Abstract:
Heterogeneous semi-structured tables are commonly used to represent data on the in- ternet. Recent years have seen a flurry of works in tasks that endeavor to comprehend such tabular information, such as table summarization, tabular question answering, and tabular fact-checking, to name a few. In this work, we proffer a new task in the realm of tabular data analysis called ‘Populating Semi-structured Tables’, wherein, given a partially filled table and related content, the aim is to generate text for the missing cells in the table. While most of the tasks that reason over semi-structured tables utilize the transformer-based sequence-to-sequence models, the table’s hier- archical structure and long-tailed nature seem to limit the performance of language models. Thus, we extend the traditional sequence-to-sequence models and propose sequence to multi-sequence models to handle multiple missing cell contents which are partially dependent on each other. Our inspiration comes from the system used for one-to-many sequence transduction problems with speech data which is yet to be experimented with for natural language generation tasks. The results show that our model, ‘Multiple Cell Filler’ (MuCeF) is better than the top baseline by a 15.44 ROUGE score and 34.54 METEOR score. Resources related to this work will be open-sourced for further research.