Zichao Li

Akshay Kalkunte

Franccois Savard

Amirhossein Abaskohi

Pierre-Andre Noel

Shubbam Agarwal

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Sepideh Kharaghani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Spandanna Gella

Perouz Taslakian

David Vázquez

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2024-12-05

ArXiv (prépublication)

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Juan Rodriguez

Xiangru Jian

Akshay Kalkunte

Franccois Savard

Amirhossein Abaskohi

Pierre-Andre Noel

Shubbam Agarwal

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Sepideh Kharaghani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Spandanna Gella

Perouz Taslakian

David Vázquez

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2024-12-05

ArXiv (prépublication)

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Juan Rodriguez

Xiangru Jian

Akshay Kalkunte

Franccois Savard

Amirhossein Abaskohi

Pierre-Andre Noel

M. L. Richter

Saverio Vadacchino

Shubbam Agarwal

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Sepideh Kharagani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Spandanna Gella

Perouz Taslakian

David Vázquez

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2024-12-05

ArXiv (prépublication)

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Juan A. Rodriguez

Xiangru Jian

Akshay Kalkunte Suresh

Amirhossein Abaskohi

Pierre-Andre Noel

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi Madhusudhan

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Sepideh Kharaghani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2024-12-05

ArXiv (prépublication)

BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks

Juan A. Rodriguez

Xiangru Jian

Akshay Kalkunte Suresh

Amirhossein Abaskohi

Pierre-Andre Noel

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi Madhusudhan

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Sepideh Kharaghani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2024-12-05

ArXiv (prépublication)

BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks

Juan A. Rodriguez

Xiangru Jian

Akshay Kalkunte Suresh

Amirhossein Abaskohi

Pierre-Andre Noel

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi Madhusudhan

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Sepideh Kharaghani

Sean Hughes

M. Özsu

Issam Hadj Laradji

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2024-10-10

NeurIPS.cc/2024/Workshop/RBFM (poster)

openreview.net

Do LLMs Build World Representations? Probing Through the Lens of State Abstraction

Zichao Li

Yanshuai Cao

Jackie Cheung

2024-09-25

NeurIPS.cc/2024/Conference (poster)

openreview.net

Evaluating Dependencies in Fact Editing for Language Models: Specificity and Implication Awareness

The potential of using a large language model (LLM) as a knowledge base (KB) has sparked significant interest. To maintain the knowledge acq… (voir plus)uired by LLMs, we need to ensure that the editing of learned facts respects internal logical constraints, which are known as dependency of knowledge. Existing work on editing LLMs has partially addressed the issue of dependency, when the editing of a fact should apply to its lexical variations without disrupting irrelevant ones. However, they neglect the dependency between a fact and its logical implications. We propose an evaluation protocol with an accompanying question-answering dataset, StandUp, that provides a comprehensive assessment of the editing process considering the above notions of dependency. Our protocol involves setting up a controlled environment in which we edit facts and monitor their impact on LLMs, along with their implications based on If-Then rules. Extensive experiments on StandUp show that existing knowledge editing methods are sensitive to the surface form of knowledge, and that they have limited performance in inferring the implications of edited facts.

2023-01-01

EMNLP (Findings) (publié)

openreview.net

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

2022-05-01

Findings of the Association for Computational Linguistics: ACL 2022 (publié)

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

Most research on question answering focuses on the pre-deployment stage; i.e., building an accurate model for deployment.In this paper, we a… (voir plus)sk the question: Can we improve QA systems further post-deployment based on user interactions? We focus on two kinds of improvements: 1) improving the QA system’s performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users. We collect this dataset by deploying a base QA system to crowdworkers who then engage with the system and provide feedback on the quality of its answers.The feedback contains both structured ratings and unstructured natural language explanations.We train a neural model with this feedback data that can generate explanations and re-score answer candidates. We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems. The generated explanations also help users make informed decisions about the correctness of answers.

2022-04-06

ArXiv (preprint)

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

Yue Dong

Zichao Li

Mehdi Rezagholizadeh

Jackie Cheung

We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-inte… (voir plus)rpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences.

2019-07-01

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (publié)