Zichao Li

Hierarchical Retrieval at Scale: Bridging Transparency and Efficiency

Tianyi Chen

Valentina Zantedeschi

Information retrieval is a core component of many intelligent systems as it enables conditioning of outputs on new and large-scale datasets.… (voir plus) While effective, the standard practice of encoding data into high-dimensional representations for similarity search entails large memory and compute footprints, and also makes it hard to inspect the inner workings of the system. Hierarchical retrieval methods offer an interpretable alternative by organizing data at multiple granular levels, yet do not match the efficiency and performance of flat retrieval approaches. In this paper, we propose ReTreever, a tree-based method that makes hierarchical retrieval viable at scale by directly optimizing its structure for retrieval performance while naturally providing transparency through meaningful semantic groupings. Our method offers the flexibility to balance cost and utility by indexing data using representations from any tree level. We show that ReTreever delivers strong coarse (intermediate levels) and fine representations (terminal level), while achieving the highest retrieval accuracy at the lowest latency among hierarchical methods. These results demonstrate that this family of techniques is viable in practical applications.

2026-03-01

Trustworthy AI @ International Conference on Learning Representations (publié)

doi.org

openreview.net

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Christopher Pal

Juan A. Rodriguez

Sai Rajeswar

We present WebMMU, a multilingual benchmark that evaluates three core web tasks: (1) website visual question answering, (2) code editing inv… (voir plus)olving HTML/CSS/JavaScript, and (3) mockup-to-code generation. Unlike prior benchmarks that treat these tasks separately, WebMMU unifies them using expert-annotated, real-world web data to assess models'abilities in complex multi-step reasoning, precise element grounding, and functional UI comprehension and coding. Our evaluation shows that while multimodal large language models (MLLMs) perform well on basic information extraction, they struggle with reasoning and grounding, editing code to preserve functionality, and generating design-to-code that maintains hierarchy and supports multilingual content. These findings reveal key limitations in current MLLMs and underscore the need for improved multimodal and cross-lingual reasoning to build future web agents capable of automating diverse web development tasks.

2025-10-31

Conference on Empirical Methods in Natural Language Processing (publié)

doi.org

openreview.net

Partial Perspectives: How LLMs Handle Logically Inconsistent Knowledge in Reasoning Tasks

Zichao Li

Ines Arous

Jackie CK Cheung

Most natural language reasoning tasks in the research community assume consistent input knowledge. Nevertheless, real-world scenarios often … (voir plus)involve inconsistent information, which might lead to divergent conclusions and are typically associated with varying levels of uncertainty. This raises a key research question: can large language models (LLMs) effectively handle uncertainty in their reasoning process to maximize knowledge consistency? In this paper, we propose a framework for evaluating reasoning over inconsistent knowledge. Our approach models uncertainty via weights of logical rules, leveraging Markov logic networks (MLN), which integrate probabilistic reasoning with first-order logic. This enables us to quantify inconsistencies in knowledge bases, and hence rigorously evaluate LLM reasoning. We introduce two tasks using this framework: 1) QA, which involves answering questions by integrating inconsistent knowledge; and 2) knowledge rectification, where we aim to rectify language models' acquired knowledge to improve consistency. We curate a dataset of 3,000 MLN-formatted knowledge bases to implement these tasks. We evaluate state-of-the-art LLMs on these tasks and highlight their limitations in uncertainty-aware reasoning over inconsistent logical knowledge.

2025-07-06

colmweb.org/COLM/2025/Conference (accepté)

openreview.net

BigDocs: An Open Dataset for Training Multi-modal Models on Document and Code Tasks

Juan Rodriguez

Xiangru Jian

Siba Smarak Panigrahi

Akshay Kalkunte

Amirhossein Abaskohi

Pierre-Andre Noel

Sanket Biswas … (voir 23 de plus)

Sara Shanian

Ying Zhang

Noah Bolger

Kurt MacDonald

Simon Fauvel

Sathwik Tejaswi

Srinivas Sunkara

Joao Monteiro

Krishnamurthy Dj Dvijotham

Torsten Scholak

Nicolas Chapados

Sepideh Kharagani

Sean Hughes

M. Özsu

Christopher Pal

Sai Rajeswar

Multimodal AI has the potential to significantly enhance document-understanding tasks, such as processing receipts, understanding workflows,… (voir plus) extracting data from documents, and summarizing reports. Code generation tasks that require long-structured outputs can also be enhanced by multimodality. Despite this, their use in commercial applications is often limited due to limited access to training data and restrictive licensing, which hinders open access. To address these limitations, we introduce BigDocs-7.5M, a high-quality, open-access dataset comprising 7.5 million multimodal documents across 30 tasks. We use an efficient data curation process to ensure our data is high-quality and license-permissive. Our process emphasizes accountability, responsibility, and transparency through filtering rules, traceable metadata, and careful content analysis. Additionally, we introduce BigDocs-Bench, a benchmark suite with 10 novel tasks where we create datasets that reflect real-world use cases involving reasoning over Graphical User Interfaces (GUI) and code generation from images. Our experiments show that training with BigDocs-Bench improves average performance up to 25.8% over closed-source GPT-4o in document reasoning and structured output tasks such as Screenshot2HTML or Image2Latex generation. Finally, human evaluations showed a preference for outputs from models trained on BigDocs over GPT-4o. This suggests that BigDocs can help both academics and the open-source community utilize and improve AI tools to enhance multimodal capabilities and document reasoning. The project is hosted at https://bigdocs.github.io .

2025-01-21

International Conference on Learning Representations (poster)

doi.org

openreview.net

Retreever: Tree-Based Coarse-to-Fine Representations for Retrieval

Tianyi Chen

Valentina Zantedeschi

Document retrieval is a core component of question-answering systems, as it enables conditioning answer generation on new and large-scale co… (voir plus)rpora. While effective, the standard practice of encoding documents into high-dimensional embeddings for similarity search entails large memory and compute footprints, and also makes it hard to inspect the inner workings of the system. In this paper, we propose a tree-based method for organizing and representing reference documents at various granular levels, which offers the flexibility to balance cost and utility, and eases the inspection of the corpus content and retrieval operations. Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches, hence directly optimizing for retrieval performance. Our evaluations show that ReTreever generally preserves full representation accuracy. Its hierarchical structure further provides strong coarse representations and enhances transparency by indirectly learning meaningful semantic groupings. Among hierarchical retrieval methods, ReTreever achieves the best retrieval accuracy at the lowest latency, proving that this family of techniques can be viable in practical applications.

2024-12-31

arXiv (prépublication)

doi.org

arxiv.org

Do LLMs Build World Representations? Probing Through the Lens of State Abstraction

Zichao Li

Yanshuai Cao

Jackie C.K. Cheung

2024-09-24

NeurIPS.cc/2024/Conference (poster)

doi.org

openreview.net

Evaluating Dependencies in Fact Editing for Language Models: Specificity and Implication Awareness

Zichao Li

Ines Arous

Siva Reddy

Jackie C.K. Cheung

The potential of using a large language model (LLM) as a knowledge base (KB) has sparked significant interest. To maintain the knowledge acq… (voir plus)uired by LLMs, we need to ensure that the editing of learned facts respects internal logical constraints, which are known as dependency of knowledge. Existing work on editing LLMs has partially addressed the issue of dependency, when the editing of a fact should apply to its lexical variations without disrupting irrelevant ones. However, they neglect the dependency between a fact and its logical implications. We propose an evaluation protocol with an accompanying question-answering dataset, StandUp, that provides a comprehensive assessment of the editing process considering the above notions of dependency. Our protocol involves setting up a controlled environment in which we edit facts and monitor their impact on LLMs, along with their implications based on If-Then rules. Extensive experiments on StandUp show that existing knowledge editing methods are sensitive to the surface form of knowledge, and that they have limited performance in inferring the implications of edited facts.

2023-10-06

Conference on Empirical Methods in Natural Language Processing (accepté)

doi.org

openreview.net

Using Interactive Feedback to Improve the Accuracy and Explainability of Question Answering Systems Post-Deployment

Jackie CK Cheung

2022-04-30

Findings of the Association for Computational Linguistics: ACL 2022 (publié)

doi.org

arxiv.org

EditNTS: An Neural Programmer-Interpreter Model for Sentence Simplification through Explicit Editing

Yue Dong

Zichao Li

Mehdi Rezagholizadeh

Jackie CK Cheung

We present the first sentence simplification model that learns explicit edit operations (ADD, DELETE, and KEEP) via a neural programmer-inte… (voir plus)rpreter approach. Most current neural sentence simplification systems are variants of sequence-to-sequence models adopted from machine translation. These methods learn to simplify sentences as a byproduct of the fact that they are trained on complex-simple sentence pairs. By contrast, our neural programmer-interpreter is directly trained to predict explicit edit operations on targeted parts of the input sentence, resembling the way that humans perform simplification and revision. Our model outperforms previous state-of-the-art neural sentence simplification models (without external knowledge) by large margins on three benchmark text simplification corpora in terms of SARI (+0.95 WikiLarge, +1.89 WikiSmall, +1.41 Newsela), and is judged by humans to produce overall better and simpler output sentences.

2019-06-30

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (publié)

doi.org

arxiv.org

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Publications

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Boussole des politiques en IA

Mots-clés populaires:

Zichao Li

Publications