Publications

Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving
Aniket Rajiv Didolkar
Anirudh Goyal
Nan Rosemary Ke
Siyuan Guo
Michal Valko
Timothy P Lillicrap
Danilo Jimenez Rezende
Michael Curtis Mozer
Sanjeev Arora
Turns Out I'm Not Real: Towards Robust Detection of AI-Generated Videos
Qingyuan Liu
Pengyuan Shi
Yun-Yun Tsai
Chengzhi Mao
Junfeng Yang
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Bogdan Mazoure
Harsh Agrawal
Zsolt Kira
Alexander T Toshev
PathOCL: Path-Based Prompt Augmentation for OCL Generation with GPT-4
Seif Abukhalaf
Mohammad Hamdaqa
The rapid progress of AI-powered programming assistants, such as GitHub Copilot, has facilitated the development of software applications. T… (voir plus)hese assistants rely on large language models (LLMs), which are foundation models (FMs) that support a wide range of tasks related to understanding and generating language. LLMs have demonstrated their ability to express UML model specifications using formal languages like the Object Constraint Language (OCL). However, the context size of the prompt is limited by the number of tokens an LLM can process. This limitation becomes significant as the size of UML class models increases. In this study, we introduce PathOCL, a novel path-based prompt augmentation technique designed to facilitate OCL generation. PathOCL addresses the limitations of LLMs, specifically their token processing limit and the challenges posed by large UML class models. PathOCL is based on the concept of chunking, which selectively augments the prompts with a subset of UML classes relevant to the English specification. Our findings demonstrate that PathOCL, compared to augmenting the complete UML class model (UML-Augmentation), generates a higher number of valid and correct OCL constraints using the GPT-4 model. Moreover, the average prompt size crafted using PathOCL significantly decreases when scaling the size of the UML class models.
Self-Consuming Generative Models with Curated Data Provably Optimize Human Preferences
Damien Ferbach
Quentin Bertrand
Joey Bose
The rapid progress in generative models has resulted in impressive leaps in generation quality, blurring the lines between synthetic and rea… (voir plus)l data. Web-scale datasets are now prone to the inevitable contamination by synthetic data, directly impacting the training of future generated models. Already, some theoretical results on self-consuming generative models (a.k.a., iterative retraining) have emerged in the literature, showcasing that either model collapse or stability could be possible depending on the fraction of generated data used at each retraining step. However, in practice, synthetic data is often subject to human feedback and curated by users before being used and uploaded online. For instance, many interfaces of popular text-to-image generative models, such as Stable Diffusion or Midjourney, produce several variations of an image for a given query which can eventually be curated by the users. In this paper, we theoretically study the impact of data curation on iterated retraining of generative models and show that it can be seen as an \emph{implicit preference optimization mechanism}. However, unlike standard preference optimization, the generative model does not have access to the reward function or negative samples needed for pairwise comparisons. Moreover, our study doesn't require access to the density function, only to samples. We prove that, if the data is curated according to a reward model, then the expected reward of the iterative retraining procedure is maximized. We further provide theoretical results on the stability of the retraining loop when using a positive fraction of real data at each step. Finally, we conduct illustrative experiments on both synthetic datasets and on CIFAR10 showing that such a procedure amplifies biases of the reward model.
GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews
Maxime Darrin
Ines Arous
Scientific peer review is essential for the quality of academic publications. However, the increasing number of paper submissions to confere… (voir plus)nces has strained the reviewing process. This surge poses a burden on area chairs who have to carefully read an ever-growing volume of reviews and discern each reviewer's main arguments as part of their decision process. In this paper, we introduce \sys, a summarization method designed to offer a concise yet comprehensive overview of scholarly reviews. Unlike traditional consensus-based methods, \sys extracts both common and unique opinions from the reviews. We introduce novel uniqueness scores based on the Rational Speech Act framework to identify relevant sentences in the reviews. Our method aims to provide a pragmatic glimpse into all reviews, offering a balanced perspective on their opinions. Our experimental results with both automatic metrics and human evaluation show that \sys generates more discriminative summaries than baseline methods in terms of human evaluation while achieving comparable performance with these methods in terms of automatic metrics.
Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems
Heiko Hoppe
Tobias Enders
Maximilian Schiffer
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests… (voir plus) or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Lu Li
Tianyu Zhang
Zhiqi Bu
Suyuchen Wang
Huan He
Jie Fu
Yonghui Wu
Jiang Bian
Yong Chen
Model merging has emerged as an effective approach to combine multiple single-task models, fine-tuned from the same pre-trained model, into … (voir plus)a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during model merging. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP identifies a Pareto set of scaling coefficients for merging multiple models to reflect the trade-offs. The core component of MAP is approximating the evaluation metrics of the various tasks using a quadratic approximation surrogate model derived from a pre-selected set of scaling coefficients, enabling amortized inference. Experimental results on vision and natural language processing tasks show that MAP can accurately identify the Pareto front. To further reduce the required computation of MAP, we propose (1) a Bayesian adaptive sampling algorithm and (2) a nested merging scheme with multiple stages.
MINERS: Multilingual Language Models as Semantic Retrievers
Genta Indra Winata
Ruochen Zhang
Words have been represented in a high-dimensional vector space that encodes their semantic similarities, enabling downstream applications su… (voir plus)ch as retrieving synonyms, antonyms, and relevant contexts. However, despite recent advances in multilingual language models (LMs), the effectiveness of these models' representations in semantic retrieval contexts has not been comprehensively explored. To fill this gap, this paper introduces the MINERS, a benchmark designed to evaluate the ability of multilingual LMs in semantic retrieval tasks, including bitext mining and classification via retrieval-augmented contexts. We create a comprehensive framework to assess the robustness of LMs in retrieving samples across over 200 diverse languages, including extremely low-resource languages in challenging cross-lingual and code-switching settings. Our results demonstrate that by solely retrieving semantically similar embeddings yields performance competitive with state-of-the-art approaches, without requiring any fine-tuning.
When is an Embedding Model More Promising than Another?
Maxime Darrin
Philippe Formont
Ismail Ben Ayed
Embedders play a central role in machine learning, projecting any object into numerical representations that can, in turn, be leveraged to p… (voir plus)erform various downstream tasks. The evaluation of embedding models typically depends on domain-specific empirical approaches utilizing downstream tasks, primarily because of the lack of a standardized framework for comparison. However, acquiring adequately large and representative datasets for conducting these assessments is not always viable and can prove to be prohibitively expensive and time-consuming. In this paper, we present a unified approach to evaluate embedders. First, we establish theoretical foundations for comparing embedding models, drawing upon the concepts of sufficiency and informativeness. We then leverage these concepts to devise a tractable comparison criterion (information sufficiency), leading to a task-agnostic and self-supervised ranking procedure. We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark
David Romero
Chenyang Lyu
Haryo Akbarianto Wibowo
Teresa Lynn
Injy Hamed
Aditya Nanda Kishore
Aishik Mandal
Alina Dragonetti
Artem Abzaliev
Atnafu Lambebo Tonja
Bontu Fufa Balcha
Chenxi Whitehouse
Christian Salamea
Dan John Velasco
D. Meur
Emilio Villa-Cueva
Fajri Koto
Fauzan Farooqui
Frederico Belcavello … (voir 55 de plus)
Ganzorig Batnasan
Gisela Vallejo
Grainne Caulfield
Guido Ivetta
Haiyue Song
Henok Biadglign Ademtew
Hernán Maina
Holy Lovenia
Israel Abebe Azime
Jan Christian Blaise Cruz
Jay Gala
Jiahui Geng
Jesús-Germán Ortiz-Barajas
Jinheon Baek
Jocelyn Dunstan
Laura Alonso Alemany
Kumaranage Ravindu Yasas Nagasinghe
Luciana Benotti
Luis Fernando D'Haro
Marcelo Viridiano
Marcos Estecha-Garitagoitia
Maria Camila Buitrago Cabrera
Mario Rodr'iguez-Cantelar
Mélanie Jouitteau
Mihail Mihaylov
Mohamed Fazli Mohamed Imam
Muhammad Farid Adilazuarda
Munkhjargal Gochoo
Munkh-Erdene Otgonbold
Naome Etori
Olivier Niyomugisha
Paula M'onica Silva
Pranjal A. Chitale
Raj Dabre
Rendi Chevi
Ruochen Zhang
Ryandito Diandaru
Samuel Cahyawijaya
Santiago G'ongora
Soyeong Jeong
Sukannya Purkayastha
Tatsuki Kuribayashi
Thanmay Jayakumar
Tiago Timponi Torrent
Toqeer Ehsan
Vladimir Araujo
Yova Kementchedjhieva
Zara Burzo
Zheng Wei Lim
Zheng-Xin Yong
O. Ignat
Joan Nwatu
Rada Mihalcea
Thamar Solorio
Alham Fikri Aji
Simulating federated learning for steatosis detection using ultrasound images
Yue Qi
Pedro Vianna
Alexandre Cadrin-Chênevert
Katleen Blanchet
Emmanuel Montagnon
Guy Wolf
Louis-Antoine Mullie
Guy Cloutier
Michael Chassé
An Tang