Publications

Multilingual Hallucination Gaps in Large Language Models

Cl'ea Chataigner

Afaf Taïk

Golnoosh Farnadi

Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (see more)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.

2024-10-23

ArXiv (preprint)

Overcoming State and Action Space Disparities in Multi-Domain, Multi-Task Reinforcement Learning

Reginald McLean

Kai Yuan

Isaac Woungang

Nariman Farsad

Pablo Samuel Castro

Current multi-task reinforcement learning (MTRL) methods have the ability to perform a large number of tasks with a single policy. However w… (see more)hen attempting to interact with a new domain, the MTRL agent would need to be re-trained due to differences in domain dynamics and structure. Because of these limitations, we are forced to train multiple policies even though tasks may have shared dynamics, leading to needing more samples and is thus sample inefficient. In this work, we explore the ability of MTRL agents to learn in various domains with various dynamics by simultaneously learning in multiple domains, without the need to fine-tune extra policies. In doing so we find that a MTRL agent trained in multiple domains induces an increase in sample efficiency of up to 70\% while maintaining the overall success rate of the MTRL agent.

2024-10-23

corl.org/2024/Workshop/MAPoDeL (published)

Overcoming State and Action Space Disparities in Multi-Domain, Multi-Task Reinforcement Learning

Reginald McLean

Kai Yuan

Isaac Woungang

Nariman Farsad

Pablo Samuel Castro

Current multi-task reinforcement learning (MTRL) methods have the ability to perform a large number of tasks with a single policy. However w… (see more)hen attempting to interact with a new domain, the MTRL agent would need to be re-trained due to differences in domain dynamics and structure. Because of these limitations, we are forced to train multiple policies even though tasks may have shared dynamics, leading to needing more samples and is thus sample inefficient. In this work, we explore the ability of MTRL agents to learn in various domains with various dynamics by simultaneously learning in multiple domains, without the need to fine-tune extra policies. In doing so we find that a MTRL agent trained in multiple domains induces an increase in sample efficiency of up to 70\% while maintaining the overall success rate of the MTRL agent.

2024-10-23

corl.org/2024/Workshop/MAPoDeL (published)

Stick-breaking Attention

Songlin Yang

Rameswar Panda

2024-10-23

ArXiv (preprint)

Stick-breaking Attention

Songlin Yang

Rameswar Panda

2024-10-23

ArXiv (preprint)

Stick-breaking Attention

Songlin Yang

Rameswar Panda

2024-10-23

ArXiv (preprint)

Stick-breaking Attention

Songlin Yang

Rameswar Panda

2024-10-23

ArXiv (preprint)

Stick-breaking Attention

Songlin Yang

Rameswar Panda

2024-10-23

ArXiv (preprint)

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen

Daniel Levy

Arnab Kumar Mondal

Sékou-Oumar Kaba

Tara Akhound-Sadegh

Siamak Ravanbakhsh

Generative modeling of symmetric densities has a range of applications in AI for science, from drug discovery to physics simulations. The ex… (see more)isting generative modeling paradigm for invariant densities combines an invariant prior with an equivariant generative process. However, we observe that this technique is not necessary and has several drawbacks resulting from the limitations of equivariant networks. Instead, we propose to model a learned slice of the density so that only one representative element per orbit is learned. To accomplish this, we learn a group-equivariant canonicalization network that maps training samples to a canonical pose and train a non-equivariant generative model over these canonicalized samples. We implement this idea in the context of diffusion models. Our preliminary experimental results on molecular modeling are promising, demonstrating improved sample quality and faster inference time.

2024-10-23

NeurIPS.cc/2024/Workshop/NeurReps (poster)

FairLoRA: Unpacking Bias Mitigation in Vision Models with Fairness-Driven Low-Rank Adaptation

Rohan Sukumaran

Aarash Feizi

Adriana Romero-Sorian

Golnoosh Farnadi

2024-10-22

ArXiv (preprint)

Thibault Le Sellier De Chezelles

Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think

Massimo Caccia

Megh Thakkar

Léo Boisvert

Alexandre Lacoste

Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing digital … (see more)tasks through web interfaces in a human-like manner. However, even the strongest closed-source models often struggle to achieve robust results on several benchmarks, while a notable performance gap exists between them and open-source counterparts. This study investigates the potential of fine-tuning to enhance the performance of a smaller, lower-performing but cost-efficient LLM by leveraging successful traces from stronger LLMs, referred to as experts. We outline a comprehensive pipeline for data collection, filtering, and supervised fine-tuning and explore various behavior cloning parameters. Our experiments provide key insights into the challenges of fine-tuning LLMs into web agents on benchmarks like MiniWoB and WorkArena. Notably, we find that the fine-tuned agents' ability to predict expert trajectories does not consistently lead to improved downstream task performance. This raises issues such as off-policy bias and the loss of reasoning abilities during fine-tuning. We discuss potential solutions to these challenges and make both the codebase and a dataset of 140M tokens open-source for the community to build upon.

2024-10-22

NeurIPS.cc/2024/Workshop/OWA (poster)