Stick-breaking Attention
Shawn Tan
Yikang Shen
Songlin Yang
Rameswar Panda
Stick-breaking Attention
Shawn Tan
Yikang Shen
Songlin Yang
Rameswar Panda
Stick-breaking Attention
Shawn Tan
Yikang Shen
Songlin Yang
Rameswar Panda
Stick-breaking Attention
Shawn Tan
Yikang Shen
Songlin Yang
Rameswar Panda
Symmetry-Aware Generative Modeling through Learned Canonicalization
Kusha Sareen
Daniel Levy
Arnab Kumar Mondal
Sékou-Oumar Kaba
Tara Akhound-Sadegh
Generative modeling of symmetric densities has a range of applications in AI for science, from drug discovery to physics simulations. The ex… (voir plus)isting generative modeling paradigm for invariant densities combines an invariant prior with an equivariant generative process. However, we observe that this technique is not necessary and has several drawbacks resulting from the limitations of equivariant networks. Instead, we propose to model a learned slice of the density so that only one representative element per orbit is learned. To accomplish this, we learn a group-equivariant canonicalization network that maps training samples to a canonical pose and train a non-equivariant generative model over these canonicalized samples. We implement this idea in the context of diffusion models. Our preliminary experimental results on molecular modeling are promising, demonstrating improved sample quality and faster inference time.
FairLoRA: Unpacking Bias Mitigation in Vision Models with Fairness-Driven Low-Rank Adaptation
Rohan Sukumaran
Aarash Feizi
Adriana Romero-Sorian
Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Massimo Caccia
Megh Thakkar
Léo Boisvert
Thibault Le Sellier de Chezelles
Alexandre Piché
Alexandre Lacoste
Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing digital … (voir plus)tasks through web interfaces in a human-like manner. However, even the strongest closed-source models often struggle to achieve robust results on several benchmarks, while a notable performance gap exists between them and open-source counterparts. This study investigates the potential of fine-tuning to enhance the performance of a smaller, lower-performing but cost-efficient LLM by leveraging successful traces from stronger LLMs, referred to as experts. We outline a comprehensive pipeline for data collection, filtering, and supervised fine-tuning and explore various behavior cloning parameters. Our experiments provide key insights into the challenges of fine-tuning LLMs into web agents on benchmarks like MiniWoB and WorkArena. Notably, we find that the fine-tuned agents' ability to predict expert trajectories does not consistently lead to improved downstream task performance. This raises issues such as off-policy bias and the loss of reasoning abilities during fine-tuning. We discuss potential solutions to these challenges and make both the codebase and a dataset of 140M tokens open-source for the community to build upon.
Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Massimo Caccia
Megh Thakkar
Léo Boisvert
Thibault Le Sellier de Chezelles
Alexandre Piché
Alexandre Lacoste
Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing digital … (voir plus)tasks through web interfaces in a human-like manner. However, even the strongest closed-source models often struggle to achieve robust results on several benchmarks, while a notable performance gap exists between them and open-source counterparts. This study investigates the potential of fine-tuning to enhance the performance of a smaller, lower-performing but cost-efficient LLM by leveraging successful traces from stronger LLMs, referred to as experts. We outline a comprehensive pipeline for data collection, filtering, and supervised fine-tuning and explore various behavior cloning parameters. Our experiments provide key insights into the challenges of fine-tuning LLMs into web agents on benchmarks like MiniWoB and WorkArena. Notably, we find that the fine-tuned agents' ability to predict expert trajectories does not consistently lead to improved downstream task performance. This raises issues such as off-policy bias and the loss of reasoning abilities during fine-tuning. We discuss potential solutions to these challenges and make both the codebase and a dataset of 140M tokens open-source for the community to build upon.
Graph Knowledge Distillation to Mixture of Experts
Pavel Rumiantsev
Health satisfaction outcome from integrated autonomous mobile clinics
Yuzhang Huang
Shaoshan Liu
Zhongying Pan
Carl Wu
Herng-Chia Chiu
Leiyu Shi
Do Robot Snakes Dream like Electric Sheep? Investigating the Effects of Architectural Inductive Biases on Hallucination
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Boxing Chen
The growth in prominence of large language models (LLMs) in everyday life can be largely attributed to their generative abilities, yet some … (voir plus)of this is also owed to the risks and costs associated with their use. On one front is their tendency to \textit{hallucinate} false or misleading information, limiting their reliability. On another is the increasing focus on the computational limitations associated with traditional self-attention based LLMs, which has brought about new alternatives, in particular recurrent models, meant to overcome them. Yet it remains uncommon to consider these two concerns simultaneously. Do changes in architecture exacerbate/alleviate existing concerns about hallucinations? Do they affect how and where they occur? Through an extensive evaluation, we study how these architecture-based inductive biases affect the propensity to hallucinate. While hallucination remains a general phenomenon not limited to specific architectures, the situations in which they occur and the ease with which specific types of hallucinations can be induced can significantly differ based on the model architecture. These findings highlight the need for better understanding both these problems in conjunction with each other, as well as consider how to design more universal techniques for handling hallucinations.
GFlowNets for Hamiltonian decomposition in groups of compatible operators
Isaac L. Huidobro-Meezs
Jun Dai
R. A. Vargas-Hern'andez
Quantum computing presents a promising alternative for the direct simulation of quantum systems with the potential to explore chemical probl… (voir plus)ems beyond the capabilities of classical methods. However, current quantum algorithms are constrained by hardware limitations and the increased number of measurements required to achieve chemical accuracy. To address the measurement challenge, techniques for grouping commuting and anti-commuting terms, driven by heuristics, have been developed to reduce the number of measurements needed in quantum algorithms on near-term quantum devices. In this work, we propose a probabilistic framework using GFlowNets to group fully (FC) or qubit-wise commuting (QWC) terms within a given Hamiltonian. The significance of this approach is demonstrated by the reduced number of measurements for the found groupings; 51% and 67% reduction factors respectively for FC and QWC partitionings with respect to greedy coloring algorithms, highlighting the potential of GFlowNets for future applications in the measurement problem. Furthermore, the flexibility of our algorithm extends its applicability to other resource optimization problems in Hamiltonian simulation, such as circuit design.