Publications

Bugs in Large Language Models Generated Code: An Empirical Study

Florian Tambon

Arghavan Moradi Dakhel

Amin Nikanjam

Foutse Khomh

Michel C. Desmarais

Giuliano Antoniol

2024-03-13

ArXiv (preprint)

doi.org

arxiv.org

Online Bayesian optimization of vagus nerve stimulation.

Lorenz Wernisch

Tristan Edwards

Antonin Berthon

Olivier Tessier-Lariviere

Elvijs Sarkans

Myrta Stoukidi

Pascal Fortier-Poisson

Max Pinkney

Michael Thornton

Catherine Hanley

Susannah Lee

Joel Jennings

Ben Appleton

Philip Garsed

Bret Patterson

Buttinger Will

Samuel Gonshaw

Matjaž Jakopec

Sudhakaran Shunmugam

Jorin Mamen … (see 4 more)

Aleksi Tukiainen

Guillaume Lajoie

Oliver Armitage

Emil Hewage

OBJECTIVE In bioelectronic medicine, neuromodulation therapies induce neural signals to the brain or organs, modifying their function. Stimu… (see more)lation devices capable of triggering exogenous neural signals using electrical waveforms require a complex and multi-dimensional parameter space to control such waveforms. Determining the best combination of parameters (waveform optimization or dosing) for treating a particular patient's illness is therefore challenging. Comprehensive parameter searching for an optimal stimulation effect is often infeasible in a clinical setting due to the size of the parameter space. Restricting this space, however, may lead to suboptimal therapeutic results, reduced responder rates, and adverse effects. Approach. As an alternative to a full parameter search, we present a flexible machine learning, data acquisition, and processing framework for optimizing neural stimulation parameters, requiring as few steps as possible using Bayesian optimization. This optimization builds a model of the neural and physiological responses to stimulations, enabling it to optimize stimulation parameters and provide estimates of the accuracy of the response model. The vagus nerve innervates, among other thoracic and visceral organs, the heart, thus controlling heart rate, making it an ideal candidate for demonstrating the effectiveness of our approach. Main results. The efficacy of our optimization approach was first evaluated on simulated neural responses, then applied to vagus nerve stimulation intraoperatively in porcine subjects. Optimization converged quickly on parameters achieving target heart rates and optimizing neural B-fiber activations despite high intersubject variability. Significance. An optimized stimulation waveform was achieved in real time with far fewer stimulations than required by alternative optimization strategies, thus minimizing exposure to side effects. Uncertainty estimates helped avoiding stimulations outside a safe range. Our approach shows that a complex set of neural stimulation parameters can be optimized in real-time for a patient to achieve a personalized precision dosing. .

2024-03-13

Journal of Neural Engineering (published)

doi.org

Scattered Mixture-of-Experts Implementation

Shawn Tan

Yikang Shen

Rameswar Panda

Aaron Courville

We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs. ScatterMoE builds upon existing implementations, and o… (see more)vercoming some of the limitations to improve inference and training speed, and memory footprint. This implementation achieves this by avoiding padding and making excessive copies of the input. We introduce ParallelLinear, the main component we use to build our implementation and the various kernels used to speed up the operation. We benchmark our implementation against Megablocks, and show that it enables a higher throughput and lower memory footprint. We also show how ParallelLinear enables extension of the Mixture-of-Experts concept by demonstrating with an implementation of Mixture of Attention.

2024-03-13

ArXiv (preprint)

doi.org

arxiv.org

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Anthony

Timothee LESORT

Eugene Belilovsky

Irina Rish

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (see more)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (preprint)

doi.org

arxiv.org

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Simon Dufort-Labbé

Pierluca D'Oro

Evgenii Nikishin

Razvan Pascanu

Pierre-Luc Bacon

Aristide Baratin

2024-03-12

ArXiv (preprint)

doi.org

arxiv.org

Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct

Peter Henderson

Jieru Hu

Mona Diab

Joelle Pineau

2024-03-12

Proceedings of the Symposium on Computer Science and Law (published)

doi.org

Simulating Weighted Automata over Sequences and Trees with Transformers

Michael Rizvi

Maude Lizaire

Clara Lacroce

Guillaume Rabusseau

2024-03-12

ArXiv (preprint)

doi.org

arxiv.org

WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks?

Alexandre Drouin

Maxime Gasse

Massimo Caccia

Issam Hadj Laradji

Manuel Del Verme

Tom Marty

Léo Boisvert

Megh Thakkar

Quentin Cappart

David Vazquez

Nicolas Chapados

Alexandre Lacoste

We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.

2024-03-12

ArXiv (preprint)

doi.org

arxiv.org

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

Minsu Kim

Sanghyeok Choi

Jiwoo Son

Hyeon-Seob Kim

Jinkyoo Park

Yoshua Bengio

2024-03-11

ArXiv (preprint)

doi.org

arxiv.org

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Lucas Lehnert

Sainbayar Sukhbaatar

Paul McVay

Michael Rabbat

Yuandong Tian

While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (see more)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the _search dynamics_ of the

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

doi.org

openreview.net

Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

Guy Wolf

Yoshua Bengio

Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (see more)mulation-based maximum likelihood training. We introduce the generalized \textit{conditional flow matching} (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schrödinger bridge inference.

2024-03-11

TMLR (accepted)

openreview.net

IntentGPT: Few-shot Intent Discovery with Large Language Models

Juan A. Rodriguez

Nicholas Botzer

David Vazquez

Chris Pal

Marco Pedersoli

Issam Hadj Laradji

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

openreview.net

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications