Publications

Implications of conscious AI in primary healthcare

Dorsai Ranjbari

Samira Abbasgholizadeh-Rahimi

2024-03-14

Family Medicine and Community Health (publié)

doi.org

One-shot Learning for MIPs with SOS1 Constraints

Charly Robinson La Rocca

Jean-François Cordeau

Emma Frejinger

2024-03-14

ArXiv (prépublication)

arxiv.org

Bugs in Large Language Models Generated Code: An Empirical Study

Florian Tambon

Arghavan Moradi Dakhel

Amin Nikanjam

Foutse Khomh

Michel C. Desmarais

Giuliano Antoniol

2024-03-13

ArXiv (prépublication)

doi.org

arxiv.org

Scattered Mixture-of-Experts Implementation

Shawn Tan

Yikang Shen

Rameswar Panda

Aaron Courville

We present ScatterMoE, an implementation of Sparse Mixture-of-Experts (SMoE) on GPUs. ScatterMoE builds upon existing implementations, and o… (voir plus)vercoming some of the limitations to improve inference and training speed, and memory footprint. This implementation achieves this by avoiding padding and making excessive copies of the input. We introduce ParallelLinear, the main component we use to build our implementation and the various kernels used to speed up the operation. We benchmark our implementation against Megablocks, and show that it enables a higher throughput and lower memory footprint. We also show how ParallelLinear enables extension of the Mixture-of-Experts concept by demonstrating with an implementation of Mixture of Attention.

2024-03-13

ArXiv (prépublication)

doi.org

arxiv.org

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Adam Ibrahim

Benjamin Thérien

Kshitij Gupta

Mats Leon Richter

Quentin Anthony

Timothee LESORT

Eugene Belilovsky

Irina Rish

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes ava… (voir plus)ilable. A much more efficient solution is to continually pre-train these models, saving significant compute compared to re-training. However, the distribution shift induced by new data typically results in degraded performance on previous data or poor adaptation to the new data. In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks. Specifically, we show this for a weak but realistic distribution shift between two commonly used LLM pre-training datasets (English

2024-03-13

ArXiv (prépublication)

doi.org

arxiv.org

Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons

Simon Dufort-Labbé

Pierluca D'Oro

Evgenii Nikishin

Razvan Pascanu

Pierre-Luc Bacon

Aristide Baratin

2024-03-12

ArXiv (prépublication)

doi.org

arxiv.org

Rethinking Machine Learning Benchmarks in the Context of Professional Codes of Conduct

Peter Henderson

Jieru Hu

Mona Diab

Joelle Pineau

2024-03-12

Proceedings of the Symposium on Computer Science and Law (publié)

doi.org

Simulating Weighted Automata over Sequences and Trees with Transformers

Michael Rizvi

Maude Lizaire

Clara Lacroce

Guillaume Rabusseau

2024-03-12

ArXiv (prépublication)

doi.org

arxiv.org

Ant Colony Sampling with GFlowNets for Combinatorial Optimization

Minsu Kim

Sanghyeok Choi

Jiwoo Son

Hyeon-Seob Kim

Jinkyoo Park

Yoshua Bengio

2024-03-11

ArXiv (prépublication)

doi.org

arxiv.org

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Lucas Lehnert

Sainbayar Sukhbaatar

Paul McVay

Michael Rabbat

Yuandong Tian

While Transformers have enabled tremendous progress in various application settings, such architectures still lag behind traditional symboli… (voir plus)c planners for solving complex decision making tasks. In this work, we demonstrate how to train Transformers to solve complex planning tasks. This is accomplished by training an encoder-decoder Transformer model to predict the _search dynamics_ of the

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

doi.org

openreview.net

Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport

Alexander Tong

Nikolay Malkin

Guillaume Huguet

Yanlei Zhang

Jarrid Rector-Brooks

Kilian FATRAS

Guy Wolf

Yoshua Bengio

Continuous normalizing flows (CNFs) are an attractive generative modeling technique, but they have been held back by limitations in their si… (voir plus)mulation-based maximum likelihood training. We introduce the generalized \textit{conditional flow matching} (CFM) technique, a family of simulation-free training objectives for CNFs. CFM features a stable regression objective like that used to train the stochastic flow in diffusion models but enjoys the efficient inference of deterministic flow models. In contrast to both diffusion models and prior CNF training algorithms, CFM does not require the source distribution to be Gaussian or require evaluation of its density. A variant of our objective is optimal transport CFM (OT-CFM), which creates simpler flows that are more stable to train and lead to faster inference, as evaluated in our experiments. Furthermore, OT-CFM is the first method to compute dynamic OT in a simulation-free way. Training CNFs with CFM improves results on a variety of conditional and unconditional generation tasks, such as inferring single cell dynamics, unsupervised image translation, and Schrödinger bridge inference.

2024-03-11

TMLR (accepté)

openreview.net

IntentGPT: Few-shot Intent Discovery with Large Language Models

Juan A. Rodriguez

Nicholas Botzer

David Vazquez

Chris Pal

Marco Pedersoli

Issam Hadj Laradji

2024-03-11

ICLR.cc/2024/Workshop/LLMAgents (poster)

openreview.net

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications