Publications

Edouard Oyallon

2024-06-03

ArXiv (prépublication)

ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training

Adel Nabli

Louis Fournier

Pierre ERBACHER

Louis Serrano

Edouard Oyallon

Training LLMs relies on distributed implementations using multiple GPUs to compute gradients in parallel with sharded optimizers. However, s… (voir plus)ynchronizing gradients in data parallel setups introduces communication overhead that grows with the number of workers, limiting parallelization efficiency. Local optimization algorithms reduce communications but incur high memory costs as they prevent optimizer state sharding, hindering scalability. To address this, we propose \textbf{AC}cumulate while \textbf{CO}mmunicate (\acco), a memory-efficient optimization algorithm for distributed LLM training. By synchronizing delayed gradients while computing new ones, \acco~reduces GPU idle time and supports heterogeneous hardware. To mitigate the convergence issues caused by delayed updates, we introduce a novel technique ensuring training dynamics align with standard distributed optimization. Compared to ZeRO-1, our approach is significantly faster and scales effectively across heterogeneous hardware.

2024-06-03

ArXiv (prépublication)

ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training

Adel Nabli

Louis Fournier

Pierre ERBACHER

Louis Serrano

Edouard Oyallon

Training Large Language Models (LLMs) relies heavily on distributed implementations, employing multiple GPUs to compute stochastic gradients… (voir plus) on model replicas in parallel. However, synchronizing gradients in data parallel settings induces a communication overhead increasing with the number of distributed workers, which can impede the efficiency gains of parallelization. To address this challenge, optimization algorithms reducing inter-worker communication have emerged, such as local optimization methods used in Federated Learning. While effective in minimizing communication overhead, these methods incur significant memory costs, hindering scalability: in addition to extra momentum variables, if communications are only allowed between multiple local optimization steps, then the optimizer's states cannot be sharded among workers. In response, we propose

2024-06-03

ArXiv (prépublication)

From Feature Visualization to Visual Circuits: Effect of Adversarial Model Manipulation

G'eraldin Nanfack

Michael Eickenberg

Understanding the inner working functionality of large-scale deep neural networks is challenging yet crucial in several high-stakes applicat… (voir plus)ions. Mechanistic inter- pretability is an emergent field that tackles this challenge, often by identifying human-understandable subgraphs in deep neural networks known as circuits. In vision-pretrained models, these subgraphs are usually interpreted by visualizing their node features through a popular technique called feature visualization. Recent works have analyzed the stability of different feature visualization types under the adversarial model manipulation framework. This paper starts by addressing limitations in existing works by proposing a novel attack called ProxPulse that simultaneously manipulates the two types of feature visualizations. Surprisingly, when analyzing these attacks under the umbrella of visual circuits, we find that visual circuits show some robustness to ProxPulse. We, therefore, introduce a new attack based on ProxPulse that unveils the manipulability of visual circuits, shedding light on their lack of robustness. The effectiveness of these attacks is validated using pre-trained AlexNet and ResNet-50 models on ImageNet.

2024-06-03

ArXiv (prépublication)

MOSEAC: Streamlined Variable Time Step Reinforcement Learning

Dong Wang

Giovanni Beltrame

2024-06-03

ArXiv (prépublication)

Political Dynasties in Canada

Alex B. Rivard

Jean-François Godbout

Marc André Bodet

Using a unique dataset of legislators' electoral and biographical data in the Canadian provinces of Ontario, Quebec, New Brunswick, Nova Sco… (voir plus)tia and the federal parliament, this article analyses the extent to which family dynasties affected the career development of legislators since the mid-18th century. We find that the prevalence of dynasties was higher in provincial legislatures than it was in the federal parliament, that the number of dynasties in the Senate increased until the mid-20th century, and that the proportion of dynastic legislators at the subnational level was similar to the numbers seen in the United Kingdom during the early 19th century. Our results confirm the existence of a clear career benefit in terms of cabinet and senate appointments. In contrast to the American case and in line with the United Kingdom experience, we find no causal relationship between a legislator's tenure length and the presence of a dynasty.

2024-06-03

Government and Opposition (publié)

AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

Jiayi Wang

David Ifeoluwa Adelani

Sweta Agrawal

Marek Masiak

Ricardo Rei

Eleftheria Briakou

Marine Carpuat

Xuanli He

Sofia Bourhim

Andiswa Bukula

Muhidin A. Mohamed

Temitayo Olatoye

Tosin Adewumi

Hamam Mokayed

Christine Mwase

Wangui Kimotho

Foutse Yuehgoh

Aremu Anuoluwapo

Jessica Ojo

Shamsuddeen Hassan Muhammad … (voir 41 de plus)

Salomey Osei

Abdul-Hakeem Omotayo

Chiamaka Ijeoma Chukwuneke

Perez Ogayo

Oumaima Hourrane

Salma El Anigri

Lolwethu Ndolela

Thabiso Mangwana

Shafie Abdi Mohamed

Hassan Ayinde

Ayinde Hassan

Oluwabusayo Olufunke Awoyomi

Lama Alkhaled

sana Sabah al-azzawi

Naome Etori

Millicent Ochieng

Clemencia Siro

Samuel Njoroge

Njoroge Kiragu

Eric Muchiri

Wangari Kimotho

Lyse Naomi Wamba

Daud Abolade

Simbiat Ajao

Iyanuoluwa Shode

Ricky Macharm

Ruqayya Nasir Iro

Saheed Salahudeen Abdullahi

Stephen Moore

Bernard Opoku

Zainab Akinjobi

Abeeb Afolabi

Nnaemeka Casmir Obiefuna

Onyekachi Ogbu

Sam Brian

Sam Ochieng’

Verrah Akinyi Otiende

CHINEDU EMMANUEL MBONU

Toadoum Sari Sakayo

Yao Lu

Pontus Stenetorp

Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measur… (voir plus)ing this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).

2024-06-01

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (publié)

Attention as a Hypernetwork

Simon Schug

Seijin Kobayashi

Yassir Akram

João Sacramento

Razvan Pascanu

Transformers can under some circumstances generalize to novel problem instances whose constituent parts might have been encountered during t… (voir plus)raining, but whose compositions have not. What mechanisms underlie this ability for compositional generalization? By reformulating multi-head attention as a hypernetwork, we reveal that a composable, low-dimensional latent code specifies key-query specific operations. We find empirically that this latent code is predictive of the subtasks the network performs on unseen task compositions, revealing that latent codes acquired during training are reused to solve unseen problem instances. To further examine the hypothesis that the intrinsic hypernetwork of multi-head attention supports compositional generalization, we ablate whether making the hypernetwork-generated linear value network nonlinear strengthens compositionality. We find that this modification improves compositional generalization on abstract reasoning tasks. In particular, we introduce a symbolic version of the Raven's Progressive Matrices human intelligence test, which gives us precise control over the problem compositions encountered during training and evaluation. We demonstrate on this task how scaling model size and data enables compositional generalization in transformers and gives rise to a functionally structured latent space.

2024-06-01

arXiv (publié)

Better entity matching with transformers through ensembles

Jwen Fai Low

Benjamin Fung

Pulei Xiong

2024-06-01

Knowledge-Based Systems (publié)

Caffeine induces age-dependent increases in brain complexity and criticality during sleep

Philipp Thölke

Maxine Arcand-Lavigne

Tarek Lajnef

Sonia Frenette

Julie Carrier

Karim Jerbi

Caffeine is the most widely consumed psychoactive stimulant worldwide. Yet important gaps persist in understanding its effects on the brain,… (voir plus) especially during sleep. We analyzed sleep EEG in 40 subjects, contrasting 200mg of caffeine against a placebo condition, utilizing inferential statistics and machine learning. We found that caffeine ingestion led to an increase in brain complexity, a widespread flattening of the power spectrum’s 1/f-like slope, and a reduction in long-range temporal correlations. Being most prominent during non-REM sleep, these results suggest that caffeine shifts the brain towards a critical regime and more diverse neural dynamics. Interestingly, this was more pronounced in younger adults (20-27 years) compared to middle-aged participants (41-58 years) whose sleep brain dynamics were less affected by caffeine. Interpreting these data in the light of modeling and empirical work on EEG-derived measures of excitation-inhibition balance provides novel insights into the effects caffeine has on the sleeping brain.

2024-06-01

bioRxiv (prépublication)

Efficient Evolutionary Search Over Chemical Space with Large Language Models

Haorui Wang

Marta Skreta

Cher Tian Ser

Wenhao Gao

Lingkai Kong

Felix Streith-Kalthoff

Chenru Duan

Yuchen Zhuang

Yue Yu

Yanqiao Zhu 0001

Yuanqi Du

Alan Aspuru-Guzik

Kirill Neklyudov

Chao Zhang

Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectiv… (voir plus)es can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations. In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs. Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings. We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations. Our code is available at http://github.com/zoom-wang112358/MOLLEO

2024-06-01

arXiv (publié)

Evaluating In-Context Learning of Libraries for Code Generation

Arkil Patel

Siva Reddy

Dzmitry Bahdanau

Pradeep Dasigi

2024-06-01

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) (publié)