Publications

Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons

Ivan Anokhin

Rishav

Stephen Chung

Biological neural networks operate in parallel, a feature that sets them apart from artificial neural networks and can significantly enhance… (see more) inference speed. However, this parallelism introduces challenges: when each neuron operates asynchronously with a fixed execution time, an

2024-06-19

ICML.cc/2024/Workshop/ARLET (poster)

openreview.net

Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models

Matthew D Riemer

Gopeshh Subbaraj

Glen Berseth

Irina Rish

Realtime environments change even as agents perform action inference and learning, thus requiring high interaction frequencies to effectivel… (see more)y minimize long-term regret. However, recent advances in machine learning involve larger neural networks with longer inference times, raising questions about their applicability in realtime systems where reaction time is crucial. We present an analysis of lower bounds on regret in realtime environments to show that minimizing long-term regret is generally impossible within the typical sequential interaction and learning paradigm, but often becomes possible when sufficient asynchronous compute is available. We propose novel algorithms for staggering asynchronous inference processes to ensure that actions are taken at consistent time intervals, and demonstrate that use of models with high action inference times is only constrained by the environment's effective stochasticity over the inference horizon, and not by action frequency. Our analysis shows that the number of inference processes needed scales linearly with increasing inference times while enabling use of models that are multiple orders of magnitude larger than existing approaches when learning from a realtime simulation of Game Boy games such as Pokemon and Tetris.

2024-06-19

ICML.cc/2024/Workshop/ARLET (poster)

openreview.net

A deeper look at depth pruning of LLMs

Shoaib Ahmed Siddiqui

Xin Dong

Greg Heinrich

Thomas Breuel

Jan Kautz

David Scott Krueger

Pavlo Molchanov

Large Language Models (LLMs) are not only resource-intensive to train but even more costly to deploy in production. Therefore, recent work h… (see more)as attempted to prune blocks of LLMs based on cheap proxies for estimating block importance, effectively removing 10% of blocks in well-trained LLaMa-2 and Mistral 7b models without any significant degradation of downstream metrics. In this paper, we explore different block importance metrics by considering adaptive metrics such as Shapley value in addition to static ones explored in prior work. We show that *adaptive metrics exhibit a trade-off in performance between tasks i.e., improvement on one task may degrade performance on the other due to differences in the computed block influences*. Furthermore, we extend this analysis from a complete block to individual self-attention and feed-forward layers, highlighting the propensity of the self-attention layers to be more amendable to pruning, even allowing ***removal of upto 33% of the self-attention layers without incurring any performance degradation on MMLU for Mistral 7b*** (significant reduction in costly maintenance of KV-cache). Finally, we look at simple performance recovery techniques to emulate the pruned layers by training lightweight additive bias or low-rank linear adapters. *Performance recovery using emulated updates avoids performance degradation for the initial blocks (up to 5% absolute improvement on MMLU)*, which is either competitive or superior to the learning-based technique.

2024-06-18

ICML.cc/2024/Workshop/TF2M (poster)

openreview.net

Insect Identification in the Wild: The AMI Dataset

Aditya Jain

Fagner Cunha

M. Bunsen

Juan Sebasti'an Canas

L. Pasi

N. Pinoy

Flemming Helsing

JoAnne Russo

Marc Botham

Michael Sabourin

Jonathan Fr'echette

Alexandre Anctil

Yacksecari Lopez

Eduardo Navarro

Filonila Perez Pimentel

Ana Cecilia Zamora

José Alejandro Ramirez Silva

Jonathan Gagnon

T. August

Kim Bjerge … (see 8 more)

Alba Gomez Segura

Marc B'elisle

Yves Basset

K. P. McFarland

David Roy

Toke Thomas Høye

Maxim Larriv'ee

David Rolnick

Insects represent half of all global biodiversity, yet many of the world's insects are disappearing, with severe implications for ecosystems… (see more) and agriculture. Despite this crisis, data on insect diversity and abundance remain woefully inadequate, due to the scarcity of human experts and the lack of scalable tools for monitoring. Ecologists have started to adopt camera traps to record and study insects, and have proposed computer vision algorithms as an answer for scalable data processing. However, insect monitoring in the wild poses unique challenges that have not yet been addressed within computer vision, including the combination of long-tailed data, extremely similar classes, and significant distribution shifts. We provide the first large-scale machine learning benchmarks for fine-grained insect recognition, designed to match real-world tasks faced by ecologists. Our contributions include a curated dataset of images from citizen science platforms and museums, and an expert-annotated dataset drawn from automated camera traps across multiple continents, designed to test out-of-distribution generalization under field conditions. We train and evaluate a variety of baseline algorithms and introduce a combination of data augmentation techniques that enhance generalization across geographies and hardware setups.

2024-06-18

ArXiv (preprint)

doi.org

arxiv.org

A machine learning pipeline for automated insect monitoring

Aditya Jain

Fagner Cunha

M. Bunsen

L. Pasi

Anna Viklund

Maxim Larriv'ee

David Rolnick

Climate change and other anthropogenic factors have led to a catastrophic decline in insects, endangering both biodiversity and the ecosyste… (see more)m services on which human society depends. Data on insect abundance, however, remains woefully inadequate. Camera traps, conventionally used for monitoring terrestrial vertebrates, are now being modified for insects, especially moths. We describe a complete, open-source machine learning-based software pipeline for automated monitoring of moths via camera traps, including object detection, moth/non-moth classification, fine-grained identification of moth species, and tracking individuals. We believe that our tools, which are already in use across three continents, represent the future of massively scalable data collection in entomology.

2024-06-18

ArXiv (preprint)

doi.org

arxiv.org

Many-Shot In-Context Learning

Rishabh Agarwal

Avi Singh

Lei M Zhang

Bernd Bohnet

Luis Rosias

Stephanie C.Y. Chan

Ankesh Anand

Zaheer Abbas

Biao Zhang

Azade Nova

John D. Co-Reyes

Eric Chu

Feryal M. P. Behbahani

Aleksandra Faust

Hugo Larochelle

Large language models (LLMs) excel at few-shot in-context learning (ICL) -- learning from a few examples provided in context at inference, w… (see more)ithout any weight updates. Newly expanded context windows allow us to investigate ICL with hundreds or thousands of examples -- the many-shot regime. Going from few-shot to many-shot, we observe significant performance gains across a wide variety of generative and discriminative tasks. While promising, many-shot ICL can be bottlenecked by the available amount of human-generated examples. To mitigate this limitation, we explore two new settings: Reinforced and Unsupervised ICL. Reinforced ICL uses model-generated chain-of-thought rationales in place of human examples. Unsupervised ICL removes rationales from the prompt altogether, and prompts the model only with domain-specific questions. We find that both Reinforced and Unsupervised ICL can be quite effective in the many-shot regime, particularly on complex reasoning tasks. Finally, we demonstrate that, unlike few-shot learning, many-shot learning is effective at overriding pretraining biases and can learn high-dimensional functions with numerical inputs. Our analysis also reveals the limitations of next-token prediction loss as an indicator of downstream ICL performance.

2024-06-18

ICML.cc/2024/Workshop/ICL (poster)

doi.org

openreview.net

Scalable Approaches for a Theory of Many Minds

Maximilian Puelma Touzel

Amin Memarian

Matthew D Riemer

Andrei Mircea

Andrew Robert Williams

Elin Ahlstrand

Lucas Lehnert

Rupali Bhati

Guillaume Dumas

Irina Rish

A major challenge as we move towards building agents for real-world problems, which could involve a massive number of human and/or machine a… (see more)gents, is that we must learn to reason about the behavior of these many other agents. In this paper, we consider the problem of scaling a predictive Theory of Mind (ToM) model to a very large number of interacting agents with a fixed computational budget. Motivated by the limited diversity of agent types, existing approaches to scalable TOM learn versatile single-agent representations for quickly adapting to new agents encountered sequentially. We consider the more general setting that many agents are observed in parallel and formulate the corresponding Theory of Many Minds (ToMM) problem of estimating the joint policy. We frame the scaling behavior of solutions in terms of parameter sharing schemes and in particular propose two parameter-free architectural features that endow models with the ability to exploit action correlations: encoding a multi-agent context, and decoding through an abstracted joint action space. The increased predictive capabilities that have come with foundation models have made it easier to imagine the possibility of using these models to make simulations that imitate the behavior of many agents within complex real-world systems. Being able to perform these simulations in a general-purpose way would not only help make more capable agents, it also would be a very useful capability for applications in social science, political science, and economics.

2024-06-18

ICML.cc/2024/Workshop/Agentic_Markets (poster)

openreview.net

Assessing the Viability of Generative Modeling in Simulated Astronomical Observations

Patrick Janulewicz

Laurence Perreault-Levasseur

Tracy Webb

In this paper, we use methods for assessing the quality of generative models and apply them to a problem from the physical sciences. We turn… (see more) our attention to astrophysics, where cosmological simulations are often used to create mock observations that mimic telescope images. These simulations and their mock observations are often slow and challenging to generate, inspiring some to use generative modeling to enhance the amount of data available to study. In this work, we add realism to simulated images of galaxy clusters and use probability mass estimation to assess their fidelity compared to reality. We find that the simulations are biased compared to real observations and suggest that researchers applying generative modeling to these systems should proceed with caution.

2024-06-17

ICML.cc/2024/Workshop/SPIGM (poster)

openreview.net

Augmenting Evolutionary Models with Structure-based Retrieval

Yining Huang

Zuobai Zhang

Jian Tang

Debora Susan Marks

Pascal Notin

2024-06-17

ICML.cc/2024/Workshop/ML4LMS (poster)

openreview.net

Bias-inducing geometries: exactly solvable data model with fairness implications

Stefano Sarao Mannelli

Federica Gerace

Negar Rostamzadeh

Luca Saglietti

Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group represen… (see more)tation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In this abstract, we aim to clarify the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Simplifying the nature of the problem to its minimal components, we can retrace and unpack typical unfairness behaviour observed on real-world datasets

2024-06-17

ICML.cc/2024/Workshop/GRaM (published)

openreview.net

Bias-inducing geometries: exactly solvable data model with fairness implications

Stefano Sarao Mannelli

Federica Gerace

Negar Rostamzadeh

Luca Saglietti

Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group represen… (see more)tation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In this abstract, we aim to clarify the role played by data geometry in the emergence of ML bias. We introduce an exactly solvable high-dimensional model of data imbalance, where parametric control over the many bias-inducing factors allows for an extensive exploration of the bias inheritance mechanism. Through the tools of statistical physics, we analytically characterise the typical properties of learning models trained in this synthetic framework and obtain exact predictions for the observables that are commonly employed for fairness assessment. Simplifying the nature of the problem to its minimal components, we can retrace and unpack typical unfairness behaviour observed on real-world datasets

2024-06-17

ICML.cc/2024/Workshop/GRaM (published)

openreview.net

Demystifying amortized causal discovery with transformers

Francesco Montagna

Max Cairney-Leeming

Dhanya Sridhar

Francesco Locatello

Supervised learning approaches for causal discovery from observational data often achieve competitive performance despite seemingly avoiding… (see more) explicit assumptions that traditional methods make for identifiability. In this work, we investigate CSIvA \citep{ke2023learning}, a transformer-based model promising to train on synthetic data and transfer to real data. First, we bridge the gap with existing identifiability theory and show that constraints on the training data distribution implicitly define a prior on the test observations. Consistent with classical approaches, good performance is achieved when we have a good prior on the test data, and the underlying model is identifiable. At the same time, we find new trade-offs. Training on datasets generated from different classes of causal models, unambiguously identifiable in isolation, improves the test generalization. Performance is still guaranteed, as the ambiguous cases resulting from the mixture of identifiable causal models are unlikely to occur (which we formally prove). Overall, our study finds that amortized causal discovery still needs to obey identifiability theory, but it also differs from classical methods in how the assumptions are formulated, trading more reliance on assumptions on the noise type for fewer hypotheses on the mechanisms.

2024-06-17

ICML.cc/2024/Workshop/SPIGM (poster)

openreview.net

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Publications

AI Advantage

Mila AI Policy Fellowship

Strategic Priorities

AI Advantage

Mila AI Policy Fellowship

Popular keywords:

Publications