Publications

Meta- (out-of-context) learning in neural networks

Dmitrii Krasheninnikov

Egor Krasheninnikov

Bruno Mlodozeniec

David M. Krueger

Brown et al. (2020) famously introduced the phenomenon of in-context learning in large language models (LLMs). We establish the existence of… (voir plus) a phenomenon we call meta-out-of-context learning (meta-OCL) via carefully designed synthetic experiments with LLMs. Our results suggest that meta-OCL leads LLMs to more readily"internalize"the semantic content of text that is, or appears to be, broadly useful (such as true statements, or text from authoritative sources) and use it in appropriate circumstances. We further demonstrate meta-OCL in a synthetic computer vision setting, and propose two hypotheses for the emergence of meta-OCL: one relying on the way models store knowledge in their parameters, and another suggesting that the implicit gradient alignment bias of gradient-descent-based optimizers may be responsible. Finally, we reflect on what our results might imply about capabilities of future AI systems, and discuss potential risks. Our code can be found at https://github.com/krasheninnikov/internalization.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Modeling Caption Diversity in Contrastive Vision-Language Pretraining

Samuel Lavoie

Polina Kirichenko

Mark Ibrahim

Mahmoud Assran

Andrew Gordon Wilson

Aaron Courville

Nicolas Ballas

There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its … (voir plus)caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

A Persuasive Approach to Combating Misinformation

Safwan Hossain

Andjela Mladenovic

Yiling Chen

Gauthier Gidel

Bayesian Persuasion is proposed as a tool for social media platforms to combat the spread of misinformation. Since platforms can use machine… (voir plus) learning to predict the popularity and misinformation features of to-be-shared posts, and users are largely motivated to share popular content, platforms can strategically signal this informational advantage to change user beliefs and persuade them not to share misinformation. We characterize the optimal signaling scheme with imperfect predictions as a linear program and give sufficient and necessary conditions on the classifier to ensure optimal platform utility is non-decreasing and continuous. Next, this interaction is considered under a performative model, wherein platform intervention affects the user's future behaviour. The convergence and stability of optimal signaling under this performative process are fully characterized. Lastly, we experimentally validate that our approach significantly reduces misinformation in both the single round and performative setting and discuss the broader scope of using information design to combat misinformation.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Position: Cracking the Code of Cascading Disparity Towards Marginalized Communities

Golnoosh Farnadi

Mohammad Havaei

Negar Rostamzadeh

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Quantitative Analysis of Miniature Synaptic Calcium Transients Using Positive Unlabeled Deep Learning

Frédéric Beaupré

Anthony Bilodeau

Theresa Wiesner

Gabriel Leclerc

Mado Lemieux

Gabriel Nadeau

Katrine Castonguay

Bolin Fan

Simon Labrecque

Renée Hložek

Paul De Koninck

Christian Gagné

Flavie Lavoie-Cardinal

Ca 2+ imaging methods are widely used for studying cellular activity in the brain, allowing detailed ana… (voir plus)lysis of dynamic processes across various scales. Enhanced by high-contrast optical microscopy and fluorescent Ca 2+ sensors, this technique can be used to reveal localized Ca 2+ fluctuations within neurons, including in sub-cellular compartments, such as the dendritic shaft or spines. Despite advances in Ca 2+ sensors, the analysis of miniature Synaptic Calcium Transients (mSCTs), characterized by variability in morphology and low signal-to-noise ratios, remains challenging. Traditional threshold-based methods struggle with the detection and segmentation of these small, dynamic events. Deep learning (DL) approaches offer promising solutions but are limited by the need for large annotated datasets. Positive Unlabeled (PU) learning addresses this limitation by leveraging unlabeled instances to increase dataset size and enhance performance. This approach is particularly useful in the case of mSCTs that are scarce and small, associated with a very small proportion of the foreground pixels. PU learning significantly increases the effective size of the training dataset, improving model performance. Here, we present a PU learning-based strategy for detecting and segmenting mSCTs. We evaluate the performance of two 3D deep learning models, StarDist-3D and 3D U-Net, which are well established for the segmentation of small volumetric structures in microscopy datasets. By integrating PU learning, we enhance the 3D U-Net’s performance, demonstrating significant gains over traditional methods. This work pioneers the application of PU learning in Ca 2+ imaging analysis, offering a robust framework for mSCT detection and segmentation. We also demonstrate how this quantitative analysis pipeline can be used for subsequent mSCTs feature analysis. We characterize morphological and kinetic changes of mSCTs associated with the application of chemical long-term potentiation (cLTP) stimulation in cultured rat hippocampal neurons. Our data-driven approach shows that a cLTP-inducing stimulus leads to the emergence of new active dendritic regions and differently affects mSCTs subtypes.

2024-07-07

bioRxiv (prépublication)

doi.org

Randomized Confidence Bounds for Stochastic Partial Monitoring

Maxime Heuillet

Ola Ahmad

Audrey Durand

The partial monitoring (PM) framework provides a theoretical formulation of sequential learning problems with incomplete feedback. On each r… (voir plus)ound, a learning agent plays an action while the environment simultaneously chooses an outcome. The agent then observes a feedback signal that is only partially informative about the (unobserved) outcome. The agent leverages the received feedback signals to select actions that minimize the (unobserved) cumulative loss. In contextual PM, the outcomes depend on some side information that is observable by the agent before selecting the action on each round. In this paper, we consider the contextual and non-contextual PM settings with stochastic outcomes. We introduce a new class of PM strategies based on the randomization of deterministic confidence bounds. We also extend regret guarantees to settings where existing stochastic strategies are not applicable. Our experiments show that the proposed RandCBP and RandCBPsidestar strategies have favorable performance against state-of-the-art baselines in multiple PM games. To advocate for the adoption of the PM framework, we design a use case on the real-world problem of monitoring the error rate of any deployed classification system.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

A Reinforcement Learning Pipeline for Band Gap-directed Crystal Generation

Prashant Govindarajan

Mathieu Reymond

Santiago Miret

Antoine Clavaud

Mariano Phielipp

A. Chandar

Property-driven AI-automated material discovery presents unique challenges owing to the complex nature of the chemical structural space and … (voir plus)computationally expensive simulations. For crystalline solids, the band gap is an important property for designing semiconductors and batteries. However, optimizing crystals for a target band gap is difficult and not well-explored. Reinforcement learning (RL) shows promise towards optimizing crystals, as it can freely explore the chemical space. However, it relies on regular band gap evaluations, which can only be accurately computed through expensive Density Functional Theory (DFT) simulations. In this study, we propose an active learning-inspired pipeline that combines RL and DFT simulations for optimizing crystal compositions given a target band gap. The pipeline includes an RL policy for predicting atom types and a band gap network that is fine-tuned with DFT data. Preliminary results indicate the need for furthering the state-of-the-art to address the inherent challenges of the problem.

2024-07-07

BOKU.ac.at/2024/AI4Mat (poster)

openreview.net

Robust Data-driven Prescriptiveness Optimization

Mehran Poursoltani

Erick Delage

Angelos Georghiou

The abundance of data has led to the emergence of a variety of optimization techniques that attempt to leverage available side information t… (voir plus)o provide more anticipative decisions. The wide range of methods and contexts of application have motivated the design of a universal unitless measure of performance known as the coefficient of prescriptiveness. This coefficient was designed to quantify both the quality of contextual decisions compared to a reference one and the prescriptive power of side information. To identify policies that maximize the former in a data-driven context, this paper introduces a distributionally robust contextual optimization model where the coefficient of prescriptiveness substitutes for the classical empirical risk minimization objective. We present a bisection algorithm to solve this model, which relies on solving a series of linear programs when the distributional ambiguity set has an appropriate nested form and polyhedral structure. Studying a contextual shortest path problem, we evaluate the robustness of the resulting policies against alternative methods when the out-of-sample dataset is subject to varying amounts of distribution shift.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

SelfIE: Self-Interpretation of Large Language Model Embeddings

Haozhe Chen

Carl Vondrick

Chengzhi Mao

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

Simple and Scalable Strategies to Continually Pre-train Large Language Models

Quentin Gregory Anthony

Timothee LESORT

Eugene Belilovsky

Irina Rish

2024-07-07

TMLR (accepté)

doi.org

openreview.net

SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning

Matthias Weissenbacher

Rishabh Agarwal

Yoshinobu Kawahara

An open challenge in reinforcement learning (RL) is the effective deployment of a trained policy to new or slightly different situations as … (voir plus)well as semantically-similar environments. We introduce Symmetry-Invariant Transformer (SiT), a scalable vision transformer (ViT) that leverages both local and global data patterns in a self-supervised manner to improve generalisation. Central to our approach is Graph Symmetric Attention, which refines the traditional self-attention mechanism to preserve graph symmetries, resulting in invariant and equivariant latent representations. We showcase SiT’s superior generalization over ViTs on MiniGrid and Procgen RL benchmarks, and its sample efficiency on Atari 100k and CIFAR10.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

Stealing part of a production language model

Nicholas Carlini

Daniel Paleka

Krishnamurthy Dj Dvijotham

Thomas Steinke

Jonathan Hayase

A. Feder Cooper

Katherine Lee

Matthew Jagielski

Milad Nasr

Arthur Conmy

Eric Wallace

David Rolnick

Florian Tramèr

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like … (voir plus)OpenAI's ChatGPT or Google's PaLM-2. Specifically, our attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \\

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

doi.org

proceedings.mlr.press

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications