Publications

Joint Multimodal Transformer for Emotion Recognition in the Wild

Paul Waligora

Muhammad Haseeb Aslam

Muhammad Osama Zeeshan

Soufiane Belharbi

Alessandro Lameiras Koerich

Marco Pedersoli

Simon Bacon

Eric Granger

Multimodal emotion recognition (MMER) systems typically outperform unimodal systems by leveraging the inter-and intra-modal relationships be… (voir plus)tween, e.g., visual, textual, physiological, and auditory modalities. This paper proposes an MMER method that relies on a joint multi-modal transformer (JMT) for fusion with key-based cross-attention. This framework can exploit the complementary nature of diverse modalities to improve predictive accuracy. Separate backbones capture intra-modal spatiotemporal dependencies within each modality over video sequences. Subsequently, our JMT fusion architecture integrates the individual modality embeddings, allowing the model to effectively capture inter- and intra-modal relationships. Extensive experiments on two challenging expression recognition tasks – (1) dimensional emotion recognition on the Affwild2 dataset (with face and voice) and (2) pain estimation on the Biovid dataset (with face and biosensors) – indicate that our JMT fusion can provide a cost-effective solution for MMER. Empirical results show that MMER systems with our proposed fusion allow us to outperform relevant baseline and state-of-the-art methods. Code is available at: https://github.com/PoloWlg/Joint-Multimodal-Transformer-6th-ABAW

2024-06-17

2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (publié)

doi.org

arxiv.org

Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming

João Loula

Katherine M. Collins

Ulrich Schaechtle

Joshua B. Tenenbaum

Adrian Weller

Feras Saad

Timothy O'Donnell

Vikash Mansinghka

Accurate, efficient generative models of clinical populations could accelerate clinical research and improve patient outcomes. For example, … (voir plus)such models could infer probable treatment outcomes for different subpopulations, generate high-fidelity synthetic data that can be shared across organizational boundaries, and discover new relationships among clinical variables. Using Bayesian structure learning, we show that it is possible to learn probabilistic program models of clinical populations by combining data from multiple, sparsely overlapping clinical datasets. Through experiments with multiple clinical trials and real-world evidence from census health surveys, we show that our model generates higher quality synthetic data than neural network baselines, supports more accurate inferences across datasets than traditional statistical methods, and can be queried more efficiently than both, opening up new avenues for accessible and efficient AI assistance in clinical research.

2024-06-17

ICML.cc/2024/Workshop/AccMLBio (poster)

openreview.net

Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming

João Loula

Katherine M. Collins

Ulrich Schaechtle

Joshua B. Tenenbaum

Adrian Weller

Feras Saad

Timothy O'Donnell

Vikash Mansinghka

Accurate, efficient generative models of clinical populations could accelerate clinical research and improve patient outcomes. For example, … (voir plus)such models could infer probable treatment outcomes for different subpopulations, generate high-fidelity synthetic data that can be shared across organizational boundaries, and discover new relationships among clinical variables. Using Bayesian structure learning, we show that it is possible to learn probabilistic program models of clinical populations by combining data from multiple, sparsely overlapping clinical datasets. Through experiments with multiple clinical trials and real-world evidence from census health surveys, we show that our model generates higher quality synthetic data than neural network baselines, supports more accurate inferences across datasets than traditional statistical methods, and can be queried more efficiently than both, opening up new avenues for accessible and efficient AI assistance in clinical research.

2024-06-17

ICML.cc/2024/Workshop/AccMLBio (poster)

openreview.net

Lost in Translation: The Algorithmic Gap Between LMs and the Brain

Tosato Tommaso

Tikeng Notsawo Pascal Junior

Helbling Saskia

Irina Rish

Guillaume Dumas

Language Models (LMs) have achieved impressive performance on various linguistic tasks, but their relationship to human language processing … (voir plus)in the brain remains unclear. This paper examines the gaps and overlaps between LMs and the brain at different levels of analysis, emphasizing the importance of looking beyond input-output behavior to examine and compare the internal processes of these systems. We discuss how insights from neuroscience, such as sparsity, modularity, internal states, and interactive learning, can inform the development of more biologically plausible language models. Furthermore, we explore the role of scaling laws in bridging the gap between LMs and human cognition, highlighting the need for efficiency constraints analogous to those in biological systems. By developing LMs that more closely mimic brain function, we aim to advance both artificial intelligence and our understanding of human cognition.

2024-06-17

ICML.cc/2024/Workshop/LLMs_and_Cognition (poster)

openreview.net

Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold

Lazar Atanackovic

Xi Zhang

Brandon Amos

Mathieu Blanchette

Leo J Lee

Yoshua Bengio

Alexander Tong

Kirill Neklyudov

Numerous biological and physical processes can be modeled as systems of interacting samples evolving continuously over time, e.g. the dynami… (voir plus)cs of communicating cells or physical particles. Flow-based models allow for learning these dynamics at the population level --- they model the evolution of the entire distribution of samples. However, current flow-based models are limited to a single initial population and a set of predefined conditions which describe different dynamics. We propose

2024-06-17

ICML.cc/2024/Workshop/GRaM (publié)

openreview.net

Neural Ratio Estimators Meet Distributional Shift and Mode Misspecification: A Cautionary Tale from Strong Gravitational Lensing

Andreas Filipp

Yashar Hezaveh

Laurence Perreault-Levasseur

In recent years, there has been increasing interest in the field of astrophysics in applying Neural Ratio Estimators (NREs) to large-scale i… (voir plus)nference problems where both amortization and marginalization over a large number of nuisance parameters are needed. Here, in order to assess the true potential of this method to produce unbiased inference on real data, we investigate the robustness of NREs to distribution shifts and model misspecification in the specific scientific application of the measurement of dark matter population-level parameters using strong gravitational lensing. We investigate the behaviour of a trained NRE for test data presenting distributional shifts inside the bounds of training, as well as out of distribution, both in the linear and non-linear parameters of this problem. While our results show that NREs perform when tested perfectly in distribution, we find that they exhibit significant biases and drawbacks when confronted with slight deviations from the examples seen in the training distribution. This indicates the necessity for caution when applying NREs to real astrophysical data, where underlying distributions are not perfectly known and models do not perfectly reconstruct the true underlying distributions.

2024-06-17

ICML.cc/2024/Workshop/SPIGM (poster)

openreview.net

Neural Ratio Estimators Meet Distributional Shift and Mode Misspecification: A Cautionary Tale from Strong Gravitational Lensing

Andreas Filipp

Yashar Hezaveh

Laurence Perreault-Levasseur

In recent years, there has been increasing interest in the field of astrophysics in applying Neural Ratio Estimators (NREs) to large-scale i… (voir plus)nference problems where both amortization and marginalization over a large number of nuisance parameters are needed. Here, in order to assess the true potential of this method to produce unbiased inference on real data, we investigate the robustness of NREs to distribution shifts and model misspecification in the specific scientific application of the measurement of dark matter population-level parameters using strong gravitational lensing. We investigate the behaviour of a trained NRE for test data presenting distributional shifts inside the bounds of training, as well as out of distribution, both in the linear and non-linear parameters of this problem. While our results show that NREs perform when tested perfectly in distribution, we find that they exhibit significant biases and drawbacks when confronted with slight deviations from the examples seen in the training distribution. This indicates the necessity for caution when applying NREs to real astrophysical data, where underlying distributions are not perfectly known and models do not perfectly reconstruct the true underlying distributions.

2024-06-17

ICML.cc/2024/Workshop/SPIGM (poster)

openreview.net

QGFN: Controllable Greediness with Action Values

Elaine Lau

Stephen Zhewen Lu

Ling Pan

Doina Precup

Emmanuel Bengio

Generative Flow Networks (GFlowNets; GFNs) are a family of reward/energy-based generative methods for combinatorial objects, capable of gene… (voir plus)rating diverse and high-utility samples. However, biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate,

2024-06-17

ICML.cc/2024/Workshop/SPIGM (poster)

doi.org

openreview.net

Reframing linguistic bootstrapping as joint inference using visually-grounded grammar induction models

Eva Portelance

Siva Reddy

Timothy John O'donnell

Semantic and syntactic bootstrapping posit that children use their prior knowledge of one linguistic domain, say syntactic relations, to hel… (voir plus)p later acquire another, such as the meanings of new words. Empirical results supporting both theories may tempt us to believe that these are different learning strategies, where one may precede the other. Here, we argue that they are instead both contingent on a more general learning strategy for language acquisition: joint learning. Using a series of neural visually-grounded grammar induction models, we demonstrate that both syntactic and semantic bootstrapping effects are strongest when syntax and semantics are learnt simultaneously. Joint learning results in better grammar induction, realistic lexical category learning, and better interpretations of novel sentence and verb meanings. Joint learning makes language acquisition easier for learners by mutually constraining the hypotheses spaces for both syntax and semantics. Studying the dynamics of joint inference over many input sources and modalities represents an important new direction for language modeling and learning research in both cognitive sciences and AI, as it may help us explain how language can be acquired in more constrained learning settings.

2024-06-17

ArXiv (prépublication)

doi.org

arxiv.org

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Joao Monteiro

Pierre-Andre Noel

Étienne Marcotte

Sai Rajeswar

Valentina Zantedeschi

David Vazquez

Nicolas Chapados

Chris Pal

Perouz Taslakian

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includ… (voir plus)es encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa.

2024-06-17

ArXiv (prépublication)

doi.org

arxiv.org

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

Joao Monteiro

Pierre-Andre Noel

Étienne Marcotte

Sai Rajeswar

Valentina Zantedeschi

David Vazquez

Nicolas Chapados

Chris Pal

Perouz Taslakian

Large Language Models (LLMs) are trained on vast amounts of data, most of which is automatically scraped from the internet. This data includ… (voir plus)es encyclopedic documents that harbor a vast amount of general knowledge (e.g., Wikipedia) but also potentially overlap with benchmark datasets used for evaluating LLMs. Consequently, evaluating models on test splits that might have leaked into the training set is prone to misleading conclusions. To foster sound evaluation of language models, we introduce a new test dataset named RepLiQA, suited for question-answering and topic retrieval tasks. RepLiQA is a collection of five splits of test sets, four of which have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA comprises (1) a reference document crafted by a human annotator and depicting an imaginary scenario (e.g., a news article) absent from the internet; (2) a question about the document's topic; (3) a ground-truth answer derived directly from the information in the document; and (4) the paragraph extracted from the reference document containing the answer. As such, accurate answers can only be generated if a model can find relevant content within the provided document. We run a large-scale benchmark comprising several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting. Released splits of RepLiQA can be found here: https://huggingface.co/datasets/ServiceNow/repliqa.

2024-06-17

ArXiv (prépublication)

doi.org

arxiv.org

Revisiting Successor Features for Inverse Reinforcement Learning

Arnav Kumar Jain

Harley Wiltzer

Jesse Farebrother

Irina Rish

Glen Berseth

Sanjiban Choudhury

2024-06-17

ICML.cc/2024/Workshop/MFHAIA (poster)

openreview.net

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications