Publications

Evaluating the transferability potential of deep learning models for climate downscaling

Ayush Prasad

Qidong Yang

Prasanna Sattegeri

D. Szwarcman

Campbell Watson

Climate downscaling, the process of generating high-resolution climate data from low-resolution simulations, is essential for understanding … (see more)and adapting to climate change at regional and local scales. Deep learning approaches have proven useful in tackling this problem. However, existing studies usually focus on training models for one specific task, location and variable, which are therefore limited in their generalizability and transferability. In this paper, we evaluate the efficacy of training deep learning downscaling models on multiple diverse climate datasets to learn more robust and transferable representations. We evaluate the effectiveness of architectures zero-shot transferability using CNNs, Fourier Neural Operators (FNOs), and vision Transformers (ViTs). We assess the spatial, variable, and product transferability of downscaling models experimentally, to understand the generalizability of these different architecture types.

2024-07-17

ArXiv (preprint)

doi.org

arxiv.org

Evaluating the transferability potential of deep learning models for climate downscaling

Ayush Prasad

Paula Harder

Qidong Yang

Prasanna Sattegeri

Daniela Szwarcman

Campbell Watson

David Rolnick

Climate downscaling, the process of generating high-resolution climate data from low-resolution simulations, is essential for understanding … (see more)and adapting to climate change at regional and local scales. Deep learning approaches have proven useful in tackling this problem. However, existing studies usually focus on training models for one specific task, location and variable, which are therefore limited in their generalizability and transferability. In this paper, we evaluate the efficacy of training deep learning downscaling models on multiple diverse climate datasets to learn more robust and transferable representations. We evaluate the effectiveness of architectures zero-shot transferability using CNNs, Fourier Neural Operators (FNOs), and vision Transformers (ViTs). We assess the spatial, variable, and product transferability of downscaling models experimentally, to understand the generalizability of these different architecture types.

2024-07-17

ArXiv (preprint)

doi.org

arxiv.org

Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models

Ayush Kaushal

Tejas Pandey

Tejas Vaidhya

Aaryan Bhagat

Irina Rish

2024-07-17

ArXiv (preprint)

doi.org

arxiv.org

Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale

Ayush Kaushal

Tejas Pandey

Tejas Vaidhya

Aaryan Bhagat

Irina Rish

2024-07-17

ArXiv (preprint)

arxiv.org

Spectra: Surprising Effectiveness of Pretraining Ternary Language Models at Scale

Ayush Kaushal

Tejas Pandey

Tejas Vaidhya

Arnab Kumar Mondal

Aaryan Bhagat

Irina Rish

2024-07-17

ArXiv (preprint)

arxiv.org

Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild

Nicolas Richet

Soufiane Belharbi

Muhammad Haseeb Aslam

Meike Emilie Schadt

Manuela Gonz'alez-Gonz'alez

Gustave Cortal

Alessandro Lameiras Koerich

Marco Pedersoli

Alain Finkel

Simon Bacon

Eric Granger

Systems for multimodal emotion recognition (ER) are commonly trained to extract features from different modalities (e.g., visual, audio, and… (see more) textual) that are combined to predict individual basic emotions. However, compound emotions often occur in real-world scenarios, and the uncertainty of recognizing such complex emotions over diverse modalities is challenging for feature-based models. As an alternative, emerging large language models (LLMs) like BERT and LLaMA can rely on explicit non-verbal cues that may be translated from different non-textual modalities (e.g., audio and visual) into text. Textualization of modalities augments data with emotional cues to help the LLM encode the interconnections between all modalities in a shared text space. In such text-based models, prior knowledge of ER tasks is leveraged to textualize relevant non-verbal cues such as audio tone from vocal expressions, and action unit intensity from facial expressions. Since the pre-trained weights are publicly available for many LLMs, training on large-scale datasets is unnecessary, allowing to fine-tune for downstream tasks such as compound ER (CER). This paper compares the potential of text- and feature-based approaches for compound multimodal ER in videos. Experiments were conducted on the challenging C-EXPR-DB dataset in the wild for CER, and contrasted with results on the MELD dataset for basic ER. Our results indicate that multimodal textualization provides lower accuracy than feature-based models on C-EXPR-DB, where text transcripts are captured in the wild. However, higher accuracy can be achieved when the video data has rich transcripts. Our code is available.

2024-07-17

ArXiv (preprint)

doi.org

arxiv.org

UTG: Towards a Unified View of Snapshot and Event Based Models for Temporal Graphs

Emanuele Rossi

Many real world graphs are inherently dynamic, constantly evolving with node and edge additions. These graphs can be represented by temporal… (see more) graphs, either through a stream of edge events or a sequence of graph snapshots. Until now, the development of machine learning methods for both types has occurred largely in isolation, resulting in limited experimental comparison and theoretical crosspollination between the two. In this paper, we introduce Unified Temporal Graph (UTG), a framework that unifies snapshot-based and event-based machine learning models under a single umbrella, enabling models developed for one representation to be applied effectively to datasets of the other. We also propose a novel UTG training procedure to boost the performance of snapshot-based models in the streaming setting. We comprehensively evaluate both snapshot and event-based models across both types of temporal graphs on the temporal link prediction task. Our main findings are threefold: first, when combined with UTG training, snapshot-based models can perform competitively with event-based models such as TGN and GraphMixer even on event datasets. Second, snapshot-based models are at least an order of magnitude faster than most event-based models during inference. Third, while event-based methods such as NAT and DyGFormer outperforms snapshot-based methods on both types of temporal graphs, this is because they leverage joint neighborhood structural features thus emphasizing the potential to incorporate these features into snapshotbased models as well. These findings highlight the importance of comparing model architectures independent of the data format and suggest the potential of combining the efficiency of snapshot-based models with the performance of event-based models in the future.

2024-07-17

ArXiv (preprint)

doi.org

arxiv.org

When can transformers compositionally generalize in-context?

Seijin Kobayashi

Simon Schug

Yassir Akram

Florian Redhardt

Johannes Von Oswald

Razvan Pascanu

Guillaume Lajoie

João Sacramento

Many tasks can be composed from a few independent components. This gives rise to a combinatorial explosion of possible tasks, only some of w… (see more)hich might be encountered during training. Under what circumstances can transformers compositionally generalize from a subset of tasks to all possible combinations of tasks that share similar components? Here we study a modular multitask setting that allows us to precisely control compositional structure in the data generation process. We present evidence that transformers learning in-context struggle to generalize compositionally on this task despite being in principle expressive enough to do so. Compositional generalization becomes possible only when introducing a bottleneck that enforces an explicit separation between task inference and task execution.

2024-07-17

ArXiv (preprint)

doi.org

arxiv.org

scSemiProfiler: Advancing Large-scale Single-cell Studies through Semi-profiling with Deep Generative Models and Active Learning

Jingtao Wang

Gregory Fonseca

Jun Ding

2024-07-16

Nature Communications (published)

doi.org

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

Niloofar Mireshghallah

Maria Antoniak

Yash More

Yejin Choi

Golnoosh Farnadi

Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate pr… (see more)ivacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.

2024-07-16

ArXiv (preprint)

doi.org

arxiv.org

Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild

Niloofar Mireshghallah

Maria Antoniak

Yash More

Yejin Choi

Golnoosh Farnadi

Measuring personal disclosures made in human-chatbot interactions can provide a better understanding of users' AI literacy and facilitate pr… (see more)ivacy research for large language models (LLMs). We run an extensive, fine-grained analysis on the personal disclosures made by real users to commercial GPT models, investigating the leakage of personally identifiable and sensitive information. To understand the contexts in which users disclose to chatbots, we develop a taxonomy of tasks and sensitive topics, based on qualitative and quantitative analysis of naturally occurring conversations. We discuss these potential privacy harms and observe that: (1) personally identifiable information (PII) appears in unexpected contexts such as in translation or code editing (48% and 16% of the time, respectively) and (2) PII detection alone is insufficient to capture the sensitive topics that are common in human-chatbot interactions, such as detailed sexual preferences or specific drug use habits. We believe that these high disclosure rates are of significant importance for researchers and data curators, and we call for the design of appropriate nudging mechanisms to help users moderate their interactions.

2024-07-16

ArXiv (preprint)

doi.org

arxiv.org

A benchmark of individual auto-regressive models in a massive fMRI dataset

Fraçois Paugam

Basile Pinsard

Guillaume Lajoie

Pierre Bellec

Lune Bellec

Dense functional magnetic resonance imaging datasets open new avenues to create auto-regressive models of brain activity. Individual idiosyn… (see more)crasies are obscured by group models, but can be captured by purely individual models given sufficient amounts of training data. In this study, we compared several deep and shallow individual models on the temporal auto-regression of BOLD time series recorded during a natural video watching task. The best performing models were then analyzed in terms of their data requirements and scaling, subject specificity and the space-time structure of their predicted dynamics. We found the Chebnets, a type of graph convolutional neural network, to be best suited for temporal BOLD auto-regression, closely followed by linear models. Chebnets demonstrated an increase in performance with increasing amounts of data, with no complete saturation at 9 h of training data. Good generalization to other kinds of video stimuli and to resting state data marked the Chebnets’ ability to capture intrinsic brain dynamics rather than only stimulus-specific autocorrelation patterns. Significant subject specificity was found at short prediction time lags. The Chebnets were found to capture lower frequencies at longer prediction time lags, and the spatial correlations in predicted dynamics were found to match traditional functional connectivity networks. Overall, these results demonstrate that large individual fMRI datasets can be used to efficiently train purely individual auto-regressive models of brain activity, and that massive amounts of individual data are required to do so. The excellent performance of the Chebnets likely reflects their ability to combine spatial and temporal interactions on large time scales at a low complexity cost. The non-linearities of the models did not appear as a key advantage. In fact, surprisingly, linear versions of the Chebnets appeared to outperform the original nonlinear ones. Individual temporal auto-regressive models have the potential to improve the predictability of the BOLD signal. This study is based on a massive, publicly-available dataset, which can serve for future benchmarks of individual auto-regressive modeling.

2024-07-15

Imaging Neuroscience (published)

doi.org

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

Hugo Larochelle appointed Scientific Director of Mila

Publications

Leveraging AI for a Sustainable Future

Mila AI Policy Fellowship

Hugo Larochelle appointed Scientific Director of Mila

Popular keywords:

Publications