Publications

Simultaneous linear connectivity of neural networks modulo permutation

Ekansh Sharma

Devin Kwok

Tom Denton

Daniel M. Roy

David Rolnick

Gintare Karolina Dziugaite

2024-04-09

ArXiv (prépublication)

arxiv.org

Structure-function coupling and decoupling during movie-watching and resting-state: Novel insights bridging EEG and structural imaging

Venkatesh Subramani

Giulia Lioi

Karim Jerbi

Nicolas Farrugia

2024-04-09

bioRxiv (prépublication)

doi.org

What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models

Emily M. Bender

Timnit Gebru

Angelina McMillan-642

Su Lin Blodgett

Solon Barocas

Hal Daumé III

Gilsinia Lopez

Alexandra Olteanu

Robert Sim

Hanna Wallach. 2021

Stereotyp-657

Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (M… (voir plus)LMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper proposes a multilingual approach to estimate gender bias in MLMs from 5 languages: Chinese, English, German, Portuguese, and Spanish. Unlike previous work, our approach does not depend on parallel corpora coupled with English to detect gender bias in other languages using multilingual lexicons. Moreover, a novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias, compared to the traditional lexicon-based method. For each language, both the lexicon-based and model-based methods are applied to create two datasets respectively, which are used to evaluate gender bias in an MLM specifically trained for that language using one existing and 3 new scoring metrics. Our results show that the previous approach is data-sensitive and not stable as it does not remove contextual dependencies irrelevant to gender. In fact, the results often flip when different scoring metrics are used on the same dataset, suggesting that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice.

2024-04-09

ArXiv (prépublication)

arxiv.org

Evaluating Interventional Reasoning Capabilities of Large Language Models

Tejas Kasetty

Divyat Mahajan

Gintare Karolina Dziugaite

Alexandre Drouin

Dhanya Sridhar

Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consid… (voir plus)er using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. Our analysis on four LLMs highlights that while GPT- 4 models show promising accuracy at predicting the intervention effects, they remain sensitive to distracting factors in the prompts.

2024-04-08

ArXiv (prépublication)

arxiv.org

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

Andrew Holliday

A. El-geneidy

Gregory Dudek

2024-04-08

ArXiv (prépublication)

arxiv.org

Learning Minimal NAP Specifications for Neural Network Verification

Chuqin Geng

Zhaoyue Wang

Haolin Ye

Saifei Liao

Xujie Si

Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represe… (voir plus)nted as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.

2024-04-06

ArXiv (prépublication)

arxiv.org

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

Chuqin Geng

Haolin Ye

Yihan Zhang

Brigitte Pientka

Xujie Si

Computing differences between tree-structured data is a critical but challenging problem in software analysis. In this paper, we propose a n… (voir plus)ovel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. By encoding the necessary transformations from the source tree to the target tree, SatDiff generates correct, minimal, and type safe low-level edit scripts with formal guarantees. We then synthesize concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order. Our empirical results demonstrate that SatDiff outperforms existing heuristic-based approaches by a significant margin in terms of conciseness while maintaining a reasonable runtime.

2024-04-06

ArXiv (prépublication)

arxiv.org

PopulAtion Parameter Averaging (PAPA)

Alexia Jolicoeur-Martineau

Emy Gervais

Kilian FATRAS

Yan Zhang

Simon Lacoste-Julien

2024-04-05

TMLR (accepté)

doi.org

openreview.net

Regulating advanced artificial agents

Michael K. Cohen

Noam Kolt

Yoshua Bengio

Gillian K. Hadfield

Stuart Russell

2024-04-05

Science (publié)

doi.org

Scope Ambiguities in Large Language Models

Gaurav Kamath

Sebastian Schuster

Sowmya Vajjala

Siva Reddy

2024-04-05

ArXiv (prépublication)

arxiv.org

Applying Recurrent Neural Networks and Blocked Cross-Validation to Model Conventional Drinking Water Treatment Processes

Aleksandar Jakovljevic

Laurent Charlin

Benoit Barbeau

The jar test is the current standard method for predicting the performance of a conventional drinking water treatment (DWT) process and opti… (voir plus)mizing the coagulant dose. This test is time-consuming and requires human intervention, meaning it is infeasible for making continuous process predictions. As a potential alternative, we developed a machine learning (ML) model from historical DWT plant data that can operate continuously using real-time sensor data without human intervention for predicting clarified water turbidity 15 min in advance. We evaluated three types of models: multilayer perceptron (MLP), the long short-term memory (LSTM) recurrent neural network (RNN), and the gated recurrent unit (GRU) RNN. We also employed two training methodologies: the commonly used holdout method and the theoretically correct blocked cross-validation (BCV) method. We found that the RNN with GRU was the best model type overall and achieved a mean absolute error on an independent production set of as low as 0.044 NTU. We further found that models trained using BCV typically achieve errors equal to or lower than their counterparts trained using holdout. These results suggest that RNNs trained using BCV are superior for the development of ML models for DWT processes compared to those reported in earlier literature.

2024-04-04

Water (publié)

doi.org

Assessing the emergence time of SARS-CoV-2 zoonotic spillover

Stéphane Samson

Étienne Lord

Vladimir Makarenkov

2024-04-04

PLoS ONE (publié)

doi.org

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Publications

La recherche en IA au service du monde réel

Boussole des politiques en IA

Vie étudiante et ressources

Mots-clés populaires:

Publications