Publications

Dev2vec: Representing Domain Expertise of Developers in an Embedding Space

Arghavan Moradi Dakhel

Michel C. Desmarais

Foutse Khomh

2023-06-30

Information and Software Technology (published)

doi.org

arxiv.org

A double-oracle, logic-based Benders decomposition approach to solve the K-adaptability problem

A. Ghahtarani

A. Saif

A. Ghasemi

Erick Delage

2023-06-30

Computers & Operations Research (published)

doi.org

arxiv.org

FairPrism: Evaluating Fairness-Related Harms in Text Generation

Eve Fleisig

Aubrie Amstutz

Chad Atalla

Su Lin Blodgett

Hal Daumé III

A.R. Olteanu

Emily Sheng

Dan Vann

Hanna Wallach

It is critical to measure and mitigate fairness-related harms caused by AI text generation systems, including stereotyping and demeaning har… (see more)ms. To that end, we introduce FairPrism, a dataset of 5,000 examples of AI-generated English text with detailed human annotations covering a diverse set of harms relating to gender and sexuality. FairPrism aims to address several limitations of existing datasets for measuring and mitigating fairness-related harms, including improved transparency, clearer specification of dataset coverage, and accounting for annotator disagreement and harms that are context-dependent. FairPrism’s annotations include the extent of stereotyping and demeaning harms, the demographic groups targeted, and appropriateness for different applications. The annotations also include specific harms that occur in interactive contexts and harms that raise normative concerns when the “speaker” is an AI system. Due to its precision and granularity, FairPrism can be used to diagnose (1) the types of fairness-related harms that AI text generation systems cause, and (2) the potential limitations of mitigation methods, both of which we illustrate through case studies. Finally, the process we followed to develop FairPrism offers a recipe for building improved datasets for measuring and mitigating harms caused by AI systems.

2023-06-30

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

doi.org

Reinforcement Learning-Based Adaptive Feature Boosting for Smart Grid Intrusion Detection

Chengming Hu

Jun Yan

Xue Liu

Intrusion detection systems (IDSs) are crucial in the security monitoring for the smart grid with increasing machine-to-machine communicatio… (see more)ns and cyber threats thereafter. However, the multi-sourced, correlated, and heterogeneous smart grid data pose significant challenges to the accurate attack detection by IDSs. To improve the attack detection, this paper proposes Reinforcement Learning-based Adaptive Feature Boosting, which aims to leverage a series of AutoEncoders (AEs) to capture critical features from the multi-sourced smart grid data for the classification of normal, fault, and attack events. Multiple AEs are utilized to extract representative features from different feature sets that are automatically generated through a weighted feature sampling process; each AE-extracted feature set is then applied to build a Random Forest (RF) base classifier. In the feature sampling process, Deep Deterministic Policy Gradient (DDPG) is introduced to dynamically determine the feature sampling probability based on the classification accuracy. The critical features that improve the classification accuracy are assigned larger sampling probabilities and increasingly participate in the training of next AE. The presence of critical features is increased in the event classification over the multi-sourced smart grid data. Considering potential different alarms among base classifiers, an ensemble classifier is further built to distinguish normal, fault, and attack events. Our proposed approach is evaluated on the two realistic datasets collected from Hardware-In-the-Loop (HIL) and WUSTIL-IIOT-2021 security testbeds, respectively. The evaluation on the HIL security dataset shows that our proposed approach achieves the classification accuracy with 97.28%, an effective 5.5% increase over the vanilla Adaptive Feature Boosting. Moreover, the proposed approach not only accurately and stably selects critical features on the WUSTIL-IIOT-2021 dataset based on the significant difference of feature sampling probabilities between critical and uncritical features, i.e., the probabilities greater than 0.08 and less than 0.01, but also outperforms the other best-performing approaches with the increasing Matthew Correlation Coefficient (MCC) of 8.03%.

2023-06-30

IEEE Transactions on Smart Grid (published)

doi.org

SkillQG: Learning to Generate Question for Reading Comprehension Assessment

Xiaoqiang Wang

Bang Liu

Siliang Tang

Lingfei Wu

2023-06-30

Findings of the Association for Computational Linguistics: ACL 2023 (published)

doi.org

arxiv.org

Studying the challenges of developing hardware description language programs

Fatemeh Yousefifeshki

Heng Li

Foutse Khomh

2023-06-30

Information and Software Technology (published)

doi.org

EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models

Xinlin Li

Mariana Parazeres

Adam Oberman

Alireza Ghaffari

Masoud Asgharian

Vahid Nia

2023-06-29

SN Computer Science (published)

doi.org

arxiv.org

Reference panel-guided super-resolution inference of Hi-C data

Yanlin Zhang

Mathieu Blanchette

Abstract Motivation Accurately assessing contacts between DNA fragments inside the nucleus with Hi-C experiment is crucial for understanding… (see more) the role of 3D genome organization in gene regulation. This challenging task is due in part to the high sequencing depth of Hi-C libraries required to support high-resolution analyses. Most existing Hi-C data are collected with limited sequencing coverage, leading to poor chromatin interaction frequency estimation. Current computational approaches to enhance Hi-C signals focus on the analysis of individual Hi-C datasets of interest, without taking advantage of the facts that (i) several hundred Hi-C contact maps are publicly available and (ii) the vast majority of local spatial organizations are conserved across multiple cell types. Results Here, we present RefHiC-SR, an attention-based deep learning framework that uses a reference panel of Hi-C datasets to facilitate the enhancement of Hi-C data resolution of a given study sample. We compare RefHiC-SR against tools that do not use reference samples and find that RefHiC-SR outperforms other programs across different cell types, and sequencing depths. It also enables high-accuracy mapping of structures such as loops and topologically associating domains. Availability and implementation https://github.com/BlanchetteLab/RefHiC.

2023-06-29

Bioinformatics (published)

doi.org

Should We Feed the Trolls? Using Marketer-Generated Content to Explain Average Toxicity and Product Usage

Marcelo Vinhal Nepomuceno

Hooman Rahemi

Tolga Cenesizoglu

Laurent Charlin

2023-06-28

Journal of Interactive Marketing (published)

doi.org

Cortico-Cerebellar neurodynamics during social interaction in Autism Spectrum Disorders

Fleur Gaudfernau

Aline Lefebvre

Denis-Alexander Engemann

Amandine Pedoux

Anna Bánki

Florence Baillin

Benjamin Landman

Frederique Amsellem

Anna Maruani

Thomas Bourgeron

Richard Delorme

Guillaume Dumas

2023-06-27

NeuroImage : Clinical (published)

doi.org

Pixelated Reconstruction of Foreground Density and Background Surface Brightness in Gravitational Lensing Systems using Recurrent Inference Machines

Alexandre Adam

Laurence Perreault-Levasseur

Yashar Hezaveh

MAX WELLING

Modeling strong gravitational lenses in order to quantify the distortions in the images of background sources and to reconstruct the mass de… (see more)nsity in the foreground lenses has been a difficult computational challenge. As the quality of gravitational lens images increases, the task of fully exploiting the information they contain becomes computationally and algorithmically more difficult. In this work, we use a neural network based on the Recurrent Inference Machine (RIM) to simultaneously reconstruct an undistorted image of the background source and the lens mass density distribution as pixelated maps. The method iteratively reconstructs the model parameters (the image of the source and a pixelated density map) by learning the process of optimizing the likelihood given the data using the physical model (a ray-tracing simulation), regularized by a prior implicitly learned by the neural network through its training data. When compared to more traditional parametric models, the proposed method is significantly more expressive and can reconstruct complex mass distributions, which we demonstrate by using realistic lensing galaxies taken from the IllustrisTNG cosmological hydrodynamic simulation.

2023-06-26

The Astrophysical Journal (published)

doi.org

arxiv.org

Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization for Heterogeneous Representational Coarseness

Dianbo Liu

Alex Lamb

Xu Ji

Pascal Junior Tikeng Notsawo

Michael Mozer

Yoshua Bengio

Kenji Kawaguchi

Vector Quantization (VQ) is a method for discretizing latent representations and has become a major part of the deep learning toolkit. It ha… (see more)s been theoretically and empirically shown that discretization of representations leads to improved generalization, including in reinforcement learning where discretization can be used to bottleneck multi-agent communication to promote agent specialization and robustness. The discretization tightness of most VQ-based methods is defined by the number of discrete codes in the representation vector and the codebook size, which are fixed as hyperparameters. In this work, we propose learning to dynamically select discretization tightness conditioned on inputs, based on the hypothesis that data naturally contains variations in complexity that call for different levels of representational coarseness which is observed in many heterogeneous data sets. We show that dynamically varying tightness in communication bottlenecks can improve model performance on visual reasoning and reinforcement learning tasks with heterogeneity in representations.

2023-06-25

Proceedings of the AAAI Conference on Artificial Intelligence (published)

doi.org

arxiv.org

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Publications

Mila Techaide 2026

Venture Scientist Bootcamp

AI Advantage: Productivity in Public Service

Popular keywords:

Publications