VulEXplaineR: XAI for Vulnerability Detection on Assembly Code
Samaneh Mahdavifar
Mohd Saqib
Philippe Charland
Andrew Walenstein
What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models
Emily M. Bender
Jeongrok Yu
Timnit Gebru
Seong Ug Kim
Angelina McMillan-642
Jacob Choi
Jinho D. Choi
Su Lin Blodgett
Solon Barocas
Hal Daumé III
Gilsinia Lopez
Robert Sim
Hanna Wallach. 2021
Stereotyp-657
Bias is a disproportionate prejudice in favor of one side against another. Due to the success of transformer-based Masked Language Models (M… (see more)LMs) and their impact on many NLP tasks, a systematic evaluation of bias in these models is needed more than ever. While many studies have evaluated gender bias in English MLMs, only a few works have been conducted for the task in other languages. This paper proposes a multilingual approach to estimate gender bias in MLMs from 5 languages: Chinese, English, German, Portuguese, and Spanish. Unlike previous work, our approach does not depend on parallel corpora coupled with English to detect gender bias in other languages using multilingual lexicons. Moreover, a novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias, compared to the traditional lexicon-based method. For each language, both the lexicon-based and model-based methods are applied to create two datasets respectively, which are used to evaluate gender bias in an MLM specifically trained for that language using one existing and 3 new scoring metrics. Our results show that the previous approach is data-sensitive and not stable as it does not remove contextual dependencies irrelevant to gender. In fact, the results often flip when different scoring metrics are used on the same dataset, suggesting that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice.
Winning the 2023 CityLearn Challenge: A Community-Based Hierarchical Energy Systems Coordination Algorithm
Andoni I. Garmendia
Francesco Morri
Hélène Le Cadre
. The effective management and control of building energy systems are crucial for reducing the energy consumption peak loads, CO 2 emissions… (see more), and ensuring the stability of the power grid, while maintaining optimal comfort levels within buildings. The difficulty to accommodate this trade-off is amplified by dynamic environmental conditions and the need for scalable solutions that can adapt across various building types and geographic locations. Acknowledging the importance of this problem, NeurIPS conference hosted since 2020 the CityLearn control challenge to foster the design of innovative solutions in building energy management. Participants were tasked with developing strategies that not only enhance energy efficiency but also prioritize sustainability and occupant comfort. This paper introduces the Community-based Hierarchical Energy Systems Co-ordination Algorithm ( CHESCA ), the winning approach of the 2023 edition. We rely on a hierarchical approach adaptable to an arbitrary number of buildings, first optimizing building-level metrics individually, and later refining these through a central community-level controller to improve grid-related metrics. Compared to the other high-ranked competitors, our approach demonstrated fast inference capabilities like learning-based methods, while offering a better interpretability and a superior generalization capabilities with minimal data requirements. This paper details our approach, supported by comprehensive experimental results and ablation studies.
Winning the 2023 CityLearn Challenge: A Community-Based Hierarchical Energy Systems Coordination Algorithm
Andoni I. Garmendia
Francesco Morri
Hélène Le Cadre
. The effective management and control of building energy systems are crucial for reducing the energy consumption peak loads, CO 2 emissions… (see more), and ensuring the stability of the power grid, while maintaining optimal comfort levels within buildings. The difficulty to accommodate this trade-off is amplified by dynamic environmental conditions and the need for scalable solutions that can adapt across various building types and geographic locations. Acknowledging the importance of this problem, NeurIPS conference hosted since 2020 the CityLearn control challenge to foster the design of innovative solutions in building energy management. Participants were tasked with developing strategies that not only enhance energy efficiency but also prioritize sustainability and occupant comfort. This paper introduces the Community-based Hierarchical Energy Systems Co-ordination Algorithm ( CHESCA ), the winning approach of the 2023 edition. We rely on a hierarchical approach adaptable to an arbitrary number of buildings, first optimizing building-level metrics individually, and later refining these through a central community-level controller to improve grid-related metrics. Compared to the other high-ranked competitors, our approach demonstrated fast inference capabilities like learning-based methods, while offering a better interpretability and a superior generalization capabilities with minimal data requirements. This paper details our approach, supported by comprehensive experimental results and ablation studies.
Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
Pablo Pernias
Dominic Rampas
Mats Leon Richter
Marc Aubreville
Würstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models
Pablo Pernias
Dominic Rampas
Mats Leon Richter
Marc Aubreville
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference
Jo˜ao Monteiro
Étienne Marcotte
Pierre-Andre Noel
Valentina Zantedeschi
David Vazquez
Perouz Taslakian
In-context learning (ICL) approaches typically leverage prompting to condition decoder-only language model generation on reference informati… (see more)on. Just-in-time processing of a context is inefficient due to the quadratic cost of self-attention operations, and caching is desirable. However, caching transformer states can easily require almost as much space as the model parameters. When the right context isn't known in advance, caching ICL can be challenging. This work addresses these limitations by introducing models that, inspired by the encoder-decoder architecture, use cross-attention to condition generation on reference text without the prompt. More precisely, we leverage pre-trained decoder-only models and only train a small number of added layers. We use Question-Answering (QA) as a testbed to evaluate the ability of our models to perform conditional generation and observe that they outperform ICL, are comparable to fine-tuned prompted LLMs, and drastically reduce the space footprint relative to standard KV caching by two orders of magnitude.
Zero-shot Logical Query Reasoning on any Knowledge Graph
Mikhail Galkin
Jincheng Zhou
Bruno Ribeiro
Zhaocheng Zhu
Complex logical query answering (CLQA) in knowledge graphs (KGs) goes beyond simple KG completion and aims at answering compositional querie… (see more)s comprised of multiple projections and logical operations. Existing CLQA methods that learn parameters bound to certain entity or relation vocabularies can only be applied to the graph they are trained on which requires substantial training time before being deployed on a new graph. Here we present UltraQuery, an inductive reasoning model that can zero-shot answer logical queries on any KG. The core idea of UltraQuery is to derive both projections and logical operations as vocabulary-independent functions which generalize to new entities and relations in any KG. With the projection operation initialized from a pre-trained inductive KG reasoning model, UltraQuery can solve CLQA on any KG even if it is only finetuned on a single dataset. Experimenting on 23 datasets, UltraQuery in the zero-shot inference mode shows competitive or better query answering performance than best available baselines and sets a new state of the art on 14 of them.
Penalties and Rewards for Fair Learning in Paired Kidney Exchange Programs
Alison Caulfield
Yi Lin
Adrian Vetta
A kidney exchange program, also called a kidney paired donation program, can be viewed as a repeated, dynamic trading and allocation mechani… (see more)sm. This suggests that a dynamic algorithm for transplant exchange selection may have superior performance in comparison to the repeated use of a static algorithm. We confirm this hypothesis using a full scale simulation of the Canadian Kidney Paired Donation Program: learning algorithms, that attempt to learn optimal patient-donor weights in advance via dynamic simulations, do lead to improved outcomes. Specifically, our learning algorithms, designed with the objective of fairness (that is, equity in terms of transplant accessibility across cPRA groups), also lead to an increased number of transplants and shorter average waiting times. Indeed, our highest performing learning algorithm improves egalitarian fairness by 10% whilst also increasing the number of transplants by 6% and decreasing waiting times by 24%. However, our main result is much more surprising. We find that the most critical factor in determining the performance of a kidney exchange program is not the judicious assignment of positive weights (rewards) to patient-donor pairs. Rather, the key factor in increasing the number of transplants, decreasing waiting times and improving group fairness is the judicious assignment of a negative weight (penalty) to the small number of non-directed donors in the kidney exchange program.
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
Thomas Krendl Gilbert
Jérémy Scheurer
Javier Rando
Rachel Freedman
Tomasz Korbak
David Lindner
Pedro Freire
Tony Tong Wang
Samuel Marks
Charbel-Raphael Segerie
Micah Carroll
Andi Peng
Phillip Christoffersen
Mehul Damani
Stewart Slocum
Usman Anwar
Anand Siththaranjan … (see 12 more)
Max Nadeau
Eric J Michaud
Jacob Pfau
Dmitrii Krasheninnikov
Xin Chen
Lauro Langosco
Peter Hase
Erdem Biyik
Anca Dragan
Dorsa Sadigh
Dylan Hadfield-Menell
Latent Idiom Recognition for a Minimalist Functional Array Language Using Equality Saturation
Jonathan Van Der Cruysse
Accelerating programs is typically done by recognizing code idioms matching high-performance libraries or hardware interfaces. However, reco… (see more)gnizing such idioms automatically is challenging. The idiom recognition machinery is difficult to write and requires expert knowledge. In addition, slight variations in the input program might hide the idiom and defeat the recognizer. This paper advocates for the use of a minimalist functional array language supporting a small, but expressive, set of operators. The minimalist design leads to a tiny sets of rewrite rules, which encode the language semantics. Crucially, the same minimalist language is also used to encode idioms. This removes the need for hand-crafted analysis passes, or for having to learn a complex domain-specific language to define the idioms. Coupled with equality saturation, this approach is able to match the core functions from the BLAS and PyTorch libraries on a set of computational kernels. Compared to reference C kernel implementations, the approach produces a geometric mean speedup of 1.46× for C programs using BLAS, when generating such programs from the high-level minimalist language.
Performance reserves in brain-imaging-based phenotype prediction
Marc-Andre Schulz
Stefan Haufe
John-Dylan Haynes
Kerstin Ritter
Machine learning studies have shown that various phenotypes can be predicted from structural and functional brain images. However, in most s… (see more)uch studies, prediction performance ranged from moderate to disappointing. It is unclear whether prediction performance will substantially improve with larger sample sizes or whether insufficient predictive information in brain images impedes further progress. Here, we systematically assess the effect of sample size on prediction performance using sample sizes far beyond what is possible in common neuroimaging studies. We project 3-9 fold improvements in prediction performance for behavioral and mental health phenotypes when moving from one thousand to one million samples. Moreover, we find that moving from single imaging modalities to multimodal input data can lead to further improvements in prediction performance, often on par with doubling the sample size. Our analyses reveal considerable performance reserves for neuroimaging-based phenotype prediction. Machine learning models may benefit much more from extremely large neuroimaging datasets than currently believed.