Publications

Self-evaluation and self-prompting to improve the reliability of LLMs

Alexandre Piché

Aristides Milios

Dzmitry Bahdanau

Christopher Pal

In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (see more)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.

2024-03-03

ICLR.cc/2024/Workshop/SeT_LLM (published)

Structure-Informed Protein Language Model

Zuobai Zhang

Jiarui Lu

Vijil Chenthamarakshan

Aurelie Lozano

Payel Das

Jian Tang

Protein language models are a powerful tool for learning protein representations through pre-training on vast protein sequence datasets. Ho… (see more)wever, traditional protein language models lack explicit structural supervision, despite its relevance to protein function. To address this issue, we introduce the integration of remote homology detection to distill structural information into protein language models without requiring explicit protein structures as input. We evaluate the impact of this structure-informed training on downstream protein function prediction tasks. Experimental results reveal consistent improvements in function annotation accuracy for EC number and GO term prediction. Performance on mutant datasets, however, varies based on the relationship between targeted properties and protein structures. This underscores the importance of considering this relationship when applying structure-aware training to protein function prediction tasks. Code and model weights will be made available upon acceptance.

2024-03-03

GEM @ International Conference on Learning Representations (poster)

Towards DNA-Encoded Library Generation with GFlowNets

Michał Koziarski

Mohammed Abukalam

Vedant Shah

Louis Vaillancourt

Doris Alexandra Schuetz

Moksh Jain

Almer Van Der Sloot

Mathieu Bourgey

Anne Marinier

Yoshua Bengio

2024-03-03

GEM @ International Conference on Learning Representations (poster)

Distinct social behavior and inter-brain connectivity in Dyads with autistic individuals

Quentin Moreau

Florence Brun

Anaël Ayrolles

Jacqueline Nadel

Guillaume Dumas

Autism Spectrum Disorder (ASD) is defined by distinctive socio-cognitive behaviors that deviate from typical patterns. Notably, social imita… (see more)tion skills appear to be particularly impacted, manifesting early on in development. This paper compared the behavior and inter-brain dynamics of dyads made up of two typically developing (TD) participants with mixed dyads made up of ASD and TD participants during social imitation tasks. By combining kinematics and EEG-hyperscanning, we show that individuals with ASD exhibited a preference for the follower rather than the lead role in imitating scenarios. Moreover, the study revealed inter-brain synchrony differences, with low-alpha inter-brain synchrony differentiating control and mixed dyads. The study’s findings suggest the importance of studying interpersonal phenomena in dynamic and ecological settings and using hyperscanning methods to capture inter-brain dynamics during actual social interactions.

2024-03-02

Social Neuroscience (published)

Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

Amal Rannen-Triki

Jörg Bornschein

Razvan Pascanu

Marcus Hutter

Andr'as Gyorgy

Alexandre Galashov

Yee Whye Teh

Michalis K. Titsias

We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation. While it is… (see more) generally known that this approach improves the overall predictive performance, especially when considering distributional shift between training and evaluation data, we here emphasize the perspective that online adaptation turns parameters into temporally changing states and provides a form of context-length extension with memory in weights, more in line with the concept of memory in neuroscience. We pay particular attention to the speed of adaptation (in terms of sample efficiency),sensitivity to the overall distributional drift, and the computational overhead for performing gradient computations and parameter updates. Our empirical study provides insights on when online adaptation is particularly interesting. We highlight that with online adaptation the conceptual distinction between in-context learning and fine tuning blurs: both are methods to condition the model on previously observed tokens.

2024-03-02

ArXiv (preprint)

Communicating Study Design Trade-offs in Software Engineering

Martin P. Robillard

Deeksha M. Arya

Neil Ernst

Jin L.C. Guo

Maxime Lamothe

Mathieu Nassif

Nicole Novielli

Alexander Serebrenik

Igor Steinmacher

Klaas-Jan Stol

2024-03-01

ACM Transactions on Software Engineering and Methodology (published)

A Compositional Typed Semantics for Universal Dependencies

Laurestine Bradford

Timothy John O'donnell

Siva Reddy

2024-03-01

ArXiv (preprint)

Latent Idiom Recognition for a Minimalist Functional Array Language Using Equality Saturation

Jonathan Van der Cruysse

Christophe Dubach

Accelerating programs is typically done by recognizing code idioms matching high-performance libraries or hardware interfaces. However, reco… (see more)gnizing such idioms automatically is challenging. The idiom recognition machinery is difficult to write and requires expert knowledge. In addition, slight variations in the input program might hide the idiom and defeat the recognizer. This paper advocates for the use of a minimalist functional array language supporting a small, but expressive, set of operators. The minimalist design leads to a tiny sets of rewrite rules, which encode the language semantics. Crucially, the same minimalist language is also used to encode idioms. This removes the need for hand-crafted analysis passes, or for having to learn a complex domain-specific language to define the idioms. Coupled with equality saturation, this approach is able to match the core functions from the BLAS and PyTorch libraries on a set of computational kernels. Compared to reference C kernel implementations, the approach produces a geometric mean speedup of 1.46Ã— for C programs using BLAS, when generating such programs from the high-level minimalist language.

2024-03-01

2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (published)

Learning and Aligning Structured Random Feature Networks

Vivian White

Muawiz Sajjad Chaudhary

Guy Wolf

Guillaume Lajoie

Kameron Decker Harris

Artificial neural networks (ANNs) are considered "black boxes'' due to the difficulty of interpreting their learned weights. While choosing… (see more) the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to develop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.

2024-03-01

ICLR.cc/2024/Workshop/Re-Align (poster)

Quality of Service-Constrained Online Routing in High Throughput Satellites

Olivier Bélanger

Olfa Ben Yahia

Stéphane Martel

Antoine Lesage-Landry

Gunes Karabulut Kurt

High throughput satellites (HTSs) outpace traditional satellites due to their multi-beam transmission. The rise of low Earth orbit mega cons… (see more)tellations amplifies HTS data rate demands to terabits/second with acceptable latency. This surge in data rate necessitates multiple modems, often exceeding single device capabilities. Consequently, satellites employ several processors, forming a complex packet-switch network. This can lead to potential internal congestion and challenges in adhering to strict quality of service (QoS) constraints. While significant research exists on constellation-level routing, a literature gap remains on the internal routing within a single HTS. The intricacy of this internal network architecture presents a significant challenge to achieve high data rates. This paper introduces an online optimal flow allocation and scheduling method for HTSs. The problem is presented as a multi-commodity flow instance with different priority data streams. An initial full time horizon model is proposed as a benchmark. We apply a model predictive control (MPC) approach to enable adaptive routing based on current information and the forecast within the prediction time horizon while allowing for deviation of the latter. Importantly, MPC is inherently suited to handle uncertainty in incoming flows. Our approach minimizes the packet loss by optimally and adaptively managing the priority queue schedulers and flow exchanges between satellite processing modules. Central to our method is a routing model focusing on optimal priority scheduling to enhance data rates and maintain QoS. The model's stages are critically evaluated, and results are compared to traditional methods via numerical simulations. Through simulations, our method demonstrates performance nearly on par with the hindsight optimum, showcasing its efficiency and adaptability in addressing satellite communication challenges.

2024-03-01

2024 IEEE Aerospace Conference (published)

ADMM-Based Hierarchical Single-Loop Framework for EV Charging Scheduling Considering Power Flow Constraints

Sina Kiani

Keyhan Sheshyekani

Hanane Dagdougui

This article presents a three-layer hierarchical distributed framework for optimal electric vehicle charging scheduling (EVCS). The proposed… (see more) hierarchical EVCS structure includes a distribution system operator (DSO) at the top layer, electric vehicle aggregators (EVAs) at the middle layer, and electric vehicles (EVs) charging stations at the bottom layer. A single-loop iterative algorithm is developed to solve the EVCS problem by combining the alternating direction method of multipliers (ADMM) and the distribution line power flow model (DistFlow). Using the single-loop structure, the primal variables of all agents are updated simultaneously at every iteration resulting in a reduced number of iterations and faster convergence. The developed framework is employed to provide charging cost minimization at the EV charging stations level, peak load shaving at the EVAs level, and voltage regulation at the DSO level. In order to further improve the performance of the optimization framework, a neural network-based load forecasting model is implemented to include the uncertainties related to non-EV residential load demand. The efficiency and the optimality of the proposed EVCS framework are evaluated through numerical simulations, conducted for a modified IEEE 13 bus test feeder with different EV penetration levels.

2024-02-29

IEEE Transactions on Transportation Electrification (published)