NLP in the era of generative AI, cognitive sciences, and societal transformation
Join us at Mila in October for a three-day workshop to explore the transformative potential of language technologies and their implications for society.
This program is designed to provide decision-makers, policymakers and professional working in policy with a foundational understanding of AI technology.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Self-evaluation and self-prompting to improve the reliability of LLMs
In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (see more)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.
Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have obse… (see more)rved a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets, resulting in a 30.25% improvement when scaling to 1 billion parameters and 28.98% improvement when increasing size of dataset to eightfold. We further demonstrate strong finetuning scaling behavior on 38 tasks, outclassing previous large models. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.
Ensembling multiple models enhances predictive performance by utilizing the varied learned features of the different models but incurs signi… (see more)ficant computational and storage costs. Model fusion, which combines parameters from multiple models into one, aims to mitigate these costs but faces practical challenges due to the complex, non-convex nature of neural network loss landscapes, where learned minima are often separated by high loss barriers. Recent works have explored using permutations to align network features, reducing the loss barrier in parameter space. However, permutations are restrictive since they assume a one-to-one mapping between the different models' neurons exists. We propose a new model merging algorithm, CCA Merge, which is based on Canonical Correlation Analysis and aims to maximize the correlations between linear combinations of the model features. We show that our method of aligning models leads to better performances than past methods when averaging models trained on the same, or differing data splits. We also extend this analysis into the harder many models setting where more than 2 models are merged, and we find that CCA Merge works significantly better in this setting than past methods.
Accelerating programs is typically done by recognizing code idioms matching high-performance libraries or hardware interfaces. However, reco… (see more)gnizing such idioms automatically is challenging. The idiom recognition machinery is difficult to write and requires expert knowledge. In addition, slight variations in the input program might hide the idiom and defeat the recognizer. This paper advocates for the use of a minimalist functional array language supporting a small, but expressive, set of operators. The minimalist design leads to a tiny sets of rewrite rules, which encode the language semantics. Crucially, the same minimalist language is also used to encode idioms. This removes the need for hand-crafted analysis passes, or for having to learn a complex domain-specific language to define the idioms. Coupled with equality saturation, this approach is able to match the core functions from the BLAS and PyTorch libraries on a set of computational kernels. Compared to reference C kernel implementations, the approach produces a geometric mean speedup of 1.46× for C programs using BLAS, when generating such programs from the high-level minimalist language.
2024-03-02
2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (published)
Artificial neural networks (ANNs) are considered ``black boxes'' due to the difficulty of interpreting their learned weights.
While choosin… (see more)g the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to devlop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.
Artificial neural networks (ANNs) are considered "black boxes'' due to the difficulty of interpreting their learned weights.
While choosing… (see more) the best features is not well understood, random feature networks (RFNs) and wavelet scattering ground some ANN learning mechanisms in function space with tractable mathematics. Meanwhile, the genetic code has evolved over millions of years, shaping the brain to develop variable neural circuits with reliable structure that resemble RFNs. We explore a similar approach, embedding neuro-inspired, wavelet-like weights into multilayer RFNs. These can outperform scattering and have kernels that describe their function space at large width. We build learnable and deeper versions of these models where we can optimize separate spatial and channel covariances of the convolutional weight distributions. We find that these networks can perform comparatively with conventional ANNs while dramatically reducing the number of trainable parameters. Channel covariances are most influential, and both weight and activation alignment are needed for classification performance. Our work outlines how neuro-inspired configurations may lead to better performance in key cases and offers a potentially tractable reduced model for ANN learning.
This article presents a three-layer hierarchical distributed framework for optimal electric vehicle charging scheduling (EVCS). The proposed… (see more) hierarchical EVCS structure includes a distribution system operator (DSO) at the top layer, electric vehicle aggregators (EVAs) at the middle layer, and electric vehicles (EVs) charging stations at the bottom layer. A single-loop iterative algorithm is developed to solve the EVCS problem by combining the alternating direction method of multipliers (ADMM) and the distribution line power flow model (DistFlow). Using the single-loop structure, the primal variables of all agents are updated simultaneously at every iteration resulting in a reduced number of iterations and faster convergence. The developed framework is employed to provide charging cost minimization at the EV charging stations level, peak load shaving at the EVAs level, and voltage regulation at the DSO level. In order to further improve the performance of the optimization framework, a neural network-based load forecasting model is implemented to include the uncertainties related to non-EV residential load demand. The efficiency and the optimality of the proposed EVCS framework are evaluated through numerical simulations, conducted for a modified IEEE 13 bus test feeder with different EV penetration levels.
2024-03-01
IEEE Transactions on Transportation Electrification (published)
Why are some individuals better at recognising faces? Uncovering the neural mechanisms supporting face recognition ability has proven elusiv… (see more)e. To tackle this challenge, we used a multi-modal data-driven approach combining neuroimaging, computational modelling, and behavioural tests. We recorded the high-density electroencephalographic brain activity of individuals with extraordinary face recognition abilities—super-recognisers—and typical recognisers in response to diverse visual stimuli. Using multivariate pattern analyses, we decoded face recognition abilities from 1 second of brain activity with up to 80% accuracy. To better understand the mechanisms subtending this decoding, we compared computations in the brains of our participants with those in artificial neural network models of vision and semantics, as well as with those involved in human judgments of shape and meaning similarity. Compared to typical recognisers, we found stronger associations between early brain computations of super-recognisers and mid-level computations of vision models as well as shape similarity judgments. Moreover, we found stronger associations between late brain representations of super-recognisers and computations of the artificial semantic model as well as meaning similarity judgments. Overall, these results indicate that important individual variations in brain processing, including neural computations extending beyond purely visual processes, support differences in face recognition abilities. They provide the first empirical evidence for an association between semantic computations and face recognition abilities. We believe that such multi-modal data-driven approaches will likely play a critical role in further revealing the complex nature of idiosyncratic face recognition in the human brain. Significance The ability to robustly recognise faces is crucial to our success as social beings. Yet, we still know little about the brain mechanisms allowing some individuals to excel at face recognition. This study builds on a sizeable neural dataset measuring the brain activity of individuals with extraordinary face recognition abilities—super-recognisers—to tackle this challenge. Using state-of-the-art computational methods, we show robust prediction of face recognition abilities in single individuals from a mere second of brain activity, and revealed specific brain computations supporting individual differences in face recognition ability. Doing so, we provide direct empirical evidence for an association between semantic computations and face recognition abilities in the human brain—a key component of prominent face recognition models.
Federated learning (FL) is a key solution for datadriven the Artificial Intelligence of Things (AIoT). Although much progress has been made,… (see more) scalability remains a core challenge for real-world FL deployments. Existing solutions either suffer from accuracy loss or do not fully address the connectivity dynamicity of FL systems. In this article, we tackle the scalability issue with a novel, adaptive FL framework called FedSwarm, which improves system scalability for AIoT by deploying multiple collaborative edge servers. FedSwarm has two novel features: 1) adaptiveness on the number of local updates and 2) dynamicity of the synchronization between edge devices and edge servers. We formulate FedSwarm as a local update adaptation and perdevice dynamic server selection problem and prove FedSwarm‘s convergence bound. We further design a control mechanism consisting of a learning-based algorithm for collaboratively providing local update adaptation on the servers’ side and a bonus-based strategy for spurring dynamic per-device server selection on the devices’ side. Our extensive evaluation shows that FedSwarm significantly outperforms other studies with better scalability, lower energy consumption, and higher model accuracy.