Publications

Prediction of Final Phosphorus Content of Steel in a Scrap-Based Electric Arc Furnace Using Artificial Neural Networks

Riadh Azzaz

Valentin Hurel

Patrice Ménard

M. Jahazi

S Ebrahimi Kahou

Elmira Moosavi-Khoonsari

2024-10-24

ArXiv (preprint)

doi.org

arxiv.org

ConvNTC: Convolutional neural tensor completion for predicting the disease-related miRNA pairs and cell-related drug pairs

Pei Liu

Xiao Liang

Yuemei Li

Jiawei Luo

2024-10-23

bioRxiv (preprint)

doi.org

The roles of neural networks in language acquisition

Eva Portelance

Masoud Jasbi

How can modern neural networks like large language models be useful to the field of language acquisition, and more broadly cognitive scie… (see more)nce, if they are not a priori designed to be cognitive models? As developments towards natural language understanding and generation have improved leaps and bounds, with models like GPT-4, the question of how they can inform our understanding of human language acquisition has re-emerged. As such, it is critical to examine how in practice linking hypotheses between models and human learners can be safely established. To address these questions, we propose a model taxonomy, including four modeling approaches, each having differing goals, from exploratory hypothesis generation to hypothesis differentiation and testing. We show how the goals of these approaches align with the overarching goals of science and linguistics by connecting our taxonomy to the realist vs. instrumentalist approaches in philosophy of science. We survey recent work having adopted each of our modelling approaches and address the importance of computational modelling in language acquisition studies.

2024-10-23

Language and Linguistics Compass (published)

doi.org

Minimally Invasive Morphology Adaptation via Parameter Efficient Fine-Tuning

Michael Przystupa

Hongyao Tang

Mariano Phielipp

Santiago Miret

Martin Jagersand

Glen Berseth

Learning reinforcement learning policies to control individual robots is often computationally non-economical because minor variations in ro… (see more)bot morphology (e.g. dynamics or number of limbs) can negatively impact policy performance. This limitation has motivated morphology agnostic policy learning, in which a monolithic deep learning policy learns to generalize between robotic morphologies. Unfortunately, these policies still have sub-optimal zero-shot performance compared to end-to-end finetuning on target morphologies. This limitation has ramifications in practical robotic applications, as online finetuning large neural networks can require immense computation. In this work, we investigate \textit{parameter efficient finetuning} techniques to specialize morphology-agnostic policies to a target robot that minimizes the number of learnable parameters adapted during online learning. We compare direct finetuning, which update subsets of the base model parameters, and input-learnable approaches, which add additional parameters to manipulate inputs passed to the base model. Our analysis concludes that tuning relatively few parameters (0.01\% of the base model) can measurably improve policy performance over zero shot. These results serve a prescriptive purpose for future research for which scenarios certain PEFT approaches are best suited for adapting policy's to new robotic morphologies.

2024-10-22

corl.org/2024/Workshop/MAPoDeL (published)

openreview.net

Multilingual Hallucination Gaps in Large Language Models

Cl'ea Chataigner

Afaf Taïk

Golnoosh Farnadi

Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that … (see more)resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.

2024-10-22

ArXiv (preprint)

doi.org

arxiv.org

Overcoming State and Action Space Disparities in Multi-Domain, Multi-Task Reinforcement Learning

Reginald McLean

Kai Yuan

Isaac Woungang

Nariman Farsad

Pablo Samuel Castro

Current multi-task reinforcement learning (MTRL) methods have the ability to perform a large number of tasks with a single policy. However w… (see more)hen attempting to interact with a new domain, the MTRL agent would need to be re-trained due to differences in domain dynamics and structure. Because of these limitations, we are forced to train multiple policies even though tasks may have shared dynamics, leading to needing more samples and is thus sample inefficient. In this work, we explore the ability of MTRL agents to learn in various domains with various dynamics by simultaneously learning in multiple domains, without the need to fine-tune extra policies. In doing so we find that a MTRL agent trained in multiple domains induces an increase in sample efficiency of up to 70\% while maintaining the overall success rate of the MTRL agent.

2024-10-22

corl.org/2024/Workshop/MAPoDeL (published)

openreview.net

Scaling Stick-Breaking Attention: An Efficient Implementation and In-Depth Study

Shawn Tan

Yikang Shen

Songlin Yang

Aaron Courville

Rameswar Panda

The self-attention mechanism traditionally relies on the softmax operator, necessitating positional embeddings like RoPE, or position biases… (see more) to account for token order. But current methods using still face length generalisation challenges. We investigate an alternative attention mechanism based on the stick-breaking process in larger scale settings. The method works as follows: For each token before the current, we determine a break point, which represents the proportion of the stick, the weight of the attention, to allocate to the current token. We repeat this on the remaining stick, until all tokens are allocated a weight, resulting in a sequence of attention weights. This process naturally incorporates recency bias, which has linguistic motivations for grammar parsing. We study the implications of replacing the conventional softmax-based attention mechanism with stick-breaking attention. We then discuss implementation of numerically stable stick-breaking attention and adapt Flash Attention to accommodate this mechanism. When used as a drop-in replacement for current softmax+RoPE attention systems, we find that stick-breaking attention performs competitively with current methods on length generalisation and downstream tasks. Stick-breaking also performs well at length generalisation, allowing a model trained with

2024-10-22

ArXiv (preprint)

doi.org

openreview.net

Symmetry-Aware Generative Modeling through Learned Canonicalization

Kusha Sareen

Daniel Levy

Arnab Kumar Mondal

Sékou-Oumar Kaba

Tara Akhound-Sadegh

Siamak Ravanbakhsh

Generative modeling of symmetric densities has a range of applications in AI for science, from drug discovery to physics simulations. The ex… (see more)isting generative modeling paradigm for invariant densities combines an invariant prior with an equivariant generative process. However, we observe that this technique is not necessary and has several drawbacks resulting from the limitations of equivariant networks. Instead, we propose to model a learned slice of the density so that only one representative element per orbit is learned. To accomplish this, we learn a group-equivariant canonicalization network that maps training samples to a canonical pose and train a non-equivariant generative model over these canonicalized samples. We implement this idea in the context of diffusion models. Our preliminary experimental results on molecular modeling are promising, demonstrating improved sample quality and faster inference time.

2024-10-22

NeurIPS.cc/2024/Workshop/NeurReps (poster)

doi.org

openreview.net

Correction: Al content detection in the emerging information ecosystem: new obligations for media and tech companies

Alistair Knott

Dino Pedreschi

Toshiya Jitsuzumi

Susan Leavy

David Eyers

Tapabrata Chakraborti

Andrew Trotman

Sundar Sundareswaran

Ricardo Baeza-Yates

Przemyslaw Biecek

Adrian Weller

Paul D. Teal

Subhadip Basu

Mehmet Haklidir

Virginia Morini

Stuart Russell

Yoshua Bengio

2024-10-21

Ethics and Information Technology (published)

doi.org

FairLoRA: Unpacking Bias Mitigation in Vision Models with Fairness-Driven Low-Rank Adaptation

Rohan Sukumaran

Aarash Feizi

Adriana Romero-Sorian

Golnoosh Farnadi

2024-10-21

ArXiv (preprint)

doi.org

arxiv.org

Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think

Massimo Caccia

Megh Thakkar

Léo Boisvert

Thibault Le Sellier De Chezelles

Alexandre Piché

Nicolas Chapados

Alexandre Drouin

Maxime Gasse

Alexandre Lacoste

Recent advancements in large language models (LLMs) have sparked interest in developing autonomous web agents capable of performing digital … (see more)tasks through web interfaces in a human-like manner. However, even the strongest closed-source models often struggle to achieve robust results on several benchmarks, while a notable performance gap exists between them and open-source counterparts. This study investigates the potential of fine-tuning to enhance the performance of a smaller, lower-performing but cost-efficient LLM by leveraging successful traces from stronger LLMs, referred to as experts. We outline a comprehensive pipeline for data collection, filtering, and supervised fine-tuning and explore various behavior cloning parameters. Our experiments provide key insights into the challenges of fine-tuning LLMs into web agents on benchmarks like MiniWoB and WorkArena. Notably, we find that the fine-tuned agents' ability to predict expert trajectories does not consistently lead to improved downstream task performance. This raises issues such as off-policy bias and the loss of reasoning abilities during fine-tuning. We discuss potential solutions to these challenges and make both the codebase and a dataset of 140M tokens open-source for the community to build upon.

2024-10-21

NeurIPS.cc/2024/Workshop/OWA (poster)

openreview.net

Graph Knowledge Distillation to Mixture of Experts

Pavel Rumiantsev

Mark J. Coates

2024-10-21

TMLR (accepted)

doi.org

openreview.net

Mila Techaide 2026

Disinformation 2.0: When AI Blurs the Lines

AI Advantage: Productivity in Public Service

Publications

Mila Techaide 2026

Disinformation 2.0: When AI Blurs the Lines

AI Advantage: Productivity in Public Service

Popular keywords:

Publications