TRAIL: Responsible AI for Professionals and Leaders
Learn how to integrate responsible AI practices into your organization with TRAIL. Join our information session on March 12, where you’ll discover the program in detail and have the chance to ask all your questions.
Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Reasoning LLMs suffer from quadratic compute growth as their context length increases, making reinforcement learning with verifiable rewards… (see more) (RLVR) and test-time scaling prohibitively expensive. Prior work has tried to lighten the computational burden by shortening reasoning traces through pruning, summarization, or multi-stage training, but these methods remain bound to quadratic costs. We introduce Delethink, a thinking algorithm that realizes the Markovian Thinking Paradigm. Instead of producing one long monolithic reasoning trace, Delethink thinks in a sequence of chunks, the Delethink trace. Each chunk continues reasoning by referring only to a fixed number of prior tokens, which functions as a Markovian state sufficient for progressing reasoning, while deleting the rest. This preserves continuity without carrying the quadratic baggage. As a result, compute scales linearly and peak memory remains constant. In experiments, we show that Delethink can be applied directly to off-the-shelf reasoning models ranging from
2025-12-31
International Conference on Learning Representations (Accept (Poster))
Interpretability is essential for user trust in real-world anomaly detection applications. However, deep learning models, despite their stro… (see more)ng performance, often lack transparency. In this work, we study the interpretability of autoencoder-based models for audio anomaly detection, by comparing a standard autoencoder (AE) with a mask autoencoder (MAE) in terms of detection performance and interpretability. We applied several attribution methods, including error maps, saliency maps, SmoothGrad, Integrated Gradients, GradSHAP, and Grad-CAM. Although MAE shows a slightly lower detection, it consistently provides more faithful and temporally precise explanations, suggesting a better alignment with true anomalies. To assess the relevance of the regions highlighted by the explanation method, we propose a perturbation-based faithfulness metric that replaces them with their reconstructions to simulate normal input. Our findings, based on experiments in a real industrial scenario, highlight the importance of incorporating interpretability into anomaly detection pipelines and show that masked training improves explanation quality without compromising performance.
Biomolecular interactions play a critical role in biological processes. While recent breakthroughs like AlphaFold 3 have enabled accurate mo… (see more)deling of biomolecular complex structures, predicting binding affinity remains challenging mainly due to limited high-quality data. Recent methods are often specialized for specific types of biomolecular interactions, limiting their generalizability. In this work, we repurpose AlphaFold 3 for representation learning to predict binding affinity, a non-trivial task that requires shifting from generative structure prediction to encoding observed geometry, simplifying the heavily conditioned trunk module, and designing a framework to jointly capture sequence and structural information. To address these challenges, we introduce the **Atom-level Diffusion Transformer (ADiT)**, which takes sequence and structure as inputs, employs a unified tokenization scheme, integrates diffusion transformers, and removes dependencies on multiple sequence alignments and templates. We pre-train three ADiT variants on the PDB dataset with a denoising objective and evaluate them across protein-ligand, drug-target, protein-protein, and antibody-antigen interactions. The model achieves state-of-the-art or competitive performance across benchmarks, scales effectively with model size, and successfully identifies wet-lab validated affinity-enhancing antibody mutations, establishing a generalizable framework for biomolecular interactions. We plan to release the code upon acceptance.
2025-12-31
International Conference on Learning Representations (Accept (Poster))
Learned optimizers are powerful alternatives to hand-designed rules like Adam, yet they have seen limited practical adoption since they ofte… (see more)n fail to meta-generalize beyond their training distribution and incur high meta-training cost. For instance, prior work, VeLO, scaled meta-training to 4,000 TPU months (
2025-12-31
International Conference on Learning Representations (Accept (Poster))
Abstract
This study introduces a self-supervised learning (SSL) approach to hyperscanning electroencephalog… (see more)raphy (EEG) data, targeting the identification of autism spectrum condition (ASC) during social interactions. Hyperscanning enables simultaneous recording of neural activity across interacting individuals, offering a novel path for studying brain-to-brain synchrony in ASC. Leveraging a large-scale, single-brain EEG dataset for SSL pretraining, we developed a multi-brain classification model fine-tuned with hyperscanning data from dyadic interactions involving ASC and neurotypical participants. The SSL model demonstrated superior performance (78.13% accuracy) compared to supervised baselines and logistic regression using spectral EEG biomarkers. These results underscore the efficacy of SSL in addressing the challenges of limited labeled data, enhancing EEG-based diagnostic tools for ASC, and advancing research in social neuroscience.
We propose a novel block for video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gate… (see more)d linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing over space, and MLPs over channels. The resulting architecture TRecViT performs well on sparse and dense tasks, trained in supervised or self-supervised regimes. Notably, our model is causal and outperforms or is on par with a pure attention model ViViT-L on large scale video datasets (SSv2, Kinetics400), while having
The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to ar… (see more)gue that different architectural designs can yield distinct circuits for modular addition. In this work, we show that this is not the case, and that both uniform attention and trainable attention architectures implement the same algorithm via topologically and geometrically equivalent representations. Our methodology goes beyond the interpretation of individual neurons and weights. Instead, we identify all of the neurons corresponding to each learned representation and then study the collective group of neurons as one entity. This method reveals that each learned representation is a manifold that we can study utilizing tools from topology. Based on this insight, we can statistically analyze the learned representations across hundreds of circuits to demonstrate the similarity between learned modular addition circuits that arise naturally from common deep learning paradigms.
Artificial intelligence (AI)-enabled technologies hold promise for assisting in the care of an aging population. Few studies have focused on… (see more) exploring family caregivers’ (FCGs) behavioural intention of using such innovation, and even fewer have employed a technology acceptance framework. This study examined FCGs of older adults’ behavioural intention of using AI-enabled technologies for caregiving. We conducted a theory-based cross-sectional quantitative survey. Eligible FCGs for this study were: (1) aged 45–64; (2) residing in Quebec, Canada; (3) providing care for at least one older adult (65+); (4) having access to a computer or smartphone with internet connectivity; and, (5) having proficiency in reading and comprehending English or French. We adapted and expanded the Unified Theory of Acceptance and Use of Technology (UTAUT) framework to measure their behavioural intention of using AI-enabled technologies for caregiving. We used descriptive statistics and a random forest model to assess the most important predictive factors across nine variables and their direction of association with behavioural intention. The Consensus-Based Checklist for Reporting of Survey Studies (CROSS) guidelines was used for reporting the study’s results. Among the polling firm’s 100,000 panelists, 2740 eligible individuals were randomly chosen to receive an email invitation to the study. Of 465 panelists who opened the survey (i.e., unique visitors),199 were eligible and completed the online survey. The random forest model explained between 56% and 86% of the behavioural intention variance of using AI, with social influence demonstrating the highest predictive relevance as indicated by a 35% increase in mean-squared error once removed from the model. Among the nine variables considered, six demonstrated a positive association with behavioural intention. These variables included social influence, effort expectancy, performance expectancy, perceived trust, confidence in healthcare professionals’ advice for the use of AI-enabled technologies, and facilitating connditions. The variables perceived cost and technology anxiety indicated a negative association with behavioural intention. Our extended UTAUT model identified factors associated with FCGs' intention to use AI. While all nine variables contributed, attitudes toward AI within caregivers’ social circles was the strongest predictor. Stakeholders from industry, government, and healthcare can enhance the adoption of AI-enabled technologies in older adult care by leveraging facilitators and addressing barriers experienced by caregivers.