Publications

Accelerated Benders Decomposition and Local Branching for Dynamic Maximum Covering Location Problems
Steven Lamontagne
Ribal Atallah
The maximum covering location problem (MCLP) is a key problem in facility location, with many applications and variants. One such variant is… (see more) the dynamic (or multi-period) MCLP, which considers the installation of facilities across multiple time periods. To the best of our knowledge, no exact solution method has been proposed to tackle large-scale instances of this problem. To that end, in this work, we expand upon the current state-of-the-art branch-and-Benders-cut solution method in the static case, by exploring several acceleration techniques. Additionally, we propose a specialised local branching scheme, that uses a novel distance metric in its definition of subproblems and features a new method for efficient and exact solving of the subproblems. These methods are then compared through extensive computational experiments, highlighting the strengths of the proposed methodologies.
A logistics provider’s profit maximization facility location problem with random utility maximizing followers
David Pinzon Ulloa
Bernard Gendron
DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations
Nima Fathi
Amar Kumar
Brennan Nichyporuk
Mohammad Havaei
Deep learning classifiers are prone to latching onto dominant confounders present in a dataset rather than on the causal markers associated … (see more)with the target class, leading to poor generalization and biased predictions. Although explainability via counterfactual image generation has been successful at exposing the problem, bias mitigation strategies that permit accurate explainability in the presence of dominant and diverse artifacts remain unsolved. In this work, we propose the DeCoDEx framework and show how an external, pre-trained binary artifact detector can be leveraged during inference to guide a diffusion-based counterfactual image generator towards accurate explainability. Experiments on the CheXpert dataset, using both synthetic artifacts and real visual artifacts (support devices), show that the proposed method successfully synthesizes the counterfactual images that change the causal pathology markers associated with Pleural Effusion while preserving or ignoring the visual artifacts. Augmentation of ERM and Group-DRO classifiers with the DeCoDEx generated images substantially improves the results across underrepresented groups that are out of distribution for each class. The code is made publicly available at https://github.com/NimaFathi/DeCoDEx.
Implementation of a Global Pediatric Trauma Course in an Upper Middle–Income Country: A Pilot Study
Abbie Naus
Madeleine Carroll
Ayla Gerk
David P. Mooney
Natalie L. Yanchar
Julia Ferreira
Karen E. Gripp
Caroline Ouellet
Fabio Botelho
On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics -- Empirical Study on Brown Build and Risk Prediction
Doriane Olewicki
Sarra Habchi
Mathieu Nayrolles
Mojtaba Faramarzi
Bram Adams
Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well establishe… (see more)d. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large updated dataset when performance decay is observed, thus incurring a computational cost; also there is no continuity between the models as the past model is discarded and ignored during the new model training. Even though the literature has taken interest in online learning approaches, those have rarely been integrated and evaluated in industrial environments. This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft, evaluating both the performance and the required computational effort in comparison to the retraining-from-scratch approaches commonly used by the industry. LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data. To avoid so-called"catastrophic forgetting"of important older data points, we adopt a replay buffer of older data, which still allows us to drastically reduce the size of the overall training dataset, and hence model training time.
Structured Learning in Time-dependent Cox Models
Guanbo Wang
Yi Lian
Robert W. Platt
Rui Wang
Sylvie Perreault
Marc Dorais
Mireille E. Schnitzer
Stimulus information guides the emergence of behavior-related signals in primary somatosensory cortex during learning.
Mariangela Panniello
Colleen J Gillon
Roberto Maffulli
Marco Celotto
Stefano Panzeri
Michael M Kohl
Managing AI Risks in an Era of Rapid Progress
Geoffrey Hinton
Andrew Yao
Dawn Song
Pieter Abbeel
Yuval Noah Harari
Trevor Darrell
Ya-Qin Zhang
Lan Xue
Shai Shalev-Shwartz
Gillian K. Hadfield
Jeff Clune
Frank Hutter
Atilim Güneş Baydin
Sheila McIlraith
Qiqi Gao
Ashwin Acharya
Anca Dragan … (see 5 more)
Philip Torr
Stuart Russell
Daniel Kahneman
Jan Brauner
Sören Mindermann
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Oleksiy Ostapenko
Zhan Su
Edoardo Ponti
Matheus Pereira
Lucas Caccia
The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trai… (see more)ned adapters to improve performance for new tasks. We study how to best build a library of adapters given multi-task data and devise techniques for both zero-shot and supervised task generalization through routing in such library. We benchmark existing approaches to build this library and introduce model-based clustering, MBC, a method that groups tasks based on the similarity of their adapter parameters, indirectly optimizing for transfer across the multi-task dataset. To re-use the library, we present a novel zero-shot routing mechanism, Arrow, which enables dynamic selection of the most relevant adapters for new inputs without the need for retraining. We experiment with several LLMs, such as Phi-2 and Mistral, on a wide array of held-out tasks, verifying that MBC-based adapters and Arrow routing lead to superior generalization to new tasks. We make steps towards creating modular, adaptable LLMs that can match or outperform traditional joint training.
What Mechanisms Does Knowledge Distillation Distill?
Cindy Wu
Ekdeep Singh Lubana
Bruno Mlodozeniec
Robert Kirk
Knowledge distillation is a commonly-used compression method in ML due to the popularity of increasingly large-scale models, but it is uncle… (see more)ar if all the information a teacher model contains is distilled into the smaller student model. We aim to formalize the concept of ‘knowledge’ to investigate how knowledge is transferred during distillation, focusing on shared invariant outputs to counterfactual changes of dataset latent variables (we call these latents mechanisms). We define a student model to be a good stand-in model for a teacher if it shares the teacher’s learned mechanisms, and find that Jacobian matching and contrastive representation learning are viable methods by which to train such models. While these methods do not result in perfect transfer of mechanisms, we show they often improve student fidelity or mitigate simplicity bias (as measured by the teacher-to-student KL divergence and accuracy on various out-of-distribution test datasets), especially on datasets with spurious statistical correlations.
TEMPLATES: Characterization of a Merger in the Dusty Lensing SPT0418-47 System
Jared Cathey
Anthony H. Gonzalez
Sidney Lower
Kedar A. Phadke
Justin Spilker
Manuel Aravena
Matthew Bayliss
Jack E. Birkin
Simon Birrer
Scott Chapman
Håkon Dahle
Christopher C. Hayward
Ryley Hill
Taylor A. Hutchison
Keunho J. Kim
Guillaume Mahler
Daniel P. Marrone
Desika Narayanan
Alexander Navarre … (see 7 more)
Cassie Reuter
Jane R Rigby
Keren Sharon
Manuel Solimano
Nikolaus Sulzenauer
Joaquin Vieira
David Vizgan
ChainForge: A Visual Toolkit for Prompt Engineering and LLM Hypothesis Testing
Chelse Swoopes
Priyan Vaithilingam
Martin Wattenberg
Elena L. Glassman
Evaluating outputs of large language models (LLMs) is challenging, requiring making -- and making sense of -- many responses. Yet tools that… (see more) go beyond basic prompting tend to require knowledge of programming APIs, focus on narrow domains, or are closed-source. We present ChainForge, an open-source visual toolkit for prompt engineering and on-demand hypothesis testing of text generation LLMs. ChainForge provides a graphical interface for comparison of responses across models and prompt variations. Our system was designed to support three tasks: model selection, prompt template design, and hypothesis testing (e.g., auditing). We released ChainForge early in its development and iterated on its design with academics and online users. Through in-lab and interview studies, we find that a range of people could use ChainForge to investigate hypotheses that matter to them, including in real-world settings. We identify three modes of prompt engineering and LLM hypothesis testing: opportunistic exploration, limited evaluation, and iterative refinement.