Portrait of Alexandre Drouin

Alexandre Drouin

Associate Industry Member
Adjunct professor, Université Laval, Department of Electrical Engineering and Computer Engineering
Research Scientist, ServiceNow

Biography

Alexandre Drouin is a research scientist at ServiceNow Research in Montréal, and an adjunct professor of computer science at Université Laval. He also leads ServiceNow’s Human Decision Support research program, which explores the use of machine learning for decision-making in complex dynamic environments.

Droiun’s main research interest is causal decision-making under uncertainty, where the goal is to answer questions of causal nature (interventions, counterfactual), while accounting for sources of uncertainty, such as ambiguity in causal structures and unmeasured variables. He is also interested in probabilistic time series forecasting and its use in foreseeing the long-term effect of actions. His PhD in computer science from Université Laval was on machine learning algorithms for biomarker discovery in large genomic datasets and their application to the global problem of antibiotic resistance.

Current Students

Research Intern - Université de Montréal
Principal supervisor :
PhD - Université de Montréal
Principal supervisor :
PhD - Université Laval
Principal supervisor :

Publications

Evaluating Interventional Reasoning Capabilities of Large Language Models
Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consid… (see more)er using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. Our analysis on four LLMs highlights that while GPT- 4 models show promising accuracy at predicting the intervention effects, they remain sensitive to distracting factors in the prompts.
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Massimo Caccia
Issam Hadj Laradji
Manuel Del Verme
Tom Marty
Léo Boisvert
Megh Thakkar
David Vazquez
Alexandre Lacoste
We study the use of large language model-based agents for interacting with software via web browsers. Unlike prior work, we focus on measuri… (see more)ng the agents' ability to perform tasks that span the typical daily work of knowledge workers utilizing enterprise software systems. To this end, we propose WorkArena, a remote-hosted benchmark of 29 tasks based on the widely-used ServiceNow platform. We also introduce BrowserGym, an environment for the design and evaluation of such agents, offering a rich set of actions as well as multimodal observations. Our empirical evaluation reveals that while current agents show promise on WorkArena, there remains a considerable gap towards achieving full task automation. Notably, our analysis uncovers a significant performance disparity between open and closed-source LLMs, highlighting a critical area for future exploration and development in the field.
TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series
Arjun Ashok
Étienne Marcotte
Valentina Zantedeschi
We introduce a new model for multivariate probabilistic time series prediction, designed to flexibly address a range of tasks including fore… (see more)casting, interpolation, and their combinations. Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS), wherein the number of distributional parameters now scales linearly with the number of variables instead of factorially. The new objective requires the introduction of a training curriculum, which goes hand-in-hand with necessary changes to the original architecture. We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks, while maintaining the flexibility of prior work, such as seamless handling of unaligned and unevenly-sampled time series. Code is made available at https://github.com/ServiceNow/TACTiS.
Capture the Flag: Uncovering Data Insights with Large Language Models
Issam Hadj Laradji
Perouz Taslakian
Sai Rajeswar
Valentina Zantedeschi
Alexandre Lacoste
David Vazquez
The Unsolved Challenges of LLMs as Generalist Web Agents: A Case Study
Rim Assouel
Tom Marty
Massimo Caccia
Issam Hadj Laradji
Sai Rajeswar
Hector Palacios
David Vazquez
Alexandre Lacoste
Lag-Llama: Towards Foundation Models for Time Series Forecasting
Kashif Rasul
Arjun Ashok
Andrew Robert Williams
Arian Khorasani
George Adamopoulos
Rishika Bhagwatkar
Marin Biloš
Hena Ghonia
Nadhir Hassen
Anderson Schneider
Sahil Garg
Yuriy Nevmyvaka
Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-… (see more)Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
Kashif Rasul
Arjun Ashok
Andrew Robert Williams
Arian Khorasani
George Adamopoulos
Rishika Bhagwatkar
Marin Bilovs
Hena Ghonia
Nadhir Hassen
Anderson Schneider
Sahil Garg
Yuriy Nevmyvaka
GEO-Bench: Toward Foundation Models for Earth Monitoring
Alexandre Lacoste
Nils Lehmann
Pau Rodriguez
Evan David Sherwin
Hannah Kerner
Björn Lütjens
Jeremy Andrew Irvin
David Dao
Hamed Alemohammad
Mehmet Gunturkun
Gabriel Huang
David Vazquez
Dava Newman
Stefano Ermon
Xiao Xiang Zhu
Recent progress in self-supervision has shown that pre-training large neural networks on vast amounts of unsupervised data can lead to subst… (see more)antial increases in generalization to downstream tasks. Such models, recently coined foundation models, have been transformational to the field of natural language processing. Variants have also been proposed for image data, but their applicability to remote sensing tasks is limited. To stimulate the development of foundation models for Earth monitoring, we propose a benchmark comprised of six classification and six segmentation tasks, which were carefully curated and adapted to be both relevant to the field and well-suited for model evaluation. We accompany this benchmark with a robust methodology for evaluating models and reporting aggregated results to enable a reliable assessment of progress. Finally, we report results for 20 baselines to gain information about the performance of existing models. We believe that this benchmark will be a driver of progress across a variety of Earth monitoring tasks.
Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation
Chris Emezue
Tristan Deleu
Stefan Bauer
Causal Discovery with Language Models as Imperfect Experts
Stephanie Long
Alexandre Piché
Valentina Zantedeschi
Tibor Schuster
Understanding the causal relationships that underlie a system is a fundamental prerequisite to accurate decision-making. In this work, we ex… (see more)plore how expert knowledge can be used to improve the data-driven identification of causal graphs, beyond Markov equivalence classes. In doing so, we consider a setting where we can query an expert about the orientation of causal relationships between variables, but where the expert may provide erroneous information. We propose strategies for amending such expert knowledge based on consistency properties, e.g., acyclicity and conditional independencies in the equivalence class. We then report a case study, on real data, where a large language model is used as an imperfect expert.
Invariant Causal Set Covering Machines
Thibaud Godon
Baptiste Bauvin
Jacques Corbeil
Regions of Reliability in the Evaluation of Multivariate Probabilistic Forecasts
Étienne Marcotte
Valentina Zantedeschi
Multivariate probabilistic time series forecasts are commonly evaluated via proper scoring rules, i.e., functions that are minimal in expect… (see more)ation for the ground-truth distribution. However, this property is not sufficient to guarantee good discrimination in the non-asymptotic regime. In this paper, we provide the first systematic finite-sample study of proper scoring rules for time-series forecasting evaluation. Through a power analysis, we identify the"region of reliability"of a scoring rule, i.e., the set of practical conditions where it can be relied on to identify forecasting errors. We carry out our analysis on a comprehensive synthetic benchmark, specifically designed to test several key discrepancies between ground-truth and forecast distributions, and we gauge the generalizability of our findings to real-world tasks with an application to an electricity production problem. Our results reveal critical shortcomings in the evaluation of multivariate probabilistic forecasts as commonly performed in the literature.