Publications

Advancing science- and evidence-based AI policy.

Rishi Bommasani

Sanjeev Arora

Jennifer Chayes

Yejin Choi

Mariano-Florentino Cuéllar

Li Fei-Fei

Daniel E. Ho

Dan Jurafsky

Sanmi Koyejo

Hima Lakkaraju

Arvind Narayanan

Alondra Nelson

Emma Pierson

Joelle Pineau

Scott Singer

Gael Varoquaux

Suresh Venkatasubramanian

Ion Stoica

Percy Liang

Dawn Song

2025-07-30

Science (published)

doi.org

arxiv.org

Computing Approximate Nash Equilibria for Integer Programming Games

Aloïs Duguet

Margarida Carvalho

Gabriele Dragotto

Sandra-ulrich Ngueveu

2025-07-30

Optimization Letters (published)

doi.org

arxiv.org

Evaluating and Improving LitLLMs with Deep Research

Gaurav Sahu

Shubham Agarwal

Abhay Puri

Issam Hadj Laradji

Krishnamurthy Dj Dvijotham

Jason Stanley

Laurent Charlin

Christopher Pal

Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially du… (see more)e to the recent influx of research papers. This paper explores the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We decompose the task into two components: (1) Retrieving related works given a query abstract and (2) Writing a literature review based on the retrieved results. We analyze how effective LLMs are for both components. For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods while providing insights into the LLM's decision-making process. In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We release this evaluation protocol to promote additional research and development in this regard. Our empirical results suggest that LLMs show promising potential for writing literature reviews when the task is decomposed into smaller components of retrieval and planning. Particularly, our ``Deep Research" retrieval variant improves coverage by over 5x compared to standard keyword search, addressing a key bottleneck in the pipeline. Further, we demonstrate that our planning-based approach achieves higher-quality reviews by minimizing hallucinated references in the generated review by 18-26\% compared to existing simpler LLM-based generation methods.

2025-07-30

colmweb.org/COLM/2025/Workshop/LM4Sci (published)

openreview.net

Towards a General GNN Framework for Combinatorial Optimization

Michael Perlmutter

2025-07-29

Proceedings of the Third Learning on Graphs Conference (published)

doi.org

proceedings.mlr.press

Capacity-Constrained Continual Learning

Zheng Wen

Doina Precup

Benjamin Van Roy

Satinder Singh

2025-07-28

ArXiv (preprint)

doi.org

arxiv.org

Latent brain subtypes of chronotype reveal unique behavioral and health profiles: an across-cohort validation

Julie Carrier

Kai-Florian Storch

Robin Dunbar

Danilo Bzdok

Chronotype is shaped by the complex interplay of endogenous and exogenous factors. This trait ties into various behaviors in the wider socie… (see more)ty and is linked to the prevalence of psychiatric and metabolic conditions. Despite its multifaceted nature, prior research has treated chronotype as a monolithic trait across the population, risking overlooking substantial heterogeneity in neural and behavioral fingerprints of both early risers and night owls. To test for such hidden subgroups, we developed a supervised pattern-learning framework for trait subtyping, integrating three complementary brain-imaging modalities with deep behavior, diagnosis, and drug prescription profiling from 27,030 UK Biobank participants. We identified and characterized five distinct biologically valid chronotype subtypes: (1) typical eveningness, (2) depression-associated eveningness, (3) typical morningness, (4) morningness with greater expression in females, and (5) eveningness with greater expression in males. Each uncovered subtype showed unique patterns across brain, behavioral and health profiles. We finally externally validated these subtypes in 10,550 US children from the ABCD Study® cohort, which revealed reversed age distributions and replicated sex-associated brain-behavioral patterns, underscoring the fact that potential divergences between chronotype traits observed throughout adulthood may begin to emerge early in life. These findings highlight underappreciated sources of population variation that echo the rhythm of people’s inner clock.

2025-07-28

Research Square (preprint)

doi.org

QComp: A QSAR-Based Imputation Framework for Drug Discovery.

Bingjia Yang

Yunsie Chung

Archer Y. Yang

Bo Yuan

Tianchi Chen

Xiang Yu

In drug discovery, in vitro and in vivo experiments generate biochemical activity data that are crucial for evaluating the efficacy and toxi… (see more)city of compounds. These data sets are massive, sparse, and ever-evolving. Quantitative structure-activity relationship (QSAR) models, which predict biochemical activities from compound structures, face challenges in integrating the evolving experimental data agilely as studies progress. We developed QSAR-Complete (QComp), an imputation framework, to address these challenges. While QSAR models are updated at a slow pace through extensive retraining on enlarging data sets, QComp leverages existing QSAR models to immediately exploit new experimental data and improves the imputation of missing data. We demonstrate that the improvement is robust and substantial for imputing in vivo assays with only in vitro experimental data. Additionally, QComp assists in finding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific end points, aiding in rational decision-making throughout the drug discovery process.

2025-07-27

Journal of Chemical Information and Modeling (published)

doi.org

Semantic change in adults is not primarily a generational phenomenon

Gaurav Kamath

Michelle Yang

Siva Reddy

Morgan Sonderegger

Dallas Card

A central question in the study of language change is whether or not such change is generational. If a language changes over time generation… (see more)-by-generation, the process looks as follows: New generations of speakers introduce innovations, while older speakers conserve their usage patterns, and the language changes as new generations replace older ones. At the opposite extreme, language change could be a zeitgeist phenomenon, in which changes are universally adopted by speakers simultaneously, regardless of age or generational cohort. This paper asks this question in the context of word meaning change. We analyze meaning change in over 100 words across more than 7.9 million U.S. congressional speeches, to observe whether, when a word sense rises or falls in prominence, adult speakers from different generations uniformly adopt it, or those from older generations conserve their prior usage. Using language model-based word sense induction methods, we identify different senses of each word, and then model the prevalence of each of these word senses as a function of time and speaker age. We find that most words show a small but statistically significant effect of speaker age; across almost 140 y of Congress, older speakers typically take longer than younger speakers to follow changes in word usage, but nevertheless do so within a few years. Our findings indicate that despite minor age-based differences, word meaning change among mature speakers is likely not a generational process, but rather a zeitgeist process, in which older adult speakers can readily adopt new word usage patterns.

2025-07-27

Proceedings of the National Academy of Sciences (published)

doi.org

A systematic review of risk stratification for pediatric appendicitis

Mahshid Mortazavi

Alexandra Dimmer

Elena Guadagno

Dan Poenaru

Sherif Emil

2025-07-26

Pediatric surgery international (Print) (published)

doi.org

Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations

Jason Hartford

Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (see more)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.

2025-07-25

Transactions on Machine Learning Research (accepted)

doi.org

openreview.net

Uncovering Hidden Factions through Text-Network Representations: Unsupervised Public Opinion Mapping of Iran on Twitter in the 2022 Unrest

Sahar Omidi Shayegan

Jean-François Godbout

Reihaneh Rabbany

Ideological mapping on social media is typically framed as a supervised classification task that depends on stable party systems and abundan… (see more)t annotated data. These assumptions fail in contexts with weak political institutionalization, such as Iran. We recast ideology detection as a fully unsupervised mapping problem and introduce a text-network representation system, uncovering latent ideological factions on Persian Twitter during the 2022 Mahsa Amini protests. Using hundreds of millions of Persian tweets, we learn joint text–network embeddings by fine-tuning ParsBERT with a combined masked-language-modeling and contrastive objective and by passing the embeddings through a Graph Attention Network trained for link prediction on time-batched subgraphs. The pipeline integrates semantic and structural signals without observing labels. Density-based clustering reveals eight ideological blocs whose spatial relations mirror known political alliances. Alignment with 883 expert-labeled accounts yields 53% accuracy. This label-free framework scales to label-scarce contexts, offering new leverage for studying political debates online.

2025-07-25

colmweb.org/COLM/2025/Workshop/NLPOR (published)

openreview.net

What Can Grokking Teach Us About Learning Under Nonstationarity?

Clare Lyle

Gharda Sokar

Razvan Pascanu

Andr'as Gyorgy

In continual learning problems, it is often necessary to overwrite components of a neural network's learned representation in response to ch… (see more)anges in the data stream; however, neural networks often exhibit \primacy bias, whereby early training data hinders the network's ability to generalize on later tasks. While feature-learning dynamics of nonstationary learning problems are not well studied, the emergence of feature-learning dynamics is known to drive the phenomenon of grokking, wherein neural networks initially memorize their training data and only later exhibit perfect generalization. This work conjectures that the same feature-learning dynamics which facilitate generalization in grokking also underlie the ability to overwrite previous learned features as well, and methods which accelerate grokking by facilitating feature-learning dynamics are promising candidates for addressing primacy bias in non-stationary learning problems. We then propose a straightforward method to induce feature-learning dynamics as needed throughout training by increasing the effective learning rate, i.e. the ratio between parameter and update norms. We show that this approach both facilitates feature-learning and improves generalization in a variety of settings, including grokking, warm-starting neural network training, and reinforcement learning tasks.

2025-07-25

ArXiv (preprint)

doi.org

arxiv.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications