Publications

Advancing science- and evidence-based AI policy.
Rishi Bommasani
Sanjeev Arora
Jennifer Chayes
Yejin Choi
Mariano-Florentino Cuéllar
Li Fei-Fei
Daniel E. Ho
Dan Jurafsky
Sanmi Koyejo
Hima Lakkaraju
Arvind Narayanan
Alondra Nelson
Emma Pierson
Scott Singer
Suresh Venkatasubramanian
Ion Stoica
Percy Liang
Dawn Song
Computing Approximate Nash Equilibria for Integer Programming Games
Aloïs Duguet
Gabriele Dragotto
Sandra-ulrich Ngueveu
Evaluating and Improving LitLLMs with Deep Research
Issam Hadj Laradji
Krishnamurthy Dj Dvijotham
Jason Stanley
Christopher Pal
Literature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially du… (see more)e to the recent influx of research papers. This paper explores the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We decompose the task into two components: (1) Retrieving related works given a query abstract and (2) Writing a literature review based on the retrieved results. We analyze how effective LLMs are for both components. For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods while providing insights into the LLM's decision-making process. In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We release this evaluation protocol to promote additional research and development in this regard. Our empirical results suggest that LLMs show promising potential for writing literature reviews when the task is decomposed into smaller components of retrieval and planning. Particularly, our ``Deep Research" retrieval variant improves coverage by over 5x compared to standard keyword search, addressing a key bottleneck in the pipeline. Further, we demonstrate that our planning-based approach achieves higher-quality reviews by minimizing hallucinated references in the generated review by 18-26\% compared to existing simpler LLM-based generation methods.
Towards a General GNN Framework for Combinatorial Optimization
Latent brain subtypes of chronotype reveal unique behavioral and health profiles: an across-cohort validation
Julie Carrier
Kai-Florian Storch
Robin Dunbar
Chronotype is shaped by the complex interplay of endogenous and exogenous factors. This trait ties into various behaviors in the wider socie… (see more)ty and is linked to the prevalence of psychiatric and metabolic conditions. Despite its multifaceted nature, prior research has treated chronotype as a monolithic trait across the population, risking overlooking substantial heterogeneity in neural and behavioral fingerprints of both early risers and night owls. To test for such hidden subgroups, we developed a supervised pattern-learning framework for trait subtyping, integrating three complementary brain-imaging modalities with deep behavior, diagnosis, and drug prescription profiling from 27,030 UK Biobank participants. We identified and characterized five distinct biologically valid chronotype subtypes: (1) typical eveningness, (2) depression-associated eveningness, (3) typical morningness, (4) morningness with greater expression in females, and (5) eveningness with greater expression in males. Each uncovered subtype showed unique patterns across brain, behavioral and health profiles. We finally externally validated these subtypes in 10,550 US children from the ABCD Study® cohort, which revealed reversed age distributions and replicated sex-associated brain-behavioral patterns, underscoring the fact that potential divergences between chronotype traits observed throughout adulthood may begin to emerge early in life. These findings highlight underappreciated sources of population variation that echo the rhythm of people’s inner clock.
QComp: A QSAR-Based Imputation Framework for Drug Discovery.
Bingjia Yang
Yunsie Chung
Archer Y. Yang
Bo Yuan
Tianchi Chen
Xiang Yu
In drug discovery, in vitro and in vivo experiments generate biochemical activity data that are crucial for evaluating the efficacy and toxi… (see more)city of compounds. These data sets are massive, sparse, and ever-evolving. Quantitative structure-activity relationship (QSAR) models, which predict biochemical activities from compound structures, face challenges in integrating the evolving experimental data agilely as studies progress. We developed QSAR-Complete (QComp), an imputation framework, to address these challenges. While QSAR models are updated at a slow pace through extensive retraining on enlarging data sets, QComp leverages existing QSAR models to immediately exploit new experimental data and improves the imputation of missing data. We demonstrate that the improvement is robust and substantial for imputing in vivo assays with only in vitro experimental data. Additionally, QComp assists in finding the optimal sequence of experiments by quantifying the reduction in statistical uncertainty for specific end points, aiding in rational decision-making throughout the drug discovery process.
Semantic change in adults is not primarily a generational phenomenon
Morgan Sonderegger
Dallas Card
A central question in the study of language change is whether or not such change is generational. If a language changes over time generation… (see more)-by-generation, the process looks as follows: New generations of speakers introduce innovations, while older speakers conserve their usage patterns, and the language changes as new generations replace older ones. At the opposite extreme, language change could be a zeitgeist phenomenon, in which changes are universally adopted by speakers simultaneously, regardless of age or generational cohort. This paper asks this question in the context of word meaning change. We analyze meaning change in over 100 words across more than 7.9 million U.S. congressional speeches, to observe whether, when a word sense rises or falls in prominence, adult speakers from different generations uniformly adopt it, or those from older generations conserve their prior usage. Using language model-based word sense induction methods, we identify different senses of each word, and then model the prevalence of each of these word senses as a function of time and speaker age. We find that most words show a small but statistically significant effect of speaker age; across almost 140 y of Congress, older speakers typically take longer than younger speakers to follow changes in word usage, but nevertheless do so within a few years. Our findings indicate that despite minor age-based differences, word meaning change among mature speakers is likely not a generational process, but rather a zeitgeist process, in which older adult speakers can readily adopt new word usage patterns.
A systematic review of risk stratification for pediatric appendicitis
Mahshid Mortazavi
Alexandra Dimmer
Elena Guadagno
Sherif Emil
Leveraging Structure Between Environments: Phylogenetic Regularization Incentivizes Disentangled Representations
Recently, learning invariant predictors across varying environments has been shown to improve the generalization of supervised learning meth… (see more)ods. This line of investigation holds great potential for application to biological problem settings, where data is often naturally heterogeneous. Biological samples often originate from different distributions, or environments. However, in biological contexts, the standard "invariant prediction" setting may not completely fit: the optimal predictor may in fact vary across biological environments. There also exists strong domain knowledge about the relationships between environments, such as the evolutionary history of a set of species, or the differentiation process of cell types. Most work on generic invariant predictors have not assumed the existence of structured relationships between environments. However, this prior knowledge about environments themselves has already been shown to improve prediction through a particular form of regularization applied when learning a set of predictors. In this work, we empirically evaluate whether a regularization strategy that exploits environment-based prior information can be used to learn representations that better disentangle causal factors that generate observed data. We find evidence that these methods do in fact improve the disentanglement of latent embeddings. We also show a setting where these methods can leverage phylogenetic information to estimate the number of latent causal features.
Uncovering Hidden Factions through Text-Network Representations: Unsupervised Public Opinion Mapping of Iran on Twitter in the 2022 Unrest
Ideological mapping on social media is typically framed as a supervised classification task that depends on stable party systems and abundan… (see more)t annotated data. These assumptions fail in contexts with weak political institutionalization, such as Iran. We recast ideology detection as a fully unsupervised mapping problem and introduce a text-network representation system, uncovering latent ideological factions on Persian Twitter during the 2022 Mahsa Amini protests. Using hundreds of millions of Persian tweets, we learn joint text–network embeddings by fine-tuning ParsBERT with a combined masked-language-modeling and contrastive objective and by passing the embeddings through a Graph Attention Network trained for link prediction on time-batched subgraphs. The pipeline integrates semantic and structural signals without observing labels. Density-based clustering reveals eight ideological blocs whose spatial relations mirror known political alliances. Alignment with 883 expert-labeled accounts yields 53% accuracy. This label-free framework scales to label-scarce contexts, offering new leverage for studying political debates online.
Comparative genomics of Pseudomonas paraeruginosa
Maxime Déraspe
Lori L. Burrows
Romé Voulhoux
Daniela Centrón
J. Corbeil
Paul H Roy
ABSTRACT The PA7-clade (or group 3) of Pseudomonas aeruginosa is now recognized as a distinct species, Pseudomonas paraeruginosa. We report … (see more)here the genomic sequences of six new strains of P. paraeruginosa: Zw26 (the first complete genome of a cystic fibrosis isolate of P. paraeruginosa), draft genomes of four burn and wound strains from Argentina very closely related to PA7, and of Pa5196, the strain in which arabinosylation of type IV pili was documented. We compared the genomes of 82 strains of P. paraeruginosa and confirmed that the species is divided into two sub-clades. Core genomes are very similar, while most differences are found in “regions of genomic plasticity” (RGPs). Several genomic deletions were identified, and most are common to the CR1 sub-clade that includes Zw26 and Pa5196. All strains lack the type 3 secretion system (T3SS) and instead use an alternative virulence strategy involving an exolysin, a characteristic shared with group 5 P. aeruginosa. All strains tend to be multiresistant like PA7, with a significant proportion of carbapenem-resistant strains, either oprD mutants or carrying carbapenemase genes. Although P. paraeruginosa is still relatively rare, it has a worldwide distribution. Its multiresistance and its alternative virulence strategy need to be considered in future therapeutic development. IMPORTANCE Pseudomonas aeruginosa is an important opportunistic pathogen causing respiratory infections, notably in cystic fibrosis, and burn and wound infections. Our study reports six new genomes of Pseudomonas paraeruginosa, a new species recently reported as distinct from P. aeruginosa. The number of sequenced genomes of P. paraeruginosa is only about 1% that of P. aeruginosa. We compare the genomic content of nearly all strains of P. paraeruginosa in GenBank, highlighting the differences in core and accessory genomes, antimicrobial resistance genes, and virulence factors. This novel species is very similar in environmental spectrum to P. aeruginosa but is notably resistant to last-line antibiotics and uses an alternative virulence strategy based on exolysin—this strategy being shared with some P. aeruginosa outliers.
Comparative genomics of
<i>Pseudomonas paraeruginosa</i>
Maxime Déraspe
Lori L. Burrows
Romé Voulhoux
Daniela Centrón
J. Corbeil
Paul H Roy