Publications

Gintare Karolina Dziugaite

Evaluating Interventional Reasoning Capabilities of Large Language Models

Tejas Kasetty

Divyat Mahajan

Alexandre Drouin

Dhanya Sridhar

Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consid… (voir plus)er using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. We evaluate six LLMs on the benchmarks, finding that GPT models show promising accuracy at predicting the intervention effects.

2024-04-08

ArXiv (prépublication)

Gintare Karolina Dziugaite

Evaluating Interventional Reasoning Capabilities of Large Language Models

Tejas Kasetty

Divyat Mahajan

Alexandre Drouin

Dhanya Sridhar

Numerous decision-making tasks require estimating causal effects under interventions on different parts of a system. As practitioners consid… (voir plus)er using large language models (LLMs) to automate decisions, studying their causal reasoning capabilities becomes crucial. A recent line of work evaluates LLMs ability to retrieve commonsense causal facts, but these evaluations do not sufficiently assess how LLMs reason about interventions. Motivated by the role that interventions play in causal inference, in this paper, we conduct empirical analyses to evaluate whether LLMs can accurately update their knowledge of a data-generating process in response to an intervention. We create benchmarks that span diverse causal graphs (e.g., confounding, mediation) and variable types, and enable a study of intervention-based reasoning. These benchmarks allow us to isolate the ability of LLMs to accurately predict changes resulting from their ability to memorize facts or find other shortcuts. We evaluate six LLMs on the benchmarks, finding that GPT models show promising accuracy at predicting the intervention effects.

2024-04-08

ArXiv (prépublication)

Learning Heuristics for Transit Network Design and Improvement with Deep Reinforcement Learning

Andrew Holliday

A. El-geneidy

Gregory Dudek

2024-04-08

ArXiv (prépublication)

Learning Minimal NAP Specifications for Neural Network Verification

Chuqin Geng

Zhaoyue Wang

Haolin Ye

Saifei Liao

Xujie Si

Specifications play a crucial role in neural network verification. They define the precise input regions we aim to verify, typically represe… (voir plus)nted as L-infinity norm balls. While recent research suggests using neural activation patterns (NAPs) as specifications for verifying unseen test set data, it focuses on computing the most refined NAPs, often limited to very small regions in the input space. In this paper, we study the following problem: Given a neural network, find a minimal (coarsest) NAP that is sufficient for formal verification of the network's robustness. Finding the minimal NAP specification not only expands verifiable bounds but also provides insights into which neurons contribute to the model's robustness. To address this problem, we propose several exact and approximate approaches. Our exact approaches leverage the verification tool to find minimal NAP specifications in either a deterministic or statistical manner. Whereas the approximate methods efficiently estimate minimal NAPs using adversarial examples and local gradients, without making calls to the verification tool. This allows us to inspect potential causal links between neurons and the robustness of state-of-the-art neural networks, a task for which existing verification frameworks fail to scale. Our experimental results suggest that minimal NAP specifications require much smaller fractions of neurons compared to the most refined NAP specifications, yet they can significantly expand the verifiable boundaries to several orders of magnitude larger.

2024-04-06

ArXiv (prépublication)

SAT-DIFF: A Tree Diffing Framework Using SAT Solving

Chuqin Geng

Haolin Ye

Yihan Zhang

Brigitte Pientka

Xujie Si

Computing differences between tree-structured data is a critical but challenging problem in software analysis. In this paper, we propose a n… (voir plus)ovel tree diffing approach called SatDiff, which reformulates the structural diffing problem into a MaxSAT problem. By encoding the necessary transformations from the source tree to the target tree, SatDiff generates correct, minimal, and type safe low-level edit scripts with formal guarantees. We then synthesize concise high-level edit scripts by effectively merging low-level edits in the appropriate topological order. Our empirical results demonstrate that SatDiff outperforms existing heuristic-based approaches by a significant margin in terms of conciseness while maintaining a reasonable runtime.

2024-04-06

ArXiv (prépublication)

Alexia Jolicoeur-Martineau

PopulAtion Parameter Averaging (PAPA)

Emy Gervais

Kilian Fatras

Yang Zhang

Simon Lacoste-Julien

2024-04-05

TMLR (accepté)

openreview.net

Regulating advanced artificial agents

Michael K. Cohen

Noam Kolt

Gillian K. Hadfield

Stuart Russell

Governance frameworks should address the prospect of AI systems that cannot be safely tested Technical experts and policy-makers have increa… (voir plus)singly emphasized the need to address extinction risk from artificial intelligence (AI) systems that might circumvent safeguards and thwart attempts to control them (1). Reinforcement learning (RL) agents that plan over a long time horizon far more effectively than humans present particular risks. Giving an advanced AI system the objective to maximize its reward and, at some point, withholding reward from it, strongly incentivizes the AI system to take humans out of the loop, if it has the opportunity. The incentive to deceive humans and thwart human control arises not only for RL agents but for long-term planning agents (LTPAs) more generally. Because empirical testing of sufficiently capable LTPAs is unlikely to uncover these dangerous tendencies, our core regulatory proposal is simple: Developers should not be permitted to build sufficiently capable LTPAs, and the resources required to build them should be subject to stringent controls.

2024-04-05

Science (publié)

Regulating advanced artificial agents

Michael K. Cohen

Noam Kolt

Gillian K. Hadfield

Stuart Russell

2024-04-05

Science (publié)

Regulating advanced artificial agents

Michael K. Cohen

Noam Kolt

Gillian K. Hadfield

Stuart Russell

Governance frameworks should address the prospect of AI systems that cannot be safely tested Technical experts and policy-makers have increa… (voir plus)singly emphasized the need to address extinction risk from artificial intelligence (AI) systems that might circumvent safeguards and thwart attempts to control them (1). Reinforcement learning (RL) agents that plan over a long time horizon far more effectively than humans present particular risks. Giving an advanced AI system the objective to maximize its reward and, at some point, withholding reward from it, strongly incentivizes the AI system to take humans out of the loop, if it has the opportunity. The incentive to deceive humans and thwart human control arises not only for RL agents but for long-term planning agents (LTPAs) more generally. Because empirical testing of sufficiently capable LTPAs is unlikely to uncover these dangerous tendencies, our core regulatory proposal is simple: Developers should not be permitted to build sufficiently capable LTPAs, and the resources required to build them should be subject to stringent controls.

2024-04-05

Science (publié)

Regulating advanced artificial agents

Michael K. Cohen

Noam Kolt

Gillian K. Hadfield

Stuart Russell

Governance frameworks should address the prospect of AI systems that cannot be safely tested Technical experts and policy-makers have increa… (voir plus)singly emphasized the need to address extinction risk from artificial intelligence (AI) systems that might circumvent safeguards and thwart attempts to control them (1). Reinforcement learning (RL) agents that plan over a long time horizon far more effectively than humans present particular risks. Giving an advanced AI system the objective to maximize its reward and, at some point, withholding reward from it, strongly incentivizes the AI system to take humans out of the loop, if it has the opportunity. The incentive to deceive humans and thwart human control arises not only for RL agents but for long-term planning agents (LTPAs) more generally. Because empirical testing of sufficiently capable LTPAs is unlikely to uncover these dangerous tendencies, our core regulatory proposal is simple: Developers should not be permitted to build sufficiently capable LTPAs, and the resources required to build them should be subject to stringent controls.

2024-04-05

Science (publié)

Scope Ambiguities in Large Language Models

Gaurav Kamath

Sebastian Schuster

Sowmya Vajjala

Siva Reddy

Abstract Sentences containing multiple semantic operators with overlapping scope often create ambiguities in interpretation, known as scope … (voir plus)ambiguities. These ambiguities offer rich insights into the interaction between semantic structure and world knowledge in language processing. Despite this, there has been little research into how modern large language models treat them. In this paper, we investigate how different versions of certain autoregressive language models—GPT-2, GPT-3/3.5, Llama 2, and GPT-4—treat scope ambiguous sentences, and compare this with human judgments. We introduce novel datasets that contain a joint total of almost 1,000 unique scope-ambiguous sentences, containing interactions between a range of semantic operators, and annotated for human judgments. Using these datasets, we find evidence that several models (i) are sensitive to the meaning ambiguity in these sentences, in a way that patterns well with human judgments, and (ii) can successfully identify human-preferred readings at a high level of accuracy (over 90% in some cases).1

2024-04-05

ArXiv (prépublication)