Mandana Samiei

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Anthony GX-Chen

Rob Fergus

Kenneth Marino

Language model (LM) agents are increasingly used as autonomous decision-makers who need to actively gather information to guide their decisi… (see more)ons. A crucial cognitive skill for such agents is the efficient exploration and understanding of the causal structure of the world -- key to robust, scientifically grounded reasoning. Yet, it remains unclear whether LMs possess this capability or exhibit systematic biases leading to erroneous conclusions. In this work, we examine LMs' ability to explore and infer causal relationships, using the well-established"Blicket Test"paradigm from developmental psychology. We find that LMs reliably infer the common, intuitive disjunctive causal relationships but systematically struggle with the unusual, yet equally (or sometimes even more) evidenced conjunctive ones. This"disjunctive bias"persists across model families, sizes, and prompting strategies, and performance further declines as task complexity increases. Interestingly, an analogous bias appears in human adults, suggesting that LMs may have inherited deep-seated reasoning heuristics from their training data. To this end, we quantify similarities between LMs and humans, finding that LMs exhibit adult-like inference profiles (but not children-like). Finally, we propose a test-time sampling method which explicitly samples and eliminates hypotheses about causal relationships from the LM. This scalable approach significantly reduces the disjunctive bias and moves LMs closer to the goal of scientific, causally rigorous reasoning.

2025-07-07

colmweb.org/COLM/2025/Conference (accepted)

doi.org

openreview.net

AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models

Jacob Chmura

Shahrad Mohammadzadeh

Taz Scott-Talib

Nishanth Anand

2025-06-09

ICML.cc/2025/Workshop/CODEML (published)

openreview.net

Language Agents Mirror Human Causal Reasoning Biases. How Can We Help Them Think Like Scientists?

Anthony GX-Chen

Rob Fergus

Kenneth Marino

2025-05-14

ArXiv (preprint)

arxiv.org

Torchmeta: A Meta-Learning library for PyTorch

The constant introduction of standardized benchmarks in the literature has helped accelerating the recent advances in meta-learning research… (see more). They offer a way to get a fair comparison between different algorithms, and the wide range of datasets available allows full control over the complexity of this evaluation. However, for a large majority of code available online, the data pipeline is often specific to one dataset, and testing on another dataset requires significant rework. We introduce Torchmeta, a library built on top of PyTorch that enables seamless and consistent evaluation of meta-learning algorithms on multiple datasets, by providing data-loaders for most of the standard benchmarks in few-shot classification and regression, with a new meta-dataset abstraction. It also features some extensions for PyTorch to simplify the development of models compatible with meta-learning algorithms. The code is available here: this https URL

2019-09-14

ArXiv (preprint)

arxiv.org

Speed Science

Leading in a New Era

Supervision Requests

Mandana Samiei

Publications

Speed Science

Leading in a New Era

Supervision Requests

Popular keywords:

Mandana Samiei

Publications