Sifan Wu

PhD - Université de Montréal

Supervisor

Bang Liu

Research Topics

Deep Learning

Natural Language Processing

Publications

What to Ask Next? Probing the Imaginative Reasoning of LLMs with TurtleSoup Puzzles

Mi Zhou

Sifan Wu

H. Zhang

Qi Sima

Bang Liu

We investigate the capacity of Large Language Models (LLMs) for imaginative reasoning—the proactive construction, testing, and revision of… (see more) hypotheses in information-sparse environments. Existing benchmarks, often static or focused on social deduction, fail to capture the dynamic, exploratory nature of this reasoning process. To address this gap, we introduce a comprehensive research framework based on the classic "Turtle Soup" game, integrating a benchmark, an agent, and an evaluation protocol. We present TurtleSoup-Bench, the first large-scale, bilingual, interactive benchmark for imaginative reasoning, comprising 800 turtle soup stories sourced from both the Internet and expert authors. We also propose Mosaic-Agent, a novel agent designed to assess LLMs' performance in this setting. To evaluate reasoning quality, we develop a multi-dimensional protocol measuring logical consistency, detail completion, and conclusion alignment. Experiments with leading LLMs reveal clear capability limits, common failure patterns, and a significant performance gap compared to humans. Our work offers new insights into LLMs' imaginative reasoning and establishes a foundation for future research on exploratory agent behavior.

2026-03-13

AAAI Conference on Artificial Intelligence (published)

doi.org

arxiv.org

Towards Agentic Intelligence for Materials Science

Huan Zhang

Yizhan Li

Wenhao Huang

Ziyu Hou

Yu Song

Xuye Liu

Jinya Jiang

Leonard R. MacGillivray

Teruyasu Mizoguchi

Tianshu Yu

Lizi Liao

Yuyu Luo

Yu Rong

Jia LI

Ying Diao

Heng Ji … (see 1 more)

Bang Liu

The convergence of artificial intelligence and materials science presents a transformative opportunity, but achieving true acceleration in d… (see more)iscovery requires moving beyond task-isolated, fine-tuned models toward agentic systems that plan, act, and learn across the full discovery loop. This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining, through domain adaptation and instruction tuning, to goal-conditioned agents interfacing with simulation and experimental platforms. Unlike prior reviews, we treat the entire process as an end-to-end system to be optimized for tangible discovery outcomes rather than proxy benchmarks. This perspective allows us to trace how upstream design choices-such as data curation and training objectives-can be aligned with downstream experimental success through effective credit assignment. To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science. We then analyze the field through two focused lenses: From the AI perspective, the survey details LLM strengths in pattern recognition, predictive analytics, and natural language processing for literature mining, materials characterization, and property prediction; from the materials science perspective, it highlights applications in materials design, process optimization, and the acceleration of computational workflows via integration with external tools (e.g., DFT, robotic labs). Finally, we contrast passive, reactive approaches with agentic design, cataloging current contributions while motivating systems that pursue long-horizon goals with autonomy, memory, and tool use. This survey charts a practical roadmap towards autonomous, safety-aware LLM agents aimed at discovering novel and useful materials.

2025-12-31

arXiv (preprint)

doi.org

arxiv.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Sifan Wu

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Sifan Wu

Publications