Aurélien Bück-Kaeffer

Research Topics

AGI (Artificial General Intelligence)

AI Alignment

AI Ethics

Applied Machine Learning

Deep Learning

Deep Neural Networks

Generative Models

Large Language Models (LLM)

Multimodal Learning

Natural Language Processing

Neural Networks

Optimization

Reinforcement Learning

GitHub

Publications

EASE Configuration Facilitates A Reproducible Science of LLM Social Simulations

Sneheel Sarangi

Aurélien Bück-Kaeffer

LLMs are increasingly deployed to simulate social interactions, yet many of the existing simulators remain ad hoc and monolithic. This lack … (see more)of architectural standardization prevents reproducible research and complicates downstream evaluation. We advance a rigorous science of LLM-based multi-agent simulation by modularizing core components into Environments, Agents, Simulation engines, and Evaluation metrics (EASE). We demonstrate the utility of EASE configuration by wrapping it in an experimental study schema for orchestrating workflows centered around answering explicit research questions in generated scenarios. We contribute SiliSocS, an open-source, research-ready Silicon Society Sandbox implementing a study-structured EASE configuration to enable highly configurable and reproducible LLM-based social simulations. Using SiliSocS and EASE, we present three case studies, showcasing the system's comprehensive assessment of existing questions, ability to dive deeper into complex questions, and elaboration of existing studies, respectively. Together, these case studies highlight the limitations of current modeling approaches and isolate the impacts of design choices on key results.

2026-05-27

arXiv (preprint)

doi.org

arxiv.org

The $\textit{Silicon Society}$ Cookbook: Design Space of LLM-based Social Simulations

Aurélien Bück-Kaeffer

Sneheel Sarangi

Studies attempting to simulate human behavior with …

2026-04-29

arXiv (preprint)

doi.org

arxiv.org

Position: Time to Close The Validation Gap in LLM Social Simulations

Sneheel Sarangi

Aurélien Bück-Kaeffer

LLM-based social simulations—in which many language model agents interact over multiple turns—are rapidly proliferating across policy an… (see more)alysis, epidemiology, and computational social science. Yet the field lacks consensus on how to validate these simulations, with evaluation methods that are sparse, inconsistent, and rarely shared across disciplinary silos. We argue this creates a serious risk: premature deployment of unvalidated simulators in high-stakes domains. Our position is that the field must pivot from expansion to consolidation, prioritizing methodological standardization—shared benchmarks, open data, and reproducible evaluation protocols grounded in social science and complex systems research. We outline a concrete research program organized around specific learning problems/benchmarks, providing a path toward answering the fundamental question: when are LLM social simulations useful modelling objects?

2025-12-31

International Conference on Machine Learning (Accept (regular))

openreview.net

$\texttt{BluePrint}$: A Social Media User Dataset for LLM Persona Evaluation and Training

Aur'elien Buck-Kaeffer

Je Qin Chooi

Dan Zhao

Kellin Pelrine