Position: Probabilistic Modelling is Sufficient for Causal Inference
Bruno Mlodozeniec
Richard E. Turner
Proceedings of 1st Workshop on Advancing Artificial Intelligence through Theory of Mind
Mouad Abrini
Omri Abend
Dina M. Acklin
Henny Admoni
Gregor Aichinger
Nitay Alon
Zahra Ashktorab
Ashish Atreja
Moises Auron
Alexander Aufreiter
Raghav Awasthi
Soumya Banerjee
Joseph Barnby
Rhea Basappa
Severin Bergsmann
Djallel Bouneffouf
Patrick Callaghan
Marc Cavazza
Thierry Chaminade
Sonia Chernova … (see 88 more)
Mohamed Chetouan
Moumita Choudhury
Axel Cleeremans
J. Cywinski
Fabio Cuzzolin
Hokin Deng
N'yoma Diamond
C. D. Pasquasio
Max J. van Duijn
Mahapatra Dwarikanath
Qingying Gao
Ashok Goel
Rebecca R. Goldstein
Matthew C. Gombolay
Gabriel Enrique Gonzalez
Amar Halilovic
Tobias Halmdienst
Mahimul Islam
Julian Jara-Ettinger
Natalie Kastel
Renana Keydar
Ashish K. Khanna
Mahdi Khoramshahi
Jihyun Kim
Mihyeon Kim
Youngbin Kim
Senka Krivic
Nikita Krasnytskyi
Arun Kumar
Junehyoung Kwon
EunJu Lee
Shane Lee
Peter R. Lewis 0001
Xue Li
Yijiang Li
Michal Lewandowski
Nathan Lloyd
Matthew B. Luebbers
Dezhi Luo
Haiyun Lyu
Dwarikanath Mahapatra
Kamal Maheshwari
Mallika Mainali
P. Mathur
Patrick Mederitsch
Shuwa Miura
Manuel Preston de Miranda
Reuth Mirsky
Shreya Mishra
Nina M. Moorman
Katelyn Morrison
John Muchovej
Bernhard Nessler
Felix Nessler
Hieu Minh Jord Nguyen
Abby Ortego
F. Papay
Antoine Pasquali
Hamed Rahimi
C. Raghu
Amanda L. Royka
Stefan Sarkadi
Jaelle Scheuerman
Simon Schmid
Paul Schrater
Anik Sen
Zahra Sheikhbahaee
Ke Shi
Reid G. Simmons
Nishant Singh
Mason O. Smith
Ramira van der Meulen
Anthia Solaki
Haoran Sun
Viktor Szolga
Matthew E. Taylor
Travis Taylor
Sanne van Waveren
Juan David Vargas
R. Verbrugge
Eitan Wagner
Justin D. Weisz
Ximing Wen
William Yeoh
Wenlong Zhang
Michelle Zhao
Shlomo Zilberstein
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen
Morgane M Moss
Arian Hosseini
Real-time fine finger motion decoding for transradial amputees with surface electromyography
Zihan Weng
Yang Xiao
Peiyang Li
Chanlin Yi
Hailin Ma
Guang Yao
Yuan Lin
Fali Li
Dezhong Yao 0001
Jingming Hou
Yangsong Zhang
Peng Xu
Rejecting Hallucinated State Targets during Planning
Harry Zhao
Mingde Zhao
Tristan Sylvain
Romain Laroche
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur
Nicholas Meade
Xing Han Lu
Alejandra Zambrano
Arkil Patel
Esin DURMUS
Spandana Gella
Karolina Stanczak
Scaling Trends in Language Model Robustness
Nikolaus H. R. Howe
Ian R. McKenzie
Oskar John Hollinsworth
Michał Zając
Tom Tseng
Aaron David Tucker
Adam Gleave
SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs
Roozbeh Aghili
Xingfang Wu
Heng Li
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre R. Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh J. Jain
Sungjin Ahn
Sungsoo Ahn
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models
Lucas Berry
Axel Brando
Wei-Di Chang
Juan Higuera
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Alexandre Piché
Nicolas Gontier
Ehsan Kamalloo
Self-Play $Q$-Learners Can Provably Collude in the Iterated Prisoner's Dilemma
Quentin Bertrand
Juan Agustin Duque
Emilio Calvano
A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such… (see more) as collusive price-setting in oligopoly markets, raising questions about what drives this outcome. In this work, we provide theoretical foundations for this phenomenon in the context of self-play multi-agent Q-learners in the iterated prisoner’s dilemma. We characterize broad conditions under which such agents provably learn the cooperative Pavlov (win-stay, lose-shift) policy rather than the Pareto-dominated “always defect” policy. We validate our theoretical results through additional experiments, demonstrating their robustness across a broader class of deep learning algorithms.