Marylou Fauchard

Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets

2025-09-22

NeurIPS.cc/2025/Workshop/WiML (published)

doi.org

openreview.net

Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets

Recent advances in reasoning with large language models (LLMs) have demonstrated strong performance on complex mathematical tasks, including… (see more) combinatorial optimization. Techniques such as Chain-of-Thought and In-Context Learning have further enhanced this capability, making LLMs both powerful and accessible tools for a wide range of users, including non-experts. However, applying LLMs to matching problems, which require reasoning under preferential and structural constraints, remains underexplored. To address this gap, we introduce a novel benchmark of 369 instances of the College Admission Problem, a canonical example of a matching problem with preferences, to evaluate LLMs across key dimensions: feasibility, stability, and optimality. We employ this benchmark to assess the performance of several open-weight LLMs. Our results first reveal that while LLMs can satisfy certain constraints, they struggle to meet all evaluation criteria consistently. They also show that reasoning LLMs, like QwQ and GPT-oss, significantly outperform traditional models such as Llama, Qwen or Mistral, defined here as models used without any dedicated reasoning mechanisms. Moreover, we observed that LLMs reacted differently to the various prompting strategies tested, which include Chain-of-Thought, In-Context Learning and role-based prompting, with no prompt consistently offering the best performance. Finally, we report the performances from iterative prompting with auto-generated feedback and show that they are not monotonic; they can peak early and then significantly decline in later attempts. Overall, this work offers a new perspective on model reasoning performance and the effectiveness of prompting strategies in combinatorial optimization problems with preferential constraints.

2025-09-16

ArXiv (preprint)

doi.org

arxiv.org

Reasoning with Preference Constraints: A Benchmark for Language Models in Many-to-One Matching Markets

Recent advances in reasoning with large language models (LLMs) have demonstrated strong performance on complex mathematical tasks, including… (see more) combinatorial optimization. Techniques such as Chain-of-Thought and In-Context Learning have further enhanced this capability, making LLMs both powerful and accessible tools for a wide range of users, including non-experts. However, applying LLMs to matching problems, which require reasoning under preferential and structural constraints, remains underexplored. To address this gap, we introduce a novel benchmark of 369 instances of the College Admission Problem, a canonical example of a matching problem with preferences, to evaluate LLMs across key dimensions: feasibility, stability, and optimality. We employ this benchmark to assess the performance of several open-weight LLMs. Our results first reveal that while LLMs can satisfy certain constraints, they struggle to meet all evaluation criteria consistently. They also show that reasoning LLMs, like QwQ and GPT-oss, significantly outperform traditional models such as Llama, Qwen or Mistral, defined here as models used without any dedicated reasoning mechanisms. Moreover, we observed that LLMs reacted differently to the various prompting strategies tested, which include Chain-of-Thought, In-Context Learning and role-based prompting, with no prompt consistently offering the best performance. Finally, we report the performances from iterative prompting with auto-generated feedback and show that they are not monotonic; they can peak early and then significantly decline in later attempts. Overall, this work offers a new perspective on model reasoning performance and the effectiveness of prompting strategies in combinatorial optimization problems with preferential constraints.

2025-09-01

arXiv (published)

doi.org

Mila AI Policy Conference

Leading in a New Era

TRAIL: Responsible AI for Professionals and Leaders

Marylou Fauchard

Publications

Mila AI Policy Conference

Leading in a New Era

TRAIL: Responsible AI for Professionals and Leaders

Popular keywords:

Marylou Fauchard

Publications