Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
Kusha Sareen
Morgane M Moss
Arian Hosseini
Real-time fine finger motion decoding for transradial amputees with surface electromyography
Zihan Weng
Yang Xiao
Peiyang Li
Chanlin Yi
Hailin Ma
Guang Yao
Yuan Lin
Fali Li
Dezhong Yao 0001
Jingming Hou
Yangsong Zhang
Peng Xu
REARANK: Reasoning Re-ranking Agent via Reinforcement Learning
Le Zhang
Bo Wang
Xipeng Qiu
We present REARANK, a large language model (LLM)-based listwise reasoning reranking agent. REARANK explicitly reasons before reranking, sign… (voir plus)ificantly improving both performance and interpretability. Leveraging reinforcement learning and data augmentation, REARANK achieves substantial improvements over baseline models across popular information retrieval benchmarks, notably requiring only 179 annotated samples. Built on top of Qwen2.5-7B, our REARANK-7B demonstrates performance comparable to GPT-4 on both in-domain and out-of-domain benchmarks and even surpasses GPT-4 on reasoning-intensive BRIGHT benchmarks. These results underscore the effectiveness of our approach and highlight how reinforcement learning can enhance LLM reasoning capabilities in reranking.
Rejecting Hallucinated State Targets during Planning
Harry Zhao
Mingde Zhao
Tristan Sylvain
Romain Laroche
Rootlets-based registration to the spinal cord PAM50 template
Sandrine B'edard
Jan Valošek
Valeria Oliva
Kenneth A. Weber
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Defne Tur
Nicholas Meade
Xing Han Lu
Alejandra Zambrano
Arkil Patel
Esin DURMUS
Spandana Gella
Karolina Stanczak
Scaling Trends in Language Model Robustness
Nikolaus H. R. Howe
Ian R. McKenzie
Oskar John Hollinsworth
Michał Zając
Tom Tseng
Aaron David Tucker
Adam Gleave
SCAR: Shapley Credit Assignment for More Efficient RLHF
Meng Cao
Shuyuan Zhang
Xiaojun Chang
SDLog: A Deep Learning Framework for Detecting Sensitive Information in Software Logs
Roozbeh Aghili
Xingfang Wu
Heng Li
Search-Based Correction of Reasoning Chains for Language Models
Minsu Kim
Jean-Pierre R. Falet
Oliver E. Richardson
Xiaoyin Chen
Moksh J. Jain
Sungjin Ahn
Sungsoo Ahn
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models
Lucas Berry
Axel Brando
Wei-Di Chang
Juan Higuera
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Alexandre Piché
Nicolas Gontier
Ehsan Kamalloo