A scalable gene network model of regulatory dynamics in single cells
Paul Bertin
Joseph D Viviano
Alejandro Tejada-Lapuerta
Weixu Wang
Stefan Bauer
Fabian J. Theis
Capturing Individual Human Preferences with Reward Features
Andre Barreto
Vincent Dumoulin
Yiran Mao
Nicolas Perez-Nieves
Bobak Shahriari
Yann Dauphin
Reinforcement learning from human feedback usually models preferences using a reward model that does not distinguish between people. We argu… (see more)e that this is unlikely to be a good design choice in contexts with high potential for disagreement, like in the training of large language models. We propose a method to specialise a reward model to a person or group of people. Our approach builds on the observation that individual preferences can be captured as a linear combination of a set of general reward features. We show how to learn such features and subsequently use them to quickly adapt the reward model to a specific individual, even if their preferences are not reflected in the training data. We present experiments with large language models comparing the proposed architecture with a non-adaptive reward model and also adaptive counterparts, including models that do in-context personalisation. Depending on how much disagreement there is in the training data, our model either significantly outperforms the baselines or matches their performance with a simpler architecture and more stable training.
Capturing Individual Human Preferences with Reward Features
Andr'e Barreto
Vincent Dumoulin
Yiran Mao
Nicolas Perez-Nieves
Bobak Shahriari
Yann Dauphin
Offline Model-Based Optimization: Comprehensive Review
Minsu Kim
Jiayao Gu
Ye Yuan
Taeyoung Yun
Zixuan Liu
Can Chen
Offline Model-Based Optimization: Comprehensive Review
Minsu Kim
Jiayao Gu
Ye Yuan
Taeyoung Yun
Zixuan Liu
Can Chen
RL4Med-DDPO: Reinforcement Learning for Controlled Guidance Towards Diverse Medical Image Generation using Vision-Language Foundation Models
Parham Saremi
Amar Kumar
Mohammed Mohammed
Zahra Tehraninasab
Meditation induces shifts in neural oscillations, brain complexity and critical dynamics: Novel insights from MEG
Annalisa Pascarella
Philipp Thölke
David Meunier
Jordan O’Byrne
Tarek Lajnef
Antonino Raffone
Roberto Guidotti
Vittorio Pizzella
Laura Marzetti
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction
Shravan Nayak
Xiangru Jian
Kevin Qinghong Lin
Juan A. Rodriguez
Montek Kalsi
Rabiul Awal
M. T. ¨Ozsu
David Vazquez
Perouz Taslakian
Spandana Gella
Sai Rajeswar
Human Annotator
Hitting the right pitch: Cortical tracking of fundamental frequency changes across speech rates in auditory and sensorimotor regions
Yorguin-Jose Mantilla-Ramos
Ana-Sofía Hincapié-Casas
Annalisa Pascarella
Tarek Lajnef
Richard M. Leahy
Emily B.J. Coffey
Véronique Boulenger
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fr'echette
Carolyne Pelletier
Eric Thibodeau-Laufer
S'andor Toth
Sam Work
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fr'echette
Carolyne Pelletier
Eric Thibodeau-Laufer
S'andor Toth
Sam Work
Meta-learning Optimizers for Communication-Efficient Learning
Charles-Étienne Joseph
Benjamin Thérien
Abhinav Moudgil
Boris Knyazev