Publications

Cross-lingual Open-Retrieval Question Answering for African Languages

Odunayo Ogundepo

Tajuddeen Gwadabe

Clara E. Rivera

Jonathan H. Clark

Sebastian Ruder

David Ifeoluwa Adelani

Bonaventure F. P. Dossou

Abdou Aziz DIOP

Claytone Sikasote

Gilles HACHEME

Happy Buzaaba

Ignatius Ezeani

Rooweither Mabuya

Salomey Osei

Chris Emezue

Albert Kahira

Shamsuddeen Hassan Muhammad

Akintunde Oladipo

Abraham Toluwase Owodunni

Atnafu Lambebo Tonja … (voir 32 de plus)

Iyanuoluwa Shode

Akari Asai

Aremu Anuoluwapo

Ayodele Awokoya

Bernard Opoku

Chiamaka Ijeoma Chukwuneke

Christine Mwase

Clemencia Siro

Stephen Arthur

Oyinkansola Awosan

Tunde Oluwaseyi Ajayi

Verrah Akinyi Otiende

Andre Niyongabo Rubungo

Boyd Sinkala

Daniel Ajisafe

Emeka Felix Onwuegbuzia

Falalu Lawan

Ibrahim Ahmad

Jesujoba Oluwadara Alabi

Habib Mbow

CHINEDU EMMANUEL MBONU

Emile Niyomutabazi

Mofetoluwa Adeyemi

Eunice Mukonde

Mofya Phiri

Orevaoghene Ahia

Ruqayya Nasir Iro

Sonia Adhiambo

Martin Namukombo

Neo Putini

Ndumiso Mngoma

Priscilla A. Amuok

2023-12-01

Findings of the Association for Computational Linguistics: EMNLP 2023 (publié)

doi.org

openreview.net

Cross-lingual Open-Retrieval Question Answering for African Languages

Odunayo Ogundepo

Tajuddeen Gwadabe

Clara E. Rivera

Jonathan H. Clark

Sebastian Ruder

David Ifeoluwa Adelani

Bonaventure F. P. Dossou

Abdou Aziz DIOP

Claytone Sikasote

Gilles Q. Hacheme

Happy Buzaaba

Ignatius Majesty Ezeani

Rooweither Mabuya

Salomey Osei

Chris Emezue

Albert Njoroge Kahira

Shamsuddeen Hassan Muhammad

Akintunde Oladipo

Abraham Toluwase Owodunni

Atnafu Lambebo Tonja … (voir 24 de plus)

Iyanuoluwa Shode

Akari Asai

Aremu Anuoluwapo

Ayodele Awokoya

Bernard Opoku

Chiamaka Ijeoma Chukwuneke

Christine Mwase

Clemencia Siro

Stephen Arthur

Tunde Oluwaseyi Ajayi

V. Otiende

Andre Niyongabo Rubungo

B. Sinkala

Daniel A. Ajisafe

Emeka Onwuegbuzia

Falalu Lawan

Ibrahim Ahmad

Jesujoba Alabi

CHINEDU EMMANUEL MBONU

Mofetoluwa Adeyemi

Mofya Phiri

Orevaoghene Ahia

Ruqayya Nasir Iro

Sonia Adhiambo

2023-12-01

Findings of the Association for Computational Linguistics: EMNLP 2023 (publié)

doi.org

Current AI applications in neurology: Brain imaging

Tal Arbel

Joshua D. Durso-Finley

Jean-Pierre R. Falet

Raghav Mehta

Douglas Arnold

Nick Pawlowski

2023-12-01

Journal of the Neurological Sciences (publié)

doi.org

DiPS: Discriminative Pseudo-Label Sampling with Self-Supervised Transformers for Weakly Supervised Object Localization

Shakeeb Murtaza

Soufiane Belharbi

Marco Pedersoli

Aydin Sarraf

Eric Granger

2023-12-01

Image and Vision Computing (publié)

doi.org

arxiv.org

From physics to sentience: Deciphering the semantics of the free-energy principle and evaluating its claims: Comment on "Path integrals, particular kinds, and strange things" by Karl Friston et al.

Zahra Sheikhbahaee

Adam Safron

Casper Hesp

Guillaume Dumas

2023-12-01

Physics of Life Reviews (publié)

doi.org

arxiv.org

Growth of TiO2 single crystals by the Verneuil method at different gas flow ratio

Xudong Liu

Hanshu Ma

Wei Wang

Yongqi Hu

Xudong Sun

2023-12-01

Journal of Crystal Growth (publié)

doi.org

Large language models: What could they do for neurology?

Guillaume Lajoie

2023-12-01

Journal of the Neurological Sciences (publié)

doi.org

A large-scale exploratory study of android sports apps in the google play store

Bhagya Chembakottu

Heng Li

Foutse Khomh

2023-12-01

Information and Software Technology (publié)

doi.org

arxiv.org

Measuring the Knowledge Acquisition-Utilization Gap in Pretrained Language Models

Amirhossein Kazemnejad

Mehdi Rezagholizadeh

Prasanna Parthasarathi

Sarath Chandar

2023-12-01

Findings of the Association for Computational Linguistics: EMNLP 2023 (publié)

doi.org

openreview.net

Nash Learning from Human Feedback

R'emi Munos

Michal Valko

Daniele Calandriello

M. G. Azar

Mark Rowland

Zhaohan Daniel Guo

Yunhao Tang

Matthieu Geist

Thomas Mesnard

Andrea Michi

Marco Selvi

Sertan Girgin

Nikola Momchev

Olivier Bachem

Daniel J Mankowitz

Doina Precup

Bilal Piot

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (voir plus)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.

2023-12-01

ArXiv (prépublication)

doi.org

arxiv.org

Nash Learning from Human Feedback

Remi Munos

Michal Valko

Daniele Calandriello

Mohammad Gheshlaghi Azar

Mark Rowland

Zhaohan Daniel Guo

Yunhao Tang

Matthieu Geist

Thomas Mesnard

Andrea Michi

Marco Selvi

Sertan Girgin

Nikola Momchev

Olivier Bachem

Daniel J Mankowitz

Doina Precup

Bilal Piot

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (voir plus)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.

2023-12-01

ArXiv (prépublication)

doi.org

arxiv.org

Nash Learning from Human Feedback

Remi Munos

Michal Valko

Daniele Calandriello

Mohammad Gheshlaghi Azar

Mark Rowland

Zhaohan Daniel Guo

Yunhao Tang

Matthieu Geist

Thomas Mesnard

Andrea Michi

Marco Selvi

Sertan Girgin

Nikola Momchev

Olivier Bachem

Daniel J Mankowitz

Doina Precup

Bilal Piot

Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (voir plus)erences. Typically, RLHF involves the initial step of learning a reward model from human feedback, often expressed as preferences between pairs of text generations produced by a pre-trained LLM. Subsequently, the LLM's policy is fine-tuned by optimizing it to maximize the reward model through a reinforcement learning algorithm. However, an inherent limitation of current reward models is their inability to fully represent the richness of human preferences and their dependency on the sampling distribution. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a preference model, which is conditioned on two inputs given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. To demonstrate the effectiveness of our approach, we present experimental results involving the fine-tuning of a LLM for a text summarization task. We believe NLHF offers a compelling avenue for preference learning and policy optimization with the potential of advancing the field of aligning LLMs with human preferences.

2023-12-01

ArXiv (prépublication)

doi.org

arxiv.org

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Publications

Avantage IA

Mettre à profit l'IA pour un avenir durable

Bourse Mila en politiques de l'IA

Avantage IA

Mettre à profit l'IA pour un avenir durable

Mots-clés populaires:

Publications