NLP in the era of generative AI, cognitive sciences, and societal transformation
Join us at Mila in October for a three-day workshop to explore the transformative potential of language technologies and their implications for society.
This program is designed to provide decision-makers, policymakers and professional working in policy with a foundational understanding of AI technology.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human pref… (see more)erences. Traditionally, RLHF involves the initial step of learning a reward model from pairwise human feedback, i.e., expressed as preferences between pairs of text generations. Subsequently, the LLM’s policy is fine-tuned to maximize the reward through a reinforcement learning algorithm. In this study, we introduce an alternative pipeline for the fine-tuning of LLMs using pairwise human feedback. Our approach entails the initial learning of a pairwise preference model, which is conditioned on two inputs (instead of a single input in the case of a reward model) given a prompt, followed by the pursuit of a policy that consistently generates responses preferred over those generated by any competing policy, thus defining the Nash equilibrium of this preference model. We term this approach Nash learning from human feedback (NLHF). In the context of a tabular policy representation, we present a novel algorithmic solution, Nash-MD, founded on the principles of mirror descent. This algorithm produces a sequence of policies, with the last iteration converging to the regularized Nash equilibrium. Additionally, we explore parametric representations of policies and introduce gradient descent algorithms for deep-learning architectures. We illustrate the effectiveness of our approach by presenting experimental results on a text summarization task. We believe NLHF offers a compelling avenue for fine-tuning LLMs and enhancing the alignment of LLMs with human preferences.
Introduction The application of large language models such as generative pre-trained transformers (GPTs) has been promising in medical educa… (see more)tion, and its performance has been tested for different medical exams. This study aims to assess the performance of GPTs in responding to a set of sample questions of short-answer management problems (SAMPs) from the certification exam of the College of Family Physicians of Canada (CFPC). Method Between August 8th and 25th, 2023, we used GPT-3.5 and GPT-4 in five rounds to answer a sample of 77 SAMPs questions from the CFPC website. Two independent certified family physician reviewers scored AI-generated responses twice: first, according to the CFPC answer key (ie, CFPC score), and second, based on their knowledge and other references (ie, Reviews’ score). An ordinal logistic generalised estimating equations (GEE) model was applied to analyse repeated measures across the five rounds. Result According to the CFPC answer key, 607 (73.6%) lines of answers by GPT-3.5 and 691 (81%) by GPT-4 were deemed accurate. Reviewer’s scoring suggested that about 84% of the lines of answers provided by GPT-3.5 and 93% of GPT-4 were correct. The GEE analysis confirmed that over five rounds, the likelihood of achieving a higher CFPC Score Percentage for GPT-4 was 2.31 times more than GPT-3.5 (OR: 2.31; 95% CI: 1.53 to 3.47; p0.001). Similarly, the Reviewers’ Score percentage for responses provided by GPT-4 over 5 rounds were 2.23 times more likely to exceed th
SelfIE: Self-Interpretation of Large Language Model Embeddings
Haozhe Chen
Carl Vondrick
Chengzhi Mao
How do large language models (LLMs) obtain their answers? The ability to explain and control an LLM's reasoning process is key for reliabili… (see more)ty, transparency, and future model developments. We propose SelfIE (Self-Interpretation of Embeddings), a framework that enables LLMs to interpret their own embeddings in natural language by leveraging their ability to respond to inquiries about a given passage. Capable of interpreting open-world concepts in the hidden embeddings, SelfIE reveals LLM internal reasoning in cases such as making ethical decisions, internalizing prompt injection, and recalling harmful knowledge. SelfIE's text descriptions on hidden embeddings also open up new avenues to control LLM reasoning. We propose Supervised Control, which allows editing open-ended concepts while only requiring gradient computation of individual layer. We extend RLHF to hidden embeddings and propose Reinforcement Control that erases harmful knowledge in LLM without supervision targets.
Masked Image Modeling (MIM) is a promising self-supervised learning approach that enables learning from unlabeled images. Despite its recent… (see more) success, learning good representations through MIM remains challenging because it requires predicting the right semantic content in accurate locations. For example, given an incomplete picture of a dog, we can guess that there is a tail, but we cannot determine its exact location. In this work, we propose to incorporate location uncertainty into MIM by using stochastic positional embeddings (StoP). Specifically, we condition the model on stochastic masked token positions drawn from a Gaussian distribution. StoP reduces overfitting to location features and guides the model toward learning features that are more robust to location uncertainties. Quantitatively, StoP improves downstream MIM performance on a variety of downstream tasks, including