Aristides Milios

Représentant du laboratoire

Doctorat

Superviseur⋅e principal⋅e

Chris Pal

Sujets de recherche

Traitement du langage naturel

Site web

Google Scholar

GitHub

Biographie

Je suis un chercheur passionné en apprentissage automatique et doctorant à l'Université de Montréal/Mila, spécialisé dans le traitement du langage naturel et les modèles fondamentaux Vision/Texte. Mes recherches portent sur l'exploration des capacités de raisonnement dans le monde réel et d'auto-amélioration des grands modèles de langage, notamment dans le contexte des modèles de langage en tant qu'agents et de l'utilisation d'outils.

Auparavant, j'ai obtenu une maîtrise en sciences (M.Sc.) à l'Université McGill sous la supervision du Dr. Siva Reddy et du Dr. Dzmitry Bahdanau (en recherchant l'utilisation des grands modèles de langage en combinaison avec des modèles de recherche dense pour la sélection de démonstrations dans le contexte), et j'ai acquis une expérience en industrie en tant que stagiaire en recherche chez ServiceNOW (en étudiant l'auto-évaluation et l'auto-amélioration pour promouvoir la concision sur des sujets susceptibles de générer des hallucinations de la part des modèles de langage).

Depuis que j'ai commencé un doctorat sous la supervision du Dr. Chris Pal à l'UdeM en septembre 2024, mon objectif est d'appliquer les grands modèles de langage à des cas d'utilisation concrets, tels que des assistants de conception interactifs et itératifs basés sur le dialogue, tout en continuant à explorer la capacité des modèles à s'auto-améliorer grâce à l'auto-évaluation.

Publications

LLMs can learn self-restraint through iterative self-reflection

Alexandre Piché

Aristides Milios

Dzmitry Bahdanau

Chris Pal

2025-01-01

Trans. Mach. Learn. Res. (publié)

doi.org

openreview.net

LLMs can learn self-restraint through iterative self-reflection

Alexandre Piché

Aristides Milios

Dzmitry Bahdanau

Chris Pal

In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of … (voir plus)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of"self-reflection"consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.

2024-05-15

ArXiv (prépublication)

doi.org

arxiv.org

Self-evaluation and self-prompting to improve the reliability of LLMs

Alexandre Piché

Aristides Milios

Dzmitry Bahdanau

Chris Pal

In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (voir plus)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.

2024-03-04

ICLR.cc/2024/Workshop/SeT_LLM (publié)

openreview.net