Portrait of Aristides Milios

Aristides Milios

Lab Representative
PhD
Supervisor
Research Topics
Natural Language Processing

Biography

I'm a passionate machine learning researcher and PhD student at Université de Montréal & MILA, specializing in Natural Language Processing and Vision/Text Foundation Models. My research focuses on investigating the real-world reasoning and self-improvement abilities of large language models, especially in the context of language models as agents and tool-use.

Previously, I completed my M.Sc. at McGill University under the supervision of Dr. Siva Reddy and Dr. Dzmitry Bahdanau (researching using LLMs in conjunction with dense retrieval models for in-context demonstration selection), and gained industry experience as a Research Intern at ServiceNOW (researching self-evaluation and self-improvement in the context of promoting conciseness when it comes to topics the LLM is likely to hallucinate about). Having started a PhD under Dr. Chris Pal at UdeM in September 2024, my aim is to apply LLMs to real-world use cases, such as interactive and iterative dialogue-based design assistants, as well as continuing to investigate the ability of the models to self-improve though self-evaluation.

Publications

LLMs can learn self-restraint through iterative self-reflection
LLMs can learn self-restraint through iterative self-reflection
In order to be deployed safely, Large Language Models (LLMs) must be capable of dynamically adapting their behavior based on their level of … (see more)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood, which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a utility function that can encourage the model to produce responses only when it is confident in them. This utility function can be used to score generation of different length and abstention. To optimize this function, we introduce ReSearch, a process of"self-reflection"consisting of iterative self-prompting and self-evaluation. We use the ReSearch algorithm to generate synthetic data on which we finetune our models. Compared to their original versions, our resulting models generate fewer \emph{hallucinations} overall at no additional inference cost, for both known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to abstain by augmenting the samples generated by the model during the search procedure with an answer expressing abstention.
Self-evaluation and self-prompting to improve the reliability of LLMs
In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (see more)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.
Self-evaluation and self-prompting to improve the reliability of LLMs
In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (see more)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.
In-Context Learning for Text Classification with Many Labels
ROSA: Random Orthogonal Subspace Adaptation