Portrait de Aristides Milios

Aristides Milios

Représentant du laboratoire
Doctorat
Superviseur⋅e principal⋅e
Sujets de recherche
Traitement du langage naturel

Biographie

Je suis un chercheur passionné en apprentissage automatique et doctorant à l'Université de Montréal/Mila, spécialisé dans le traitement du langage naturel et les modèles fondamentaux Vision/Texte. Mes recherches portent sur l'exploration des capacités de raisonnement dans le monde réel et d'auto-amélioration des grands modèles de langage, notamment dans le contexte des modèles de langage en tant qu'agents et de l'utilisation d'outils.

Auparavant, j'ai obtenu une maîtrise en sciences (M.Sc.) à l'Université McGill sous la supervision du Dr. Siva Reddy et du Dr. Dzmitry Bahdanau (en recherchant l'utilisation des grands modèles de langage en combinaison avec des modèles de recherche dense pour la sélection de démonstrations dans le contexte), et j'ai acquis une expérience en industrie en tant que stagiaire en recherche chez ServiceNOW (en étudiant l'auto-évaluation et l'auto-amélioration pour promouvoir la concision sur des sujets susceptibles de générer des hallucinations de la part des modèles de langage).

Depuis que j'ai commencé un doctorat sous la supervision du Dr. Chris Pal à l'UdeM en septembre 2024, mon objectif est d'appliquer les grands modèles de langage à des cas d'utilisation concrets, tels que des assistants de conception interactifs et itératifs basés sur le dialogue, tout en continuant à explorer la capacité des modèles à s'auto-améliorer grâce à l'auto-évaluation.

Publications

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
An AI system for professional floor plan design needs to be able to precisely control room dimensions and areas (quantitative constraints), … (voir plus)while also balancing functional considerations and design aesthetics. Existing generative approaches focus primarily on respecting the requested connectivity between rooms, but do not support generating floor plans with numerical constraints. We introduce a text‑based floor plan generation approach that fine-tunes a large language model (LLM) on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to enforce both numerical (areas, dimensions) and spatial (topological) constraints. Furthermore, we design a set of constraint adherence metrics to measure how generated floor plans align with user-defined constraints systematically. Our model generates floor plans that satisfy numerical constraints and outperforms existing methods on realism, compatibility, and diversity scores. Specifically, our approach leads to an up to 94\% reduction in compatibility score. Our results demonstrate that LLMs can effectively handle quantitative constraints in structured design tasks, suggesting broader applications for text-based generative modeling.
LLMs can learn self-restraint through iterative self-reflection
Self-evaluation and self-prompting to improve the reliability of LLMs
In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (voir plus)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.
In-Context Learning for Text Classification with Many Labels
ROSA: Random Orthogonal Subspace Adaptation