Portrait of Aristides Milios

Aristides Milios

Lab Representative
PhD
Supervisor
Research Topics
Natural Language Processing

Biography

I'm a passionate machine learning researcher and PhD student at Université de Montréal & MILA, specializing in Natural Language Processing and Vision/Text Foundation Models. My research focuses on investigating the real-world reasoning and self-improvement abilities of large language models, especially in the context of language models as agents and tool-use.

Previously, I completed my M.Sc. at McGill University under the supervision of Dr. Siva Reddy and Dr. Dzmitry Bahdanau (researching using LLMs in conjunction with dense retrieval models for in-context demonstration selection), and gained industry experience as a Research Intern at ServiceNOW (researching self-evaluation and self-improvement in the context of promoting conciseness when it comes to topics the LLM is likely to hallucinate about). Having started a PhD under Dr. Chris Pal at UdeM in September 2024, my aim is to apply LLMs to real-world use cases, such as interactive and iterative dialogue-based design assistants, as well as continuing to investigate the ability of the models to self-improve though self-evaluation.

Publications

Generative Floor Plan Design with LLMs via Reinforcement Learning with Verifiable Rewards
An AI system for professional floor plan design needs to be able to precisely control room dimensions and areas (quantitative constraints), … (see more)while also balancing functional considerations and design aesthetics. Existing generative approaches focus primarily on respecting the requested connectivity between rooms, but do not support generating floor plans with numerical constraints. We introduce a text‑based floor plan generation approach that fine-tunes a large language model (LLM) on real plans and then applies reinforcement learning with verifiable rewards (RLVR) to enforce both numerical (areas, dimensions) and spatial (topological) constraints. Furthermore, we design a set of constraint adherence metrics to measure how generated floor plans align with user-defined constraints systematically. Our model generates floor plans that satisfy numerical constraints and outperforms existing methods on realism, compatibility, and diversity scores. Specifically, our approach leads to an up to 94\% reduction in compatibility score. Our results demonstrate that LLMs can effectively handle quantitative constraints in structured design tasks, suggesting broader applications for text-based generative modeling.
LLMs can learn self-restraint through iterative self-reflection
Self-evaluation and self-prompting to improve the reliability of LLMs
In order to safely deploy Large Language Models (LLMs), they must be capable of dynamically adapting their behavior based on their level of … (see more)knowledge and uncertainty associated with specific topics. This adaptive behavior, which we refer to as self-restraint, is non-trivial to teach since it depends on the internal knowledge of an LLM. By default, LLMs are trained to maximize the next token likelihood which does not teach the model to modulate its answer based on its level of uncertainty. In order to learn self-restraint, we devise a simple objective that can encourage the model to produce generation that the model is confident in. To optimize this objective, we introduce ReSearch, an iterative search algorithm based on self-evaluation and self-prompting. Our method results in fewer hallucinations overall, both for known and unknown topics, as the model learns to selectively restrain itself. In addition, our method elegantly incorporates the ability to decline, when the model assesses that it cannot provide a response without a high proportion of hallucination.
In-Context Learning for Text Classification with Many Labels
ROSA: Random Orthogonal Subspace Adaptation