Portrait de Minsu Kim

Minsu Kim

Alumni

Publications

Improved Off-policy Reinforcement Learning in Biological Sequence Design
Designing biological sequences with desired properties is challenging due to vast search spaces and limited evaluation budgets. Although rei… (voir plus)nforcement learning methods use proxy models for rapid reward evaluation, insufficient training data can cause proxy misspecification on out-of-distribution inputs. To address this, we propose a novel off-policy search,
Outsourced Diffusion Sampling: Efficient Posterior Inference in Latent Spaces of Generative Models
Any well-behaved generative model over a variable …
Active Attacks: Red-teaming LLMs via Adaptive Environments
We address the challenge of generating diverse attack prompts for large language models (LLMs) that elicit harmful behaviors (e.g., insults,… (voir plus) sexual content) and are used for safety fine-tuning. Rather than relying on manual prompt engineering, attacker LLMs can be trained with reinforcement learning (RL) to automatically generate such prompts using only a toxicity classifier as a reward. However, capturing a wide range of harmful behaviors is a significant challenge that requires explicit diversity objectives. Existing diversity-seeking RL methods often collapse to limited modes: once high-reward prompts are found, exploration of new regions is discouraged. Inspired by the active learning paradigm that encourages adaptive exploration, we introduce \textit{Active Attacks}, a novel RL-based red-teaming algorithm that adapts its attacks as the victim evolves. By periodically safety fine-tuning the victim LLM with collected attack prompts, rewards in exploited regions diminish, which forces the attacker to seek unexplored vulnerabilities. This process naturally induces an easy-to-hard exploration curriculum, where the attacker progresses beyond easy modes toward increasingly difficult ones. As a result, Active Attacks uncovers a wide range of local attack modes step by step, and their combination achieves wide coverage of the multi-mode distribution. Active Attacks, a simple plug-and-play module that seamlessly integrates into existing RL objectives, unexpectedly outperformed prior RL-based methods -- including GFlowNets, PPO, and REINFORCE -- by improving cross-attack success rates against GFlowNets, the previous state-of-the-art, from 0.07% to 31.28% (a relative gain greater than
Adaptive Cyclic Diffusion for Inference Scaling
Gyubin Lee
Truong Nhat Nguyen Bao
Jaesik Yoon
Dongwoo Lee
Adaptive Inference-Time Scaling via Cyclic Diffusion Search
Gyubin Lee
Truong Nhat Nguyen Bao
Jaesik Yoon
Dongwoo Lee
Diffusion models have demonstrated strong generative capabilities across domains ranging from image synthesis to complex reasoning tasks. Ho… (voir plus)wever, most inference-time scaling methods rely on fixed denoising schedules, limiting their ability to allocate computation based on instance difficulty or task-specific demands adaptively. We introduce the challenge of adaptive inference-time scaling-dynamically adjusting computational effort during inference-and propose Adaptive Bi-directional Cyclic Diffusion (ABCD), a flexible, search-based inference framework. ABCD refines outputs through bi-directional diffusion cycles while adaptively controlling exploration depth and termination. It comprises three components: Cyclic Diffusion Search, Automatic Exploration-Exploitation Balancing, and Adaptive Thinking Time. Experiments show that ABCD improves performance across diverse tasks while maintaining computational efficiency.
Self-Evolving Curriculum for LLM Reasoning
Alex Pich'e
Nicolas Gontier
Ehsan Kamalloo
Search-Based Correction of Reasoning Chains for Language Models
Jean-Pierre R. Falet
Oliver E. Richardson
Moksh J. Jain
Sungsoo Ahn
Adaptive Cyclic Diffusion for Inference Scaling
Gyubin Lee
Truong Nhat Nguyen Bao
Jaesik Yoon
Dongwoo Lee
Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Any well-behaved generative model over a variable …
Search-Based Correction of Reasoning Chains for Language Models
Jean-Pierre R. Falet
Oliver E. Richardson
Moksh J. Jain
Sungsoo Ahn
Self-Evolving Curriculum for LLM Reasoning
Offline Model-Based Optimization: Comprehensive Review
Jiayao Gu
Zixuan Liu
Can Chen