Enning Yang

Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

Yusong Wu

Stephen Brade

Aleksandra Teng Ma

Tia-Jane Fowler

Enning Yang

Berker Banar

Aaron Courville

Natasha Jaques

Cheng-Zhi Anna Huang

Most applications of generative AI involve a sequential interaction in which a person inputs a prompt and waits for a response, and where re… (see more)action time and adaptivity are not important factors. In contrast, live jamming is a collaborative interaction that requires real-time coordination and adaptation without access to the other player’s future moves, while preserving diversity to sustain a creative flow. Reinforcement learning post-training enables effective adaptation through on-policy interaction, yet it often reduces output diversity by exploiting coherence-based rewards. This collapse, known as ``reward hacking'', affects many RL post-training pipelines, but is especially harmful in live jamming, where musical creativity relies on dynamic variation and mutual responsiveness. In this paper, we propose a novel adversarial training method on policy-generated trajectories to mitigate reward hacking in RL post-training for melody-to-chord accompaniment. A co-evolving discriminator separates policy trajectories from the data distribution, while the policy maximizes the discriminator output in addition to coherence rewards to prevent collapse to trivial outputs. We evaluate accompaniment quality and output diversity in simulation with both fixed test melodies and learned melody agents, and we conduct a user study with the model deployed in a real-time interactive system with expert musicians. Quantitative evaluation and user feedback demonstrate improved output diversity, harmonic coherence, adaptation speed and user agency. Our results demonstrate a simple yet effective method to mitigate reward hacking in RL post-training of generative sequence models.

2025-12-31

International Conference on Learning Representations (Accept (Poster))

doi.org

openreview.net

The default network dominates neural responses to evolving movie stories

Enning Yang

Filip Milisav

Jakub Kopal

Avram J. Holmes

Georgios D. Mitsis

Bratislav Misic

Emily S. Finn

Danilo Bzdok

Neuroscientific studies exploring real-world dynamic perception often overlook the influence of continuous changes in narrative content. In … (see more)our research, we utilize machine learning tools for natural language processing to examine the relationship between movie narratives and neural responses. By analyzing over 50,000 brain images of participants watching Forrest Gump from the studyforrest dataset, we find distinct brain states that capture unique semantic aspects of the unfolding story. The default network, associated with semantic information integration, is the most engaged during movie watching. Furthermore, we identify two mechanisms that underlie how the default network liaises with the amygdala and hippocampus. Our findings demonstrate effective approaches to understanding neural processes in everyday situations and their relation to conscious awareness.

2023-07-13

Nature Communications (published)

doi.org

Bringing language to dynamic brain states: the default network dominates neural responses to evolving movie stories

Enning Yang

Filip Milisav

Jakub Kopal

Avram J. Holmes

Georgios D. Mitsis

Bratislav Misic

Emily S. Finn

Danilo Bzdok

Naturalistic neuroscience opened the door to new insights into neural circuits that serve real-world dynamic perception. Such studies have o… (see more)ften neglected the rich texture of the movie narrative itself, but semantic content can be used to contextualize the induced neural responses. Here, we translated natural language processing tools from machine learning to characterize brain states estimated from hidden Markov models. Our analytical strategy allowed pitting shallow unimodal against the deep associative brain network layers in explaining how semantic content of the movie links to observed neural activity. Pooling information across >53,000 brain image time points watching Forrest Gump, we could show that distinct dynamic brain states capture unique semantic facets along the unfolding movie narrative. The spatiotemporal dynamics of brain states explicitly captured subject-level responses throughout the brain network hierarchy. Across all analyses, the default network was most intimately linked to semantic information integration, and this neural system switched online for longest durations during movie watching. Further, we identified and described two mechanisms of how the default network liaises dynamically with microanatomically defined subregion partners: the amygdala and the hippocampus. Our study thus unlocks the potential of natural language processing to explore neural processes in everyday life situations that engage key aspects of conscious awareness.

2022-08-23

bioRxiv (accepted)

doi.org

Bringing language to dynamic brain states: the default network dominates neural responses to evolving movie stories

Enning Yang

Filip Milisav

Jakub Kopal

Avram J. Holmes

Georgios D. Mitsis

Bratislav Misic

Emily S. Finn

Danilo Bzdok

Naturalistic neuroscience opened the door to new insights into neural circuits that serve real-world dynamic perception. Such studies have o… (see more)ften neglected the rich texture of the movie narrative itself, but semantic content can be used to contextualize the induced neural responses. Here, we translated natural language processing tools from machine learning to characterize brain states estimated from hidden Markov models. Our analytical strategy allowed pitting shallow unimodal against the deep associative brain network layers in explaining how semantic content of the movie links to observed neural activity. Pooling information across >53,000 brain image time points watching Forrest Gump, we could show that distinct dynamic brain states capture unique semantic facets along the unfolding movie narrative. The spatiotemporal dynamics of brain states explicitly captured subject-level responses throughout the brain network hierarchy. Across all analyses, the default network was most intimately linked to semantic information integration, and this neural system switched online for longest durations during movie watching. Further, we identified and described two mechanisms of how the default network liaises dynamically with microanatomically defined subregion partners: the amygdala and the hippocampus. Our study thus unlocks the potential of natural language processing to explore neural processes in everyday life situations that engage key aspects of conscious awareness.

2022-08-23

bioRxiv (accepted)

doi.org

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Enning Yang

Publications

AI Policy Fellowship Publications

Mila Ventures Launchpad

AI Policy Compass

Popular keywords:

Enning Yang

Publications