Tim Cooijmans

Adaptive Accompaniment with ReaLchords

Yusong Wu

Kyle Kastner

Adam Roberts

Ian Simon

Alexander Scarlatos

Chris Donahue

Cassie Tarakajian

Shayegan Omidshafiei

Pablo Samuel Castro

Natasha Jaques

Anna (Cheng-Zhi) Huang

Jamming requires coordination, anticipation, and collaborative creativity between musicians. Current generative models of music produce expr… (see more)essive output but are not able to generate in an online manner, meaning simultaneously with other musicians (human or otherwise). We propose ReaLchords, an online generative model for improvising chord accompaniment to user melody. We start with an online model pretrained by maximum likelihood, and use reinforcement learning to finetune the model for online use. The finetuning objective leverages both a novel reward model that provides feedback on both harmonic and temporal coherency between melody and chord, and a divergence term that implements a novel type of distillation from a teacher model that can see the future melody. Through quantitative experiments and listening tests, we demonstrate that the resulting model adapts well to unfamiliar input and produce fitting accompaniment. ReaLchords opens the door to live jamming, as well as simultaneous co-creation in other modalities.

2024-07-08

Proceedings of the 41st International Conference on Machine Learning (published)

proceedings.mlr.press

Best Response Shaping

Juan Agustin Duque

Shunichi Akatsuka

We investigate the challenge of multi-agent deep reinforcement learning in partially competitive environments, where traditional methods str… (see more)uggle to foster reciprocity-based cooperation. LOLA and POLA agents learn reciprocity-based cooperative policies by differentiation through a few look-ahead optimization steps of their opponent. However, there is a key limitation in these techniques. Because they consider a few optimization steps, a learning opponent that takes many steps to optimize its return may exploit them. In response, we introduce a novel approach, Best Response Shaping (BRS), which differentiates through an opponent approximating the best response, termed the "detective." To condition the detective on the agent's policy for complex games we propose a state-aware differentiable conditioning mechanism, facilitated by a question answering (QA) method that extracts a representation of the agent based on its behaviour on specific environment states. To empirically validate our method, we showcase its enhanced performance against a Monte Carlo Tree Search (MCTS) opponent, which serves as an approximation to the best response in the Coin Game. This work expands the applicability of multi-agent RL in partially competitive environments and provides a new pathway towards achieving improved social welfare in general sum games.

2024-05-14

rl-conference.cc/RLC/2024/Conference (published)

LOQA: Learning with Opponent Q-Learning Awareness

Juan Agustin Duque

In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to opt… (see more)imize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA) , a novel reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes that each agent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-the-art performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game. LOQA achieves these outcomes with a significantly reduced computational footprint compared to previous works, making it a promising approach for practical multi-agent applications.

2024-01-16

ICLR.cc/2024/Conference (poster)

Meta-Value Learning: a General Framework for Learning with Learning Awareness

2023-07-17

ArXiv (preprint)

doi.org

Learning with Learning Awareness using Meta-Values

2023-06-19

ICML.cc/2023/Workshop/Frontiers4LCD (published)

MIDI-DDSP: Detailed Control of Musical Performance via Hierarchical Modeling

Yusong Wu

Ethan Manilow

Yi Deng

Rigel Swavely

Kyle Kastner

Anna (Cheng-Zhi) Huang

Jesse Engel

Musical expression requires control of both what notes are played, and how they are performed. Conventional audio synthesizers provide detai… (see more)led expressive controls, but at the cost of realism. Black-box neural audio synthesis and concatenative samplers can produce realistic audio, but have few mechanisms for control. In this work, we introduce MIDI-DDSP a hierarchical model of musical instruments that enables both realistic neural audio synthesis and detailed user control. Starting from interpretable Differentiable Digital Signal Processing (DDSP) synthesis parameters, we infer musical notes and high-level properties of their expressive performance (such as timbre, vibrato, dynamics, and articulation). This creates a 3-level hierarchy (notes, performance, synthesis) that affords individuals the option to intervene at each level, or utilize trained priors (performance given notes, synthesis given performance) for creative assistance. Through quantitative experiments and listening tests, we demonstrate that this hierarchy can reconstruct high-fidelity audio, accurately predict performance attributes for a note sequence, independently manipulate the attributes of a given performance, and as a complete system, generate realistic audio from a novel note sequence. By utilizing an interpretable hierarchy, with multiple levels of granularity, MIDI-DDSP opens the door to assistive tools to empower individuals across a diverse range of musical experience.

2022-01-28

ICLR.cc/2022/Conference (oral)

Harmonic Recomposition using Conditional Autoregressive Modeling

We demonstrate a conditional autoregressive pipeline for efficient music recomposition, based on methods presented in van den Oord et al.(20… (see more)17). Recomposition (Casal & Casey, 2010) focuses on reworking existing musical pieces, adhering to structure at a high level while also re-imagining other aspects of the work. This can involve reuse of pre-existing themes or parts of the original piece, while also requiring the flexibility to generate new content at different levels of granularity. Applying the aforementioned modeling pipeline to recomposition, we show diverse and structured generation conditioned on chord sequence annotations.

2018-11-18

ArXiv (preprint)

arxiv.org

Recurrent Batch Normalization

We propose a reparameterization of LSTM that brings the benefits of batch normalization to recurrent neural networks. Whereas previous works… (see more) only apply batch normalization to the input-to-hidden transformation of RNNs, we demonstrate that it is both possible and beneficial to batch-normalize the hidden-to-hidden transition, thereby reducing internal covariate shift between time steps. We evaluate our proposal on various sequential problems such as sequence classification, language modeling and question answering. Our empirical results show that our batch-normalized LSTM consistently leads to faster convergence and improved generalization.

2017-01-01

ICLR.cc/2017/conference (poster)