Publications

Findings of the WMT’22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
Akshita Bhagia
Marta R. Costa-jussa
Jesse Dodge
Fahim Faisal
Christian Federmann
Natalia N. Fedorova
Francisco S. Guzm'an
Sergey Koshelev
Jean Maillard
Vukosi Marivate
Jonathan Mbuya
Alexandre Mourachko
Safiyyah Saleem
Guillaume Wenzek
We present the results of the WMT’22 SharedTask on Large-Scale Machine Translation Evaluation for African Languages. The shared taskinclud… (voir plus)ed both a data and a systems track, alongwith additional innovations, such as a focus onAfrican languages and extensive human evaluation of submitted systems. We received 14system submissions from 8 teams, as well as6 data track contributions. We report a largeprogress in the quality of translation for Africanlanguages since the last iteration of this sharedtask: there is an increase of about 7.5 BLEUpoints across 72 language pairs, and the average BLEU scores went from 15.09 to 22.60.
Flexible Diffusion Modeling of Long Videos
William Harvey
Saeid Naderiparizi
Vaden Masrani
Christian Dietrich Weilbach
Frank N. Wood
We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in… (voir plus) a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator.
Forgetting Enhances Episodic Control With Structured Memories
Blake A. Richards
Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limi… (voir plus)tations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.
S5 Framework: A Review of Self-Supervised Shared Semantic Space Optimization for Multimodal Zero-Shot Learning
Clst
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Ja-740 cob
Angeliki Lapata
Jonathan Lazaridou
Alek-742 May
Nicolas sandr Nisnevich
P. PintoJoseph
Turian
Ting Chen
Simon Kornblith
Mohammad Norouzi
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El … (voir 89 de plus)
Faisal Kholy
Zhe Ahmed
Yu Gan
Cheng
Zihan Dai
Hanxiao Liu
Quoc V. Le
Jia Deng
Wei Dong
Richard Socher
Li-Jia Li
K. Liu
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Jesse Dodge
Maarten Sap
Ana Marasovic
Gabriel Agnew
Dirk Ilharco
Groeneveld Matt
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Yang Liu
Jianfeng Wang
Ming Gao
Zhou
Xiaoyi Dong
Jia Bao
Ting Zhang
Dongdong
Weiming Chen
Lu Zhang
Dong Yuan
Fang Chen
Da-cheng Juan
Chuntian Lu
Zhen Li
Futang Peng
Aleksei Timofeev
Yi-Ting Chen
Yaxi Gao
Tom
Andrew Duerig
Tomkins Sujith
Ravi
Lukasz Kaiser
Aidan N. Gomez
Noam M. Shazeer
Niki Vaswani
Llion Parmar
Jones Jakob
Uszko-850
Alex G. Kendall
Yarin Gal
Roberto Cipolla
Salman H. Khan
Muzammal Naseer
Munawar Hayat
Waqas Zamir
Fahad Shahbaz
Khan
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin John-867
Kenji Hata
Joshua Kravitz
Stephanie Chen
Mike Lewis
Yinhan Liu
Marjan Naman Goyal
Abdelrahman Ghazvininejad
Omer Mohamed
Levy
Luke Zettlemoyer
Bohan Li
Hao Zhou
Jun-Tao He
Mingxuan Wang
Liunian Harold
Mark Li
Da Yatskar
Yin
Cho-Jui
Kai-Wei Chang
Visualbert
In this review, we aim to inspire research into 001 S elf-S upervised S hared S emantic S pace ( S5 ) 002 multimodal learning problems. We e… (voir plus)quip non-003 expert researchers with a framework of in-004 formed modeling decisions via an extensive 005 literature review, an actionable modeling check-006 list, as well as a series of novel zero-shot eval-007 uation tasks. The core idea for our S5 check-008 list lies in learning contextual multimodal in-009 teractions at various granularity levels via a 010 shared Transformer encoder with a denoising 011 loss term, which is also regularized by a con-012 trastive loss term to induce a semantic align-013 ment prior on the contextual embedding space. 014 Essentially, we aim to model human concept 015 understanding and thus learn to “put a name to 016 a face”. This ultimately enables interpretable 017 zero-shot S5 generalization on a variety of 018 novel downstream tasks. In summary, this re-019 view provides sufficient background and ac-020 tionable strategies for training cutting-edge S5 021 multimodal networks. 022
FusionRetro: Molecule Representation Fusion via In-Context Learning for Retrosynthetic Planning
Songtao Liu
Rex Ying
Rex Ying
Peilin Zhao
Lu Lin
Dinghao Wu
Retrosynthetic planning aims to devise a complete multi-step synthetic route from starting materials to a target molecule. Current strategie… (voir plus)s use a decoupled approach of single-step retrosynthesis models and search algorithms, taking only the product as the input to predict the reactants for each planning step and ignoring valuable context information along the synthetic route. In this work, we propose a novel framework that utilizes context information for improved retrosynthetic planning. We view synthetic routes as reaction graphs and propose to incorporate context through three principled steps: encode molecules into embeddings, aggregate information over routes, and readout to predict reactants. Our approach is the first attempt to utilize in-context learning for retrosynthesis prediction in retrosynthetic planning. The entire framework can be efficiently optimized in an end-to-end fashion and produce more practical and accurate predictions. Comprehensive experiments demonstrate that by fusing in the context information over routes, our model significantly improves the performance of retrosynthetic planning over baselines that are not context-aware, especially for long synthetic routes. Code is available at https://github.com/SongtaoLiu0823/FusionRetro.
GANSpiration: Balancing Targeted and Serendipitous Inspiration in User Interface Design with Style-Based Generative Adversarial Network
Mohammad Amin Mozaffari
Xinyuan Zhang
Jinghui Cheng
Jin L.C. Guo
Inspiration from design examples plays a crucial role in the creative process of user interface design. However, current tools and technique… (voir plus)s that support inspiration usually only focus on example browsing with limited user control or similarity-based example retrieval, leading to undesirable design outcomes such as focus drift and design fixation. To address these issues, we propose the GANSpiration approach that suggests design examples for both targeted and serendipitous inspiration, leveraging a style-based Generative Adversarial Network. A quantitative evaluation revealed that the outputs of GANSpiration-based example suggestion approaches are relevant to the input design, and at the same time include diverse instances. A user study with professional UI/UX practitioners showed that the examples suggested by our approach serve as viable sources of inspiration for overall design concepts and specific design elements. Overall, our work paves the road of using advanced generative machine learning techniques in supporting the creative design practice.
A general class of surrogate functions for stable and efficient reinforcement learning
Olivier Bachem
Robert Müller
Shivam Garg
Matthieu Geist
Marlos C. Machado
Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions… (voir plus) have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives rise to an entire family of surrogate functions. We construct surrogate functions that enable policy improvement guarantees, a property not shared by most existing surrogate functions. Crucially, these guarantees hold regardless of the choice of policy parameterization. Moreover, a particular instantiation of FMA-PG recovers important implementation heuristics (e.g., using forward vs reverse KL divergence) resulting in a variant of TRPO with additional desirable properties. Via experiments on simple bandit problems, we evaluate the algorithms instantiated by FMA-PG. The proposed framework also suggests an improved variant of PPO, whose robustness and efficiency we empirically demonstrate on the MuJoCo suite.
Generating physically-consistent high-resolution climate data with hard-constrained neural networks
Prasanna Sattegeri
Campbell Watson
D. Szwarcman
The availability of reliable, high-resolution climate and weather data is important to inform long-term decisions on climate adaptation and … (voir plus)mitigation and to guide rapid responses to extreme events. Forecasting models are limited by computational costs and therefore often can only make coarse resolution predictions. Statistical downscaling can provide an efficient method of upsampling low-resolution data. In this field, deep learning has been applied successfully, often us-ing image super-resolution methods from computer vision. Despite achieving visually compelling results in some cases, such models often violate conservation laws when predicting physical variables. In order to conserve important physical quantities, we develop methods that guarantee physical constraints are satisfied by a deep downscaling model while also increasing their performance according to traditional metrics. We introduce two ways of constraining the network: A renor-malization layer added to the end of the neural network and a successive approach that scales with increasing upsampling factors. We show the applicability of our methods across different popular architectures and upsampling factors using ERA5 reanalysis data.
GitHub repositories with links to academic papers: Public access, traceability, and evolution
Supatsara Wattanakriengkrai
Bodin Chinthanet
Hideaki Hata
Raula Gaikovina Kula
Christoph Treude
Jin L.C. Guo
Kenichi Matsumoto
Goal-driven optimization of single-neuron properties in artificial networks reveals regularization role of neural diversity and adaptation in the brain
Neurons in the brain have rich and adaptive input-output properties. Features such as diverse f-I curves and spike frequency adaptation are … (voir plus)known to place single neurons in optimal coding regimes when facing changing stimuli. Yet, it is still unclear how brain circuits exploit single neuron flexibility, and how network-level requirements may have shaped such cellular function. To answer this question, a multi-scaled approach is needed where the computations of single neurons and of neural circuits must be considered as a complete system. In this work, we use artificial neural networks to systematically investigate single neuron input-output adaptive mechanisms, optimized in an end-to-end fashion. Throughout the optimization process, each neuron has the liberty to modify its nonlinear activation function, parametrized to mimic f-I curves of biological neurons, and to learn adaptation strategies to modify activation functions in real-time during a task. We find that such networks show much-improved robustness to noise and changes in input statistics. Importantly, we find that this procedure recovers precise coding strategies found in biological neurons, such as gain scaling and fractional order differentiation/integration. Using tools from dynamical systems theory, we analyze the role of these emergent single neuron properties and argue that neural diversity and adaptation plays an active regularization role that enables neural circuits to optimally propagate information across time.
Gradient Descent Is Optimal Under Lower Restricted Secant Inequality And Upper Error Bound
The study of first-order optimization is sensitive to the assumptions made on the objective functions. These assumptions induce complexity c… (voir plus)lasses which play a key role in worst-case analysis, including the fundamental concept of algorithm optimality. Recent work argues that strong convexity and smoothness, popular assumptions in literature, lead to a pathological definition of the condition number (Guille-Escuret et al., 2021). Motivated by this result, we focus on the class of functions satisfying a lower restricted secant inequality and an upper error bound. On top of being robust to the aforementioned pathological behavior and including some non-convex functions, this pair of conditions displays interesting geometrical properties. In particular, the necessary and sufficient conditions to interpolate a set of points and their gradients within the class can be separated into simple conditions on each sampled gradient. This allows the performance estimation problem (PEP, Drori and Teboulle (2012)) to be solved analytically, leading to a lower bound on the convergence rate that proves gradient descent to be exactly optimal on this class of functions among all first-order algorithms.
Graph-Based Active Machine Learning Method for Diverse and Novel Antimicrobial Peptides Generation and Selection
Bonaventure F. P. Dossou
Dianbo Liu
Dianbo Liu
Almer M. van der Sloot
Roger Palou
Michael Tyers
As antibiotic-resistant bacterial strains are rapidly spreading worldwide, infections caused by these strains are emerging as a global crisi… (voir plus)s causing the death of millions of people every year. Antimicrobial Peptides (AMPs) are one of the candidates to tackle this problem because of their potential diversity, and ability to favorably modulate the host immune response. However, large-scale screening of new AMP candidates is expensive, time-consuming, and now affordable in developing countries, which need the treatments the most. In this work, we propose a novel active machine learning-based framework that statistically minimizes the number of wet-lab experiments needed to design new AMPs, while ensuring a high diversity and novelty of generated AMPs sequences, in multi-rounds of wet-lab AMP screening settings. Combining recurrent neural network models and a graph-based filter (GraphCC), our proposed approach delivers novel and diverse candidates and demonstrates better performances according to our defined metrics.