Publications

Efficient Continual Learning Ensembles in Neural Network Subspaces

Thang Doan

Seyed Iman Mirzadeh

Mehrdad Farajtabar

A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to allev… (see more)iate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet eﬀective method to improve continual performance. However, the training and inference cost of ensembles can increase linearly with the number of models. Motivated by this limitation, we leverage the recent advances in the deep learning optimization literature, such as mode connectivity and neural network subspaces, to derive a new method that is both computationally advantageous and can outperform the state-of-the-art continual learning algorithms

2021-12-31

arXiv.org (preprint)

dblp.uni-trier.de

Embedding Signals on Graphs with Unbalanced Diffusion Earth Mover's Distance

Dennis Shung

Manik Kuchroo

In modern relational machine learning it is common to encounter large graphs that arise via interactions or similarities between observation… (see more)s in many domains. Further, in many cases the target entities for analysis are actually signals on such graphs. We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph. Typically, EMD is computed by optimizing over the cost of transporting one probability distribution to another over an underlying metric space. However, this is inefficient when computing the EMD between many signals. Here, we propose an unbalanced graph EMD that efficiently embeds the unbalanced EMD on an underlying graph into an

2021-12-31

IEEE International Conference on Acoustics, Speech and Signal Processing (unknown)

doi.org

arxiv.org

Enhanced Biomedical Knowledge Discovery From Unstructured Text Using Contextual Embeddings

Iz Beltagy

Kyle Lo

Arman Cohan. 2019

Scib-500

Yoshua Bengio

R´ejean Ducharme

P Vincent

Rishi Bommasani

Kelly Davis

Claire Cardie

Billy Chiu

Sampo Pyysalo

Ivan Vuli´c

Extracting knowledge from large, unstruc-001 tured text corpora presents a challenge. Re-002 cently, authors have utilized unsupervised, 003… (see more) static word embeddings to uncover "latent 004 knowledge" contained within domain-speciﬁc 005 scientiﬁc corpora. Here semantic-similarity 006 measures between representations of concepts, 007 objects or entities were used to predict re-008 lationships, which were later veriﬁed using 009 physical methods. Static language models 010 have recently been surpassed at most down-011 stream tasks by massively pre-trained, contex-012 tual language models like BERT. Some have 013 postulated that contextualized embeddings po-014 tentially yield word representations superior 015 to static ones for knowledge-discovery pur-016 poses. In an effort to address this ques-017 tion, two biomedically-trained BERT models 018 (BioBERT, SciBERT) were used to encode 019 n = 500, 1000 or 5000 sentences containing 020 words of interest extracted from a biomedical 021 corpus (Coronavirus Open Research Dataset). 022 The n representations for the words of inter-023 est were subsequently extracted and then ag-024 gregated to yield static-equivalent word rep-025 resentations. These words belonged to the 026 vocabularies of intrinsic benchmarking tools 027 for the biomedical domain (Bio-SimVerb and 028 Bio-SimLex), which assess quality of word 029 representations using semantic-similarity and 030 relatedness measures. Using intrinsic bench-031 marking tasks, feasibility of using contextual-032 ized word representations for knowledge dis-033 covery tasks can be assessed: Word represen-034 tations that better encode described reality are 035 expected to perform better (i.e. closer to do-036 main experts). As postulated, BERT embed-037 dings outperform static counterparts

2021-12-31

(published)

www.semanticscholar.org

Equivariant Networks for Crystal Structures

Sékou-Oumar Kaba

Siamak Ravanbakhsh

Supervised learning with deep models has tremendous potential for applications in materials science. Recently, graph neural networks have be… (see more)en used in this context, drawing direct inspiration from models for molecules. However, materials are typically much more structured than molecules, which is a feature that these models do not leverage. In this work, we introduce a class of models that are equivariant with respect to crystalline symmetry groups. We do this by defining a generalization of the message passing operations that can be used with more general permutation groups, or that can alternatively be seen as defining an expressive convolution operation on the crystal graph. Empirically, these models achieve competitive results with state-of-the-art on property prediction tasks.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (published)

doi.org

openreview.net

Extended Abstract Track

Amin Mansouri

Jason Hartford

Kartik Ahuja

Yoshua Bengio

Christian Shewmake

Simone Azeglio

Arianna Di Bernardo

Nina Miolane

2021-12-31

(published)

www.semanticscholar.org

Extracting Person Names from User Generated Text: Named-Entity Recognition for Combating Human Trafficking

2021-12-31

Findings (published)

doi.org

f-Cal: Aleatoric uncertainty quantification for robot perception via calibrated neural regression

Dhaivat Bhatt

Kaustubh Mani

Dishank Bansal

Krishna Murthy

Hanju Lee

Liam Paull

While modern deep neural networks are performant perception modules, performance (accuracy) alone is insufficient, particularly for safety-c… (see more)ritical robotic applications such as self-driving vehicles. Robot autonomy stacks also require these otherwise blackbox models to produce reliable and calibrated measures of confidence on their predictions. Existing approaches estimate uncertainty from these neural network perception stacks by modifying network architectures, inference procedure, or loss functions. However, in general, these methods lack calibration, meaning that the predictive uncertainties do not faithfully represent the true underlying uncertainties (process noise). Our key insight is that calibration is only achieved by imposing constraints across multiple examples, such as those in a mini-batch; as opposed to existing approaches which only impose constraints per-sample, often leading to overconfident (thus miscalibrated) uncertainty estimates. By enforcing the distribution of outputs of a neural network to resemble a target distribution by minimizing an

2021-12-31

International Conference on Robotics and Automation (published)

doi.org

Feeding What You Need by Understanding What You Learned

Xiaoqiang Wang

Bang Liu

Fangli Xu

Bo Long

Siliang Tang

Lingfei Wu

2021-12-31

ACL (1) (published)

doi.org

arxiv.org

Few-Shot Pidgin Text Adaptation via Contrastive Fine-Tuning

Ernie Chang

Jesujoba Oluwadara Alabi

David Ifeoluwa Adelani

Vera Demberg

The surging demand for multilingual dialogue systems often requires a costly labeling process for each language addition. For low resource l… (see more)anguages, human annotators are continuously tasked with the adaptation of resource-rich language utterances for each new domain. However, this prohibitive and impractical process can often be a bottleneck for low resource languages that are still without proper translation systems nor parallel corpus. In particular, it is difficult to obtain task-specific low resource language annotations for the English-derived creoles (e.g. Nigerian and Cameroonian Pidgin). To address this issue, we utilize the pretrained language models i.e. BART which has shown great potential in language generation/understanding – we propose to finetune the BART model to generate utterances in Pidgin by leveraging the proximity of the source and target languages, and utilizing positive and negative examples in constrastive training objectives. We collected and released the first parallel Pidgin-English conversation corpus in two dialogue domains and showed that this simple and effective technique is suffice to yield impressive results for English-to-Pidgin generation, which are two closely-related languages.

2021-12-31

COLING (published)

dblp.uni-trier.de

Findings of the WMT’22 Shared Task on Large-Scale Machine Translation Evaluation for African Languages

David Ifeoluwa Adelani

Md Mahfuz Ibn Alam

Antonios Anastasopoulos

Akshita Bhagia

Marta R. Costa-jussa

Jesse Dodge

Fahim Faisal

Christian Federmann

Natalia N. Fedorova

Francisco S. Guzm'an

Sergey Koshelev

Jean Maillard

Vukosi Marivate

Jonathan Mbuya

Alexandre Mourachko

Safiyyah Saleem

Holger Schwenk

Guillaume Wenzek

We present the results of the WMT’22 SharedTask on Large-Scale Machine Translation Evaluation for African Languages. The shared taskinclud… (see more)ed both a data and a systems track, alongwith additional innovations, such as a focus onAfrican languages and extensive human evaluation of submitted systems. We received 14system submissions from 8 teams, as well as6 data track contributions. We report a largeprogress in the quality of translation for Africanlanguages since the last iteration of this sharedtask: there is an increase of about 7.5 BLEUpoints across 72 language pairs, and the average BLEU scores went from 15.09 to 22.60.

2021-12-31

Conference on Machine Translation (published)

doi.org

Flexible Diffusion Modeling of Long Videos

William Harvey

Saeid Naderiparizi

Vaden Masrani

Christian Dietrich Weilbach

Frank N. Wood

We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in… (see more) a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator.

2021-12-31

Advances in Neural Information Processing Systems 35 (NeurIPS 2022) (published)

doi.org

openreview.net

Forgetting Enhances Episodic Control With Structured Memories

Annik Yalnizyan-carson

Blake A. Richards

Forgetting is a normal process in healthy brains, and evidence suggests that the mammalian brain forgets more than is required based on limi… (see more)tations of mnemonic capacity. Episodic memories, in particular, are liable to be forgotten over time. Researchers have hypothesized that it may be beneficial for decision making to forget episodic memories over time. Reinforcement learning offers a normative framework in which to test such hypotheses. Here, we show that a reinforcement learning agent that uses an episodic memory cache to find rewards in maze environments can forget a large percentage of older memories without any performance impairments, if they utilize mnemonic representations that contain structural information about space. Moreover, we show that some forgetting can actually provide a benefit in performance compared to agents with unbounded memories. Our analyses of the agents show that forgetting reduces the influence of outdated information and states which are not frequently visited on the policies produced by the episodic control system. These results support the hypothesis that some degree of forgetting can be beneficial for decision making, which can help to explain why the brain forgets more than is required by capacity limitations.

2021-12-31

Frontiers in Computational Neuroscience (published)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications