Publications

Minimax and Neyman-Pearson Meta-Learning for Outlier Languages

Disha Shrivastava

Anders Søgaard

Model-agnostic meta-learning (MAML) has been recently put forth as a strategy to learn resource-poor languages in a sample-efficient fashion… (voir plus). Nevertheless, the properties of these languages are often not well represented by those available during training. Hence, we argue that the i.i.d. assumption ingrained in MAML makes it ill-suited for cross-lingual NLP. In fact, under a decision-theoretic framework, MAML can be interpreted as minimising the expected risk across training languages (with a uniform prior), which is known as Bayes criterion. To increase its robustness to outlier languages, we create two variants of MAML based on alternative criteria: Minimax MAML reduces the maximum risk across languages, while Neyman-Pearson MAML constrains the risk in each language to a maximum threshold. Both criteria constitute fully differentiable two-player games. In light of this, we propose a new adaptive optimiser solving for a local approximation to their Nash equilibrium. We evaluate both model variants on two popular NLP tasks, part-of-speech tagging and question answering. We report gains for their average and minimum performance across low-resource languages in zero- and few-shot settings, compared to joint multi-source transfer and vanilla MAML.

2021-07-31

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (publié)

doi.org

arxiv.org

On-the-Fly Attention Modulation for Neural Generation

Yue Dong

Chandra Bhagavatula

Ximing Lu

Jena D. Hwang

Antoine Bosselut

Jackie CK Cheung

Yejin Choi

Despite considerable advancements with deep neural language models (LMs), neural text generation still suffers from degeneration: the genera… (voir plus)ted text is repetitive, generic, self-contradictory, and often lacks commonsense. Our analyses on sentence-level attention patterns in LMs reveal that neural degeneration may be associated with insufficient learning of task-specific characteristics by the attention mechanism. This finding motivates on-the-fly attention modulation -- a simple but effective method that enables the injection of priors into attention computation during inference. Automatic and human evaluation results on three text generation benchmarks demonstrate that attention modulation helps LMs generate text with enhanced fluency, creativity, and commonsense reasoning, in addition to significantly reduce sentence-level repetition.

2021-07-31

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (publié)

doi.org

arxiv.org

Optimizing Deeper Transformers on Small Datasets

Peng Xu

Dhruv Kumar

Wei Yang

Wenjie Zi

Keyi Tang

Chenyang Huang

Jackie Chi Kit Cheung

Simon J.D. Prince

Yanshuai Cao

It is a common belief that training deep transformers from scratch requires large datasets. Consequently, for small datasets, people usually… (voir plus) use shallow and simple additional layers on top of pre-trained models during fine-tuning. This work shows that this does not always need to be the case: with proper initialization and optimization, the benefits of very deep transformers can carry over to challenging tasks with small datasets, including Text-to-SQL semantic parsing and logical reading comprehension. In particular, we successfully train

2021-07-31

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (publié)

doi.org

arxiv.org

Semantic and Syntactic Enhanced Aspect Sentiment Triplet Extraction

Zhexue Chen

Hong Huang

Bang Liu

Xuanhua Feng Shi

Hai-nan Jin

Aspect Sentiment Triplet Extraction (ASTE) aims to extract triplets from sentences, where each triplet includes an entity, its associated se… (voir plus)ntiment, and the opinion span explaining the reason for the sentiment. Most existing research addresses this problem in a multi-stage pipeline manner, which neglects the mutual information between such three elements and has the problem of error propagation. In this paper, we propose a Semantic and Syntactic Enhanced aspect Sentiment triplet Extraction model (S3E2) to fully exploit the syntactic and semantic relationships between the triplet elements and jointly extract them. Specifically, we design a Graph-Sequence duel representation and modeling paradigm for the task of ASTE: we represent the semantic and syntactic relationships between word pairs in a sentence by graph and encode it by Graph Neural Networks (GNNs), as well as modeling the original sentence by LSTM to preserve the sequential information. Under this setting, we further apply a more efficient inference strategy for the extraction of triplets. Extensive evaluations on four benchmark datasets show that S3E2 significantly outperforms existing approaches, which proves our S3E2's superiority and flexibility in an end-to-end fashion.

2021-07-31

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (publié)

doi.org

arxiv.org

StereoSet: Measuring stereotypical bias in pretrained language models

Moin Nadeem

Anna Bethke

Siva Reddy

A stereotype is an over-generalized belief about a particular group of people, e.g., Asians are good at math or African Americans are athlet… (voir plus)ic. Such beliefs (biases) are known to hurt target groups. Since pretrained language models are trained on large real-world data, they are known to capture stereotypical biases. It is important to quantify to what extent these biases are present in them. Although this is a rapidly growing area of research, existing literature lacks in two important aspects: 1) they mainly evaluate bias of pretrained language models on a small set of artificial sentences, even though these models are trained on natural data 2) current evaluations focus on measuring bias without considering the language modeling ability of a model, which could lead to misleading trust on a model even if it is a poor language model. We address both these problems. We present StereoSet, a large-scale natural English dataset to measure stereotypical biases in four domains: gender, profession, race, and religion. We contrast both stereotypical bias and language modeling ability of popular models like BERT, GPT-2, RoBERTa, and XLnet. We show that these models exhibit strong stereotypical biases. Our data and code are available at https://stereoset.mit.edu.

2021-07-31

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (publié)

doi.org

arxiv.org

StructFormer: Joint Unsupervised Induction of Dependency and Constituency Structure from Masked Language Modeling

Yikang Shen

Yi Tay

Che Zheng

Dara Bahri

Donald Metzler

Aaron Courville

There are two major classes of natural language grammar -- the dependency grammar that models one-to-one correspondences between words and t… (voir plus)he constituency grammar that models the assembly of one or several corresponded words. While previous unsupervised parsing methods mostly focus on only inducing one class of grammars, we introduce a novel model, StructFormer, that can simultaneously induce dependency and constituency structure. To achieve this, we propose a new parsing framework that can jointly generate a constituency tree and dependency graph. Then we integrate the induced dependency relations into the transformer, in a differentiable manner, through a novel dependency-constrained self-attention mechanism. Experimental results show that our model can achieve strong results on unsupervised constituency parsing, unsupervised dependency parsing, and masked language modeling at the same time.

2021-07-31

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (publié)

doi.org

openreview.net

Supervised multi-specialist topic model with applications on large-scale electronic health record data

Ziyang Song

Xavier Sumba Toral

Yixin Xu

Aihua Liu

Liming Guo

Guido Powell

Aman Verma

David Buckeridge

Ariane Marelli

Yue Li

Motivation: Electronic health record (EHR) data provides a new venue to elucidate disease comorbidities and latent phenotypes for precision … (voir plus)medicine. To fully exploit its potential, a realistic data generative process of the EHR data needs to be modelled. We present MixEHR-S to jointly infer specialist-disease topics from the EHR data. As the key contribution, we model the specialist assignments and ICD-coded diagnoses as the latent topics based on patient's underlying disease topic mixture in a novel unified supervised hierarchical Bayesian topic model. For efficient inference, we developed a closed-form collapsed variational inference algorithm to learn the model distributions of MixEHR-S. We applied MixEHR-S to two independent large-scale EHR databases in Quebec with three targeted applications: (1) Congenital Heart Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic obstructive pulmonary disease (COPD) diagnostic prediction among 73,791 patients; (3) future insulin treatment prediction among 78,712 patients diagnosed with diabetes as a mean to assess the disease exacerbation. In all three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics and achieved superior target prediction accuracy compared to the existing methods, providing opportunities for prioritizing high-risk patients for healthcare services. MixEHR-S source code and scripts of the experiments are freely available at https://github.com/li-lab-mcgill/mixehrS

2021-07-31

Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (publié)

doi.org

arxiv.org

A Survey of Data Augmentation Approaches for NLP

Steven Y. Feng

Varun Gangal

Jason Wei

Sarath Chandar

Soroush Vosoughi

Teruko Mitamura

Eduard Hovy

Data augmentation has recently seen increased interest in NLP due to more work in low-resource domains, new tasks, and the popularity of lar… (voir plus)ge-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner. We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches. Next, we highlight techniques that are used for popular NLP applications and tasks. We conclude by outlining current challenges and directions for future research. Overall, our paper aims to clarify the landscape of existing literature in data augmentation for NLP and motivate additional work in this area. We also present a GitHub repository with a paper list that will be continuously updated at https://github.com/styfeng/DataAug4NLP

2021-07-31

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 (publié)

doi.org

arxiv.org

A systematic analysis of ICSD-3 diagnostic criteria and proposal for further structured iteration.

Christophe Gauld

Régis Lopez

Pierre A. GEOFFROY

Charles Morin

Kelly Guichard

Elodie Giroux

Yves Dauvilliers

Guillaume Dumas

Pierre Philip

Jean‐Arthur Micoulaud‐Franchi

2021-07-31

Sleep Medicine Reviews (publié)

doi.org

Temporal Profiles of Social Attention Are Different Across Development in Autistic and Neurotypical People.

Teresa Del Bianco

Luke Mason

Tony Charman

Julianne Tillman

Eva Loth

Hannah Hayward

F. Shic

Jan K. Buitelaar

Mark Johnson

Emily J. H. Jones

Jumana Ahmad

Sara Ambrosino

Tobias Banaschewski

Simon Baron-Cohen

Sarah Baumeister

Christian Beckmann

Sven Bölte

Thomas Bourgeron

Carsten Bours

M. Brammer … (voir 46 de plus)

Daniel Brandeis

Claudia Brogna

Yvette de Bruijn

Ineke Cornelissen

Daisy Crawley

Flavio Dell’Acqua

Guillaume Dumas

Sarah Durston

Christine Ecker

Jessica Faulkner

Vincent Frouin

Pilar Garcés

David Goyard

Lindsay Ham

Joerg F. Hipp

Rosemary Holt

Meng-Chuan Lai

Xavier Liogier D’ardhuy

Michael V. Lombardo

David J. Lythgoe

René Mandl

Andre Marquand

Maarten Mennes

Andreas Meyer-Lindenberg

Carolin Moessnang

Nico Mueller

Declan Murphy

Beth Oakley

Larry O’Dwyer

Marianne Oldehinkel

Bob Oranje

Gahan Pandina

Antonio Persico

Barbara Ruggeri

Amber N. V. Ruigrok

Jessica Sabet

Roberto Sacco

Antonia San José Cáceres

Emily Simonoff

Will Spooren

Roberto Toro

Heike Tost

Jack Waldman

Steve C. R. Williams

Caroline Wooldridge

Marcel P. Zwiers

2021-07-31

Biological Psychiatry: Cognitive Neuroscience and Neuroimaging (publié)

doi.org

Why do sleep disorders belong to mental disorder classifications? A network analysis of the "Sleep-Wake Disorders" section of the DSM-5.

Christophe Gauld

Régis Lopez

Charles Morin

Julien Maquet

Aileen McGonigal

Pierre A. GEOFFROY

Eric Fakra

Pierre Philip

Guillaume Dumas

Jean‐Arthur Micoulaud‐Franchi

2021-07-31

Journal of Psychiatric Research (publié)

doi.org

Atlas-Based Quantification of DTI Measures in a Typically Developing Pediatric Spinal Cord

Shiva Shahrampour

Benjamin De Leener

Mahdi Alizadeh

D. Middleton

Laura Krisa

Adam E. Flanders

S. Faro

Julien Cohen-Adad

F. Mohamed

2021-07-28

American Journal of Neuroradiology (publié)

doi.org

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Publications

TRAIL : IA responsable pour les professionnels et les leaders

Fondateur en résidence Mila Ventures

Avantage IA : productivité dans la fonction publique

Mots-clés populaires:

Publications