Publications

Masked Language Modeling and the Distributional Hypothesis: Order Word Matters Pre-training for Little

Robin Jia

Dieuwke Hupkes

Adina Williams

Douwe Kiela

A possible explanation for the impressive performance of masked language model (MLM) pre-training is that such models have learned to repres… (see more)ent the syntactic structures prevalent in classical NLP pipelines. In this paper, we propose a different explanation: MLMs succeed on downstream tasks almost entirely due to their ability to model higher-order word co-occurrence statistics. To demonstrate this, we pre-train MLMs on sentences with randomly shuffled word order, and show that these models still achieve high accuracy after fine-tuning on many downstream tasks—including tasks specifically designed to be challenging for models that ignore word order. Our models perform surprisingly well according to some parametric syntactic probes, indicating possible deficiencies in how we test representations for syntactic information. Overall, our results show that purely distributional information largely explains the success of pre-training, and underscore the importance of curating challenging evaluation datasets that require deeper linguistic knowledge.

2021-10-31

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

Opioid prescribing among new users for non-cancer pain in the USA, Canada, UK, and Taiwan: A population-based cohort study

Meghna Jani

Nadyne Girard

David W. Bates

David L Buckeridge

Therese Sheppard

Jack Li

Usman Iqbal

Shelly Vik

Colin Weaver

Judy Seidel

William G. Dixon

Robyn Tamblyn

Background The opioid epidemic in North America has been driven by an increase in the use and potency of prescription opioids, with ensuing … (see more)excessive opioid-related deaths. Internationally, there are lower rates of opioid-related mortality, possibly because of differences in prescribing and health system policies. Our aim was to compare opioid prescribing rates in patients without cancer, across 5 centers in 4 countries. In addition, we evaluated differences in the type, strength, and starting dose of medication and whether these characteristics changed over time. Methods and findings We conducted a retrospective multicenter cohort study of adults who are new users of opioids without prior cancer. Electronic health records and administrative health records from Boston (United States), Quebec and Alberta (Canada), United Kingdom, and Taiwan were used to identify patients between 2006 and 2015. Standard dosages in morphine milligram equivalents (MMEs) were calculated according to The Centers for Disease Control and Prevention. Age- and sex-standardized opioid prescribing rates were calculated for each jurisdiction. Of the 2,542,890 patients included, 44,690 were from Boston (US), 1,420,136 Alberta, 26,871 Quebec (Canada), 1,012,939 UK, and 38,254 Taiwan. The highest standardized opioid prescribing rates in 2014 were observed in Alberta at 66/1,000 persons compared to 52, 51, and 18/1,000 in the UK, US, and Quebec, respectively. The median MME/day (IQR) at initiation was highest in Boston at 38 (20 to 45); followed by Quebec, 27 (18 to 43); Alberta, 23 (9 to 38); UK, 12 (7 to 20); and Taiwan, 8 (4 to 11). Oxycodone was the first prescribed opioid in 65% of patients in the US cohort compared to 14% in Quebec, 4% in Alberta, 0.1% in the UK, and none in Taiwan. One of the limitations was that data were not available from all centers for the entirety of the 10-year period. Conclusions In this study, we observed substantial differences in opioid prescribing practices for non-cancer pain between jurisdictions. The preference to start patients on higher MME/day and more potent opioids in North America may be a contributing cause to the opioid epidemic.

2021-10-31

PLoS Medicine (published)

doi.org

Refining BERT Embeddings for Document Hashing via Mutual Information Maximization

Zijing Ou

Qinliang Su

Jianxing Yu

Ruihui Zhao

Yefeng Zheng

Bang Liu

Existing unsupervised document hashing methods are mostly established on generative models. Due to the difficulties of capturing long depend… (see more)ency structures, these methods rarely model the raw documents directly, but instead to model the features extracted from them (e.g. bag-of-words (BOW), TFIDF). In this paper, we propose to learn hash codes from BERT embeddings after observing their tremendous successes on downstream tasks. As a first try, we modify existing generative hashing models to accommodate the BERT embeddings. However, little improvement is observed over the codes learned from the old BOW or TFIDF features. We attribute this to the reconstruction requirement in the generative hashing, which will enforce irrelevant information that is abundant in the BERT embeddings also compressed into the codes. To remedy this issue, a new unsupervised hashing paradigm is further proposed based on the mutual information (MI) maximization principle. Specifically, the method first constructs appropriate global and local codes from the documents and then seeks to maximize their mutual information. Experimental results on three benchmark datasets demonstrate that the proposed method is able to generate hash codes that outperform existing ones learned from BOW features by a substantial margin.

2021-10-31

Findings of the Association for Computational Linguistics: EMNLP 2021 (published)

doi.org

arxiv.org

The meaning of significant mean group differences for biomarker discovery

Eva Loth

Jumana Ahmad

Chris Chatham

Beatriz López

Ben Carter

Daisy Crawley

Bethany Oakley

Hannah Hayward

Jennifer Cooke

Antonia San José Cáceres

Danilo Bzdok

Emily Jones

Tony Charman

Christian Beckmann

Thomas Bourgeron

Roberto Toro

Jan Buitelaar

Declan Murphy

Guillaume Dumas

Over the past decade, biomarker discovery has become a key goal in psychiatry to aid in the more reliable diagnosis and prognosis of heterog… (see more)eneous psychiatric conditions and the development of tailored therapies. Nevertheless, the prevailing statistical approach is still the mean group comparison between “cases” and “controls,” which tends to ignore within-group variability. In this educational article, we used empirical data simulations to investigate how effect size, sample size, and the shape of distributions impact the interpretation of mean group differences for biomarker discovery. We then applied these statistical criteria to evaluate biomarker discovery in one area of psychiatric research—autism research. Across the most influential areas of autism research, effect size estimates ranged from small (d = 0.21, anatomical structure) to medium (d = 0.36 electrophysiology, d = 0.5, eye-tracking) to large (d = 1.1 theory of mind). We show that in normal distributions, this translates to approximately 45% to 63% of cases performing within 1 standard deviation (SD) of the typical range, i.e., they do not have a deficit/atypicality in a statistical sense. For a measure to have diagnostic utility as defined by 80% sensitivity and 80% specificity, Cohen’s d of 1.66 is required, with still 40% of cases falling within 1 SD. However, in both normal and nonnormal distributions, 1 (skewness) or 2 (platykurtic, bimodal) biologically plausible subgroups may exist despite small or even nonsignificant mean group differences. This conclusion drastically contrasts the way mean group differences are frequently reported. Over 95% of studies omitted the “on average” when summarising their findings in their abstracts (“autistic people have deficits in X”), which can be misleading as it implies that the group-level difference applies to all individuals in that group. We outline practical approaches and steps for researchers to explore mean group comparisons for the discovery of stratification biomarkers.

2021-10-31

PLoS Computational Biology (published)

doi.org

The Topic Confusion Task: A Novel Evaluation Scenario for Authorship Attribution

Malik H. Altakrori

Jackie CK Cheung

Benjamin C. M. Fung

2021-10-31

Findings of the Association for Computational Linguistics: EMNLP 2021 (published)

doi.org

arxiv.org

Visually Grounded Reasoning across Languages and Cultures

Fangyu Liu

Emanuele Bugliarello

Edoardo Ponti

Siva Reddy

Nigel Collier

Desmond Elliott

The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and … (see more)images of ImageNet. While one can hardly overestimate how much this benchmark contributed to progress in computer vision, it is mostly derived from lexical databases and image queries in English, resulting in source material with a North American or Western European bias. Therefore, we devise a new protocol to construct an ImageNet-style hierarchy representative of more languages and cultures. In particular, we let the selection of both concepts and images be entirely driven by native speakers, rather than scraping them automatically. Specifically, we focus on a typologically diverse set of languages, namely, Indonesian, Mandarin Chinese, Swahili, Tamil, and Turkish. On top of the concepts and images obtained through this new protocol, we create a multilingual dataset for Multicultural Reasoning over Vision and Language (MaRVL) by eliciting statements from native speaker annotators about pairs of images. The task consists of discriminating whether each grounded statement is true or false. We establish a series of baselines using state-of-the-art models and find that their cross-lingual transfer performance lags dramatically behind supervised performance in English. These results invite us to reassess the robustness and accuracy of current state-of-the-art models beyond a narrow domain, but also open up new exciting challenges for the development of truly multilingual and multicultural systems.

2021-10-31

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

openreview.net

From Machine Learning to Robotics: Challenges and Opportunities for Embodied Intelligence

Nicholas Roy

Ingmar Posner

T. Barfoot

Philippe Beaudoin

Yoshua Bengio

Jeannette Bohg

Oliver Brock

Isabelle Depatie

Dieter Fox

D. Koditschek

Tom'as Lozano-p'erez

Vikash K. Mansinghka

Christopher Pal

Blake Aaron Richards

Dorsa Sadigh

Stefan Schaal

G. Sukhatme

Denis Therien

Marc Emile Toussaint

Michiel van de Panne

2021-10-27

ArXiv (preprint)

arxiv.org

How do AI systems fail socially?: an engineering risk analysis approach

Shalaleh Rismani

AJung Moon

Failure Mode and Effect Analysis (FMEA) has been used as an engineering risk assessment tool since 1949. FMEAs are effective in preemptively… (see more) identifying and addressing how a device or process might fail in operation and are often used in the design of high-risk technology applications such as military, automotive industry and medical devices. In this work, we explore whether FMEAs can serve as a risk assessment tool for machine learning practitioners, especially in deploying systems for high-risk applications (e.g. algorithms for recidivism assessment). In particular, we discuss how FMEAs can be used to identify social and ethical failures of Artificial Intelligent Systemss (AISs), recognizing that FMEAs have the potential to uncover a broader range of failures. We first propose a process for developing a Social FMEAs (So-FMEAs) by building on the existing FMEAs framework and a recently published definition of Social Failure Modes by Millar. We then demonstrate a simple proof-of-concept, So-FMEAs for the COMPAS algorithm, a risk assessment tool used by judges to make recidivism-related decisions for convicted individuals. Through this preliminary investigation, we illustrate how a traditional engineering risk management tool could be adapted for analyzing social and ethical failures of AIS. Engineers and designers of AISs can use this new approach to improve their system's design and perform due diligence with respect to potential ethical and social failures.

2021-10-27

2021 IEEE International Symposium on Ethics in Engineering, Science and Technology (ETHICS) (published)

doi.org

Rademacher Random Projections with Tensor Networks

Beheshteh T. Rakhshan

Guillaume Rabusseau

Random projection (RP) have recently emerged as popular techniques in the machine learning community for their ability in reducing the dimen… (see more)sion of very high-dimensional tensors. Following the work in [30], we consider a tensorized random projection relying on Tensor Train (TT) decomposition where each element of the core tensors is drawn from a Rademacher distribution. Our theoretical results reveal that the Gaussian low-rank tensor represented in compressed form in TT format in [30] can be replaced by a TT tensor with core elements drawn from a Rademacher distribution with the same embedding size. Experiments on synthetic data demonstrate that tensorized Rademacher RP can outperform the tensorized Gaussian RP studied in [30]. In addition, we show both theoretically and experimentally, that the tensorized RP in the Matrix Product Operator (MPO) format is not a Johnson-Lindenstrauss transform (JLT) and therefore not a well-suited random projection map

2021-10-25

ArXiv (preprint)

arxiv.org

Generating GitHub Repository Descriptions: A Comparison of Manual and Automated Approaches

Jazlyn Hellman

Eunbee Jang

Christoph Treude

Chenzhun Huang

Jin L.C. Guo

Given the vast number of repositories hosted on GitHub, project discovery and retrieval have become increasingly important for GitHub users.… (see more) Repository descriptions serve as one of the first points of contact for users who are accessing a repository. However, repository owners often fail to provide a high-quality description; instead, they use vague terms, the purpose of the repository is poorly explained, or the description is omitted entirely. In this work, we examine the current practice of writing GitHub repository descriptions. Our investigation leads to the proposal of the LSP (Language, Software technology, and Purpose) template to formulate good descriptions for GitHub repositories that are clear, concise, and informative. To understand the extent to which current automated techniques can support generating repository descriptions, we compare the performance of state-of-the-art text summarization methods on this task. Finally, our user study with GitHub users reveals that automated summarization can adequately be used for default description generation for GitHub repositories, while the descriptions which follow the LSP template offer the most effective instrument for communicating with GitHub users.

2021-10-24

ArXiv (preprint)

arxiv.org

CACHE (Critical Assessment of Computational Hit-finding Experiments): A public-private partnership benchmarking initiative to enable the development of computational methods for hit-finding

Suzanne Ackloo

Rima Al-awar

Rommie E. Amaro

Cheryl H. Arrowsmith

Hatylas Azevedo

Robert A. Batey

Yoshua Bengio

Ulrich A.K. Betz

Cristian G. Bologa

John D. Chodera

Wendy D. Cornell

Ian Dunham

Gerhard F. Ecker

Kristina Edfeldt

Aled M. Edwards

Michael K. Gilson

Claudia R. Gordijo

Gerhard Hessler

Alexander Hillisch

Anders Hogner … (see 19 more)

John J. Irwin

Johanna M. Jansen

Daniel Kuhn

Andrew R. Leach

Alpha A. Lee

Uta Lessel

John Moult

Ingo Muegge

Tudor I. Oprea

Benjamin G. Perry

Patrick Riley

Kumar Singh Saikatendu

Vijayaratnam Santhakumar

Matthieu Schapira

Cora Scholten

Matthew H. Todd

Masoud Vedadi

Andrea Volkamer

Timothy M. Willson

Computational approaches in drug discovery and development hold great promise, with artificial intelligence methods undergoing widespread co… (see more)ntemporary use, but the experimental validation of these new approaches is frequently inadequate. We are initiating Critical Assessment of Computational Hit-finding Experiments (CACHE) as a public benchmarking project that aims to accelerate the development of small molecule hit-finding algorithms by competitive assessment. Compounds will be identified by participants using a wide range of computational methods for dozens of protein targets selected for different types of prediction scenarios, as well as for their potential biological or pharmaceutical relevance. Community-generated predictions will be tested centrally and rigorously in an experimental hub(s), and all data, including the chemical structures of experimentally tested compounds, will be made publicly available without restrictions. The ability of a range of computational approaches to find novel compounds will be evaluated, compared, and published. The overarching goal of CACHE is to accelerate the development of computational chemistry methods by providing rapid and unbiased feedback to those developing methods, with an ancillary and valuable benefit of identifying new compound-protein binding pairs for biologically interesting targets. The initiative builds on the power of crowd sourcing and expands the open science paradigm for drug discovery.

2021-10-21

Nature Reviews Chemistry (published)

doi.org

Probabilistic fine-tuning of pruning masks and PAC-Bayes self-bounded learning

Soufiane Hayou

Bo He

Gintare Karolina Dziugaite

2021-10-21

ArXiv (preprint)

arxiv.org

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications