Publications

S$^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks

Xinlin Li

Bang Liu

Yaoliang Yu

Wulong Liu

Chunjing Xu

Vahid Partovi Nia

openreview.net

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

Devendra Singh Sachan

Siva Reddy

William Hamilton

Chris Dyer

Dani Yogatama

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine informat… (see more)ion from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectation-maximization algorithm. We iteratively estimate the value of our latent variable (the set of relevant documents for a given question) and then use this estimate to update the retriever and reader parameters. We hypothesize that such end-to-end training allows training signals to flow to the reader and then to the retriever better than staged-wise training. This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer. Experiments on three benchmark datasets demonstrate that our proposed method outperforms all existing approaches of comparable size by 2-3% absolute exact match points, achieving new state-of-the-art results. Our results also demonstrate the feasibility of learning to retrieve to improve answer generation without explicit supervision of retrieval decisions.

openreview.net

Learning to Combine Per-Example Solutions for Neural Program Synthesis

Disha Shrivastava

Hugo Larochelle

Danny Tarlow

The goal of program synthesis from examples is to find a computer program that is consistent with a given set of input-output examples. Most… (see more) learning-based approaches try to find a program that satisfies all examples at once. Our work, by contrast, considers an approach that breaks the problem into two stages: (a) find programs that satisfy only one example, and (b) leverage these per-example solutions to yield a program that satisfies all examples. We introduce the Cross Aggregator neural network module based on a multi-head attention mechanism that learns to combine the cues present in these per-example solutions to synthesize a global solution. Evaluation across programs of different lengths and under two different experimental settings reveal that when given the same time budget, our technique significantly improves the success rate over PCCoder [Zohar et. al 2018] and other ablation baselines.

openreview.net

Lower and Upper Bounds on the Pseudo-Dimension of Tensor Network Models

Behnoush Khavari

Guillaume Rabusseau

Tensor network (TN) methods have been a key ingredient of advances in condensed matter physics and have recently sparked interest in the mac… (see more)hine learning community for their ability to compactly represent very high-dimensional objects. TN methods can for example be used to efﬁciently learn linear models in exponentially large feature spaces [56]. In this work, we derive upper and lower bounds on the VC-dimension and pseudo-dimension of a large class of TN models for classiﬁcation, regression and completion. Our upper bounds hold for linear models parameterized by arbitrary TN structures, and we derive lower bounds for common tensor decomposition models (CP, Tensor Train, Tensor Ring and Tucker) showing the tightness of our general upper bound. These results are used to derive a generalization bound which can be applied to classiﬁcation with low-rank matrices as well as linear classiﬁers based on any of the commonly used tensor decomposition models. As a corollary of our results, we obtain a bound on the VC-dimension of the matrix product state classiﬁer introduced in [56] as a function of the so-called bond dimension (i.e. tensor train rank), which answers an open problem listed by Cirac, Garre-Rubio and Pérez-García in [13].

openreview.net

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Nicolas Loizou

Hugo Berard

Gauthier Gidel

Ioannis Mitliagkas

Simon Lacoste-Julien

Two of the most prominent algorithms for solving unconstrained smooth games are the classical stochastic gradient descent-ascent (SGDA) and … (see more)the recently introduced stochastic consensus optimization (SCO) [Mescheder et al., 2017]. SGDA is known to converge to a stationary point for specific classes of games, but current convergence analyses require a bounded variance assumption. SCO is used successfully for solving large-scale adversarial problems, but its convergence guarantees are limited to its deterministic variant. In this work, we introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO under this condition for solving a class of stochastic variational inequality problems that are potentially non-monotone. We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size, and we propose insightful stepsize-switching rules to guarantee convergence to the exact solution. In addition, our convergence guarantees hold under the arbitrary sampling paradigm, and as such, we give insights into the complexity of minibatching.

openreview.net

The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning

Shahab Bakhtiari

Patrick J Mineault

Timothy P. Lillicrap

Christopher C. Pack

Blake Richards

The visual system of mammals is comprised of parallel, hierarchical specialized pathways. Different pathways are specialized in so far as th… (see more)ey use representations that are more suitable for supporting specific downstream behaviours. In particular, the clearest example is the specialization of the ventral (“what”) and dorsal (“where”) pathways of the visual cortex. These two pathways support behaviours related to visual recognition and movement, respectively. To-date, deep neural networks have mostly been used as models of the ventral, recognition pathway. However, it is unknown whether both pathways can be modelled with a single deep ANN. Here, we ask whether a single model with a single loss function can capture the properties of both the ventral and the dorsal pathways. We explore this question using data from mice, who like other mammals, have specialized pathways that appear to support recognition and movement behaviours. We show that when we train a deep neural network architecture with two parallel pathways using a self-supervised predictive loss function, we can outperform other models in fitting mouse visual cortex. Moreover, we can model both the dorsal and ventral pathways. These results demonstrate that a self-supervised predictive learning approach applied to parallel pathway architectures can account for some of the functional specialization seen in mammalian visual systems.

openreview.net

On the Effectiveness of Interpretable Feedforward Neural Network

Miles Q. Li

Benjamin Fung

Adel Abusitta

Deep learning models have achieved state-of-the-art performance in many classification tasks. However, most of them cannot provide an explan… (see more)ation for their classification results. Machine learning models that are interpretable are usually linear or piecewise linear and yield inferior performance. Non-linear models achieve much better classification performance, but it is usually hard to explain their classification results. As a counter-example, an interpretable feedforward neural network (IFFNN) is proposed to achieve both high classification performance and interpretability for malware detection. If the IFFNN can perform well in a more flexible and general form for other classification tasks while providing meaningful explanations, it may be of great interest to the applied machine learning community. In this paper, we propose a way to generalize the interpretable feedforward neural network to multi-class classification scenarios and any type of feedforward neural networks, and evaluate its classification performance and interpretability on interpretable datasets. We conclude by finding that the generalized IFFNNs achieve comparable classification performance to their normal feedforward neural network counterparts and provide meaningful explanations. Thus, this kind of neural network architecture has great practical use.

2021-11-03

ArXiv (preprint)

doi.org

arxiv.org

Vesicular trafficking is a key determinant of the statin response in acute myeloid leukemia

Jana K Krosl

Marie-Eve Bordeleau

Céline Moison

Tara MacRae

Isabel Boivin

Nadine Mayotte

Deanne Gracias

Irène Baccelli

Vincent-Philippe Lavallee

Richard Bisaillon

Bernhard Lehnertz

Rodrigo Mendoza-Sanchez

Réjean Ruel

Thierry Bertomeu

Jasmin Coulombe-Huntington

Geneviève Boucher

Nandita Noronha

C. Pabst

M. Tyers

Patrick Gendron … (see 5 more)

Sébastien Lemieux

Frederic Barabe

Anne Marinier

Josée Hébert

Guy Sauvageau

Key Points Inhibition of RAB protein function mediates the anti–acute myeloid leukemia activity of statins. Statin sensitivity is associat… (see more)ed with enhanced vesicle-mediated traffic.

2021-11-03

Blood Advances (published)

doi.org

Vesicular trafficking is a key determinant of the statin response in acute myeloid leukemia

Jana Krosl

Marie-Eve Bordeleau

Céline Moison

Tara MacRae

Isabel Boivin

Nadine Mayotte

Deanne Gracias

Irène Baccelli

Vincent-Philippe Lavallee

Richard Bisaillon

Bernhard Lehnertz

Rodrigo Mendoza-Sanchez

Réjean Ruel

Thierry Bertomeu

Jasmin Coulombe-Huntington

Geneviève Boucher

Nandita Noronha

Caroline Pabst

Mike Tyers

Patrick Gendron … (see 5 more)

Sébastien Lemieux

Frederic Barabe

Anne Marinier

Josée Hébert

Guy Sauvageau

Key Points Inhibition of RAB protein function mediates the anti–acute myeloid leukemia activity of statins. Statin sensitivity is associat… (see more)ed with enhanced vesicle-mediated traffic.

2021-11-03

Blood Advances (published)

doi.org

Back-Training excels Self-Training at Unsupervised Domain Adaptation of Question Generation and Passage Retrieval

Devang Kulshreshtha

Robert Belfer

Iulian V. Serban

Siva Reddy

In this work, we introduce back-training, an alternative to self-training for unsupervised domain adaptation (UDA). While self-training gene… (see more)rates synthetic training data where natural inputs are aligned with noisy outputs, back-training results in natural outputs aligned with noisy inputs. This significantly reduces the gap between target domain and synthetic data distribution, and reduces model overfitting to source domain. We run UDA experiments on question generation and passage retrieval from the Natural Questions domain to machine learning and biomedical domains. We find that back-training vastly outperforms self-training by a mean improvement of 7.8 BLEU-4 points on generation, and 17.6% top-20 retrieval accuracy across both domains. We further propose consistency filters to remove low-quality synthetic data before training. We also release a new domain-adaptation dataset - MLQuestions containing 35K unaligned questions, 50K unaligned passages, and 3K aligned question-passage pairs.

2021-11-01

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (published)

doi.org

arxiv.org

Estimating treatment effect for individuals with progressive multiple sclerosis using deep learning

JR Falet

Joshua D. Durso-Finley

Brennan Nichyporuk

Jan Schroeter

Francesca Bovis

Maria-Pia Sormani

Doina Precup

Tal Arbel

Douglas Arnold

2021-11-01

medRxiv (preprint)

doi.org

Opioid prescribing among new users for non-cancer pain in the USA, Canada, UK, and Taiwan: A population-based cohort study

Meghna Jani

Nadyne Girard

David W. Bates

David Buckeridge

Therese Sheppard

Jack Li

Usman Iqbal

Shelly Vik

Colin Weaver

Judy Seidel

William G. Dixon

Robyn Tamblyn

Background The opioid epidemic in North America has been driven by an increase in the use and potency of prescription opioids, with ensuing … (see more)excessive opioid-related deaths. Internationally, there are lower rates of opioid-related mortality, possibly because of differences in prescribing and health system policies. Our aim was to compare opioid prescribing rates in patients without cancer, across 5 centers in 4 countries. In addition, we evaluated differences in the type, strength, and starting dose of medication and whether these characteristics changed over time. Methods and findings We conducted a retrospective multicenter cohort study of adults who are new users of opioids without prior cancer. Electronic health records and administrative health records from Boston (United States), Quebec and Alberta (Canada), United Kingdom, and Taiwan were used to identify patients between 2006 and 2015. Standard dosages in morphine milligram equivalents (MMEs) were calculated according to The Centers for Disease Control and Prevention. Age- and sex-standardized opioid prescribing rates were calculated for each jurisdiction. Of the 2,542,890 patients included, 44,690 were from Boston (US), 1,420,136 Alberta, 26,871 Quebec (Canada), 1,012,939 UK, and 38,254 Taiwan. The highest standardized opioid prescribing rates in 2014 were observed in Alberta at 66/1,000 persons compared to 52, 51, and 18/1,000 in the UK, US, and Quebec, respectively. The median MME/day (IQR) at initiation was highest in Boston at 38 (20 to 45); followed by Quebec, 27 (18 to 43); Alberta, 23 (9 to 38); UK, 12 (7 to 20); and Taiwan, 8 (4 to 11). Oxycodone was the first prescribed opioid in 65% of patients in the US cohort compared to 14% in Quebec, 4% in Alberta, 0.1% in the UK, and none in Taiwan. One of the limitations was that data were not available from all centers for the entirety of the 10-year period. Conclusions In this study, we observed substantial differences in opioid prescribing practices for non-cancer pain between jurisdictions. The preference to start patients on higher MME/day and more potent opioids in North America may be a contributing cause to the opioid epidemic.

2021-11-01

PLoS Medicine (published)

doi.org

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Publications

AI Research Driven by Real-World Problems

AI Policy Compass

Student Life and Resources

Popular keywords:

Publications