Publications

Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation

Nuno M. Guerreiro

Pierre Colombo

André Martins

Neural machine translation (NMT) has become the de-facto standard in real-world machine translation applications. However, NMT models can un… (see more)predictably produce severely pathological translations, known as hallucinations, that seriously undermine user trust. It becomes thus crucial to implement effective preventive strategies to guarantee their proper functioning. In this paper, we address the problem of hallucination detection in NMT by following a simple intuition: as hallucinations are detached from the source content, they exhibit encoder-decoder attention patterns that are statistically different from those of good quality translations. We frame this problem with an optimal transport formulation and propose a fully unsupervised, plug-in detector that can be used with any attention-based NMT model. Experimental results show that our detector not only outperforms all previous model-based detectors, but is also competitive with detectors that employ external models trained on millions of samples for related tasks such as quality estimation and cross-lingual sentence similarity.

2022-12-31

ACL (1) (published)

doi.org

arxiv.org

Optimising Electric Vehicle Charging Station Placement using Advanced Discrete Choice Models

Steven Lamontagne

Margarida Carvalho

Emma Frejinger

Bernard Gendron

Miguel F. Anjos

Ribal Atallah

D'epartement d'informatique et de recherche op'erationnelle

U. Montr'eal

S. O. Mathematics

U. Edinburgh

Institut de Recherche d'Hydro-Qu'ebec

We present a new model for finding the optimal placement of electric vehicle charging stations across a multi-period time frame so as to max… (see more)imise electric vehicle adoption. Via the use of advanced discrete choice models and user classes, this work allows for a granular modelling of user attributes and their preferences in regard to charging station characteristics. Instead of embedding an analytical probability model in the formulation, we adopt a simulation approach and pre-compute error terms for each option available to users for a given number of scenarios. This results in a bilevel optimisation model that is, however, intractable for all but the simplest instances. Using the pre-computed error terms to calculate the users covered by each charging station allows for a maximum covering model, for which solutions can be found more efficiently than for the bilevel formulation. The maximum covering formulation remains intractable in some instances, so we propose rolling horizon, greedy, and GRASP heuristics to obtain good quality solutions more efficiently. Extensive computational results are provided, which compare the maximum covering formulation with the current state-of-the-art, both for exact solutions and the heuristic methods. Keywords: Electric vehicle charging stations, facility location, integer programming, discrete choice models, maximum covering

2022-12-31

INFORMS J. Comput. (published)

doi.org

arxiv.org

Optimism and Adaptivity in Policy Optimization

Veronica Chelu

Tom Zahavy

Arthur Guez

Doina Precup

Sebastian Flennerhag

2022-12-31

arXiv.org (preprint)

doi.org

Optimizing Fairness over Time with Homogeneous Workers (Short Paper).

Bart-jan Van Rossum

Ying Chen

Andrea Lodi

2022-12-31

ATMOS (published)

doi.org

Party Prediction for Twitter

Sacha Lévy

Gabrielle Desrosiers-Brisebois

Aarash Feizi

Cécile Amadoro

Andre Blais

Jean-François Godbout

Reihaneh Rabbany

A large number of studies on social media compare the behaviour of users from different political parties. As a basic step, they employ a pr… (see more)edictive model for inferring their political affiliation. The accuracy of this model can change the conclusions of a downstream analysis significantly, yet the choice between different models seems to be made arbitrarily. In this paper, we provide a comprehensive survey and an empirical comparison of the current party prediction practices and propose several new approaches which are competitive with or outperform state-of-the-art methods, yet require less computational resources. Party prediction models rely on the content generated by the users (e.g., tweet texts), the relations they have (e.g., who they follow), or their activities and interactions (e.g., which tweets they like). We examine all of these and compare their signal strength for the party prediction task. This paper lets the practitioner select from a wide range of data types that all give strong performance. Finally, we conduct extensive experiments on different aspects of these methods, such as data collection speed and transfer capabilities, which can provide further insights for both applied and methodological research.

2022-12-31

arXiv (preprint)

doi.org

arxiv.org

Patient experience or patient satisfaction? A systematic review of child- and family-reported experience measures in pediatric surgery.

Julia Ferreira

Prachikumari Patel

Elena Guadagno

Nikki Ow

Jo Wray

Sherif Emil

Dan Poenaru

2022-12-31

Journal of Pediatric Surgery (published)

doi.org

Performative Prediction with Neural Networks

Mehrnaz Mofakhami

Ioannis Mitliagkas

Gauthier Gidel

Performative prediction is a framework for learning models that influence the data they intend to predict. We focus on finding classifiers t… (see more)hat are performatively stable, i.e. optimal for the data distribution they induce. Standard convergence results for finding a performatively stable classifier with the method of repeated risk minimization assume that the data distribution is Lipschitz continuous to the model's parameters. Under this assumption, the loss must be strongly convex and smooth in these parameters; otherwise, the method will diverge for some problems. In this work, we instead assume that the data distribution is Lipschitz continuous with respect to the model's predictions, a more natural assumption for performative systems. As a result, we are able to significantly relax the assumptions on the loss function. In particular, we do not need to assume convexity with respect to the model's parameters. As an illustration, we introduce a resampling procedure that models realistic distribution shifts and show that it satisfies our assumptions. We support our theory by showing that one can learn performatively stable classifiers with neural networks making predictions about real data that shift according to our proposed procedure.

2022-12-31

AISTATS (published)

doi.org

proceedings.mlr.press

Physics-Guided Adversarial Machine Learning for Aircraft Systems Simulation

Houssem Ben Braiek

Thomas Reid

Foutse Khomh

In the context of aircraft system performance assessment, deep learning technologies allow us to quickly infer models from experimental meas… (see more)urements, with less detailed system knowledge than usually required by physics-based modeling. However, this inexpensive model development also comes with new challenges regarding model trustworthiness. This article presents a novel approach, physics-guided adversarial machine learning (ML), which improves the confidence over the physics consistency of the model. The approach performs, first, a physics-guided adversarial testing phase to search for test inputs revealing behavioral system inconsistencies, while still falling within the range of foreseeable operational conditions. Then, it proceeds with a physics-informed adversarial training to teach the model the system-related physics domain foreknowledge through iteratively reducing the unwanted output deviations on the previously uncovered counterexamples. Empirical evaluation on two aircraft system performance models shows the effectiveness of our adversarial ML approach in exposing physical inconsistencies of both the models and in improving their propensity to be consistent with physics domain knowledge.

2022-12-31

IEEE Transactions on Reliability (published)

doi.org

arxiv.org

Preclinical-to-clinical Anti-cancer Drug Response Prediction and Biomarker Identification Using TINDL

David Earl Hostallero

Lixuan Wei

Liewei Wang

Junmei Cairns

Amin Emad

Prediction of the response of cancer patients to different treatments and identification of biomarkers of drug response are two major goals … (see more)of individualized medicine. Here, we developed a deep learning framework called TINDL, completely trained on preclinical cancer cell lines (CCLs), to predict the response of cancer patients to different treatments. TINDL utilizes a tissue-informed normalization to account for the tissue type and cancer type of the tumors and to reduce the statistical discrepancies between CCLs and patient tumors. Moreover, by making the deep learning black box interpretable, this model identifies a small set of genes whose expression levels are predictive of drug response in the trained model, enabling identification of biomarkers of drug response. Using data from two large databases of CCLs and cancer tumors, we showed that this model can distinguish between sensitive and resistant tumors for 10 (out of 14) drugs, outperforming various other machine learning models. In addition, our small interfering RNA (siRNA) knockdown experiments on 10 genes identified by this model for one of the drugs (tamoxifen) confirmed that tamoxifen sensitivity is substantially influenced by all of these genes in MCF7 cells, and seven of these genes in T47D cells. Furthermore, genes implicated for multiple drugs pointed to shared mechanism of action among drugs and suggested several important signaling pathways. In summary, this study provides a powerful deep learning framework for prediction of drug response and identification of biomarkers of drug response in cancer. The code can be accessed at https://github.com/ddhostallero/tindl.

2022-12-31

Genomics, Proteomics & Bioinformatics (published)

doi.org

Predicting Time to and Average Quality of Future Offers for Kidney Transplant Candidates Declining a Current Deceased Donor Kidney Offer: A Retrospective Cohort Study

Jonathan Jalbert

Jean-Noel Weller

Pierre-Luc Boivin

Sylvain Lavigne

Mehdi Taobane

Mike Pieper

Andrea Lodi

Heloise Cardinal

By providing personalized quantitative estimates of time to and quality of future offers, our new approach can inform the shared decision-ma… (see more)king process between transplant candidates and physicians when a kidney offer from a deceased donor is made by an ODO.

2022-12-31

Canadian Journal of Kidney Health and Disease (published)

doi.org

Preference-Based Offline Evaluation

C. Clarke

Fernando Diaz

Negar Arabzadeh

A core step in production model research and development involves the offline evaluation of a system before production deployment. Tradition… (see more)al offline evaluation of search, recommender, and other systems involves gathering item relevance labels from human editors. These labels can then be used to assess system performance using offline evaluation metrics. Unfortunately, this approach does not work when evaluating highly effective ranking systems, such as those emerging from the advances in machine learning. Recent work demonstrates that moving away from pointwise item and metric evaluation can be a more effective approach to the offline evaluation of systems. This tutorial, intended for both researchers and practitioners, reviews early work in preference-based evaluation and covers recent developments in detail.

2022-12-31

WSDM (published)

doi.org

Pre-Training Protein Encoder via Siamese Sequence-Structure Diffusion Trajectory Prediction

Zuobai Zhang

Minghao Xu

Aurelie Lozano

Vijil Chenthamarakshan

Payel Das

Jian Tang

Self-supervised pre-training methods on proteins have recently gained attention, with most approaches focusing on either protein sequences o… (see more)r structures, neglecting the exploration of their joint distribution, which is crucial for a comprehensive understanding of protein functions by integrating co-evolutionary information and structural characteristics. In this work, inspired by the success of denoising diffusion models in generative tasks, we propose the DiffPreT approach to pre-train a protein encoder by sequence-structure joint diffusion modeling. DiffPreT guides the encoder to recover the native protein sequences and structures from the perturbed ones along the joint diffusion trajectory, which acquires the joint distribution of sequences and structures. Considering the essential protein conformational variations, we enhance DiffPreT by a method called Siamese Diffusion Trajectory Prediction (SiamDiff) to capture the correlation between different conformers of a protein. SiamDiff attains this goal by maximizing the mutual information between representations of diffusion trajectories of structurally-correlated conformers. We study the effectiveness of DiffPreT and SiamDiff on both atom- and residue-level structure-based protein understanding tasks. Experimental results show that the performance of DiffPreT is consistently competitive on all tasks, and SiamDiff achieves new state-of-the-art performance, considering the mean ranks on all tasks. Our implementation is available at https://github.com/DeepGraphLearning/SiamDiff.

2022-12-31

arXiv.org (preprint)

doi.org

openreview.net

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Publications

TRAIL: Responsible AI for Professionals and Leaders

Mila Ventures Founder in Residence

AI Advantage: Productivity in Public Service

Popular keywords:

Publications