Irina Rish

sayed.mansouri-tehrani@mila.quebec

Amin Darabi

PhD - Université de Montréal

amin.darabi@mila.quebec

Amin Memarian

Independent visiting researcher

memariaa@mila.quebec

Amin Mansouri

Master's Research - Université de Montréal

andrew.williams@mila.quebec

Andrei Mircea Romascanu

PhD - Université de Montréal

PhD - Université de Montréal

arian.khorasani@mila.quebec

Arian Khorasani

Master's Research - Université de Montréal

arnav-kumar.jain@mila.quebec

Arjun Ashok

PhD

Co-supervisor :

Alexandre Drouin

arjun.ashok@mila.quebec

PhD - Université de Montréal

PhD - Université de Montréal

Co-supervisor :

Collaborating researcher

ayush.kaushal@mila.quebec

Benjamin Therien

PhD - Université de Montréal

Co-supervisor :

benjamin.therien@mila.quebec

Collaborating researcher - Université de Montréal

connor.brennan@mila.quebec

Daria Yasafova

Research Intern - Technical University of Munich

daria.yasafova@mila.quebec

Dave Whipps

Master's Research - Université de Montréal

whippsda@mila.quebec

diganta.misra@mila.quebec

Diganta Misra

Master's Research - Université de Montréal

Postdoctorate

Principal supervisor :

Nicolas Le Roux

ekaterina.lobacheva@mila.quebec

PhD - McGill University

Principal supervisor :

Blake Richards

ethan.caballero@mila.quebec

george.adamopoulos@mila.quebec

George Adamopoulos

Research Intern

gopeshh.subbaraj@mila.quebec

Germán Abrevaya

Independent visiting researcher - Université de Montréal

Co-supervisor :

PhD - Université de Montréal

Gwen Legate

PhD - Concordia University

Principal supervisor :

gwendolyne.legate@mila.quebec

Ivan Anokhin

PhD - Université de Montréal

Co-supervisor :

Samira Ebrahimi Kahou

ivan.anokhin@mila.quebec

juan.mayor-torres@mila.quebec

Juan Manuel Mayor-Torres

Collaborating researcher

Collaborating Alumni - Université de Montréal

Co-supervisor :

Sarath Chandar Anbil Parthipan

kshitij.gupta@mila.quebec

Mahta Ramezanian

Master's Research - Université de Montréal

Co-supervisor :

mahta.ramezanian@mila.quebec

Matthew Riemer

PhD - Université de Montréal

matthew.riemer@mila.quebec

Maximilian Puelma Touzel

Collaborating researcher

PhD - Université de Montréal

arefinmr@mila.quebec

Mohammad Pezeshki

Collaborating researcher

pezeshki@mila.quebec

Mohammad-Javad Darvishi Bayazi

PhD - Université de Montréal

mohammad-javad.darvishi-bayasi@mila.quebec

PhD - Université de Montréal

faramarm@mila.quebec

Motahareh Pourrahimi

PhD - McGill University

Principal supervisor :

Pouya Bashivan

motahareh.pourrahimi@mila.quebec

nadhir.hassen@mila.quebec

Nadhir Hassen

Research Intern - Université de Montréal

Neeraj Kumar

Professional Master's - Université de Montréal

neeraj.kumar@mila.quebec

Nizar Islah

PhD - Université de Montréal

Principal supervisor :

Eilif Benjamin Muller

nizar.islah@mila.quebec

paolo.cudrano@mila.quebec

Omar Younis

Research Intern - Université de Montréal

omar.younis@mila.quebec

Collaborating researcher - Politecnico di Milano

Pascal Tikeng Notsawo

PhD - Université de Montréal

Co-supervisor :

pascal.tikeng@mila.quebec

Collaborating researcher

prateek.humane@mila.quebec

Master's Research - Université de Montréal

remus.mocanu@mila.quebec

Reza Bayat

Master's Research - Université de Montréal

Co-supervisor :

Pouya Bashivan

reza.bayat@mila.quebec

rishika.bhagwatkar@mila.quebec

Rishika Bhagwatkar

Master's Research - Université de Montréal

Collaborating researcher - Université de Montréal

roland.riachi@mila.quebec

Simon Dufort-Labbé

PhD - Université de Montréal

simon.dufort-labbe@mila.quebec

Sparsha Mishra

Master's Research - Université de Montréal

sparsha.mishra@mila.quebec

Tejas Vaidhya

Master's Research - Université de Montréal

tejas.vaidhya@mila.quebec

PhD - Université de Montréal

Co-supervisor :

Eilif Benjamin Muller

timothy.nest@mila.quebec

Vaibhav Singh

PhD - Concordia University

Principal supervisor :

vaibhav.singh@mila.quebec

Zahra Sheikhbahaee

Postdoctorate - Université de Montréal

Principal supervisor :

zahra.sheikhbahaee@mila.quebec

Publications

Comparison of Radiologists and Deep Learning for US Grading of Hepatic Steatosis.

Pedro Vianna

Sara-Ivana Calce

Pamela Boustros

Cassandra Larocque-Rigney

Laurent Patry-Beaudoin

Yi Hui Luo

Emre Aslan

John Marinos

Talal M. Alamri

Kim-Nhien Vu

Jessica Murphy-Lavallée

Jean-Sébastien Billiard

Emmanuel Montagnon

Hongliang Li

Samuel Kadoury

Bich Nguyen

Shanel Gauthier

Benjamin Thérien

Eugene Belilovsky … (see 4 more)

Guy Wolf

Michaël Chassé

Guy Cloutier

An Tang

Background Screening for nonalcoholic fatty liver disease (NAFLD) is suboptimal due to the subjective interpretation of US images. Purpose T… (see more)o evaluate the agreement and diagnostic performance of radiologists and a deep learning model in grading hepatic steatosis in NAFLD at US, with biopsy as the reference standard. Materials and Methods This retrospective study included patients with NAFLD and control patients without hepatic steatosis who underwent abdominal US and contemporaneous liver biopsy from September 2010 to October 2019. Six readers visually graded steatosis on US images twice, 2 weeks apart. Reader agreement was assessed with use of κ statistics. Three deep learning techniques applied to B-mode US images were used to classify dichotomized steatosis grades. Classification performance of human radiologists and the deep learning model for dichotomized steatosis grades (S0, S1, S2, and S3) was assessed with area under the receiver operating characteristic curve (AUC) on a separate test set. Results The study included 199 patients (mean age, 53 years ± 13 [SD]; 101 men). On the test set (n = 52), radiologists had fair interreader agreement (0.34 [95% CI: 0.31, 0.37]) for classifying steatosis grades S0 versus S1 or higher, while AUCs were between 0.49 and 0.84 for radiologists and 0.85 (95% CI: 0.83, 0.87) for the deep learning model. For S0 or S1 versus S2 or S3, radiologists had fair interreader agreement (0.30 [95% CI: 0.27, 0.33]), while AUCs were between 0.57 and 0.76 for radiologists and 0.73 (95% CI: 0.71, 0.75) for the deep learning model. For S2 or lower versus S3, radiologists had fair interreader agreement (0.37 [95% CI: 0.33, 0.40]), while AUCs were between 0.52 and 0.81 for radiologists and 0.67 (95% CI: 0.64, 0.69) for the deep learning model. Conclusion Deep learning approaches applied to B-mode US images provided comparable performance with human readers for detection and grading of hepatic steatosis. Published under a CC BY 4.0 license. Supplemental material is available for this article. See also the editorial by Tuthill in this issue.

2023-10-01

Radiology (published)

LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot Compression

Ayush Kaushal

Tejas Vaidhya

Low Rank Decomposition of matrix - splitting a large matrix into a product of two smaller matrix offers a means for compression that reduces… (see more) the parameters of a model without sparsification, and hence delivering more speedup on modern hardware. Moreover, unlike quantization, the compressed linear layers remain fully differentiable and all the parameters trainable, while being able to leverage the existing highly efficient kernels over floating point matrices. We study the potential to compress Large Language Models (LLMs) for monolingual Code generation via Low Rank Decomposition (LoRD) and observe that ranks for the linear layers in these models can be reduced by upto 39.58% with less than 1% increase in perplexity. We then use Low Rank Decomposition (LoRD) to compress StarCoder 16B to 13.2B parameter with no drop and to 12.3B with minimal drop in HumanEval Pass@1 score, in less than 10 minutes on a single A100. The compressed models speeds up inference by up to 22.35% with just a single line of change in code over huggingface's implementation with pytorch backend. Low Rank Decomposition (LoRD) models remain compatible with state of the art near-lossless quantization method such as SpQR, which allows leveraging further compression gains of quantization. Lastly, QLoRA over Low Rank Decomposition (LoRD) model further reduces memory requirements by as much as 21.2% over vanilla QLoRA while offering similar gains from parameter efficient fine tuning. Our work shows Low Rank Decomposition (LoRD) as a promising new paradigm for LLM compression.

2023-09-25

ArXiv (preprint)

arxiv.org

Maximum State Entropy Exploration using Predecessor and Successor Representations

Arnav Kumar Jain

Lucas Lehnert

Glen Berseth

Animals have a developed ability to explore that aids them in important tasks such as locating food, exploring for shelter, and finding misp… (see more)laced items. These exploration skills necessarily track where they have been so that they can plan for finding items with relative efficiency. Contemporary exploration algorithms often learn a less efficient exploration strategy because they either condition only on the current state or simply rely on making random open-loop exploratory moves. In this work, we propose

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series

Jean-Christophe Gagnon-Audet

Kartik Ahuja

Mohammad Javad Darvishi Bayazi

Pooneh Mousavi

2023-09-01

TMLR (accepted)

Beyond performance: the role of task demand, effort, and individual differences in ab initio pilots

Mohammad-Javad Darvishi-Bayazi

Andrew Law

Sergio Mejia Romero

Sion Jennings

Jocelyn Faubert

2023-08-28

Scientific Reports (published)

Neural efficiency in an aviation task with different levels of difficulty: Assessing different biometrics during a performance task

Mohammad Javad Darvishi Bayazi

Andrew Law

Sergio Mejia Romero

Sion Jennings

Jocelyn Faubert

2023-08-01

Journal of Vision (published)

Cognitive Models as Simulators: Using Cognitive Models to Tap into Implicit Human Feedback

Ardavan S. Nobandegani

Thomas Shultz

2023-06-20

ICML.cc/2023/Workshop/ILHF (published)

Continual Pre-Training of Large Language Models: How to (re)warm your model?

Kshitij Gupta

Benjamin Thérien

Adam Ibrahim

Mats Leon Richter

Quentin Gregory Anthony

Timothee LESORT

Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes a… (see more)vailable. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch

2023-06-20

ICML.cc/2023/Workshop/ES-FoMO (poster)

Towards Out-of-Distribution Adversarial Robustness

Adam Ibrahim

Charles Guille-Escuret

Adversarial robustness continues to be a major challenge for deep learning. A core issue is that robustness to one type of attack often fail… (see more)s to transfer to other attacks. While prior work establishes a theoretical trade-off in robustness against different

2023-06-20

ICML.cc/2023/Workshop/AdvML-Frontiers (published)

Dialogue System with Missing Observation

Djallel Bouneffouf

Mayank Agarwal

Within the domain of dialogue, the ability to orchestrate multiple independently trained dialogue agents to create a unified system is of pa… (see more)rticular importance. Where we define orchestration as the task of selecting a subset of skills which most appropriately answer a user input using features extracted from both the user input and the individual skills. In this work, we study the task of online dialogue orchestration where the user feedback associated with the dialogue agent may not always be observed. In order to address the missing feedback setting, we propose to combine the attentive contextual bandit approach with an unsupervised learning mechanism such as clustering. By leveraging clustering to estimate missing reward, we are able to learn from each incoming event, even those with missing rewards. Promising empirical results are obtained on proprietary conversational datasets.

2023-06-04

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (published)

Estimating individual minimum calibration for deep-learning with predictive performance recovery: An example case of gait surface classification from wearable sensor gait data.

Guillaume Lam

P. Dixon

2023-06-01

Journal of Biomechanics (published)

Towards ethical multimodal systems

Alexis Roger

Esma Aimeur

2023-04-26

ArXiv (preprint)