Pierre Luc Carrier

Divya Sharma

pierre luc carrier

Michał Koziarski

Victor Schmidt

2023-10-27

NeurIPS.cc/2023/Workshop/AI4Mat (spotlight)

Crystal-GFN: sampling crystals with desirable properties and constraints

Alex Hernandez-Garcia

Alexandre AGM Duval

Alexandra Volokhova

Divya Sharma

pierre luc carrier

Michał Koziarski

Victor Schmidt

Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such … (see more)as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.

2023-10-07

ArXiv (preprint)

doi.org

Crystal-GFN: sampling crystals with desirable properties and constraints

Alex Hernandez-Garcia

Alexandre AGM Duval

Alexandra Volokhova

Divya Sharma

pierre luc carrier

Michał Koziarski

Victor Schmidt

Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such … (see more)as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.

2023-10-07

ArXiv (preprint)

doi.org

Predicting Infectiousness for Proactive Contact Tracing

Prateek Gupta

Tegan Maharaj

Nasim Rahaman

Martin Weiss

Tristan Deleu

Eilif Benjamin Muller

Meng Qu

Victor Schmidt

Pierre-Luc St-Charles

Hannah Alsdurf

Olexa Bilaniuk

David Buckeridge

gaetan caron

pierre luc carrier

Joumana Ghosn

satya ortiz gagne

Irina Rish

Bernhard Schölkopf … (see 3 more)

Abhinav Sharma

Jian Tang

Andrew Robert Williams

The COVID-19 pandemic has spread rapidly worldwide, overwhelming manual contact tracing in many countries and resulting in widespread lockdo… (see more)wns for emergency containment. Large-scale digital contact tracing (DCT) has emerged as a potential solution to resume economic and social activity while minimizing spread of the virus. Various DCT methods have been proposed, each making trade-offs between privacy, mobility restrictions, and public health. The most common approach, binary contact tracing (BCT), models infection as a binary event, informed only by an individual's test results, with corresponding binary recommendations that either all or none of the individual's contacts quarantine. BCT ignores the inherent uncertainty in contacts and the infection process, which could be used to tailor messaging to high-risk individuals, and prompt proactive testing or earlier warnings. It also does not make use of observations such as symptoms or pre-existing medical conditions, which could be used to make more accurate infectiousness predictions. In this paper, we use a recently-proposed COVID-19 epidemiological simulator to develop and test methods that can be deployed to a smartphone to locally and proactively predict an individual's infectiousness (risk of infecting others) based on their contact history and other information, while respecting strong privacy constraints. Predictions are used to provide personalized recommendations to the individual via an app, as well as to send anonymized messages to the individual's contacts, who use this information to better predict their own infectiousness, an approach we call proactive contact tracing (PCT). We find a deep-learning based PCT method which improves over BCT for equivalent average mobility, suggesting PCT could help in safe re-opening and second-wave prevention.

2021-01-12

ICLR.cc/2021/Conference (spotlight)

COVI-AgentSim: an Agent-based Model for Evaluating Methods of Digital Contact Tracing

Prateek Gupta

Tegan Maharaj

Martin Weiss

Nasim Rahaman

Hannah Alsdurf

Abhinav Sharma

Nanor Minoyan

Soren Harnois-Leblanc

Victor Schmidt

Pierre-Luc St-Charles

Tristan Deleu

Andrew Robert Williams

Akshay Patel

Meng Qu

Olexa Bilaniuk

gaetan caron

pierre luc carrier

satya ortiz gagne

Marc-Andre Rousseau

David Buckeridge … (see 9 more)

Joumana Ghosn

Yang Zhang

Bernhard Schölkopf

Jian Tang

Irina Rish

Joanna Merckx

Eilif Benjamin Muller

2020-10-02

OpenReview.net/Anonymous_Preprint (unknown)

Diet Networks: Thin Parameters for Fat Genomics

pierre luc carrier

Akram Erraqabi

Tristan Sylvain

Alex Auvolat

Etienne Dejoie

Marc-André Legault

Marie-Pierre Dubé

Julie Hussin

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

2017-01-01

ICLR.cc/2017/conference (poster)

Diet Networks: Thin Parameters for Fat Genomics

pierre luc carrier

Akram Erraqabi

Tristan Sylvain

Alex Auvolat

Etienne Dejoie

Marc-André Legault

Marie-Pierre Dubé

Julie Hussin

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in medical research, more specifically in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer (number of input features times number of hidden units): each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed in data), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation (based on the feature's identity not its value) to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). This approach views the problem of producing the parameters associated with each feature as a multi-task learning problem. We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

2017-01-01

ICLR.cc/2017/conference (poster)

Diet Networks: Thin Parameters for Fat Genomic

pierre luc carrier

Akram Erraqabi

Tristan Sylvain

Alex Auvolat

Etienne Dejoie

Marc-André Legault

M. Dubé

Julie Hussin

Learning tasks such as those involving genomic data often poses a serious challenge: the number of input features can be orders of magnitude… (see more) larger than the number of training examples, making it difficult to avoid overfitting, even when using the known regularization techniques. We focus here on tasks in which the input is a description of the genetic variation specific to a patient, the single nucleotide polymorphisms (SNPs), yielding millions of ternary inputs. Improving the ability of deep learning to handle such datasets could have an important impact in precision medicine, where high-dimensional data regarding a particular patient is used to make predictions of interest. Even though the amount of data for such tasks is increasing, this mismatch between the number of examples and the number of inputs remains a concern. Naive implementations of classifier neural networks involve a huge number of free parameters in their first layer: each input feature is associated with as many parameters as there are hidden units. We propose a novel neural network parametrization which considerably reduces the number of free parameters. It is based on the idea that we can first learn or provide a distributed representation for each input feature (e.g. for each position in the genome where variations are observed), and then learn (with another neural network called the parameter prediction network) how to map a feature's distributed representation to the vector of parameters specific to that feature in the classifier neural network (the weights which link the value of the feature to each of the hidden units). We show experimentally on a population stratification task of interest to medical studies that the proposed approach can significantly reduce both the number of parameters and the error rate of the classifier.

2016-11-04

ArXiv (preprint)

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-rfou'

Guillaume Alain

Amjad Almahairi

Christof Angermüller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin S. Bayer

A. Belikov

A. Belopolsky

Arnaud Bergeron

J. Bergstra

Valentin Bisson

Josh Bleecher Snyder

Nicolas Bouchard

Nicolas Boulanger-Lewandowski

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (see 92 more)

pierre luc carrier

Kyunghyun Cho

Jan Chorowski

Paul F. Christiano

Tim Cooijmans

Marc-Alexandre Côté

Myriam Côté

Aaron Courville

Yann Dauphin

Olivier Delalleau

Julien Demouth

Guillaume Desjardins

Sander Dieleman

Laurent Dinh

M'elanie Ducoffe

Vincent Dumoulin

Samira Ebrahimi Kahou

Dumitru Erhan

Ziye Fan

Orhan Firat

Mathieu Germain

Xavier Glorot

Ian J. Goodfellow

Matthew Graham

Caglar Gulcehre

Philippe Hamel

Iban Harlouchet

Jean-philippe Heng

Balázs Hidasi

Sina Honari

Arjun Jain

S'ebastien Jean

Kai Jia

Mikhail V. Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric P. Larsen

César Laurent

S. Lee

Simon-mark Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

J. Livezey

Cory R. Lorenz

Jeremiah L. Lowin

Qianli M. Ma

Pierre-Antoine Manzagol

Olivier Mastropietro

R. McGibbon

Roland Memisevic

Bart van Merriënboer

Vincent Michalski

Mehdi Mirza

Alberto Orlandi

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew David Rocklin

Markus Dr. Roth

Peter Sadowski

John Salvatier

François Savard

Jan Schlüter

John D. Schulman

Gabriel Schwartz

Iulian V. Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Subramanyam

Jakub Sygnowski

Jérémie Tanguay

Gijs van Tulder

Joseph P. Turian

Sebastian Urban

Pascal Vincent

Francesco Visin

Harm de Vries

David Warde-Farley

Dustin J. Webb

M. Willson

Kelvin Xu

Lijun Xue

Li Yao

Saizheng Zhang

Ying Zhang

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2016-05-09

ArXiv (preprint)

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-rfou'

Guillaume Alain

Amjad Almahairi

Christof Angermüller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin S. Bayer

A. Belikov

A. Belopolsky

Arnaud Bergeron

James Bergstra

Valentin Bisson

Josh Bleecher Snyder

Nicolas Bouchard

Nicolas Boulanger-Lewandowski

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (see 92 more)

pierre luc carrier

Kyunghyun Cho

Jan Chorowski

Paul F. Christiano

Tim Cooijmans

Marc-Alexandre Côté

Myriam Côté

Aaron Courville

Yann Dauphin

Olivier Delalleau

Julien Demouth

Guillaume Desjardins

Sander Dieleman

Laurent Dinh

M'elanie Ducoffe

Vincent Dumoulin

Samira Ebrahimi Kahou

Dumitru Erhan

Ziye Fan

Orhan Firat

Mathieu Germain

Xavier Glorot

Ian G Goodfellow

Matthew Graham

Caglar Gulcehre

Philippe Hamel

Iban Harlouchet

Jean-philippe Heng

Balázs Hidasi

Sina Honari

Arjun Jain

Sébastien Jean

Kai Jia

Mikhail V. Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric Larsen

César Laurent

S. Lee

Simon-mark Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

J. Livezey

Cory R. Lorenz

Jeremiah L. Lowin

Qianli M. Ma

Pierre-Antoine Manzagol

Olivier Mastropietro

R. McGibbon

Roland Memisevic

Bart van Merriënboer

Vincent Michalski

Mehdi Mirza

Alberto Orlandi

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew David Rocklin

Markus Dr. Roth

Peter Sadowski

John Salvatier

François Savard

Jan Schlüter

John D. Schulman

Gabriel Schwartz

Iulian V. Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Subramanyam

Jakub Sygnowski

Jérémie Tanguay

Gijs van Tulder

Joseph Turian

Sebastian Urban

Pascal Vincent

Francesco Visin

Harm de Vries

David Warde-Farley

Dustin J. Webb

M. Willson

Kelvin Xu

Lijun Xue

Li Yao

Saizheng Zhang

Ying Zhang

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2016-05-09

ArXiv (preprint)