Publications

SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation

End-to-end speech synthesis models directly convert the input characters into an audio representation (e.g., spectrograms). Despite their im… (see more)pressive performance, such models have difficulty disambiguating the pronunciations of identically spelled words. To mitigate this issue, a separate Grapheme-to-Phoneme (G2P) model can be employed to convert the characters into phonemes before synthesizing the audio. This paper proposes SoundChoice, a novel G2P architecture that processes entire sentences rather than operating at the word level. The proposed architecture takes advantage of a weighted homograph loss (that improves disambiguation), exploits curriculum learning (that gradually switches from word-level to sentence-level G2P), and integrates word embeddings from BERT (for further performance improvement). Moreover, the model inherits the best practices in speech recognition, including multi-task learning with Connectionist Temporal Classification (CTC) and beam search with an embedded language model. As a result, SoundChoice achieves a Phoneme Error Rate (PER) of 2.65% on whole-sentence transcription using data from LibriSpeech and Wikipedia. Index Terms grapheme-to-phoneme, speech synthesis, text-tospeech, phonetics, pronunciation, disambiguation.

2022-09-17

Interspeech 2022 (published)

doi.org

arxiv.org

Intervertebral Disc Labeling With Learning Shape Information, A Look Once Approach

Reza Azad

Moein Heidari

Julien Cohen-Adad

Ehsan Adeli

Dorit Merhof

Accurate and automatic segmentation of intervertebral discs from medical images is a critical task for the assessment of spine-related disea… (see more)ses such as osteoporosis, vertebral fractures, and intervertebral disc herniation. To date, various approaches have been developed in the literature which routinely relies on detecting the discs as the primary step. A disadvantage of many cohort studies is that the localization algorithm also yields false-positive detections. In this study, we aim to alleviate this problem by proposing a novel U-Net-based structure to predict a set of candidates for intervertebral disc locations. In our design, we integrate the image shape information (image gradients) to encourage the model to learn rich and generic geometrical information. This additional signal guides the model to selectively emphasize the contextual representation and suppress the less discriminative features. On the post-processing side, to further decrease the false positive rate, we propose a permutation invariant 'look once' model, which accelerates the candidate recovery procedure. In comparison with previous studies, our proposed approach does not need to perform the selection in an iterative fashion. The proposed method was evaluated on the spine generic public multi-center dataset and demonstrated superior performance compared to previous work. We have provided the implementation code in https://github.com/rezazad68/intervertebral-lookonce

2022-09-15

Predictive Intelligence in Medicine (published)

doi.org

arxiv.org

Accurate machine learning prediction of sexual orientation based on brain morphology and intrinsic functional connectivity

Benjamin Clemens

Jeremy Lefort-Besnard

Christoph Ritter

Elke Smith

Mikhail Votinov

Birgit Derntl

Ute Habel

Danilo Bzdok

Sexual orientation in humans represents a multilevel construct that is grounded in both neurobiological and environmental factors. Here, we… (see more) bring to bear a machine learning approach to predict sexual orientation from gray matter volumes (GMVs) or resting-state functional connectivity (RSFC) in a cohort of 45 heterosexual and 41 homosexual participants. In both brain assessments, we used penalized logistic regression models and nonparametric permutation. We found an average accuracy of 62% (±6.72) for predicting sexual orientation based on GMV and an average predictive accuracy of 92% (±9.89) using RSFC. Regions in the precentral gyrus, precuneus and the prefrontal cortex were significantly informative for distinguishing heterosexual from homosexual participants in both the GMV and RSFC settings. These results indicate that, aside from self-reports, RSFC offers neurobiological information valuable for highly accurate prediction of sexual orientation. We demonstrate for the first time that sexual orientation is reflected in specific patterns of RSFC, which enable personalized, brain-based predictions of this highly complex human trait. While these results are preliminary, our neurobiologically based prediction framework illustrates the great value and potential of RSFC for revealing biologically meaningful and generalizable predictive patterns in the human brain.

2022-09-13

Cerebral Cortex (unknown)

doi.org

Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization

The ability to accelerate the design of biological sequences can have a substantial impact on the progress of the medical field. The problem… (see more) can be framed as a global optimization problem where the objective is an expensive black-box function such that we can query large batches restricted with a limitation of a low number of rounds. Bayesian Optimization is a principled method for tackling this problem. However, the astronomically large state space of biological sequences renders brute-force iterating over all possible sequences infeasible. In this paper, we propose MetaRLBO where we train an autoregressive generative model via Meta-Reinforcement Learning to propose promising sequences for selection via Bayesian Optimization. We pose this problem as that of finding an optimal policy over a distribution of MDPs induced by sampling subsets of the data acquired in the previous rounds. Our in-silico experiments show that meta-learning over such ensembles provides robustness against reward misspecification and achieves competitive results compared to existing strong baselines.

2022-09-12

ArXiv (preprint)

doi.org

arxiv.org

Measuring Commonality in Recommendation of Cultural Content: Recommender Systems to Enhance Cultural Citizenship

Andres Ferraro

Gustavo Ferreira

Fernando Diaz

Georgina Born

2022-09-12

Proceedings of the 16th ACM Conference on Recommender Systems (published)

doi.org

arxiv.org

Towards Fair Federated Recommendation Learning: Characterizing the Inter-Dependence of System and Data Heterogeneity

Kiwan Maeng

Haiyu Lu

Luca Melis

John Nguyen

Michael G. Rabbat

Carole-Jean Wu

2022-09-12

Proceedings of the 16th ACM Conference on Recommender Systems (published)

doi.org

arxiv.org

Rapidly Inferring Personalized Neurostimulation Parameters with Meta-Learning: A Case Study of Individualized Fiber Recruitment in Vagus Nerve Stimulation

Ximeng Mao

Yao-Chuan Chang

Stavros Zanos

Guillaume Lajoie

Our meta-learning framework is general and can be adapted to many input-response neurostimulation mapping problems. Moreover, this method le… (see more)verages information from growing data sets of past patients, as a treatment is deployed. It can also be combined with several model types, including regression, Gaussian processes with Bayesian optimization, and beyond.

2022-09-07

bioRxiv (preprint)

doi.org

Unifying Generative Models with GFlowNets

Dinghuai Zhang

Ricky T. Q. Chen

Nikolay Malkin

Yoshua Bengio

There are many frameworks for deep generative modeling, each often presented with their own specific training algorithms and inference metho… (see more)ds. Here, we demonstrate the connections between existing deep generative models and the recently introduced GFlowNet framework, a probabilistic inference machine which treats sampling as a decision-making process. This analysis sheds light on their overlapping traits and provides a unifying viewpoint through the lens of learning with Markovian trajectories. Our framework provides a means for unifying training and inference algorithms, and provides a route to shine a unifying light over many generative models. Beyond this, we provide a practical and experimentally verified recipe for improving generative modeling with insights from the GFlowNet perspective.

2022-09-05

ArXiv (preprint)

doi.org

arxiv.org

User Experience of a Computer-Based Decision Aid for Prenatal Trisomy Screening: Mixed Methods Explanatory Study

Titilayo Tatiana Agbadje

Chantale Pilon

Pierre Bérubé

Jean-Claude Forest

François Rousseau

S. A. Rahimi

Yves Giguère

France Légaré

2022-09-05

JMIR Pediatrics and Parenting (published)

doi.org

Assessing Intrapartum Risk of Hypoxic-Ischemic Encephalopathy using Fetal Heart Rate with Long Short-term Memory Networks

"Derek Kweku DEGBEDZUI

Michael Kuzniewicz

Cornet Marie-Coralie

Yvonne Wu

Heather Forquer

Lawrence Gerstley

Emily Hamilton

Doina Precup

Philip Warrick

Robert Kearney"

This study investigated the prediction of the risk of hypoxic ischemic encephalopathy using intrapartum cardiotocography records with a long… (see more) short-term memory re-current neural network. Across the 12 hours of labour, HIE sensitivity rose from 0.25 to 0.56 as delivery approached while specificity remained approximately constant with a mean of 0.71 and standard deviation of 0.04. The results show that classification improves as delivery approaches but that performance needs improvement. Future work will address the limitations of this preliminary study by investigating input signal transformations and the use of other network architectures to improve the model performance.

2022-09-03

2022 Computing in Cardiology (CinC) (published)

doi.org

Re-expression of CA1 and entorhinal activity patterns preserves temporal context memory at long timescales

Futing Zou

Wanjia Guo

Emily J. Allen

Yihan Wu

Ian Charest

Thomas Naselaris

Kendrick Kay

Brice A. Kuhl

J. Benjamin Hutchinson

Sarah DuBrow

Converging, cross-species evidence indicates that memory for time is supported by hippocampal area CA1 and entorhinal cortex. However, limit… (see more)ed evidence characterizes how these regions preserve temporal memories over long timescales (e.g., months). At long timescales, memoranda may be encountered in multiple temporal contexts, potentially creating interference. Here, using 7T fMRI, we measured CA1 and entorhinal activity patterns as human participants viewed thousands of natural scene images distributed, and repeated, across many months. We show that memory for an image’s original temporal context was predicted by the degree to which CA1/entorhinal activity patterns from the first encounter with an image were re-expressed during re-encounters occurring minutes to months later. Critically, temporal memory signals were dissociable from predictors of recognition confidence, which were carried by distinct medial temporal lobe expressions. These findings suggest that CA1 and entorhinal cortex preserve temporal memories across long timescales by coding for and reinstating temporal context information.

2022-09-02

bioRxiv (preprint)

doi.org

Digitalization and the Anthropocene

Felix Creutzig

Daron Acemoglu

Xuemei Bai

Paul N. Edwards

Marie Josefine Hintz

Lynn H. Kaack

Siir Kilkis

Stefanie Kunkel

Amy Luers

Nikola Milojevic-Dupont

Dave Rejeski

Jürgen Renn

David Rolnick

Christoph Rosol

Daniela Russ

Thomas Turnbull

Elena Verdolini

Felix Wagner

Charlie Wilson

Aicha Zekar … (see 1 more)

Marius Zumwald

Great claims have been made about the benefits of dematerialization in a digital service economy. However, digitalization has historically i… (see more)ncreased environmental impacts at local and planetary scales, affecting labor markets, resource use, governance, and power relationships. Here we study the past, present, and future of digitalization through the lens of three interdependent elements of the Anthropocene: ( a) planetary boundaries and stability, ( b) equity within and between countries, and ( c) human agency and governance, mediated via ( i) increasing resource efficiency, ( ii) accelerating consumption and scale effects, ( iii) expanding political and economic control, and ( iv) deteriorating social cohesion. While direct environmental impacts matter, the indirect and systemic effects of digitalization are more profoundly reshaping the relationship between humans, technosphere and planet. We develop three scenarios: planetary instability, green but inhumane, and deliberate for the good. We conclude with identifying leverage points that shift human–digital–Earth interactions toward sustainability.

2022-09-01

Annual Review Environment and Resources (published)

doi.org

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Publications

Mila on Udemy

AI Policy Fellowship Publications

Mila Ventures Launchpad

Popular keywords:

Publications