Publications

Validation of an AI-assisted Treatment Outcome Measure for Gender-Affirming Voice Care: Comparing AI Accuracy to Listener's Perception of Voice Femininity.

Shane Simon

Einav Silverstein

Lauren Timmons-Sund

Jeremy Pinto

M. Eugenia Castro

Karla O’Dell

Michael M. Johns III

Wendy J. Mack

Yael Bensoussan

2023-12-01

Journal of Voice (published)

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

Sebastian Ruder

Jonathan H. Clark

Alexander Gutkin

Mihir Kale

Min Ma

Massimo Nicosia

Shruti Rijhwani

Parker Riley

Jean Michel Amath Sarr

Xinyi Wang

John Frederick Wieting

Nitish Gupta

Anna Katanova

Christo Kirov

Dana L Dickinson

Brian Roark

Bidisha Samanta

Connie Tao

David Ifeoluwa Adelani

Vera Axelrod … (see 7 more)

Isaac Rayburn Caswell

Colin Cherry

Dan Garrette

Reeve Ingle

Melvin Johnson

Dmitry Panteleev

Partha Talukdar

Data scarcity is a crucial issue for the development of highly multilingual NLP systems. Yet for many under-represented languages (ULs) -- l… (see more)anguages for which NLP re-search is particularly far behind in meeting user needs -- it is feasible to annotate small amounts of data. Motivated by this, we propose XTREME-UP, a benchmark defined by: its focus on the scarce-data scenario rather than zero-shot; its focus on user-centric tasks -- tasks with broad adoption by speakers of high-resource languages; and its focus on under-represented languages where this scarce-data scenario tends to be most realistic. XTREME-UP evaluates the capabilities of language models across 88 under-represented languages over 9 key user-centric technologies including ASR, OCR, MT, and information access tasks that are of general utility. We create new datasets for OCR, autocomplete, semantic parsing, and transliteration, and build on and refine existing datasets for other tasks. XTREME-UP provides methodology for evaluating many modeling scenarios including text-only, multi-modal (vision, audio, and text),supervised parameter tuning, and in-context learning. We evaluate commonly used models on the benchmark. We release all code and scripts to train and evaluate models

2023-12-01

Findings of the Association for Computational Linguistics: EMNLP 2023 (published)

openreview.net

Learning domain-invariant classifiers for infant cry sounds

Charles Onu

Hemanth K. Sheetha

Arsenii Gorin

Doina Precup

2023-11-30

ArXiv (preprint)

Active learning meets fractal decision boundaries: a cautionary tale from the Sitnikov three-body problem

Nicolas Payot

Mario Pasquato

Alessandro A. Trani

Yashar Hezaveh

Chaotic systems such as the gravitational N-body problem are ubiquitous in astronomy. Machine learning (ML) is increasingly deployed to pred… (see more)ict the evolution of such systems, e.g. with the goal of speeding up simulations. Strategies such as active Learning (AL) are a natural choice to optimize ML training. Here we showcase an AL failure when predicting the stability of the Sitnikov three-body problem, the simplest case of N-body problem displaying chaotic behavior. We link this failure to the fractal nature of our classification problem's decision boundary. This is a potential pitfall in optimizing large sets of N-body simulations via AL in the context of star cluster physics, galactic dynamics, or cosmology.

2023-11-29

ArXiv (preprint)

Bayesian Imaging for Radio Interferometry with Score-Based Priors

No'e Dia

M. J. Yantovski-Barth

Alexandre Adam

Micah Bowles

Pablo Lemos

A. Scaife

Yashar Hezaveh

U. Montŕeal

Ciela Institute

Flatiron Institute

2023-11-29

ArXiv (preprint)

Learning an Effective Evolution Equation for Particle-Mesh Simulations Across Cosmologies

Nicolas Payot

Pablo Lemos

Carolina Cuesta-lazaro

C. Modi

Yashar Hezaveh

2023-11-29

ArXiv (preprint)

Silent bugs in deep learning frameworks: an empirical study of Keras and TensorFlow

Florian Tambon

Amin Nikanjam

Le An

Foutse Khomh

Giuliano Antoniol

2023-11-29

Empirical Software Engineering (published)

TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare

Ziyang Song

Qincheng Lu

Hao Xu

Mike He Zhu

David Buckeridge

Yue Li

Motivation: Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved great success in Natural Language Processing a… (see more)nd Computer Vision domains. However, the development of PTMs on healthcare time-series data is lagging behind. This underscores the limitations of the existing transformer-based architectures, particularly their scalability to handle large-scale time series and ability to capture long-term temporal dependencies. Methods: In this study, we present Timely Generative Pre-trained Transformer (TimelyGPT). TimelyGPT employs an extrapolatable position (xPos) embedding to encode trend and periodic patterns into time-series representations. It also integrates recurrent attention and temporal convolution modules to effectively capture global-local temporal dependencies. Materials: We evaluated TimelyGPT on two large-scale healthcare time series datasets corresponding to continuous biosignals and irregularly-sampled time series, respectively: (1) the Sleep EDF dataset consisting of over 1.2 billion timesteps; (2) the longitudinal healthcare administrative database PopHR, comprising 489,000 patients randomly sampled from the Montreal population. Results: In forecasting continuous biosignals, TimelyGPT achieves accurate extrapolation up to 6,000 timesteps of body temperature during the sleep stage transition, given a short look-up window (i.e., prompt) containing only 2,000 timesteps. For irregularly-sampled time series, TimelyGPT with a proposed time-specific inference demonstrates high top recall scores in predicting future diagnoses using early diagnostic records, effectively handling irregular intervals between clinical records. Together, we envision TimelyGPT to be useful in various health domains, including long-term patient health state forecasting and patient risk trajectory prediction. Availability: The open-sourced code is available at Github.

2023-11-29

ArXiv (preprint)

TimelyGPT: Extrapolatable Transformer Pre-training for Long-term Time-Series Forecasting in Healthcare

Ziyang Song

Qincheng Lu

Hao Xu

Mike He Zhu

David Buckeridge

Yue Li

2023-11-29

ArXiv (preprint)

Unraveling the Mysteries of Galaxy Clusters: Recurrent Inference Deconvolution of X-ray Spectra

C. L. Rhea

J. Hlavacek-Larrondo

Ralph P. Kraft

Ákos Bogdán

Alexandre Adam

2023-11-29

ArXiv (preprint)

H3K27me3 spreading organizes canonical PRC1 chromatin architecture to regulate developmental programs

Brian Krug

Bo Hu

Haifen Chen

Adam Ptack

Xiao Chen

Kristjan H. Gretarsson

Shriya Deshmukh

Nisha Kabir

Augusto Faria Andrade

Elias Jabbour

Ashot S. Harutyunyan

John J. Y. Lee

Maud Hulswit

Damien Faury

Caterina Russo

Xinjing Xu

Michael Johnston

Audrey Baguette

Nathan A. Dahl

Alexander G. Weil … (see 12 more)

Benjamin Ellezam

Rola Dali

Mathieu Blanchette

Khadija Wilson

Benjamin A. Garcia

Rajesh Kumar Soni

Marco Gallo

Michael D. Taylor

Claudia Kleinman

Jacek Majewski

Nada Jabado

Chao Lu

2023-11-28

bioRxiv (preprint)