Publications

Data Privacy for Record Linkage and Beyond
Shurong Lin
In a data-driven world, two prominent research problems are record linkage and data privacy, among others. Record linkage is essential for i… (voir plus)mproving decision-making by integrating information of the same entities from different sources. On the other hand, data privacy research seeks to balance the need to extract accurate insights from data with the imperative to protect the privacy of the entities involved. Inevitably, data privacy issues arise in the context of record linkage. This article identifies two complementary aspects at the intersection of these two fields: (1) how to ensure privacy during record linkage and (2) how to mitigate privacy risks when releasing the analysis results after record linkage. We specifically discuss privacy-preserving record linkage, differentially private regression, and related topics.
Virtual Reality for Pediatric Trauma Education - A Preliminary Face and Content Validation Study
Fabio Botelho
Said Ashkar
Shreenik Kundu
Tj Matthews
Elena Guadgano
Herbarium collections remain essential in the age of community science
Isaac Eckert
Anne Bruneau
D. Metsger
Simon Joly
T. Dickinson
ProGRes: Prompted Generative Rescoring on ASR n-Best
Ada Defne Tur
Adel Moumen
Learning Multi-agent Multi-machine Tending by Mobile Robots
Abdalwhab Abdalwhab
David St-Onge
Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborat… (voir plus)ive robots can tackle that can also highly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. In this work, we introduce a multi-agent multi-machine tending learning framework by mobile robots based on Multi-agent Reinforcement Learning (MARL) techniques with the design of a suitable observation and reward. Moreover, an attention-based encoding mechanism is developed and integrated into Multi-agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine tending scenarios. Our model (AB-MAPPO) outperformed MAPPO in this new challenging scenario in terms of task success, safety, and resources utilization. Furthermore, we provided an extensive ablation study to support our various design decisions.
Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration
Rongge Zhang
Haechan Mark Bong
ARGV: 3D genome structure exploration using augmented reality
Chrisostomos Drogaris
Yanlin Zhang
Éric Zhang
Elena Nazarova
Roman Sarrazin-Gendron
Sélik Wilhelm-Landry
Yan Cyr
Jacek Majewski
Jérôme Waldispühl
A long-context RNA foundation model for predicting transcriptome architecture
Ali Saberi
Benedict Choi
Sean Wang
Aldo Hernández-Corchado
Mohsen Naghipourfar
Arsham Mikaeili Namini
Vijay Ramani
Hamed S. Najafabadi
Hani Goodarzi
Linking DNA sequence to genomic function remains one of the grand challenges in genetics and genomics. Here, we combine large-scale single-m… (voir plus)olecule transcriptome sequencing of diverse cancer cell lines with cutting-edge machine learning to build LoRNASH, an RNA foundation model that learns how the nucleotide sequence of unspliced pre-mRNA dictates transcriptome architecture—the relative abundances and molecular structures of mRNA isoforms. Owing to its use of the StripedHyena architecture, LoRNASH handles extremely long sequence inputs (∼65 kilobase pairs), allowing for quantitative, zero-shot prediction of all aspects of transcriptome architecture, including isoform abundance, isoform structure, and the impact of DNA sequence variants on transcript structure and abundance. We anticipate that our public data release and proof-of-concept model will accelerate varying aspects of RNA biotechnology. More broadly, we envision the use of LoRNASH as a foundation for fine-tuning of any transcriptome-related downstream prediction task, including cell-type specific gene expression, splicing, and general RNA processing.
MeshUp: Multi-Target Mesh Deformation via Blended Score Distillation
Hyunwoo Kim
Itai Lang
Thibault Groueix
Vladimir Kim
Rana Hanocka
We propose MeshUp, a technique that deforms a 3D mesh towards multiple target concepts, and intuitively controls the region where each conce… (voir plus)pt is expressed. Conveniently, the concepts can be defined as either text queries, e.g.,"a dog"and"a turtle,"or inspirational images, and the local regions can be selected as any number of vertices on the mesh. We can effectively control the influence of the concepts and mix them together using a novel score distillation approach, referred to as the Blended Score Distillation (BSD). BSD operates on each attention layer of the denoising U-Net of a diffusion model as it extracts and injects the per-objective activations into a unified denoising pipeline from which the deformation gradients are calculated. To localize the expression of these activations, we create a probabilistic Region of Interest (ROI) map on the surface of the mesh, and turn it into 3D-consistent masks that we use to control the expression of these activations. We demonstrate the effectiveness of BSD empirically and show that it can deform various meshes towards multiple objectives. Our project page is at https://threedle.github.io/MeshUp.
Pushing the frontiers in climate modelling and analysis with machine learning
Veronika Eyring
William D. Collins
Pierre Gentine
Elizabeth A. Barnes
Marcelo Barreiro
Tom Beucler
Marc Bocquet
Christopher S. Bretherton
Hannah M. Christensen
Katherine Dagon
David John Gagne
David Hall
Dorit Hammerling
Stephan Hoyer
Fernando Iglesias-Suarez
Ignacio Lopez-Gomez
Marie C. McGraw
Gerald A. Meehl
Maria J. Molina
Claire Monteleoni … (voir 9 de plus)
Juliane Mueller
Michael S. Pritchard
Jakob Runge
Philip Stier
Oliver Watt-Meyer
Katja Weigel
Rose Yu
Laure Zanna
Development of a Framework for Establishing 'Gold Standard' Outbreak Data from Submitted SARS-CoV-2 Genome Samples
Yannan Shen
Russell Steele
Submitted genomic data for respiratory viruses reflect the emergence and spread of new variants. Although delays in submission limit the uti… (voir plus)lity of these data for prospective surveillance, they may be useful for evaluating other surveillance sources. However, few studies have investigated the use of these data for evaluating aberration detection in surveillance systems. Our study used a Bayesian online change point detection algorithm (BOCP) to detect increases in the number of submitted genome samples as a means of establishing 'gold standard' dates of outbreak onset in multiple countries. We compared models using different data transformations and parameter values. BOCP detected change points that were not sensitive to different parameter settings. We also found data transformations were essential prior to change point detection. Our study presents a framework for using global genomic submission data to develop 'gold standard' dates about the onset of outbreaks due to new viral variants.
INTREPPPID - An Orthologue-Informed Quintuplet Network for Cross-Species Prediction of Protein-Protein Interaction
Joseph Szymborski
An overwhelming majority of protein-protein interaction (PPI) studies are conducted in a select few model organisms largely due to constrain… (voir plus)ts in time and cost of the associated “wet lab” experiments. In silico PPI inference methods are ideal tools to overcome these limitations, but often struggle with cross-species predictions. We present INTREPPPID, a method which incorporates orthology data using a new “quintuplet” neural network, which is constructed with five parallel encoders with shared parameters. INTREPPPID incorporates both a PPI classification task and an orthologous locality task. The latter learns embeddings of orthologues that have small Euclidean distances between them and large distances between embeddings of all other proteins. INTREPPPID outperforms all other leading PPI inference methods tested on both the intra-species and cross-species tasks using strict evaluation datasets. We show that INTREPPPID’s orthologous locality loss increases performance because of the biological relevance of the orthologue data, and not due to some other specious aspect of the architecture. Finally, we introduce PPI.bio and PPI Origami, a web server interface for INTREPPPID and a software tool for creating strict evaluation datasets, respectively. Together, these two initiatives aim to make both the use and development of PPI inference tools more accessible to the community. GRAPHICAL ABSTRACT