Data Privacy for Record Linkage and Beyond
Shurong Lin
In a data-driven world, two prominent research problems are record linkage and data privacy, among others. Record linkage is essential for i… (see more)mproving decision-making by integrating information of the same entities from different sources. On the other hand, data privacy research seeks to balance the need to extract accurate insights from data with the imperative to protect the privacy of the entities involved. Inevitably, data privacy issues arise in the context of record linkage. This article identifies two complementary aspects at the intersection of these two fields: (1) how to ensure privacy during record linkage and (2) how to mitigate privacy risks when releasing the analysis results after record linkage. We specifically discuss privacy-preserving record linkage, differentially private regression, and related topics.
Do machine learning methods Make Better predictions in pharmacoepidemiology?
Ana Paula Pena-Gralle
Mireille E. Schnitzer
Sofia-Nada Boureguaa
Félix Morin
Caroline Sirois
Alice Dragomir
Lucie Blais
Predicting Five-Year All-Cause Mortality in COPD Patients Using Machine Learning
Ana Paula Pena-Gralle
Amélie Forget
Sofia-Nada Boureguaa
Lucie Blais
Virtual Reality for Pediatric Trauma Education - A Preliminary Face and Content Validation Study.
Fabio Botelho
Said Ashkar
Shreenik Kundu
Tj Matthews
Elena Guadgano
Virtual Reality for Pediatric Trauma Education - A Preliminary Face and Content Validation Study
Fabio Botelho
Said Ashkar
Shreenik Kundu
Tj Matthews
Elena Guadgano
Herbarium collections remain essential in the age of community science
Isaac Eckert
Anne Bruneau
D. Metsger
Simon Joly
T. Dickinson
Herbarium collections remain essential in the age of community science
Isaac Eckert
Anne Bruneau
D. Metsger
Simon Joly
T. Dickinson
Progres: Prompted Generative Rescoring on ASR N-Best
Ada Defne Tur
Adel Moumen
Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best h… (see more)ypotheses generated during the beam search process. However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence scorer LLM. We evaluated our approach using different speech recognizers and observed significant relative improvement in the word error rate (WER) ranging from 5% to 25%.
Progres: Prompted Generative Rescoring on ASR N-Best
Ada Defne Tur
Adel Moumen
Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best h… (see more)ypotheses generated during the beam search process. However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence scorer LLM. We evaluated our approach using different speech recognizers and observed significant relative improvement in the word error rate (WER) ranging from 5% to 25%.
Active Semantic Mapping and Pose Graph Spectral Analysis for Robot Exploration
Rongge Zhang
Haechan Mark Bong
Exploration in unknown and unstructured environments is a pivotal requirement for robotic applications. A robot’s exploration behavior can… (see more) be inherently affected by the performance of its Simultaneous Localization and Mapping (SLAM) subsystem, although SLAM and exploration are generally studied separately. In this paper, we formulate exploration as an active mapping problem and extend it with semantic information. We introduce a novel active metric-semantic SLAM approach, leveraging recent research advances in information theory and spectral graph theory: we combine semantic mutual information and the connectivity metrics of the underlying pose graph of the SLAM subsystem. We use the resulting utility function to evaluate different trajectories to select the most favorable strategy during exploration. Exploration and SLAM metrics are analyzed in experiments. Running our algorithm on the Habitat dataset, we show that, while maintaining efficiency close to the state-of-the-art exploration methods, our approach effectively increases the performance of metric-semantic SLAM with a 21% reduction in average map error and a 9% improvement in average semantic classification accuracy.
ARGV: 3D genome structure exploration using augmented reality
Chrisostomos Drogaris
Yanlin Zhang
Éric Zhang
Elena Nazarova
Roman Sarrazin-Gendron
Sélik Wilhelm-Landry
Yan Cyr
Jacek Majewski
Jérôme Waldispühl
A long-context RNA foundation model for predicting transcriptome architecture
Ali Saberi
Benedict Choi
Sean Wang
Aldo Hernández-Corchado
Mohsen Naghipourfar
Arsham Mikaeili Namini
Vijay Ramani
Hamed S. Najafabadi
Hani Goodarzi
Linking DNA sequence to genomic function remains one of the grand challenges in genetics and genomics. Here, we combine large-scale single-m… (see more)olecule transcriptome sequencing of diverse cancer cell lines with cutting-edge machine learning to build LoRNASH, an RNA foundation model that learns how the nucleotide sequence of unspliced pre-mRNA dictates transcriptome architecture—the relative abundances and molecular structures of mRNA isoforms. Owing to its use of the StripedHyena architecture, LoRNASH handles extremely long sequence inputs (∼65 kilobase pairs), allowing for quantitative, zero-shot prediction of all aspects of transcriptome architecture, including isoform abundance, isoform structure, and the impact of DNA sequence variants on transcript structure and abundance. We anticipate that our public data release and proof-of-concept model will accelerate varying aspects of RNA biotechnology. More broadly, we envision the use of LoRNASH as a foundation for fine-tuning of any transcriptome-related downstream prediction task, including cell-type specific gene expression, splicing, and general RNA processing.