Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
CODA: an open-source platform for federated analysis and machine learning on distributed healthcare data
Louis Mullie
Jonathan Afilalo
Patrick Archambault
Rima Bouchakri
Kip Brown
David L. Buckeridge
Yiorgos Alexandros Cavayas
Alexis F. Turgeon
Denis Martineau
François Lamontagne
Martine Lebrasseur
Renald Lemieux
Jeffrey Li
Michaël Sauthier
Pascal St-Onge
An Tang
William Witteman
Michael Chassé
Distributed computations facilitate multi-institutional data analysis while avoiding the costs and complexity of data pooling. Existing appr… (see more)oaches lack crucial features, such as built-in medical standards and terminologies, no-code data visualizations, explicit disclosure control mechanisms, and support for basic statistical computations, in addition to gradient-based optimization capabilities.
We describe the development of the Collaborative Data Analysis (CODA) platform, and the design choices undertaken to address the key needs identified during our survey of stakeholders. We use a public dataset (MIMIC-IV) to demonstrate end-to-end multi-modal FL using CODA. We assessed the technical feasibility of deploying the CODA platform at 9 hospitals in Canada, describe implementation challenges, and evaluate its scalability on large patient populations.
The CODA platform was designed, developed, and deployed between January 2020 and January 2023. Software code, documentation, and technical documents were released under an open-source license. Multi-modal federated averaging is illustrated using the MIMIC-IV and MIMIC-CXR datasets. To date, 8 out of the 9 participating sites have successfully deployed the platform, with a total enrolment of >1M patients. Mapping data from legacy systems to FHIR was the biggest barrier to implementation.
The CODA platform was developed and successfully deployed in a public healthcare setting in Canada, with heterogeneous information technology systems and capabilities. Ongoing efforts will use the platform to develop and prospectively validate models for risk assessment, proactive monitoring, and resource usage. Further work will also make tools available to facilitate migration from legacy formats to FHIR and DICOM.
In late December 1973, the United States enacted what some would come to call “the pitbull of environmental laws.” In the 50 years since… (see more), the formidable regulatory teeth of the Endangered Species Act (ESA) have been credited with considerable successes, obliging agencies to draw upon the best available science to protect species and habitats. Yet human pressures continue to push the planet toward extinctions on a massive scale. With that prospect looming, and with scientific understanding ever changing,
Science
invited experts to discuss how the ESA has evolved and what its future might hold.
—Brad Wible
Deep spectroscopic surveys with the Atacama Large Millimeter/submillimeter Array (ALMA) have revealed that some of the brightest infrared so… (see more)urces in the sky correspond to concentrations of submillimeter galaxies (SMGs) at high redshift. Among these, the SPT2349-56 protocluster system is amongst the most extreme examples given its high source density and integrated star formation rate. We conducted a deep Lyman-alpha line emission survey around SPT2349-56 using the Multi-Unit Spectroscopic Explorer (MUSE) at the Very Large Telescope (VLT) in order to characterize this uniquely dense environment. Taking advantage of the deep three-dimensional nature of this survey, we performed a sensitive search for Lyman-alpha emitters (LAEs) toward the core and northern extension of the protocluster, which correspond to the brightest infrared regions in this field. Using a smoothed narrowband image extracted from the MUSE datacube around the protocluster redshift, we searched for possible extended structures. We identify only three LAEs at
Widely considered a cornerstone of human morality, trust shapes many aspects of human social interactions. In this work, we present a theore… (see more)tical analysis of the
Recently, machine and deep learning (ML/DL) algorithms have been increasingly adopted in many software systems. Due to their inductive natur… (see more)e, ensuring the quality of these systems remains a significant challenge for the research community. Unlike traditional software built deductively by writing explicit rules, ML/DL systems infer rules from training data. Recent research in ML/DL quality assurance has adapted concepts from traditional software testing, such as mutation testing, to improve reliability. However, it is unclear if these proposed testing techniques are adopted in practice, or if new testing strategies have emerged from real-world ML deployments. There is little empirical evidence about the testing strategies.
To fill this gap, we perform the first fine-grained empirical study on ML testing in the wild to identify the ML properties being tested, the testing strategies, and their implementation throughout the ML workflow.
We conducted a mixed-methods study to understand ML software testing practices. We analyzed test files and cases from 11 open-source ML/DL projects on GitHub. Using open coding, we manually examined the testing strategies, tested ML properties, and implemented testing methods to understand their practical application in building and releasing ML/DL software systems.
Our findings reveal several key insights: 1.) The most common testing strategies, accounting for less than 40%, are Grey-box and White-box methods, such as Negative Testing, Oracle Approximation and Statistical Testing. 2.) A wide range of 17 ML properties are tested, out of which only 20% to 30% are frequently tested, including Consistency, Correctness}, and Efficiency. 3.) Bias and Fairness is more tested in Recommendation, while Security & Privacy is tested in Computer Vision (CV) systems, Application Platforms, and Natural Language Processing (NLP) systems.
Moiré superlattices have emerged as an exciting condensed-matter quantum simulator for exploring the exotic physics of strong electronic co… (see more)rrelations. Notable progress has been witnessed, but such correlated states are achievable usually at low temperatures. Here, we report evidence of possible room-temperature correlated electronic states and layer-hybridized SU(4) model simulator in AB-stacked MoS_{2} homobilayer moiré superlattices. Correlated insulating states at moiré band filling factors v=1, 2, 3 are unambiguously established in twisted bilayer MoS_{2}. Remarkably, the correlated electronic state at v=1 shows a giant correlated gap of ∼126 meV and may persist up to a record-high critical temperature over 285 K. The realization of a possible room-temperature correlated state with a large correlated gap in twisted bilayer MoS_{2} can be understood as the cooperation effects of the stacking-specific atomic reconstruction and the resonantly enhanced interlayer hybridization, which largely amplify the moiré superlattice effects on electronic correlations. Furthermore, extreme large nonlinear Hall responses up to room temperature are uncovered near correlated electronic states, demonstrating the quantum geometry of moiré flat conduction band.
There is increasing adoption of artificial intelligence in drug discovery. However, existing studies use machine learning to mainly utilize … (see more)the chemical structures of molecules but ignore the vast textual knowledge available in chemistry. Incorporating textual knowledge enables us to realize new drug design objectives, adapt to text-based instructions and predict complex biological activities. Here we present a multi-modal molecule structure–text model, MoleculeSTM, by jointly learning molecules’ chemical structures and textual descriptions via a contrastive learning strategy. To train MoleculeSTM, we construct a large multi-modal dataset, namely, PubChemSTM, with over 280,000 chemical structure–text pairs. To demonstrate the effectiveness and utility of MoleculeSTM, we design two challenging zero-shot tasks based on text instructions, including structure–text retrieval and molecule editing. MoleculeSTM has two main properties: open vocabulary and compositionality via natural language. In experiments, MoleculeSTM obtains the state-of-the-art generalization ability to novel biochemical concepts across various benchmarks. Machine learning methods in cheminformatics have made great progress in using chemical structures of molecules, but a large portion of textual information remains scarcely explored. Liu and colleagues trained MoleculeSTM, a foundation model that aligns the structure and text modalities through contrastive learning, and show its utility on the downstream tasks of structure–text retrieval, text-guided editing and molecular property prediction.
Human diseases are characterized by intricate cellular dynamics. Single-cell sequencing provides critical insights, yet a persistent gap rem… (see more)ains in computational tools for detailed disease progression analysis and targeted in-silico drug interventions. Here, we introduce UNAGI, a deep generative neural network tailored to analyze time-series single-cell transcriptomic data. This tool captures the complex cellular dynamics underlying disease progression, enhancing drug perturbation modeling and discovery. When applied to a dataset from patients with Idiopathic Pulmonary Fibrosis (IPF), UNAGI learns disease-informed cell embeddings that sharpen our understanding of disease progression, leading to the identification of potential therapeutic drug candidates. Validation via proteomics reveals the accuracy of UNAGI’s cellular dynamics analyses, and the use of the Fibrotic Cocktail treated human Precision-cut Lung Slices confirms UNAGI’s predictions that Nifedipine, an antihypertensive drug, may have antifibrotic effects on human tissues. UNAGI’s versatility extends to other diseases, including a COVID dataset, demonstrating adaptability and confirming its broader applicability in decoding complex cellular dynamics beyond IPF, amplifying its utility in the quest for therapeutic solutions across diverse pathological landscapes.