The next cohort of our program, designed to empower policy professionals with a comprehensive understanding of AI, will take place in Ottawa on November 28 and 29.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Implicitly Bayesian Prediction Rules in Deep Learning
The Bayesian approach leads to coherent updates of predictions under new data, which makes adhering to Bayesian principles appealing in deci… (see more)sion-making contexts. Traditionally, integrating Bayesian principles into models like deep neural networks involves setting priors on parameters and approximating posteriors. This is done despite the fact that, typically, priors on parameters reflect any prior beliefs only insofar as they dictate function space behaviour. In this paper, we rethink this approach and consider what properties characterise a prediction rule as being Bayesian. Algorithms meeting such criteria can be deemed implicitly Bayesian — they make the same predictions as some Bayesian model, without explicitly manifesting priors and posteriors. We argue this might be a more fruitful approach towards integrating Bayesian principles into deep learning. In this paper, we propose how to measure how close a general prediction rule is to being implicitly Bayesian, and empirically evaluate multiple prediction strategies using our approach. We also show theoretically that agents relying on non-implicitly Bayesian prediction rules can be easily exploited in adversarial betting settings.
2024-07-29
Proceedings of the 6th Symposium on Advances in Approximate Bayesian Inference (published)
Single-cell multi-omics illuminate intricate cellular states, yielding transformative insights into cellular dynamics and disease. Yet, whil… (see more)e the potential of this technology is vast, the integration of its multifaceted data presents challenges. Some modalities have not reached the robustness or clarity of established scRNA-seq. Coupled with data scarcity for newer modalities and integration intricacies, these challenges limit our ability to maximize single-cell omics benefits. We introduce scCross: a tool adeptly engineered using variational autoencoder, generative adversarial network principles, and the Mutual Nearest Neighbors (MNN) technique for modality alignment. This synergy ensures seamless integration of varied single-cell multi-omics data. Beyond its foundational prowess in multi-omics data integration, scCross excels in single-cell cross-modal data generation, multi-omics data simulation, and profound in-silico cellular perturbations. Armed with these capabilities, scCross is set to transform the field of single-cell research, establishing itself in the nuanced integration, generation, and simulation of complex multi-omics data.
Offline reinforcement learning has shown promise for solving tasks in safety-critical settings, such as clinical decision support. Its appli… (see more)cation, however, has been limited by the lack of interpretability and interactivity for clinicians. To address these challenges, we propose the medical decision transformer (MeDT), a novel and versatile framework based on the goal-conditioned reinforcement learning paradigm for sepsis treatment recommendation. MeDT uses the decision transformer architecture to learn a policy for drug dosage recommendation. During offline training, MeDT utilizes collected treatment trajectories to predict administered treatments for each time step, incorporating known treatment outcomes, target acuity scores, past treatment decisions, and current and past medical states. This analysis enables MeDT to capture complex dependencies among a patient's medical history, treatment decisions, outcomes, and short-term effects on stability. Our proposed conditioning uses acuity scores to address sparse reward issues and to facilitate clinician-model interactions, enhancing decision-making. Following training, MeDT can generate tailored treatment recommendations by conditioning on the desired positive outcome (survival) and user-specified short-term stability improvements. We carry out rigorous experiments on data from the MIMIC-III dataset and use off-policy evaluation to demonstrate that MeDT recommends interventions that outperform or are competitive with existing offline reinforcement learning methods while enabling a more interpretable, personalized and clinician-directed approach.
Approximately two-thirds of survivors of childhood acute lymphoblastic leukemia (ALL) cancer develop late adverse effects post-treatment. Pr… (see more)ior studies explored prediction models for personalized follow-up, but none integrated the usage of neural networks to date. In this work, we propose the Error Passing Network (EPN), a graph-based method that leverages relationships between samples to propagate residuals and adjust predictions of any machine learning model. We tested our approach to estimate patients’ \vo peak, a reliable indicator of their cardiac health. We used the EPN in conjunction with several baseline models and observed up to 12.16% improvement in the mean average percentage error compared to the last established equation predicting \vo peak in childhood ALL survivors. Along with this performance improvement, our final model is more efficient considering that it relies only on clinical variables that can be self-reported by patients, therefore removing the previous need of executing a resource-consuming physical test.
2024-07-24
Proceedings of the fifth Conference on Health, Inference, and Learning (published)
Background:
Recently, machine and deep learning (ML/DL) algorithms have been increasingly adopted in many software systems. Due to their in… (see more)ductive nature, ensuring the quality of these systems remains a significant challenge for the research community. Traditionally, software systems were constructed deductively, by writing explicit rules that govern the behavior of the system as program code. However, ML/DL systems infer rules from training data i.e., they are generated inductively). Recent research in ML/DL quality assurance has adapted concepts from traditional software testing, such as mutation testing, to improve reliability. However, it is unclear if these proposed testing techniques are adopted in practice, or if new testing strategies have emerged from real-world ML deployments. There is little empirical evidence about the testing strategies.
Aims:
To fill this gap, we perform the first fine-grained empirical study on ML testing in the wild to identify the ML properties being tested, the testing strategies, and their implementation throughout the ML workflow.
Method:
We conducted a mixed-methods study to understand ML software testing practices. We analyzed test files and cases from 11 open-source ML/DL projects on GitHub. Using open coding, we manually examined the testing strategies, tested ML properties, and implemented testing methods to understand their practical application in building and releasing ML/DL software systems.
Results:
Our findings reveal several key insights: 1.) The most common testing strategies, accounting for less than 40%, are Grey-box and White-box methods, such as
Negative Testing
,
Oracle Approximation
, and
Statistical Testing
. 2.) A wide range of
\(17\)
ML properties are tested, out of which only 20% to 30% are frequently tested, including
Consistency
,
Correctness
, and
Efficiency
. 3.)
Bias and Fairness
is more tested in Recommendation (6%) and CV (3.9%) systems, while
Security & Privacy
is tested in CV (2%), Application Platforms (0.9%), and NLP (0.5%). 4.) We identified 13 types of testing methods, such as
Unit Testing
,
Input Testing
, and
Model Testing
.
Conclusions:
This study sheds light on the current adoption of software testing techniques and highlights gaps and limitations in existing ML testing practices.
2024-07-24
ACM Transactions on Software Engineering and Methodology (published)
Bayesian inference for inverse problems hinges critically on the choice of priors. In the absence of specific prior information, population-… (see more)level distributions can serve as effective priors for parameters of interest. With the advent of machine learning, the use of data-driven population-level distributions (encoded, e.g., in a trained deep neural network) as priors is emerging as an appealing alternative to simple parametric priors in a variety of inverse problems. However, in many astrophysical applications, it is often difficult or even impossible to acquire independent and identically distributed samples from the underlying data-generating process of interest to train these models. In these cases, corrupted data or a surrogate, e.g. a simulator, is often used to produce training samples, meaning that there is a risk of obtaining misspecified priors. This, in turn, can bias the inferred posteriors in ways that are difficult to quantify, which limits the potential applicability of these models in real-world scenarios. In this work, we propose addressing this issue by iteratively updating the population-level distributions by retraining the model with posterior samples from different sets of observations and showcase the potential of this method on the problem of background image reconstruction in strong gravitational lensing when score-based models are used as data-driven priors. We show that starting from a misspecified prior distribution, the updated distribution becomes progressively closer to the underlying population-level distribution, and the resulting posterior samples exhibit reduced bias after several updates.
Fine-grained understanding of objects, attributes, and relationships between objects is crucial for visual-language models (VLMs). Existing … (see more)benchmarks primarily focus on evaluating VLMs' capability to distinguish between two very similar \textit{captions} given an image. In this paper, we introduce a new, challenging benchmark termed \textbf{Vis}ual \textbf{Min}imal-Change Understanding (VisMin), which requires models to predict the correct image-caption match given two images and two captions. The image pair and caption pair contain minimal changes, i.e., only one aspect changes at a time from among the following: \textit{object}, \textit{attribute}, \textit{count}, and \textit{spatial relation}. These changes test the models' understanding of objects, attributes (such as color, material, shape), counts, and spatial relationships between objects. We built an automatic framework using large language models and diffusion models, followed by a rigorous 4-step verification process by human annotators. Empirical experiments reveal that current VLMs exhibit notable deficiencies in understanding spatial relationships and counting abilities. We also generate a large-scale training dataset to finetune CLIP and Idefics2, showing significant improvements in fine-grained understanding across benchmarks and in CLIP's general image-text alignment. We release all resources, including the benchmark, training data, and finetuned model checkpoints, at https://vismin.net/.