Learn how to leverage generative AI to support and improve your productivity at work. The next cohort will take place online on April 28 and 30, 2026, in French.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
While neural networks are capable of achieving human-like performance in many tasks such as image classification, the impressive performance… (see more) of each model is limited to its own dataset. Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data, thus, increasing data privacy. Diversity in representation space can be vital to a model`s adaptability in varied and difficult domains. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. Motivated by the improved predictive performance of ensembles, we propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors with Distinct Backbone Architectures (DBA). Although diversity in feature space is increased, the unconstrained mutual information (MI) maximization may potentially introduce amplification of weak hypotheses. Thus we introduce the Weak Hypothesis Penalization (WHP) regularizer as a mitigation strategy. Our work proposes Penalized Diversity (PD) where the synergy of DBA and WHP is applied to unsupervised source-free domain adaptation for covariate shift. In addition, PD is augmented with a weighted MI maximization objective for label distribution shift. Empirical results on natural, synthetic, and medical domains demonstrate the effectiveness of PD under different distributional shifts.
Uncovering executive function profiles within interindividual variability: A data driven clustering exploration of design fluency in school-aged children
Large collections of high-dimensional data have become nearly ubiquitous across many academic fields and application domains, ranging from b… (see more)iology to the humanities. Since working directly with high-dimensional data poses challenges, the demand for algorithms that create low-dimensional representations, or embeddings, for data visualization, exploration, and analysis is now greater than ever. In recent years, numerous embedding algorithms have been developed, and their usage has become widespread in research and industry. This surge of interest has resulted in a large and fragmented research field that faces technical challenges alongside fundamental debates, and it has left practitioners without clear guidance on how to effectively employ existing methods. Aiming to increase coherence and facilitate future work, in this review we provide a detailed and critical overview of recent developments, derive a list of best practices for creating and using low-dimensional embeddings, evaluate popular approaches on a variety of datasets, and discuss the remaining challenges and open problems in the field.
We investigate the robustness of Neural Ratio Estimators (NREs) and Neural Posterior Estimators (NPEs) to distributional shifts in the conte… (see more)xt of measuring the abundance of dark matter subhalos using strong gravitational lensing data. While these data-driven inference frameworks can be accurate on test data from the same distribution as the training sets, in real applications, it is expected that simulated training data and true observational data will differ in their distributions. We explore the behavior of a trained NRE and trained sequential NPEs to estimate the population-level parameters of dark matter subhalos from a large sample of images of strongly lensed galaxies with test data presenting distributional shifts within and beyond the bounds of the training distribution in the nuisance parameters (e.g., the background source morphology). While our results show that NREs and NPEs perform well when tested perfectly in distribution, they exhibit significant biases when confronted with slight deviations from the examples seen in the training distribution. This indicates the necessity for caution when applying NREs and NPEs to real astrophysical data, where high-dimensional underlying distributions are not perfectly known.
RadiSeq: a single- and bulk-cell whole-genome DNA sequencing simulator for radiation-damaged cell models
Felix Mathew
Luc Galarneau
J. Kildea
Objective To build and validate a simulation framework to perform single-cell and bulk-cell whole genome sequencing simulation of radiation-… (see more)exposed Monte Carlo cell models to assist radiation genomics studies. Approach Sequencing the genomes of radiation-damaged cells can provide useful insight into radiation action for radiobiology research. However, carrying out post-irradiation sequencing experiments can often be challenging, expensive, and time-consuming. Although computational simulations have the potential to provide solutions to these experimental challenges, and aid in designing optimal experiments, the absence of tools currently limits such application. Monte Carlo toolkits exist to simulate radiation exposures of cell models but there are no tools to simulate single- and bulk-cell sequencing of cell models containing radiation-damaged DNA. Therefore, we aimed to develop a Monte Carlo simulation framework to address this gap by designing a tool capable of simulating sequencing processes for radiation-damaged cells. Main Results We developed RadiSeq – a multi-threaded whole-genome DNA sequencing simulator written in C++. RadiSeq can be used to simulate Illumina sequencing of radiation-damaged cell models produced by Monte Carlo simulations. RadiSeq has been validated through comparative analysis, where simulated data were matched against experimentally obtained data, demonstrating reasonable agreement between the two. Additionally, it comes with numerous features designed to closely resemble actual whole-genome sequencing. RadiSeq is also highly customizable with a single input parameter file. Significance RadiSeq enables the research community to perform complex simulations of radiation-exposed DNA sequencing, supporting the optimization, planning, and validation of costly and time-intensive radiation biology experiments. This framework provides a powerful tool for advancing radiation genomics research.
Foundation models based on large language models (LLMs) have shown great success in handling various tasks and modalities. However, adapting… (see more) these models for general-purpose audio-language tasks is challenging due to differences in acoustic environments and task variations. In this work, we introduce LiSTEN Learning Soft Token Embeddings for Neural Audio LLMs), a framework for adapting LLMs to speech and audio tasks. LiSTEN uses a dynamic prompt selection strategy with learnable key-value pairs, allowing the model to balance general and task-specific knowledge while avoiding overfitting in a multitask setting. Our approach reduces dependence on large-scale ASR or captioning datasets, achieves competitive performance with fewer trainable parameters, and simplifies training by using a single-stage process. Additionally, LiSTEN enhances interpretability by analyzing the diversity and overlap of selected prompts across different tasks.