Nous utilisons des témoins pour analyser le trafic et l’utilisation de notre site web, afin de personnaliser votre expérience. Vous pouvez désactiver ces technologies à tout moment, mais cela peut restreindre certaines fonctionnalités du site. Consultez notre Politique de protection de la vie privée pour en savoir plus.
Paramètre des cookies
Vous pouvez activer et désactiver les types de cookies que vous souhaitez accepter. Cependant certains choix que vous ferez pourraient affecter les services proposés sur nos sites (ex : suggestions, annonces personnalisées, etc.).
Cookies essentiels
Ces cookies sont nécessaires au fonctionnement du site et ne peuvent être désactivés. (Toujours actif)
Cookies analyse
Acceptez-vous l'utilisation de cookies pour mesurer l'audience de nos sites ?
Multimedia Player
Acceptez-vous l'utilisation de cookies pour afficher et vous permettre de regarder les contenus vidéo hébergés par nos partenaires (YouTube, etc.) ?
Publications
Deep Clustering with Self-Supervision using Pairwise Similarities
Deep clustering incorporates embedding into clustering to find a lower-dimensional space appropriate for clustering. In this paper, we propo… (voir plus)se a novel deep clustering framework with self-supervision using pairwise similarities (DCSS). The proposed method consists of two successive phases. In the first phase, we propose to form hypersphere-like groups of similar data points, i.e. one hypersphere per cluster, employing an autoencoder that is trained using cluster-specific losses. The hyper-spheres are formed in the autoencoder's latent space. In the second phase, we propose to employ pairwise similarities to create a
Studies of dataset development in machine learning call for greater attention to the data practices that make model development possible and… (voir plus) shape its outcomes. Many argue that the adoption of theory and practices from archives and data curation fields can support greater fairness, accountability, transparency, and more ethical machine learning. In response, this paper examines data practices in machine learning dataset development through the lens of data curation. We evaluate data practices in machine learning as data curation practices. To do so, we develop a framework for evaluating machine learning datasets using data curation concepts and principles through a rubric. Through a mixed-methods analysis of evaluation results for 25 ML datasets, we study the feasibility of data curation principles to be adopted for machine learning data work in practice and explore how data curation is currently performed. We find that researchers in machine learning, which often emphasizes model development, struggle to apply standard data curation principles. Our findings illustrate difficulties at the intersection of these fields, such as evaluating dimensions that have shared terms in both fields but non-shared meanings, a high degree of interpretative flexibility in adapting concepts without prescriptive restrictions, obstacles in limiting the depth of data curation expertise needed to apply the rubric, and challenges in scoping the extent of documentation dataset creators are responsible for. We propose ways to address these challenges and develop an overall framework for evaluation that outlines how data curation concepts and methods can inform machine learning data practices.
This research note reports on a new dataset about legislators in four Canadian provinces since the establishment of their colonial assemblie… (voir plus)s in the eighteenth century. Over 7,000 legislators from Ontario, Quebec, New Brunswick, and Nova Scotia are included, with consolidated information drawn from multiple sources about parliamentarians’ years of birth and death, religion, electoral performance, kinship, and several other biographical indicators. We also illustrate the utility of such data with the help of a few descriptive examples drawn from the four provinces. We believe this consolidated dataset offers several opportunities for future research on representation, legislative activities and party politics.
2024-05-03
Canadian Journal of Political Science/Revue canadienne de science politique (publié)
The emerging behaviors of swarms have fascinated scientists and gathered significant interest in the field of robotics. Traditionally, swarm… (voir plus)s are viewed as egalitarian, with robots sharing identical roles and capabilities. However, recent findings highlight the importance of hierarchy for deploying robot swarms more effectively in diverse scenarios. Despite nature's preference for hierarchies, the robotics field has clung to the egalitarian model, partly due to a lack of empirical evidence for the conditions favoring hierarchies. Our research demonstrates that while egalitarian swarms excel in environments proportionate to their collective sensing abilities, they struggle in larger or more complex settings. Hierarchical swarms, conversely, extend their sensing reach efficiently, proving successful in larger, more unstructured environments with fewer resources. We validated these concepts through simulations and physical robot experiments, using a complex radiation cleanup task. This study paves the way for developing adaptable, hierarchical swarm systems applicable in areas like planetary exploration and autonomous vehicles. Moreover, these insights could deepen our understanding of hierarchical structures in biological organisms.
Despite substantial progress in machine learning for scientific discovery in recent years, truly de novo design of small molecules which exh… (voir plus)ibit a property of interest remains a significant challenge. We introduce LambdaZero, a generative active learning approach to search for synthesizable molecules. Powered by deep reinforcement learning, LambdaZero learns to search over the vast space of molecules to discover candidates with a desired property. We apply LambdaZero with molecular docking to design novel small molecules that inhibit the enzyme soluble Epoxide Hydrolase 2 (sEH), while enforcing constraints on synthesizability and drug-likeliness. LambdaZero provides an exponential speedup in terms of the number of calls to the expensive molecular docking oracle, and LambdaZero de novo designed molecules reach docking scores that would otherwise require the virtual screening of a hundred billion molecules. Importantly, LambdaZero discovers novel scaffolds of synthesizable, drug-like inhibitors for sEH. In in vitro experimental validation, a series of ligands from a generated quinazoline-based scaffold were synthesized, and the lead inhibitor N-(4,6-di(pyrrolidin-1-yl)quinazolin-2-yl)-N-methylbenzamide (UM0152893) displayed sub-micromolar enzyme inhibition of sEH.
This paper tackles the challenge of detecting unreliable behavior in regression algorithms, which may arise from intrinsic variability (e.g.… (voir plus), aleatoric uncertainty) or modeling errors (e.g., model uncertainty). First, we formally introduce the notion of unreliability in regression, i.e., when the output of the regressor exceeds a specified discrepancy (or error). Then, using powerful tools for probabilistic modeling, we estimate the discrepancy density, and we measure its statistical diversity using our proposed metric for statistical dissimilarity. In turn, this allows us to derive a data-driven score that expresses the uncertainty of the regression outcome. We show empirical improvements in error detection for multiple regression tasks, consistently outperforming popular baseline approaches, and contributing to the broader field of uncertainty quantification and safe machine learning systems.
Clinical research emphasizes the implementation of rigorous and reproducible study designs that rely on between-group matching or controllin… (voir plus)g for sources of biological variation such as subject’s sex and age. However, corrections for body size (i.e. height and weight) are mostly lacking in clinical neuroimaging designs. This study investigates the importance of body size parameters in their relationship with spinal cord (SC) and brain magnetic resonance imaging (MRI) metrics. Data were derived from a cosmopolitan population of 267 healthy human adults (age 30.1±6.6 years old, 125 females). We show that body height correlated strongly or moderately with brain gray matter (GM) volume, cortical GM volume, total cerebellar volume, brainstem volume, and cross-sectional area (CSA) of cervical SC white matter (CSA-WM; 0.44≤r≤0.62). In comparison, age correlated weakly with cortical GM volume, precentral GM volume, and cortical thickness (-0.21≥r≥-0.27). Body weight correlated weakly with magnetization transfer ratio in the SC WM, dorsal columns, and lateral corticospinal tracts (-0.20≥r≥-0.23). Body weight further correlated weakly with the mean diffusivity derived from diffusion tensor imaging (DTI) in SC WM (r=-0.20) and dorsal columns (-0.21), but only in males. CSA-WM correlated strongly or moderately with brain volumes (0.39≤r≤0.64), and weakly with precentral gyrus thickness and DTI-based fractional anisotropy in SC dorsal columns and SC lateral corticospinal tracts (-0.22≥r≥-0.25). Linear mixture of sex and age explained 26±10% of data variance in brain volumetry and SC CSA. The amount of explained variance increased at 33±11% when body height was added into the mixture model. Age itself explained only 2±2% of such variance. In conclusion, body size is a significant biological variable. Along with sex and age, body size should therefore be included as a mandatory variable in the design of clinical neuroimaging studies examining SC and brain structure.
Clinical research emphasizes the implementation of rigorous and reproducible study designs that rely on between-group matching or controllin… (voir plus)g for sources of biological variation such as subject’s sex and age. However, corrections for body size (i.e. height and weight) are mostly lacking in clinical neuroimaging designs. This study investigates the importance of body size parameters in their relationship with spinal cord (SC) and brain magnetic resonance imaging (MRI) metrics. Data were derived from a cosmopolitan population of 267 healthy human adults (age 30.1±6.6 years old, 125 females). We show that body height correlated strongly or moderately with brain gray matter (GM) volume, cortical GM volume, total cerebellar volume, brainstem volume, and cross-sectional area (CSA) of cervical SC white matter (CSA-WM; 0.44≤r≤0.62). In comparison, age correlated weakly with cortical GM volume, precentral GM volume, and cortical thickness (-0.21≥r≥-0.27). Body weight correlated weakly with magnetization transfer ratio in the SC WM, dorsal columns, and lateral corticospinal tracts (-0.20≥r≥-0.23). Body weight further correlated weakly with the mean diffusivity derived from diffusion tensor imaging (DTI) in SC WM (r=-0.20) and dorsal columns (-0.21), but only in males. CSA-WM correlated strongly or moderately with brain volumes (0.39≤r≤0.64), and weakly with precentral gyrus thickness and DTI-based fractional anisotropy in SC dorsal columns and SC lateral corticospinal tracts (-0.22≥r≥-0.25). Linear mixture of sex and age explained 26±10% of data variance in brain volumetry and SC CSA. The amount of explained variance increased at 33±11% when body height was added into the mixture model. Age itself explained only 2±2% of such variance. In conclusion, body size is a significant biological variable. Along with sex and age, body size should therefore be included as a mandatory variable in the design of clinical neuroimaging studies examining SC and brain structure.