Publications

Automated prediction of extubation success in extremely preterm infants: the APEX multicenter study
Lara J. Kanbar
Wissam Shalish
Charles C. Onu
Samantha Latremouille
Lajos Kovacs
Martin Keszler
Sanjay Chawla
Karen A. Brown
Robert E. Kearney
Guilherme M. Sant’Anna
BioCaster in 2021: automatic disease outbreaks detection from global news media
Zaiqiao Meng
Anya Okhmatovskaia
Maxime Polleri
Guido Powell
Zihao Fu
Iris Ganser
Meiru Zhang
Nicholas B. King
Nigel Collier
SUMMARY: BioCaster was launched in 2008 to provide an ontology-based text mining system for early disease detection from open news sources. … (voir plus)Following a 6-year break, we have re-launched the system in 2021. Our goal is to systematically upgrade the methodology using state-of-the-art neural network language models, whilst retaining the original benefits that the system provided in terms of logical reasoning and automated early detection of infectious disease outbreaks. Here, we present recent extensions such as neural machine translation in 10 languages, neural classification of disease outbreak reports and a new cloud-based visualization dashboard. Furthermore, we discuss our vision for further improvements, including combining risk assessment with event semantics and assessing the risk of outbreaks with multi-granularity. We hope that these efforts will benefit the global public health community. AVAILABILITY AND IMPLEMENTATION: BioCaster web-portal is freely accessible at http://biocaster.org.
A Parsimonious Description of Global Functional Brain Organization in Three Spatiotemporal Patterns
Taylor Bolt
Jason S. Nomi
Jorge A. Salas
Catie Chang
B.T. Thomas Yeo
Lucina Q. Uddin
Shella D. Keilholz
Resting-state functional MRI has yielded seemingly disparate insights into large-scale organization of the human brain. The brain’s large-… (voir plus)scale organization can be divided into two broad categories - zero-lag representations of functional connectivity structure and time-lag representations of traveling wave or propagation structure. Here we sought to unify observed phenomena across these two categories in the form of three low-frequency spatiotemporal patterns composed of a mixture of standing and traveling wave dynamics. We showed that a range of empirical phenomena, including functional connectivity gradients, the task-positive/task-negative anti-correlation pattern, the global signal, time-lag propagation patterns, the quasiperiodic pattern, and the functional connectome network structure are manifestations of these three spatiotemporal patterns. These patterns account for much of the global spatial structure that underlies functional connectivity analyses, and unifies phenomena in resting-state functional MRI previously thought distinct.
Explanatory latent representation of heterogeneous spatial maps of task-fMRI in large-scale datasets
Mariam Zabihi
Seyed Mostafa Kia
Thomas Wolfers
Stijn de Boer
Charlotte Fraza
Sourena Soheili-Nezhad
Richard Dinga
Alberto Llera Arenas
Christian F. Beckmann
Andre Marquand
Finding an interpretable and compact representation of complex neuroimage data can be extremely useful for understanding brain behavioral ma… (voir plus)pping and hence for explaining the biological underpinnings of mental disorders. Hand-crafted representations, as well as linear transformations, may not accurately reflect the significant variability across individuals. Here, we applied a data-driven approach to learn interpretable and generalizable latent representations that link cognition with underlying brain systems; we applied a three-dimensional autoencoder to two large-scale datasets to find an interpretable latent representation of high dimensional task fMRI image data. This representation also accounts for demographic characteristics, achieved by solving a joint optimization problem that simultaneously reconstructs the data and predicts clinical or demographic variables. We then applied normative modeling to the latent variables to define summary statistics (‘latent indices’) to find a multivariate mapping to non-imaging measures. We trained our model with multi-task fMRI data derived from the Human Connectome Project (HCP) that provides whole-brain coverage across a range of cognitive tasks. Next, in a transfer learning setting, we tested the generalization of our latent space on UK Biobank data as an independent dataset. Our model showed high performance in terms of age and predictions and was capable of capturing complex behavioral characteristics and preserving the individualized variabilities using a highly interpretable latent representation.
Global fMRI signal topography differs systematically across the lifespan
Jason S. Nomi
Jingwei Li
Taylor Bolt
Catie Chang
Salome Kornfeld
Zachary T. Goodman
B.T. Thomas Yeo
R. Nathan Spreng
Lucina Q. Uddin
The global signal (GS) in resting-state fMRI, known to contain artifacts and non-neuronal physiological signals, also contains important neu… (voir plus)ral information related to individual state and trait characteristics. Here we show distinct linear and curvilinear lifespan patterns of GS topography in a cross-sectional lifespan sample, demonstrating its importance for consideration in studies of development and aging. Subcortical brain regions such as the thalamus and putamen show linear associations with the GS across the lifespan. The thalamus has stronger coupling in older-age individuals compared with younger-aged individuals, while the putamen has stronger coupling in younger individuals compared with older individuals. The subcortical nucleus basalis shows a u-shaped pattern similar to cortical regions within the lateral frontoparietal network and dorsal attention network, where coupling with the GS is stronger at early and old age, with weaker coupling in middle age. This differentiation in coupling strength between subcortical and cortical brain activity across the lifespan supports a dual-layer model of GS composition, where subcortical aspects of the GS are differentiated from cortical aspects of the GS. We find that these subcortical-cortical contributions to the GS depend strongly on the lifespan stage of individuals. Our findings demonstrate how neurobiological information within the GS differs across development and highlight the need to carefully consider whether or not to remove this signal when investigating age-related functional differences in the brain.
H4rm0ny: A Competitive Zero-Sum Two-Player Markov Game for Multi-Agent Learning on Evasive Malware Generation and Detection
Christopher Molloy
Steven H. H. Ding
Benjamin C. M. Fung
Philippe Charland
To combat the increasingly versatile and mutable modern malware, Machine Learning (ML) is now a popular and effective complement to the exis… (voir plus)ting signature-based techniques for malware triage and identification. However, ML is also a readily available tool for adversaries. Recent studies have shown that malware can be modified by deep Reinforcement Learning (RL) techniques to bypass AI-based and signature-based anti-virus systems without altering their original malicious functionalities. These studies only focus on generating evasive samples and assume a static detection system as the enemy.Malware detection and evasion essentially form a two-party cat-and-mouse game. Simulating the real-life scenarios, in this paper we present the first two-player competitive game for evasive malware detection and generation, following the zero-sum Multi-Agent Reinforcement Learning (MARL) paradigm. Our experiments on recent malware show that the produced malware detection agent is more robust against adversarial attacks. Also, the produced malware modification agent is able to generate more evasive samples fooling both AI-based and other anti-malware techniques.
Implications of Topological Imbalance for Representation Learning on Biomedical Knowledge Graphs
Stephen Bonner
Ufuk Kirik
Ola Engkvist
Ian P Barrett
Adoption of recently developed methods from machine learning has given rise to creation of drug-discovery knowledge graphs (KG) that utilize… (voir plus) the interconnected nature of the domain. Graph-based modelling of the data, combined with KG embedding (KGE) methods, are promising as they provide a more intuitive representation and are suitable for inference tasks such as predicting missing links. One common application is to produce ranked lists of genes for a given disease, where the rank is based on the perceived likelihood of association between the gene and the disease. It is thus critical that these predictions are not only pertinent but also biologically meaningful. However, KGs can be biased either directly due to the underlying data sources that are integrated or due to modeling choices in the construction of the graph, one consequence of which is that certain entities can get topologically overrepresented. We demonstrate the effect of these inherent structural imbalances, resulting in densely-connected entities being highly ranked no matter the context. We provide support for this observation across different datasets, models as well as predictive tasks. Further, we present various graph perturbation experiments which yield more support to the observation that KGE models can be more influenced by the frequency of entities rather than any biological information encoded within the relations. Our results highlight the importance of data modeling choices, and emphasizes the need for practitioners to be mindful of these issues when interpreting model outputs and during KG composition.
On the Expressivity of Markov Reward (Extended Abstract)
David Abel
Will Dabney
Anna Harutyunyan
Mark K. Ho
Michael L. Littman
Satinder Singh
Flaky Performances when Pre-Training on Relational Databases with a Plan for Future Characterization Efforts
We explore the downstream task performances for graph neural network (GNN) self-supervised learning (SSL) methods trained on subgraphs extra… (voir plus)cted from relational databases (RDBs). Intu-itively, this joint use of SSL and GNNs allows us to leverage more of the available data, which could translate to better results. However, while we observe positive transfer in some cases, others showed systematic performance degradation, including some spectacular ones. We hypothesize a mechanism that could explain this behaviour and draft the plan for future work testing it by characterizing how much relevant information different strategies can (theoretically and/or empirically) extract from (synthetic and/or real) RDBs.
Revisiting Hotels-50K and Hotel-ID
Arantxa Casanova
Adriana Romero
In this paper, we propose revisited versions for two recent hotel recognition datasets: Hotels-50K and Hotel-ID. The revisited versions prov… (voir plus)ide evaluation setups with different levels of difficulty to better align with the intended real-world application, i.e. countering human trafficking. Real-world scenarios involve hotels and locations that are not captured in the current data sets, therefore it is important to consider evaluation settings where classes are truly unseen. We test this setup using multiple state-of-the-art image retrieval models and show that as expected, the models’ performances decrease as the evaluation gets closer to the real-world unseen settings. The rankings of the best performing models also change across the different evaluation settings, which further motivates using the proposed revisited datasets.
On the Generalization and Adaption Performance of Causal Models
Characterizing User Behaviors in Open-Source Software User Forums: An Empirical Study
Jazlyn Hellman
Jiahao Chen
Md. Sami Uddin
Jinghui Cheng
Jin L.C. Guo
User forums of Open Source Software (OSS) enable end-users to collaboratively discuss problems concerning the OSS applications. Despite deca… (voir plus)des of research on OSS, we know very little about how end-users engage with OSS communities on these forums, in particular, the challenges that hinder their continuous and meaningful participation in the OSS community. Many previous works are developer-centric and overlook the importance of end-user forums. As a result, end-users' expectations are seldom reflected in OSS development. To better understand user behaviors in OSS user forums, we carried out an empirical study analyzing about 1.3 million posts from user forums of four popular OSS applications: Zotero, Audacity, VLC, and RStudio. Through analyzing the contribution patterns of three common user types (end-users, developers, and organizers), we observed that end-users not only initiated most of the threads (above 96% of threads in three projects, 86% in the other), but also acted as the significant contributors for responding to other users' posts, even though they tended to lack confidence in their activities as indicated by psycho-linguistic analyses. Moreover, we found end-users more open, reflecting a more positive emotion in communication than organizers and developers in the forums. Our work contributes new knowledge about end-users' activities and behaviors in OSS user forums that the vital OSS stakeholders can leverage to improve end-user engagement in the OSS development process.