Publications

On the effectiveness of log representation for log-based anomaly detection
Xingfang Wu
Heng Li
Improving Image-Based Precision Medicine with Uncertainty-Aware Causal Models
Joshua D. Durso-Finley
Jean-Pierre R. Falet
Raghav Mehta
Douglas Arnold
Nick Pawlowski
Image-based precision medicine aims to personalize treatment decisions based on an individual's unique imaging features so as to improve the… (see more)ir clinical outcome. Machine learning frameworks that integrate uncertainty estimation as part of their treatment recommendations would be safer and more reliable. However, little work has been done in adapting uncertainty estimation techniques and validation metrics for precision medicine. In this paper, we use Bayesian deep learning for estimating the posterior distribution over factual and counterfactual outcomes on several treatments. This allows for estimating the uncertainty for each treatment option and for the individual treatment effects (ITE) between any two treatments. We train and evaluate this model to predict future new and enlarging T2 lesion counts on a large, multi-center dataset of MR brain images of patients with multiple sclerosis, exposed to several treatments during randomized controlled trials. We evaluate the correlation of the uncertainty estimate with the factual error, and, given the lack of ground truth counterfactual outcomes, demonstrate how uncertainty for the ITE prediction relates to bounds on the ITE error. Lastly, we demonstrate how knowledge of uncertainty could modify clinical decision-making to improve individual patient and clinical trial outcomes.
Mitigating Calibration Bias Without Fixed Attribute Grouping for Improved Fairness in Medical Imaging Analysis
Changjian Shui
Justin Szeto
Raghav Mehta
Douglas Arnold
Better Quality Pre-training Data and T5 Models for African Languages
Akintunde Oladipo
Mofetoluwa Adeyemi
Orevaoghene Ahia
Abraham Toluwase Owodunni
Odunayo Ogundepo
Jimmy Lin
In this study, we highlight the importance of enhancing the quality of pretraining data in multilingual language models. Existing web crawl… (see more)s have demonstrated quality issues, particularly in the context of low-resource languages. Consequently, we introduce a new multilingual pretraining corpus for
Crystal-GFN: sampling crystals with desirable properties and constraints
Alex Hernandez-Garcia
Alexandre AGM Duval
Alexandra Volokhova
Divya Sharma
pierre luc carrier
Michał Koziarski
Victor Schmidt
Accelerating material discovery holds the potential to greatly help mitigate the climate crisis. Discovering new solid-state materials such … (see more)as electrocatalysts, super-ionic conductors or photovoltaic materials can have a crucial impact, for instance, in improving the efficiency of renewable energy production and storage. In this paper, we introduce Crystal-GFN, a generative model of crystal structures that sequentially samples structural properties of crystalline materials, namely the space group, composition and lattice parameters. This domain-inspired approach enables the flexible incorporation of physical and structural hard constraints, as well as the use of any available predictive model of a desired physicochemical property as an objective function. To design stable materials, one must target the candidates with the lowest formation energy. Here, we use as objective the formation energy per atom of a crystal structure predicted by a new proxy machine learning model trained on MatBench. The results demonstrate that Crystal-GFN is able to sample highly diverse crystals with low (median -3.1 eV/atom) predicted formation energy.
Efficient Classification of Long Documents via State-Space Models
Peng Lu
Suyuchen Wang
Mehdi Rezagholizadeh
Ivan Kobyzev
EpiK-Eval: Evaluation for Language Models as Epistemic Models
Gabriele Prato
Jerry Huang
Prasanna Parthasarathi
Shagun Sodhani
In the age of artificial intelligence, the role of large language models (LLMs) is becoming increasingly central. Despite their growing prev… (see more)alence, their capacity to consolidate knowledge from different training documents—a crucial ability in numerous applications—remains unexplored. This paper presents the first study examining the capability of LLMs to effectively combine such information within their parameter space. We introduce EpiK-Eval, a novel question-answering benchmark tailored to evaluate LLMs' proficiency in formulating a coherent and consistent knowledge representation from segmented narratives. Evaluations across various LLMs reveal significant weaknesses in this domain. We contend that these shortcomings stem from the intrinsic nature of prevailing training objectives. Consequently, we advocate for refining the approach towards knowledge consolidation, as it harbors the potential to dramatically improve their overall effectiveness and performance. The findings from this study offer insights for developing more robust and reliable LLMs. Our code and benchmark are available at https://github.com/chandar-lab/EpiK-Eval
HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science
Yu Song
Santiago Miret
Huan Zhang
Investigating the Effect of Pre-finetuning BERT Models on NLI Involving Presuppositions
Jad Kabbara
MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization
Yuyan Chen
Zhihao Wen
Ge Fan
Zhengyu Chen
Wei Wu
Dayiheng Liu
Zhixu Li
Yanghua Xiao
MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model
Le Zhang
Yihong Wu
Fengran Mo
Jian-Yun Nie
Multi-modal open-domain question answering typically requires evidence retrieval from databases across diverse modalities, such as images, t… (see more)ables, passages, etc. Even Large Language Models (LLMs) like GPT-4 fall short in this task. To enable LLMs to tackle the task in a zero-shot manner, we introduce MoqaGPT, a straightforward and flexible framework. Using a divide-and-conquer strategy that bypasses intricate multi-modality ranking, our framework can accommodate new modalities and seamlessly transition to new models for the task. Built upon LLMs, MoqaGPT retrieves and extracts answers from each modality separately, then fuses this multi-modal information using LLMs to produce a final answer. Our methodology boosts performance on the MMCoQA dataset, improving F1 by +37.91 points and EM by +34.07 points over the supervised baseline. On the MultiModalQA dataset, MoqaGPT surpasses the zero-shot baseline, improving F1 by 9.5 points and EM by 10.1 points, and significantly closes the gap with supervised methods. Our codebase is available at https://github.com/lezhang7/MOQAGPT.
RainProof: An Umbrella to Shield Text Generator from Out-Of-Distribution Data
Maxime Darrin
Pierre Colombo
Implementing effective control mechanisms to ensure the proper functioning and security of deployed NLP models, from translation to chatbots… (see more), is essential. A key ingredient to ensure safe system behaviour is Out-Of-Distribution (OOD) detection, which aims to detect whether an input sample is statistically far from the training distribution. Although OOD detection is a widely covered topic in classification tasks, most methods rely on hidden features output by the encoder. In this work, we focus on leveraging soft-probabilities in a black-box framework, i.e. we can access the soft-predictions but not the internal states of the model. Our contributions include: (i) RAINPROOF a Relative informAItioN Projection OOD detection framework; and (ii) a more operational evaluation setting for OOD detection. Surprisingly, we find that OOD detection is not necessarily aligned with task-specific measures. The OOD detector may filter out samples well processed by the model and keep samples that are not, leading to weaker performance. Our results show that RAINPROOF provides OOD detection methods more aligned with task-specific performance metrics than traditional OOD detectors.