Publications

Are LLMs Robust for Spoken Dialogues?
Seyed Mahed Mousavi
Gabriel Roccabruna
Simone Alghisi
Massimo Rizzoli
Giuseppe Riccardi
Large Pre-Trained Language Models have demonstrated state-of-the-art performance in different downstream tasks, including dialogue state tra… (see more)cking and end-to-end response generation. Nevertheless, most of the publicly available datasets and benchmarks on task-oriented dialogues focus on written conversations. Consequently, the robustness of the developed models to spoken interactions is unknown. In this work, we have evaluated the performance of LLMs for spoken task-oriented dialogues on the DSTC11 test sets. Due to the lack of proper spoken dialogue datasets, we have automatically transcribed a development set of spoken dialogues with a state-of-the-art ASR engine. We have characterized the ASR-error types and their distributions and simulated these errors in a large dataset of dialogues. We report the intrinsic (perplexity) and extrinsic (human evaluation) performance of fine-tuned GPT-2 and T5 models in two subtasks of response generation and dialogue state tracking, respectively. The results show that LLMs are not robust to spoken noise by default, however, fine-tuning/training such models on a proper dataset of spoken TODs can result in a more robust performance.
A primer on the use of machine learning to distil knowledge from data in biological psychiatry.
Thomas P. Quinn
Jonathan L. Hess
Victoria S. Marshe
Michelle M. Barnett
Anne-Christin Hauschild
Malgorzata Maciukiewicz
Samar S. M. Elsheikh
Xiaoyu Men
Emanuel Schwarz
Michael S. Breen
Eric J. Barnett
Yanli Zhang-James
Mehmet Eren Ahsen
Han Cao
Junfang Chen
Jiahui Hou
Asif Salekin
Ping-I Lin
Kristin K. Nicodemus … (see 7 more)
Andreas Meyer-Lindenberg
Isabelle Bichindaritz
Stephen V. Faraone
Murray J. Cairns
Gaurav Pandey
Daniel J. Müller
Stephen J. Glatt
AITA: AI trustworthiness assessment
Bertrand Braunschweig
Stefan Buijsman
Faicel Chamroukhi
Fredrik Heintz
Juliette Mattioli
Maximilian Poretschkin
Asymmetry in the complexity of the multi-commodity network pricing problem
Quang Minh Bui
Jos'e Neto
A Column Generation Scheme for Distributionally Robust Multi-Item Newsvendor Problems
Shanshan Wang
This paper studies a distributionally robust multi-item newsvendor problem, where the demand distribution is unknown but specified with a ge… (see more)neral event-wise ambiguity set. Using the event-wise affine decision rules, we can obtain a conservative approximation formulation of the problem, which can typically be further reformulated as a linear program. In order to efficiently solve the resulting large-scale linear program, we develop a column generation-based decomposition scheme and speed up the computational efficiency by exploiting a special column selection strategy and stopping early based on a Karush-Kuhn-Tucker condition test. Focusing on the Wasserstein ambiguity set and the event-wise mean absolute deviation set, a computational study demonstrates both the computational efficiency of the proposed algorithm, which significantly outperforms a commercial solver and a Benders decomposition method, and the out-of-sample superiority of distributionally robust solutions relative to their sample average approximation counterparts. History: Accepted by Nicola Secomandi, Area Editor for Stochastic Models & Reinforcement Learning. Funding: This work was supported by the Natural Sciences and Engineering Research Council of Canada [492997-2016, RGPIN-2016-05208], the National Natural Science Foundation of China [71972012], Alliance de recherche numérique du Canada, and Canada Research Chairs [CRC-2018-00105]. It was also supported by Groupe d’études et de recherche en analyse des décisions (GERAD). Finally, this research was enabled in part by support provided by Digital Research Alliance of Canada ( https://alliancecan.ca/en ). Supplemental Material: The software that supports the findings of this study is available within the paper and its supplemental information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0010 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2022.0010 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .
Dataset Difficulty and the Role of Inductive Bias
Devin Kwok
Nikhil Anand
Jonathan Frankle
Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examp… (see more)les within a dataset. These methods, which we call"example difficulty scores", are typically used to rank or categorize examples, but the consistency of rankings between different training runs, scoring methods, and model architectures is generally unknown. To determine how example rankings vary due to these random and controlled effects, we systematically compare different formulations of scores over a range of runs and model architectures. We find that scores largely share the following traits: they are noisy over individual runs of a model, strongly correlated with a single notion of difficulty, and reveal examples that range from being highly sensitive to insensitive to the inductive biases of certain model architectures. Drawing from statistical genetics, we develop a simple method for fingerprinting model architectures using a few sensitive examples. These findings guide practitioners in maximizing the consistency of their scores (e.g. by choosing appropriate scoring methods, number of runs, and subsets of examples), and establishes comprehensive baselines for evaluating scores in the future.
GPS-SSL: Guided Positive Sampling to Inject Prior Into Self-Supervised Learning
Aarash Feizi
Randall Balestriero
Arantxa Casanova
We propose Guided Positive Sampling Self-Supervised Learning (GPS-SSL), a general method to inject a priori knowledge into Self-Supervised L… (see more)earning (SSL) positive samples selection. Current SSL methods leverage Data-Augmentations (DA) for generating positive samples and incorporate prior knowledge - an incorrect, or too weak DA will drastically reduce the quality of the learned representation. GPS-SSL proposes instead to design a metric space where Euclidean distances become a meaningful proxy for semantic relationship. In that space, it is now possible to generate positive samples from nearest neighbor sampling. Any prior knowledge can now be embedded into that metric space independently from the employed DA. From its simplicity, GPS-SSL is applicable to any SSL method, e.g. SimCLR or BYOL. A key benefit of GPS-SSL is in reducing the pressure in tailoring strong DAs. For example GPS-SSL reaches 85.58% on Cifar10 with weak DA while the baseline only reaches 37.51%. We therefore move a step forward towards the goal of making SSL less reliant on DA. We also show that even when using strong DAs, GPS-SSL outperforms the baselines on under-studied domains. We evaluate GPS-SSL along with multiple baseline SSL methods on numerous downstream datasets from different domains when the models use strong or minimal data augmentations. We hope that GPS-SSL will open new avenues in studying how to inject a priori knowledge into SSL in a principled manner.
Recovering Dantzig–Wolfe Bounds by Cutting Planes
Rui Chen
Oktay Günlük
Leveraging Dantzig–Wolfe Decomposition in the Original Variable Space for Mixed-Integer Programming Dantzig–Wolfe decomposition has been… (see more) extensively applied to solve large-scale mixed-integer programs with decomposable structures, leading to exact solution approaches, such as branch and price. However, these approaches would require solving the problem in an extended variable space and are not readily present in off-the-shelf solvers. In “Recovering Dantzig–Wolfe Bounds by Cutting Planes,” Chen, Günlük, and Lodi propose a computational effective approach for generating cutting planes from Dantzig–Wolfe decomposition to enhance branch and cut in the space of original variables. The proposed approach requires a relatively small number of cutting planes to recover the strength of the Dantzig–Wolfe dual bound and should be easy to implement in general-purpose mixed-integer programming solvers. The authors show that these cutting planes typically lead to a formulation with lower dual degeneracy and hence, a better computational performance than naïve approaches, such as the objective function cut.
SCIseg: Automatic Segmentation of T2-weighted Hyperintense Lesions in Spinal Cord Injury
Enamundram Naga Karthik
Jan Valošek
Andrew C. Smith
Dario Pfyffer
Simon Schading-Sassenhausen
Lynn Farner
Kenneth A. Weber
Patrick Freund
Background: Quantitative MRI biomarkers in spinal cord injury (SCI) can help understand the extent of the focal injury. However, due to the … (see more)lack of automatic segmentation methods, these biomarkers are derived manually, which is a time-consuming process prone to intra- and inter-rater variability, thus limiting large multi-site studies and translation to clinical workflows. Purpose: To develop a deep learning tool for the automatic segmentation of T2-weighted hyperintense lesions and the spinal cord in SCI patients. Material and Methods: This retrospective study included a cohort of SCI patients from three sites enrolled between July 2002 and February 2023 who underwent clinical MRI examination. A deep learning model, SCIseg, was trained on T2-weighted images with heterogeneous image resolutions (isotropic, anisotropic), and orientations (axial, sagittal) acquired using scanners from different manufacturers (Siemens, Philips, GE) and different field strengths (1T, 1.5T, 3T) for the automatic segmentation of SCI lesions and the spinal cord. The proposed method was visually and quantitatively compared with other open-source baseline methods. Quantitative biomarkers (lesion volume, lesion length, and maximal axial damage ratio) computed from manual ground-truth lesion masks and automatic SCIseg predictions were correlated with clinical scores (pinprick, light touch, and lower extremity motor scores). A between-group comparison was performed using the Wilcoxon signed-rank test. Results: MRI data from 191 SCI patients (mean age, 48.1 years {+/-} 17.9 [SD]; 142 males) were used for training. Compared to existing methods, SCIseg achieved the best segmentation performance for both the cord and lesions and generalized well to both traumatic and non-traumatic SCI patients. SCIseg is open-source and accessible through the Spinal Cord Toolbox. Conclusion: Automatic segmentation of intramedullary lesions commonly seen in traumatic SCI replaces the tedious manual annotation process and enables the extraction of relevant lesion morphometrics in large cohorts. The proposed model generalizes across lesion etiologies (traumatic, ischemic), scanner manufacturers and heterogeneous image resolutions.
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity
Melissa Hall
Candace Ross
Adina Williams
Nicolas Carion
Michal Drozdzal
The unprecedented photorealistic results achieved by recent text-to-image generative systems and their increasing use as plug-and-play conte… (see more)nt creation solutions make it crucial to understand their potential biases. In this work, we introduce three indicators to evaluate the realism, diversity and prompt-generation consistency of text-to-image generative systems when prompted to generate objects from across the world. Our indicators complement qualitative analysis of the broader impact of such systems by enabling automatic and efficient benchmarking of geographic disparities, an important step towards building responsible visual content creation systems. We use our proposed indicators to analyze potential geographic biases in state-of-the-art visual content creation systems and find that: (1) models have less realism and diversity of generations when prompting for Africa and West Asia than Europe, (2) prompting with geographic information comes at a cost to prompt-consistency and diversity of generated images, and (3) models exhibit more region-level disparities for some objects than others. Perhaps most interestingly, our indicators suggest that progress in image generation quality has come at the cost of real-world geographic representation. Our comprehensive evaluation constitutes a crucial step towards ensuring a positive experience of visual content creation for everyone. Code is available at https://github.com/facebookresearch/DIG-In/.
Influence of scanning plane on Human Spinal Cord functional Magnetic Resonance echo planar imaging
Marta Moraschi
Silvia Tommasin
Laura Maugeri
Mauro Dinuzzo
Marco Masullo
Fabio Mangini
Lorenzo Giovannelli
Daniele Mascali
Tommaso Gili
Valerio Pisani
Ugo Md Nocentini
Federico Giove
Michela Fratini
BACKGROUND: Functional Magnetic Resonance Imaging (fMRI) is based on the Blood Oxygenation Level Dependent contrast and has been exploited f… (see more)or the indirect study of the neuronal activity within both the brain and the spinal cord. However, the interpretation of spinal cord fMRI (scfMRI) is still controversial and its diffusion is rather limited because of technical limitations. Overcoming these limitations would have a beneficial effect for the assessment and follow-up of spinal injuries and neurodegenerative diseases. PURPOSE: This study was aimed at systematically verify whether sagittal scanning in scfMRI using EPI readout is a viable alternative to the more common axial scanning, and at optimizing a pipeline for EPI-based scfMRI data analysis, based on Spinal Cord Toolbox (SCT). METHODS: Forty-five healthy subjects underwent MRI acquisition in a Philips Achieva 3T MRI scanner. T2*-weighted fMRI data were acquired using a GE-EPI sequence along sagittal and axial planes during an isometric motor task. Differences on benchmarks were assessed via paired two-sample t-test at p=0.05. RESULTS: We investigated the impact of the acquisition strategy by means of various metrics such as Temporal Signal to Noise Ratio (tSNR), Dice Coefficient to assess geometric distortions, Reproducibility and Sensitivity. tSNR was higher in axial than in sagittal scans, as well as reproducibility within the whole cord mask (t=7.4, p0.01) and within the GM mask (t=4.2, p0.01). The other benchmarks, associated with distortion and functional response, showed no differenc
More than one way to skin a dose volume: the impact of dose-surface map calculation approach on study reproducibility.
Haley Patrick