Publications

Population variation in social brain morphology: Links to socioeconomic status and health disparity

Nathania Suryoputri

Hannah Kiesow

ABSTRACT Health disparity across layers of society involves reasons beyond the healthcare system. Socioeconomic status (SES) shapes people… (voir plus)s daily interaction with their social environment and is known to impact various health outcomes. Using generative probabilistic modeling, we investigate health satisfaction and complementary indicators of socioeconomic lifestyle in the human social brain. In a population cohort of ~10,000 UK Biobank participants, our first analysis probed the relationship between health status and subjective social standing (i.e., financial satisfaction). We identified volume effects in participants unhappy with their health in regions of the higher associative cortex, especially the dorsomedial prefrontal cortex (dmPFC) and bilateral temporo-parietal junction (TPJ). Specifically, participants in poor subjective health showed deviations in dmPFC and TPJ volume as a function of financial satisfaction. The second analysis on health status and objective social standing (i.e., household income) revealed volume deviations in regions of the limbic system for individuals feeling unhealthy. In particular, low-SES participants dissatisfied with their health showed deviations in volume distributions in the amygdala and hippocampus bilaterally. Thus, our population-level evidence speaks to the possibility that health status and socioeconomic position have characteristic imprints in social brain differentiation.

2022-05-03

Social Neuroscience (inconnu)

doi.org

Amortized Rejection Sampling in Universal Probabilistic Programming

Saeid Naderiparizi

Adam Ścibior

Andreas Munk

Mehrdad Ghadiri

Atilim Güneş Baydin

Bradley Gram-Hansen

C. S. D. Witt

Robert Zinkov

Philip Torr

Tom Rainforth

Yee Whye Teh

Frank N. Wood

Existing approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. An … (voir plus)instance of this is importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method's correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework.

2022-05-02

Proceedings of The 25th International Conference on Artificial Intelligence and Statistics (publié)

proceedings.mlr.press

Capacity Variation in the Many-to-one Stable Matching

Federico Bobbio

Margarida Carvalho

Andrea Lodi

Alfredo Torrico

2022-05-02

ArXiv (prépublication)

doi.org

arxiv.org

Deep Learning Prediction of Response to Disease Modifying Therapy in Primary Progressive Multiple Sclerosis (P1-1.Virtual)

Jean-Pierre René Falet

Joshua Durso-finley

Brennan Nichyporuk

Julien Schroeter

Francesca Bovis

Maria-Pia Sormani

Doina Precup

Tal Arbel

Douglas Arnold

2022-05-02

Neurology (inconnu)

doi.org

Fast Continuous and Integer L-shaped Heuristics Through Supervised Learning

Eric Larsen

Emma Frejinger

Bernard Gendron

Andrea Lodi

We propose a methodology at the nexus of operations research and machine learning (ML) leveraging generic approximators available from ML to… (voir plus) accelerate the solution of mixed-integer linear two-stage stochastic programs. We aim at solving problems where the second stage is highly demanding. Our core idea is to gain large reductions in online solution time while incurring small reductions in first-stage solution accuracy by substituting the exact second-stage solutions with fast, yet accurate supervised ML predictions. This upfront investment in ML would be justified when similar problems are solved repeatedly over time, for example, in transport planning related to fleet management, routing and container yard management. Our numerical results focus on the problem class seminally addressed with the integer and continuous L-shaped cuts. Our extensive empirical analysis is grounded in standardized families of problems derived from stochastic server location (SSLP) and stochastic multi knapsack (SMKP) problems available in the literature. The proposed method can solve the hardest instances of SSLP in less than 9% of the time it takes the state-of-the-art exact method, and in the case of SMKP the same figure is 20%. Average optimality gaps are in most cases less than 0.1%.

2022-05-01

ArXiv (prépublication)

doi.org

arxiv.org

Challenges in machine learning application development

Md Saidur Rahman

Foutse Khomh

Emilio Rivera

Yann-Gaël Guéhéneuc

Bernd Lehnert

SAP is the market leader in enterprise application software offering an end-to-end suite of applications and services to enable their custom… (voir plus)ers worldwide to operate their business. Especially, retail customers of SAP deal with millions of sales transactions for their day-to-day business. Transactions are created during retail sales at the point of sale (POS) terminals and those transactions are then sent to some central servers for validations and other business operations. A considerable proportion of the retail transactions may have inconsistencies or anomalies due to many technical and human errors. SAP provides an automated process for error detection but still requires a manual process by dedicated employees using workbench software for correction. However, manual corrections of these errors are time-consuming, labor-intensive, and might be prone to further errors due to incorrect modifications. Thus, automated detection and correction of transaction errors are very important regarding their potential business values and the improvement in the business workflow. In this paper, we report on our experience from a project where we develop an AI-based system to automatically detect transaction errors and propose corrections. We identify and discuss the challenges that we faced during this collaborative research and development project, from two distinct perspectives: Software Engineering and Machine Learning. We report on our experience and insights from the project with guidelines for the identified challenges. We collect developers’ feedback for qualitative analysis of our findings. We believe that our findings and recommendations can help other researchers and practitioners embarking into similar endeavours. CCS CONCEPTS • Software and its engineering → Programming teams.

2022-04-30

2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI) (publié)

doi.org

Characterizing Idioms: Conventionality and Contingency

Michaela Socolof

Jackie Chi Kit Cheung

Michael Wagner

Timothy J. O'Donnell

Idioms are unlike most phrases in two important ways. First, the words in an idiom have non-canonical meanings. Second, the non-canonical me… (voir plus)anings of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we implement them using BERT (Devlin et al., 2019) and XLNet(Yang et al., 2019). We show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that special machinery to handle idioms may not be warranted.

2022-04-30

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

doi.org

arxiv.org

Compositional Generalization in Dependency Parsing

Emily Goodwin

Siva Reddy

Timothy J. O'Donnell

Dzmitry Bahdanau

Compositionality -- the ability to combine familiar units like words into novel phrases and sentences -- has been the focus of intense inter… (voir plus)est in artificial intelligence in recent years. To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ). This dataset maximizes the similarity between the test and train distributions over primitive units, like words, while maximizing the compound divergence: the dissimilarity between test and train distributions over larger structures, like phrases. Dependency parsing, however, lacks a compositional generalization benchmark. In this work, we introduce a gold-standard set of dependency parses for CFQ, and use this to analyze the behavior of a state-of-the art dependency parser (Qi et al., 2020) on the CFQ dataset. We find that increasing compound divergence degrades dependency parsing performance, although not as dramatically as semantic parsing performance. Additionally, we find the performance of the dependency parser does not uniformly degrade relative to compound divergence, and the parser performs differently on different splits with the same compound divergence. We explore a number of hypotheses for what causes the non-uniform degradation in dependency parsing performance, and identify a number of syntactic structures that drive the dependency parser's lower performance on the most challenging splits.

2022-04-30

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

doi.org

arxiv.org

Elucidating Transcriptional Networks Driving Idiopathic Pulmonary Fibrosis

O. Lapohos

M. El-Hajjar

E.S. Charette

A. Emad

G. Fonseca

2022-04-30

Varied Omics Techniques Applied to Allergic and Respiratory Traits (publié)

doi.org

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models

Nicholas Meade

Elinor Poole-Dayan

Siva Reddy

Recent work has shown pre-trained language models capture social biases from the large amounts of text they are trained on. This has attract… (voir plus)ed attention to developing techniques that mitigate such biases. In this work, we perform an empirical survey of five recently proposed bias mitigation techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three intrinsic bias benchmarks while also measuring the impact of these techniques on a model’s language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) Self-Debias is the strongest debiasing technique, obtaining improved scores on all bias benchmarks; (2) Current debiasing techniques perform less consistently when mitigating non-gender biases; And (3) improvements on bias benchmarks such as StereoSet and CrowS-Pairs by using debiasing strategies are often accompanied by a decrease in language modeling ability, making it difficult to determine whether the bias mitigation was effective.

2022-04-30

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

doi.org

arxiv.org

Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization

Meng Cao

Yue Dong

Jackie Chi Kit Cheung

State-of-the-art abstractive summarization systems often generate hallucinations; i.e., content that is not directly inferable from the sour… (voir plus)ce text. Despite being assumed to be incorrect, we find that much hallucinated content is actually consistent with world knowledge, which we call factual hallucinations. Including these factual hallucinations in a summary can be beneficial because they provide useful background information. In this work, we propose a novel detection approach that separates factual from non-factual hallucinations of entities. Our method is based on an entity’s prior and posterior probabilities according to pre-trained and finetuned masked language models, respectively. Empirical results suggest that our method vastly outperforms two baselines in both accuracy and F1 scores and has a strong correlation with human judgments on factuality classification tasks.Furthermore, we use our method as a reward signal to train a summarization system using an off-line reinforcement learning (RL) algorithm that can significantly improve the factuality of generated summaries while maintaining the level of abstractiveness.

2022-04-30

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (publié)

doi.org

arxiv.org

Identification of out-of-distribution cases of CNN using class-based surprise adequacy

Mira Marhaba

Ettore Merlo

Foutse Khomh

Giuliano Antoniol

Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (voir plus)alibration

2022-04-30

2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) (publié)

doi.org

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Publications

Mila sur Udemy

Publications du Fellowship en politiques de l'IA

La plateforme Mila Ventures

Mots-clés populaires:

Publications