Publications

GIST: Generated Inputs Sets Transferability in Deep Learning
Florian Tambon
Giuliano Antoniol
Global rewards in multi-agent deep reinforcement learning for autonomous mobility on demand systems
Heiko Hoppe
Tobias Enders
Maximilian Schiffer
We study vehicle dispatching in autonomous mobility on demand (AMoD) systems, where a central operator assigns vehicles to customer requests… (voir plus) or rejects these with the aim of maximizing its total profit. Recent approaches use multi-agent deep reinforcement learning (MADRL) to realize scalable yet performant algorithms, but train agents based on local rewards, which distorts the reward signal with respect to the system-wide profit, leading to lower performance. We therefore propose a novel global-rewards-based MADRL algorithm for vehicle dispatching in AMoD systems, which resolves so far existing goal conflicts between the trained agents and the operator by assigning rewards to agents leveraging a counterfactual baseline. Our algorithm shows statistically significant improvements across various settings on real-world data compared to state-of-the-art MADRL algorithms with local rewards. We further provide a structural analysis which shows that the utilization of global rewards can improve implicit vehicle balancing and demand forecasting abilities. An extended version of our paper, including an appendix, can be found at https://arxiv.org/abs/2312.08884. Our code is available at https://github.com/tumBAIS/GR-MADRL-AMoD.
Promoting Exploration in Memory-Augmented Adam using Critical Momenta
Pranshu Malviya
Goncalo Mordido
Aristide Baratin
Reza Babanezhad Harikandeh
Jerry Huang
Razvan Pascanu
Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of su… (voir plus)ch optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.
DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations
Nima Fathi
Amar Kumar
Brennan Nichyporuk
Mohammad Havaei
Deep learning classifiers are prone to latching onto dominant confounders present in a dataset rather than on the causal markers associated … (voir plus)with the target class, leading to poor generalization and biased predictions. Although explainability via counterfactual image generation has been successful at exposing the problem, bias mitigation strategies that permit accurate explainability in the presence of dominant and diverse artifacts remain unsolved. In this work, we propose the DeCoDEx framework and show how an external, pre-trained binary artifact detector can be leveraged during inference to guide a diffusion-based counterfactual image generator towards accurate explainability. Experiments on the CheXpert dataset, using both synthetic artifacts and real visual artifacts (support devices), show that the proposed method successfully synthesizes the counterfactual images that change the causal pathology markers associated with Pleural Effusion while preserving or ignoring the visual artifacts. Augmentation of ERM and Group-DRO classifiers with the DeCoDEx generated images substantially improves the results across underrepresented groups that are out of distribution for each class. The code is made publicly available at https://github.com/NimaFathi/DeCoDEx.
Characterizing and Classifying Developer Forum Posts with their Intentions
Xingfang Wu
Eric Laufer
Heng Li
Santhosh Srinivasan
Jayden Luo
With the rapid growth of the developer community, the amount of posts on online technical forums has been growing rapidly, which poses diffi… (voir plus)culties for users to filter useful posts and find important information. Tags provide a concise feature dimension for users to locate their interested posts and for search engines to index the most relevant posts according to the queries. However, most tags are only focused on the technical perspective (e.g., program language, platform, tool). In most cases, forum posts in online developer communities reveal the author's intentions to solve a problem, ask for advice, share information, etc. The modeling of the intentions of posts can provide an extra dimension to the current tag taxonomy. By referencing previous studies and learning from industrial perspectives, we create a refined taxonomy for the intentions of technical forum posts. Through manual labeling and analysis on a sampled post dataset extracted from online forums, we understand the relevance between the constitution of posts (code, error messages) and their intentions. Furthermore, inspired by our manual study, we design a pre-trained transformer-based model to automatically predict post intentions. The best variant of our intention prediction framework, which achieves a Micro F1-score of 0.589, Top 1-3 accuracy of 62.6% to 87.8%, and an average AUC of 0.787, outperforms the state-of-the-art baseline approach. Our characterization and automated classification of forum posts regarding their intentions may help forum maintainers or third-party tool developers improve the organization and retrieval of posts on technical forums. We have released our annotated dataset and codes in our supplementary material package.
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Jiayi Wang
Sweta Agrawal
Marek Masiak
Ricardo Rei
Eleftheria Briakou
Marine Carpuat
Xuanli He
Sofia Bourhim
Andiswa Bukula
Muhidin A. Mohamed
Temitayo Olatoye
Tosin Adewumi
Hamam Mokayed
Christine Mwase
Wangui Kimotho
Foutse Yuehgoh
Aremu Anuoluwapo
Jessica Ojo
Shamsuddeen Hassan Muhammad … (voir 41 de plus)
Salomey Osei
Abdul-Hakeem Omotayo
Chiamaka Ijeoma Chukwuneke
Perez Ogayo
Oumaima Hourrane
Salma El Anigri
Lolwethu Ndolela
Thabiso Mangwana
Shafie Abdi Mohamed
Hassan Ayinde
Ayinde Hassan
Oluwabusayo Olufunke Awoyomi
Lama Alkhaled
sana Sabah al-azzawi
Naome Etori
Millicent Ochieng
Clemencia Siro
Njoroge Kiragu
Samuel Njoroge
Eric Muchiri
Wangari Kimotho
Lyse Naomi Wamba
Daud Abolade
Simbiat Ajao
Iyanuoluwa Shode
Ricky Macharm
Ruqayya Nasir Iro
Saheed Salahudeen Abdullahi
Stephen Moore
Bernard Opoku
Zainab Akinjobi
Abeeb Afolabi
Nnaemeka Casmir Obiefuna
Onyekachi Ogbu
Sam Brian
Sam Ochieng’
Verrah Akinyi Otiende
CHINEDU EMMANUEL MBONU
Toadoum Sari Sakayo
Yao Lu
Pontus Stenetorp
Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measur… (voir plus)ing this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).
Evaluating In-Context Learning of Libraries for Code Generation
Arkil Patel
Pradeep Dasigi
Implementation of a Global Pediatric Trauma Course in an Upper Middle–Income Country: A Pilot Study
Abbie Naus
Madeleine Carroll
Ayla Gerk
David P. Mooney
Natalie L. Yanchar
Julia Ferreira
Karen E. Gripp
Caroline Ouellet
Fabio Botelho
"One-Size-Fits-All"? Examining Expectations around What Constitute"Fair"or"Good"NLG System Behaviors
Li Lucy
Su Lin Blodgett
Milad Shokouhi
Hanna Wallach
Fairness-related assumptions about what constitute appropriate NLG system behaviors range from invariance, where systems are expected to beh… (voir plus)ave identically for social groups, to adaptation, where behaviors should instead vary across them. To illuminate tensions around invariance and adaptation, we conduct five case studies, in which we perturb different types of identity-related language features (names, roles, locations, dialect, and style) in NLG system inputs. Through these cases studies, we examine people's expectations of system behaviors, and surface potential caveats of these contrasting yet commonly held assumptions. We find that motivations for adaptation include social norms, cultural differences, feature-specific information, and accommodation; in contrast, motivations for invariance include perspectives that favor prescriptivism, view adaptation as unnecessary or too difficult for NLG systems to do appropriately, and are wary of false assumptions. Our findings highlight open challenges around what constitute"fair"or"good"NLG system behaviors.
How well do models of visual cortex generalize to out of distribution samples?
Yifei Ren
On the Costs and Benefits of Adopting Lifelong Learning for Software Analytics -- Empirical Study on Brown Build and Risk Prediction
Doriane Olewicki
Sarra Habchi
Mathieu Nayrolles
Mojtaba Faramarzi
Bram Adams
Nowadays, software analytics tools using machine learning (ML) models to, for example, predict the risk of a code change are well establishe… (voir plus)d. However, as the goals of a project shift over time, and developers and their habits change, the performance of said models tends to degrade (drift) over time. Current retraining practices typically require retraining a new model from scratch on a large updated dataset when performance decay is observed, thus incurring a computational cost; also there is no continuity between the models as the past model is discarded and ignored during the new model training. Even though the literature has taken interest in online learning approaches, those have rarely been integrated and evaluated in industrial environments. This paper evaluates the use of lifelong learning (LL) for industrial use cases at Ubisoft, evaluating both the performance and the required computational effort in comparison to the retraining-from-scratch approaches commonly used by the industry. LL is used to continuously build and maintain ML-based software analytics tools using an incremental learner that progressively updates the old model using new data. To avoid so-called"catastrophic forgetting"of important older data points, we adopt a replay buffer of older data, which still allows us to drastically reduce the size of the overall training dataset, and hence model training time.
Structured Learning in Time-dependent Cox Models
Guanbo Wang
Yi Lian
Robert W. Platt
Rui Wang
Sylvie Perreault
Marc Dorais
Mireille E. Schnitzer