Publications

AmbieGen tool at the SBST 2022 Tool Competition

Dmytro Humeniuk

Giuliano Antoniol

AmbieGen is a tool for generating test cases for cyber-physical systems (CPS). In the context of SBST 2022 CPS tool competition, it has been… (see more) adapted to generating virtual roads to test a car lane keeping assist system. AmbieGen leverages a two objective NSGA-II algorithm to produce the test cases. It has achieved the highest final score, accounting for the test case efficiency, effectiveness and diversity in both testing configurations.

2022-05-01

International Workshop on Search-Based Software Testing (published)

Challenges in Machine Learning Application Development: An Industrial Experience Report

Md. Saidur Rahman

Emilio Martínez Rivera

Yann‐Gaël Guéhéneuc

Bernd Lehnert

SAP is the market leader in enterprise application software offering an end-to-end suite of applications and services to enable their custom… (see more)ers worldwide to operate their business. Especially, retail customers of SAP deal with millions of sales transactions for their day-to-day business. Transactions are created during retail sales at the point of sale (POS) terminals and those transactions are then sent to some central servers for validations and other business operations. A considerable proportion of the retail transactions may have inconsistencies or anomalies due to many technical and human errors. SAP provides an automated process for error detection but still requires a manual process by dedicated employees using workbench software for correction. However, manual corrections of these errors are time-consuming, labor-intensive, and might be prone to further errors due to incorrect modifications. Thus, automated detection and correction of transaction errors are very important regarding their potential business values and the improvement in the business workflow. In this paper, we report on our experience from a project where we develop an AI-based system to automatically detect transaction errors and propose corrections. We identify and discuss the challenges that we faced during this collaborative research and development project, from two distinct perspectives: Software Engineering and Machine Learning. We report on our experience and insights from the project with guidelines for the identified challenges. We collect developers’ feedback for qualitative analysis of our findings. We believe that our findings and recommendations can help other researchers and practitioners embarking into similar endeavours. CCS CONCEPTS • Software and its engineering → Programming teams.

2022-05-01

2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI) (published)

Challenges in Machine Learning Application Development: An Industrial Experience Report

Md Saidur Rahman

Emilio Rivera

Yann‐Gaël Guéhéneuc

Bernd Lehnert

2022-05-01

2022 IEEE/ACM 1st International Workshop on Software Engineering for Responsible Artificial Intelligence (SE4RAI) (published)

Characterizing Idioms: Conventionality and Contingency

Michaela Socolof

Jackie Cheung

Michael Wagner

Timothy O'Donnell

Idioms are unlike most phrases in two important ways. First, words in an idiom have non-canonical meanings. Second, the non-canonical meanin… (see more)gs of words in an idiom are contingent on the presence of other words in the idiom. Linguistic theories differ on whether these properties depend on one another, as well as whether special theoretical machinery is needed to accommodate idioms. We define two measures that correspond to the properties above, and we show that idioms fall at the expected intersection of the two dimensions, but that the dimensions themselves are not correlated. Our results suggest that introducing special machinery to handle idioms may not be warranted.

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

Compositional Generalization in Dependency Parsing

Emily D. Goodwin

Siva Reddy

Timothy O'Donnell

Dzmitry Bahdanau

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models

Nicholas Meade

Elinor Poole-Dayan

Siva Reddy

Recent work has shown pre-trained language models capture social biases from the large amounts of text they are trained on. This has attract… (see more)ed attention to developing techniques that mitigate such biases. In this work, we perform an empirical survey of five recently proposed bias mitigation techniques: Counterfactual Data Augmentation (CDA), Dropout, Iterative Nullspace Projection, Self-Debias, and SentenceDebias. We quantify the effectiveness of each technique using three intrinsic bias benchmarks while also measuring the impact of these techniques on a model’s language modeling ability, as well as its performance on downstream NLU tasks. We experimentally find that: (1) Self-Debias is the strongest debiasing technique, obtaining improved scores on all bias benchmarks; (2) Current debiasing techniques perform less consistently when mitigating non-gender biases; And (3) improvements on bias benchmarks such as StereoSet and CrowS-Pairs by using debiasing strategies are often accompanied by a decrease in language modeling ability, making it difficult to determine whether the bias mitigation was effective.

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

Hallucinated but Factual! Inspecting the Factuality of Hallucinations in Abstractive Summarization

Meng Cao

Yue Dong

Jackie Cheung

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy

Mira Marhaba

Ettore Merlo

Giuliano Antoniol

Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (see more)alibration

2022-05-01

2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) (published)

Identification of Out-of-Distribution Cases of CNN using Class-Based Surprise Adequacy

Mira Marhaba

Ettore Merlo

Giuliano Antoniol

Machine learning is vulnerable to possible incorrect classification of cases that are out of the distribution observed during training and c… (see more)alibration

2022-05-01

2022 IEEE/ACM 1st International Conference on AI Engineering – Software Engineering for AI (CAIN) (published)

Image Retrieval from Contextual Descriptions

Vibhav Vineet

Edoardo Ponti

The ability to integrate context, including perceptual and temporal cues, plays a pivotal role in grounding the meaning of a linguistic utte… (see more)rance. In order to measure to what extent current vision-and-language models master this ability, we devise a new multimodal challenge, Image Retrieval from Contextual Descriptions (ImageCoDe). In particular, models are tasked with retrieving the correct image from a set of 10 minimally contrastive candidates based on a contextual description.As such, each description contains only the details that help distinguish between images.Because of this, descriptions tend to be complex in terms of syntax and discourse and require drawing pragmatic inferences. Images are sourced from both static pictures and video frames.We benchmark several state-of-the-art models, including both cross-encoders such as ViLBERT and bi-encoders such as CLIP, on ImageCoDe.Our results reveal that these models dramatically lag behind human performance: the best variant achieves an accuracy of 20.9 on video frames and 59.4 on static pictures, compared with 90.8 in humans.Furthermore, we experiment with new model variants that are better equipped to incorporate visual and temporal context into their representations, which achieve modest gains. Our hope is that ImageCoDE will foster progress in grounded language understanding by encouraging models to focus on fine-grained visual differences.

2022-05-01

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (published)

Local Structure Matters Most: Perturbation Study in NLU

Louis Clouâtre

Prasanna Parthasarathi

Amal Zouaq

Sarath Chandar

Recent research analyzing the sensitivity of natural language understanding models to word-order perturbations has shown that neural models … (see more)are surprisingly insensitive to the order of words.In this paper, we investigate this phenomenon by developing order-altering perturbations on the order of words, subwords, and characters to analyze their effect on neural models’ performance on language understanding tasks.We experiment with measuring the impact of perturbations to the local neighborhood of characters and global position of characters in the perturbed texts and observe that perturbation functions found in prior literature only affect the global ordering while the local ordering remains relatively unperturbed.We empirically show that neural models, invariant of their inductive biases, pretraining scheme, or the choice of tokenization, mostly rely on the local structure of text to build understanding and make limited use of the global structure.

2022-05-01

Findings of the Association for Computational Linguistics: ACL 2022 (published)