Publications

WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series
Jean-Christophe Gagnon-Audet
Kartik Ahuja
Mohammad Javad Darvishi Bayazi
Pooneh Mousavi
Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing
Arghavan Moradi Dakhel
Amin Nikanjam
Vahid Majdinasab
Michel C. Desmarais
One of the critical phases in software development is software testing. Testing helps with identifying potential bugs and reducing maintenan… (see more)ce costs. The goal of automated test generation tools is to ease the development of tests by suggesting efficient bug-revealing tests. Recently, researchers have leveraged Large Language Models (LLMs) of code to generate unit tests. While the code coverage of generated tests was usually assessed, the literature has acknowledged that the coverage is weakly correlated with the efficiency of tests in bug detection. To improve over this limitation, in this paper, we introduce MuTAP for improving the effectiveness of test cases generated by LLMs in terms of revealing bugs by leveraging mutation testing. Our goal is achieved by augmenting prompts with surviving mutants, as those mutants highlight the limitations of test cases in detecting bugs. MuTAP is capable of generating effective test cases in the absence of natural language descriptions of the Program Under Test (PUTs). We employ different LLMs within MuTAP and evaluate their performance on different benchmarks. Our results show that our proposed method is able to detect up to 28% more faulty human-written code snippets. Among these, 17% remained undetected by both the current state-of-the-art fully automated test generation tool (i.e., Pynguin) and zero-shot/few-shot learning approaches on LLMs. Furthermore, MuTAP achieves a Mutation Score (MS) of 93.57% on synthetic buggy code, outperforming all other approaches in our evaluation. Our findings suggest that although LLMs can serve as a useful tool to generate test cases, they require specific post-processing steps to enhance the effectiveness of the generated test cases which may suffer from syntactic or functional errors and may be ineffective in detecting certain types of bugs and testing corner cases PUTs.
Learning Lyapunov-Stable Polynomial Dynamical Systems Through Imitation
Amin Abyaneh
Imitation learning is a paradigm to address complex motion planning problems by learning a policy to imitate an expert's behavior. However, … (see more)relying solely on the expert's data might lead to unsafe actions when the robot deviates from the demonstrated trajectories. Stability guarantees have previously been provided utilizing nonlinear dynamical systems, acting as high-level motion planners, in conjunction with the Lyapunov stability theorem. Yet, these methods are prone to inaccurate policies, high computational cost, sample inefficiency, or quasi stability when replicating complex and highly nonlinear trajectories. To mitigate this problem, we present an approach for learning a globally stable nonlinear dynamical system as a motion planning policy. We model the nonlinear dynamical system as a parametric polynomial and learn the polynomial's coefficients jointly with a Lyapunov candidate. To showcase its success, we compare our method against the state of the art in simulation and conduct real-world experiments with the Kinova Gen3 Lite manipulator arm. Our experiments demonstrate the sample efficiency and reproduction accuracy of our method for various expert trajectories, while remaining stable in the face of perturbations.
Beyond the ML Model: Applying Safety Engineering Frameworks to Text-to-Image Development
Shalaleh Rismani
Renee Shelby
Andrew J Smart
Renelito Delos Santos
Identifying potential social and ethical risks in emerging machine learning (ML) models and their applications remains challenging. In this … (see more)work, we applied two well-established safety engineering frameworks (FMEA, STPA) to a case study involving text-to-image models at three stages of the ML product development pipeline: data processing, integration of a T2I model with other models, and use. Results of our analysis demonstrate the safety frameworks – both of which are not designed explicitly examine social and ethical risks – can uncover failure and hazards that pose social and ethical risks. We discovered a broad range of failures and hazards (i.e., functional, social, and ethical) by analyzing interactions (i.e., between different ML models in the product, between the ML product and user, and between development teams) and processes (i.e., preparation of training data or workflows for using an ML service/product). Our findings underscore the value and importance of examining beyond an ML model in examining social and ethical risks, especially when we have minimal information about an ML model.
Policy composition in reinforcement learning via multi-objective policy optimization
Shruti Mishra
Ankit Anand
Jordan Hoffmann
Nicolas Heess
Martin A. Riedmiller
Abbas Abdolmaleki
We enable reinforcement learning agents to learn successful behavior policies by utilizing relevant pre-existing teacher policies. The teach… (see more)er policies are introduced as objectives, in addition to the task objective, in a multi-objective policy optimization setting. Using the Multi-Objective Maximum a Posteriori Policy Optimization algorithm (Abdolmaleki et al. 2020), we show that teacher policies can help speed up learning, particularly in the absence of shaping rewards. In two domains with continuous observation and action spaces, our agents successfully compose teacher policies in sequence and in parallel, and are also able to further extend the policies of the teachers in order to solve the task. Depending on the specified combination of task and teacher(s), teacher(s) may naturally act to limit the final performance of an agent. The extent to which agents are required to adhere to teacher policies are determined by hyperparameters which determine both the effect of teachers on learning speed and the eventual performance of the agent on the task. In the humanoid domain (Tassa et al. 2018), we also equip agents with the ability to control the selection of teachers. With this ability, agents are able to meaningfully compose from the teacher policies to achieve a superior task reward on the walk task than in cases without access to the teacher policies. We show the resemblance of composed task policies with the corresponding teacher policies through videos.
Sociotechnical Harms of Algorithmic Systems: Scoping a Taxonomy for Harm Reduction
Renee Shelby
Shalaleh Rismani
Kathryn Henne
Paul Nicholas
N'Mah Yilla-Akbari
Jess Gallegos
Andrew J Smart
Emilio Garcia
Gurleen Virk
What does it mean to be a responsible AI practitioner: An ontology of roles and skills
Shalaleh Rismani
With the growing need to regulate AI systems across a wide variety of application domains, a new set of occupations has emerged in the indus… (see more)try. The so-called responsible Artificial Intelligence (AI) practitioners or AI ethicists are generally tasked with interpreting and operationalizing best practices for ethical and safe design of AI systems. Due to the nascent nature of these roles, however, it is unclear to future employers and aspiring AI ethicists what specific function these roles serve and what skills are necessary to serve the functions. Without clarity on these, we cannot train future AI ethicists with meaningful learning objectives. In this work, we examine what responsible AI practitioners do in the industry and what skills they employ on the job. We propose an ontology of existing roles alongside skills and competencies that serve each role. We created this ontology by examining the job postings for such roles over a two-year period (2020-2022) and conducting expert interviews with fourteen individuals who currently hold such a role in the industry. Our ontology contributes to business leaders looking to build responsible AI teams and provides educators with a set of competencies that an AI ethics curriculum can prioritize.
Beyond performance: the role of task demand, effort, and individual differences in ab initio pilots
Mohammad-Javad Darvishi-Bayazi
Andrew Law
Sergio Mejia Romero
Sion Jennings
Jocelyn Faubert
From Assistive Devices to Manufacturing Cobot Swarms
Monica Li
Bruno Belzile
Ali Imran
Lionel Birglen
David St-Onge
This paper provides an overview of the latest trends in robotics research and development, with a particular focus on applications in manufa… (see more)cturing and industrial settings. We highlight recent advances in robot design, including cutting-edge collaborative robot mechanics and advanced safety features, as well as exciting developments in perception and human-swarm interaction. By examining recent contributions from Kinova, a leading robotics company, we illustrate the differences between industry and academia in their approaches to developing innovative robotic systems and technologies that enhance productivity and safety in the workplace. Ultimately, this paper demonstrates the tremendous potential of robotics to revolutionize manufacturing and industrial operations, and underscores the crucial role of companies like Kinova in driving this transformation forward.
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
Slim Essid
Efficient Epistemic Uncertainty Estimation in Regression Ensemble Models Using Pairwise-Distance Estimators
Lucas Berry
This work introduces an efficient novel approach for epistemic uncertainty estimation for ensemble models for regression tasks using pairwis… (see more)e-distance estimators (PaiDEs). Utilizing the pairwise-distance between model components, these estimators establish bounds on entropy. We leverage this capability to enhance the performance of Bayesian Active Learning by Disagreement (BALD). Notably, unlike sample-based Monte Carlo estimators, PaiDEs exhibit a remarkable capability to estimate epistemic uncertainty at speeds up to 100 times faster while covering a significantly larger number of inputs at once and demonstrating superior performance in higher dimensions. To validate our approach, we conducted a varied series of regression experiments on commonly used benchmarks: 1D sinusoidal data,
Party Prediction for Twitter
Kellin Pelrine
Anne Imouza
Zachary Yang
Jacob-Junqi Tian
Sacha Lévy
Gabrielle Desrosiers-Brisebois
Aarash Feizi
C'ecile Amadoro
André Blais
Jean-François Godbout