Samira Ebrahimi Kahou

Abstract The past decade has observed a significant advancement in AI, with deep learning‐based models being deployed in diverse scenarios… (voir plus), including safety‐critical applications. As these AI systems become deeply embedded in our societal infrastructure, the repercussions of their decisions and actions have significant consequences, making the ethical implications of AI deployment highly relevant and essential. The ethical concerns associated with AI are multifaceted, including challenging issues of fairness, privacy and data protection, responsibility and accountability, safety and robustness, transparency and explainability, and environmental impact. These principles together form the foundations of ethical AI considerations that concern every stakeholder in the AI system lifecycle. In light of the present ethical and future x‐risk concerns, governments have shown increasing interest in establishing guidelines for the ethical deployment of AI. This work unifies the current and future ethical concerns of deploying AI into society. While we acknowledge and appreciate the technical surveys for each of the ethical principles concerned, in this paper, we aim to provide a comprehensive overview that not only addresses each principle from a technical point of view but also discusses them from a social perspective.

2025-11-01

Computational Intelligence (publié)

Attention-Based Multi-Agent RL for Multi-Machine Tending Using Mobile Robots

Abdalwhab Bakheet Mohamed Abdalwhab

Giovanni Beltrame

David St-Onge

Robotics can help address the growing worker shortage challenge of the manufacturing industry. As such, machine tending is a task collaborat… (voir plus)ive robots can tackle that can also greatly boost productivity. Nevertheless, existing robotics systems deployed in that sector rely on a fixed single-arm setup, whereas mobile robots can provide more flexibility and scalability. We introduce a multi-agent multi-machine-tending learning framework using mobile robots based on multi-agent reinforcement learning (MARL) techniques, with the design of a suitable observation and reward. Moreover, we integrate an attention-based encoding mechanism into the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm to boost its performance for machine-tending scenarios. Our model (AB-MAPPO) outperforms MAPPO in this new challenging scenario in terms of task success, safety, and resource utilization. Furthermore, we provided an extensive ablation study to support our design decisions.

2025-09-30

AI (publié)

Source-free Domain Adaptation Requires Penalized Diversity

Laya Rafiee Sevyeri

Ivaxi Sheth

Farhood Farahnak

Alexandre See

Thomas Fevens

Mohammad Havaei

While neural networks are capable of achieving human-like performance in many tasks such as image classification, the impressive performance… (voir plus) of each model is limited to its own dataset. Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data, thus, increasing data privacy. Diversity in representation space can be vital to a model`s adaptability in varied and difficult domains. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. Motivated by the improved predictive performance of ensembles, we propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors with Distinct Backbone Architectures (DBA). Although diversity in feature space is increased, the unconstrained mutual information (MI) maximization may potentially introduce amplification of weak hypotheses. Thus we introduce the Weak Hypothesis Penalization (WHP) regularizer as a mitigation strategy. Our work proposes Penalized Diversity (PD) where the synergy of DBA and WHP is applied to unsupervised source-free domain adaptation for covariate shift. In addition, PD is augmented with a weighted MI maximization objective for label distribution shift. Empirical results on natural, synthetic, and medical domains demonstrate the effectiveness of PD under different distributional shifts.

2025-08-25

Machine Learning (publié)

Handling Delay in Real-Time Reinforcement Learning

Ivan Anokin

Rishav Rishav

Matthew Riemer

Stephen Chung

Irina Rish

Real-time reinforcement learning (RL) introduces several challenges. First, policies are constrained to a fixed number of actions per second… (voir plus) due to hardware limitations. Second, the environment may change while the network is still computing an action, leading to observational delay. The first issue can partly be addressed with pipelining, leading to higher throughput and potentially better policies. However, the second issue remains: if each neuron operates in parallel with an execution time of

2025-01-21

ICLR.cc/2025/Conference (poster)

openreview.net

GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Victor May

Massimo Caccia

The rapid evolution of software libraries poses a considerable hurdle for code generation, necessitating continuous adaptation to frequent v… (voir plus)ersion updates while preserving backward compatibility. While existing code evolution benchmarks provide valuable insights, they typically lack execution-based evaluation for generating code compliant with specific library versions. To address this, we introduce GitChameleon 2.0, a novel, meticulously curated dataset comprising 328 Python code completion problems, each conditioned on specific library versions and accompanied by executable unit tests. GitChameleon 2.0 rigorously evaluates the capacity of contemporary large language models (LLMs), LLM-powered agents, code assistants, and RAG systems to perform version-conditioned code generation that demonstrates functional accuracy through execution. Our extensive evaluations indicate that state-of-the-art systems encounter significant challenges with this task; enterprise models achieving baseline success rates in the 48-51% range, underscoring the intricacy of the problem. By offering an execution-based benchmark emphasizing the dynamic nature of code libraries, GitChameleon 2.0 enables a clearer understanding of this challenge and helps guide the development of more adaptable and dependable AI code generation methods. We make the dataset and evaluation code publicly available at https://github.com/mrcabbage972/GitChameleonBenchmark.

2024-12-31

arXiv (prépublication)

openreview.net

Learning to Play Atari in a World of Tokens

Pranav Agarwal

Sheldon Andrews

Model-based reinforcement learning agents utilizing transformers have shown improved sample efficiency due to their ability to model extende… (voir plus)d context, resulting in more accurate world models. However, for complex reasoning and planning tasks, these methods primarily rely on continuous representations. This complicates modeling of discrete properties of the real world such as disjoint object classes between which interpolation is not plausible. In this work, we introduce discrete abstract representations for transformer-based learning (DART), a sample-efficient method utilizing discrete representations for modeling both the world and learning behavior. We incorporate a transformer-decoder for auto-regressive world modeling and a transformer-encoder for learning behavior by attending to task-relevant cues in the discrete representation of the world model. For handling partial observability, we aggregate information from past time steps as memory tokens. DART outperforms previous state-of-the-art methods that do not use look-ahead search on the Atari 100k sample efficiency benchmark with a median human-normalized score of 0.790 and beats humans in 9 out of 26 games. We release our code at https://pranaval.github.io/DART/.

2024-07-07

Proceedings of the 41st International Conference on Machine Learning (publié)

proceedings.mlr.press

Prioritizing Samples in Reinforcement Learning with Reducible Loss

Shivakanth Sujit

Somjit Nath

Pedro H.M. Braga

Most reinforcement learning algorithms take advantage of an experience replay buffer to repeatedly train on samples the agent has observed i… (voir plus)n the past. Not all samples carry the same amount of significance and simply assigning equal importance to each of the samples is a naïve strategy. In this paper, we propose a method to prioritize samples based on how much we can learn from a sample. We define the learn-ability of a sample as the steady decrease of the training loss associated with this sample over time. We develop an algorithm to prioritize samples with high learn-ability, while assigning lower priority to those that are hard-to-learn, typically caused by noise or stochasticity. We empirically show that our method is more robust than random sampling and also better than just prioritizing with respect to the training loss, i.e. the temporal difference loss, which is used in prioritized experience replay.

2023-09-20

NeurIPS.cc/2023/Conference (poster)

openreview.net

Discovering Object-Centric Generalized Value Functions From Pixels

Somjit Nath

Gopeshh Raaj Subbaraj

Khimya Khetarpal

Deep Reinforcement Learning has shown significant progress in extracting useful representations from high-dimensional inputs albeit using ha… (voir plus)nd-crafted auxiliary tasks and pseudo rewards. Automatically learning such representations in an object-centric manner geared towards control and fast adaptation remains an open research problem. In this paper, we introduce a method that tries to discover meaningful features from objects, translating them to temporally coherent "question" functions and leveraging the subsequent learned general value functions for control. We compare our approach with state-of-the-art techniques alongside other ablations and show competitive performance in both stationary and non-stationary settings. Finally, we also investigate the discovered general value functions and through qualitative analysis show that the learned representations are not only interpretable but also, centered around objects that are invariant to changes across tasks facilitating fast adaptation.

2023-07-02

Proceedings of the 40th International Conference on Machine Learning (publié)

proceedings.mlr.press

Towards Policy-Guided Conversational Recommendation with Dialogue Acts

Paul Crook

Y-Lan Boureau

J. Weston

Akbar Karimi

Leonardo Rossi

Andrea Prati

Wenqiang Lei

Xiangnan He

Qingyun Yisong Miao

Richang Wu

Min-Yen Hong

Kan Tat-Seng

Raymond Li

Hannes Schulz

Zujie Liang

Huang Hu

Can Xu

Jian Miao

Lizi Liao … (voir 47 de plus)

Ryuichi Takanobu

Yunshan Ma

Xun Yang

Wenchang Ma

Minlie Huang

Minghao Tu

Iulian Serban

Aaron C. Courville

David Silver

Julian Schrittwieser

K. Simonyan

Ioannis Antonoglou

Aja Huang

A. Guez

Hanlin Zhu

O. Vinyals

Igor Babuschkin

Junyoung Chung

M. Mathieu

Max Jaderberg

Wojciech M. Czar-725 necki

A. Dudzik

Petko Georgiev

Richard Powell

T. Ewalds

Dan Horgan

M. Kroiss

Ivo Danihelka

J. Agapiou

Junhyuk Oh

Valentin Dalibard

David Choi

L. Sifre

Yury Sulsky

Sasha Vezhnevets

James Molloy

Trevor Cai

D. Budden

T. Paine

Caglar Gulçehre

Ziyu Wang

Tobias Pfaff

Tobias Pohlen

2021-12-31

(publié)

www.semanticscholar.org

Accounting for Variance in Machine Learning Benchmarks

Mirko Bronzi

Naz Sepah

Edward Raff

Kanika Madan

Vikram Voleti

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the l… (voir plus)earning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices. This is prohibitively expensive, and corners are cut to reach conclusions. We model the whole benchmarking process, revealing that variance due to data sampling, parameter initialization and hyperparameter choice impact markedly the results. We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost. Building on these results, we study the error rate of detecting improvements, on five different deep-learning tasks/architectures. This study leads us to propose recommendations for performance comparisons.

2020-12-31

MLSys (publié)

Tell, Draw, and Repeat: Generating and Modifying Images Based on Continual Linguistic Instruction

Alaaeldin El-Nouby

Shikhar Sharma

Hannes Schulz

Devon Hjelm

Layla El Asri

Yoshua Bengio

Graham W. Taylor

Conditional text-to-image generation is an active area of research, with many possible applications. Existing research has primarily focused… (voir plus) on generating a single image from available conditioning information in one step. One practical extension beyond one-step generation is a system that generates an image iteratively, conditioned on ongoing linguistic input or feedback. This is significantly more challenging than one-step generation tasks, as such a system must understand the contents of its generated images with respect to the feedback history, the current feedback, as well as the interactions among concepts present in the feedback history. In this work, we present a recurrent image generation model which takes into account both the generated output up to the current step as well as all past instructions for generation. We show that our model is able to generate the background, add new objects, and apply simple transformations to existing objects. We believe our approach is an important step toward interactive generation. Code and data is available at: https://www.microsoft.com/en-us/research/project/generative-neural-visual-artist-geneva/ .

2019-11-01

2019 IEEE/CVF International Conference on Computer Vision (ICCV) (publié)

Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies

Sarath Chandar

Chinnadhurai Sankar

Eugene Vorontsov

Yoshua Bengio

Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish durin… (voir plus)g training, as the sequence length increases. Gradients can be attenuated by transition operators and are attenuated or dropped by activation functions. Canonical architectures like LSTM alleviate this issue by skipping information through a memory mechanism. We propose a new recurrent architecture (Non-saturating Recurrent Unit; NRU) that relies on a memory mechanism but forgoes both saturating activation functions and saturating gates, in order to further alleviate vanishing gradients. In a series of synthetic and real world tasks, we demonstrate that the proposed model is the only model that performs among the top 2 models across all tasks with and without long-term dependencies, when compared against a range of other architectures.

2019-07-16

Proceedings of the AAAI Conference on Artificial Intelligence (publié)