Gregory Dudek

Github

Google Scholar

Steve Wen

Master's Research - McGill University

Principal supervisor :

Doina Precup

Publications

Trajectory-Constrained Deep Latent Visual Attention for Improved Local Planning in Presence of Heterogeneous Terrain

Stefan Wapnick

Travis Manderson

We present a reward-predictive, model-based learning method featuring trajectory-constrained visual attention for use in mapless, local visu… (see more)al navigation tasks. Our method learns to place visual attention at locations in latent image space which follow trajectories caused by vehicle control actions to later enhance predictive accuracy during planning. Our attention model is jointly optimized by the task-specific loss and additional trajectory-constraint loss, allowing adaptability yet encouraging a regularized structure for improved generalization and reliability. Importantly, visual attention is applied in latent feature map space instead of raw image space to promote efficient planning. We validated our model in visual navigation tasks of planning low turbulence, collision-free trajectories in off-road settings and hill climbing with locking differentials in the presence of slippery terrain. Experiments involved randomized procedural generated simulation and real-world environments. We found our method improved generalization and learning efficiency when compared to no-attention and self-attention alternatives.

2021-10-01

2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

Latent Attention Augmentation for Robust Autonomous Driving Policies

Ran Cheng

Christopher Agia

Florian Shkurti

Model-free reinforcement learning has become a viable approach for vision-based robot control. However, sample complexity and adaptability t… (see more)o domain shifts remain persistent challenges when operating in high-dimensional observation spaces (images, LiDAR), such as those that are involved in autonomous driving. In this paper, we propose a flexible framework by which a policy’s observations are augmented with robust attention representations in the latent space to guide the agent’s attention during training. Our method encodes local and global descriptors of the augmented state representations into a compact latent vector, and scene dynamics are approximated by a recurrent network that processes the latent vectors in sequence. We outline two approaches for constructing attention maps; a supervised pipeline leveraging semantic segmentation networks, and an unsupervised pipeline relying only on classical image processing techniques. We conduct our experiments in simulation and test the learned policy against varying seasonal effects and weather conditions. Our design decisions are supported in a series of ablation studies. The results demonstrate that our state augmentation method both improves learning efficiency and encourages robust domain adaptation when compared to common end-to-end frameworks and methods that learn directly from intermediate representations.

2021-09-27

IEEE/RJS International Conference on Intelligent RObots and Systems (published)

An Autonomous Probing System for Collecting Measurements at Depth from Small Surface Vehicles

Yuying Huang

Yiming Yao

Johanna Hansen

Jeremy Mallette

Sandeep Manjanna

2021-09-20

OCEANS 2021: San Diego – Porto (published)

Multimodal dynamics modeling for off-road autonomous vehicles

Jean-François Tremblay

Travis Manderson

Aurélio Noca

Dynamics modeling in outdoor and unstructured environments is difficult because different elements in the environment interact with the robo… (see more)t in ways that can be hard to predict. Leveraging multiple sensors to perceive maximal information about the robot’s environment is thus crucial when building a model to perform predictions about the robot’s dynamics with the goal of doing motion planning. We design a model capable of long-horizon motion predictions, leveraging vision, lidar and proprioception, which is robust to arbitrarily missing modalities at test time. We demonstrate in simulation that our model is able to leverage vision to predict traction changes. We then test our model using a real-world challenging dataset of a robot navigating through a forest, performing predictions in trajectories unseen during training. We try different modality combinations at test time and show that, while our model performs best when all modalities are present, it is still able to perform better than the baseline even when receiving only raw vision input and no proprioception, as well as when only receiving proprioception. Overall, our study demonstrates the importance of leveraging multiple sensors when doing dynamics modeling in outdoor conditions.

2021-06-05

2021 IEEE International Conference on Robotics and Automation (ICRA) (published)

Learning Intuitive Physics with Multimodal Generative Models

Sahand Rezaei-Shoshtari

Francois Hogan

M. Jenkin

Predicting the future interaction of objects when they come into contact with their environment is key for autonomous agents to take intelli… (see more)gent and anticipatory actions. This paper presents a perception framework that fuses visual and tactile feedback to make predictions about the expected motion of objects in dynamic scenes. Visual information captures object properties such as 3D shape and location, while tactile information provides critical cues about interaction forces and resulting object motion when it makes contact with the environment. Utilizing a novel See-Through-your-Skin (STS) sensor that provides high resolution multimodal sensing of contact surfaces, our system captures both the visual appearance and the tactile properties of objects. We interpret the dual stream signals from the sensor using a Multimodal Variational Autoencoder (MVAE), allowing us to capture both modalities of contacting objects and to develop a mapping from visual to tactile interaction and vice-versa. Additionally, the perceptual system can be used to infer the outcome of future physical interactions, which we validate through simulated and real-world experiments in which the resting state of an object is predicted from given initial conditions.

2021-05-18

Proceedings of the AAAI Conference on Artificial Intelligence (published)

Learning Domain Randomization Distributions for Training Robust Locomotion Policies

Melissa Mozian

Juan Higuera

This paper considers the problem of learning behaviors in simulation without knowledge of the precise dynamical properties of the target rob… (see more)ot platform(s). In this context, our learning goal is to mutually maximize task efficacy on each environment considered and generalization across the widest possible range of environmental conditions. The physical parameters of the simulator are modified by a component of our technique that learns the Domain Randomization (DR) that is appropriate at each learning epoch to maximally challenge the current behavior policy, without being overly challenging, which can hinder learning progress. This so-called sweet spot distribution is a selection of simulated domains with the following properties: 1) The trained policy should be successful in environments sampled from the domain randomization distribution; and 2) The DR distribution made as wide as possible, to increase variability in the environments. These properties aim to ensure the trajectories encountered in the target system are close to those observed during training, as existing methods in machine learning are better suited for interpolation than extrapolation. We show how adapting the DR distribution while training context-conditioned policies results in improvements on jump-start and asymptotic performance when transferring a learned policy to the target environment1.

2021-01-24

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (published)

Vision-Based Goal-Conditioned Policies for Underwater Navigation in the Presence of Obstacles

Travis Manderson

Juan Higuera

Stefan Wapnick

Jean-François Tremblay

Florian Shkurti

We present Nav2Goal, a data-efficient and end-to-end learning method for goal-conditioned visual navigation. Our technique is used to train … (see more)a navigation policy that enables a robot to navigate close to sparse geographic waypoints provided by a user without any prior map, all while avoiding obstacles and choosing paths that cover user-informed regions of interest. Our approach is based on recent advances in conditional imitation learning. General-purpose, safe and informative actions are demonstrated by a human expert. The learned policy is subsequently extended to be goal-conditioned by training with hindsight relabelling, guided by the robot's relative localization system, which requires no additional manual annotation. We deployed our method on an underwater vehicle in the open ocean to collect scientifically relevant data of coral reefs, which allowed our robot to operate safely and autonomously, even at very close proximity to the coral. Our field deployments have demonstrated over a kilometer of autonomous visual navigation, where the robot reaches on the order of 40 waypoints, while collecting scientifically relevant data. This is done while travelling within 0.5 m altitude from sensitive corals and exhibiting significant learned agility to overcome turbulent ocean conditions and to actively avoid collisions.

2020-07-12

Robotics: Science and Systems XVI (published)

Navigation in the Service of Enhanced Pose Estimation

Travis Manderson

Ran Cheng

2020-01-23

Springer Proceedings in Advanced Robotics (published)

Seeing Through Your Skin: A Novel Visuo-Tactile Sensor for Robotic Manipulation

Francois Hogan

Sahand Rezaei-Shoshtari

M. Jenkin

Yashveer Girdhar

This work describes the development of the novel tactile sensor, named Semitransparent Tactile Sensor (STS), designed to enable reactive and… (see more) robust manipulation skills. The design, inspired from recent developments in optical tactile sensing technology, addresses a key missing features of these sensors: the ability to capture an “in the hand” perspective prior to and during the contact interaction. Whereas optical tactile sensors are typically opaque and obscure the view of the object at the critical moment prior to manipulator-object contact, we present a sensor that has the dual capabilities of acting as a tactile sensor and as a visual camera. This paper details the design and fabrication of the sensor, showcases its dual sensing capabilities, and introduces a simulated environment of the sensor within the PyBullet simulator.

Detecting GAN generated errors

Xiru Zhu

Fengdi Che

Tianzi Yang

Tzuyang Yu

Despite an impressive performance from the latest GAN for generating hyper-realistic images, GAN discriminators have difficulty evaluating t… (see more)he quality of an individual generated sample. This is because the task of evaluating the quality of a generated image differs from deciding if an image is real or fake. A generated image could be perfect except in a single area but still be detected as fake. Instead, we propose a novel approach for detecting where errors occur within a generated image. By collaging real images with generated images, we compute for each pixel, whether it belongs to the real distribution or generated distribution. Furthermore, we leverage attention to model long-range dependency; this allows detection of errors which are reasonable locally but not holistically. For evaluation, we show that our error detection can act as a quality metric for an individual image, unlike FID and IS. We leverage Improved Wasserstein, BigGAN, and StyleGAN to show a ranking based on our metric correlates impressively with FID scores. Our work opens the door for better understanding of GAN and the ability to select the best samples from a GAN model.

2019-12-02

ArXiv (preprint)

Planning in Dynamic Environments with Conditional Autoregressive Models

Johanna Hansen

Kyle Kastner

Aaron Courville

We demonstrate the use of conditional autoregressive generative models (van den Oord et al., 2016a) over a discrete latent space (van den Oo… (see more)rd et al., 2017b) for forward planning with MCTS. In order to test this method, we introduce a new environment featuring varying difficulty levels, along with moving goals and obstacles. The combination of high-quality frame generation and classical planning approaches nearly matches true environment performance for our task, demonstrating the usefulness of this method for model-based planning in dynamic environments.

2018-11-25

ArXiv (preprint)