Collision-free motion generation

Published On 2024/4/25

Apparatuses, systems, and techniques to perform collision-free motion generation (eg, to operate a real-world or virtual robot). In at least one embodiment, at least a portion of the collision-free motion generation is performed in parallel.

Authors

Dieter Fox

Dieter Fox

University of Washington

H-Index

128

Research Interests

Robotics

Artificial Intelligence

Computer Vision

University Profile Page

Caelan Garrett

Caelan Garrett

Massachusetts Institute of Technology

H-Index

16

Research Interests

Robotics

Planning

Learning

Adam Fishman

Adam Fishman

University of Washington

H-Index

6

Research Interests

Robotics

Computer Vision

Motion Control

Imitation Learning

Human-Robot Interaction

University Profile Page

Other Articles from authors

Dieter Fox

Dieter Fox

University of Washington

arXiv preprint arXiv:2404.06089

EVE: Enabling Anyone to Train Robot using Augmented Reality

The increasing affordability of robot hardware is accelerating the integration of robots into everyday activities. However, training a robot to automate a task typically requires physical robots and expensive demonstration data from trained human annotators. Consequently, only those with access to physical robots produce demonstrations to train robots. To mitigate this issue, we introduce EVE, an iOS app that enables everyday users to train robots using intuitive augmented reality visualizations without needing a physical robot. With EVE, users can collect demonstrations by specifying waypoints with their hands, visually inspecting the environment for obstacles, modifying existing waypoints, and verifying collected trajectories. In a user study (, ) consisting of three common tabletop tasks, EVE outperformed three state-of-the-art interfaces in success rate and was comparable to kinesthetic teaching-physically moving a real robot-in completion time, usability, motion intent communication, enjoyment, and preference (). We conclude by enumerating limitations and design considerations for future AR-based demonstration collection systems for robotics.

Dieter Fox

Dieter Fox

University of Washington

Model predictive control techniques for autonomous systems

Apparatuses, systems, and techniques to infer a sequence of actions to perform using one or more neural networks trained, at least in part, by optimizing a probability distribution function using a cost function, wherein the probability distribution represents different sequences of actions that can be performed. In at least one embodiment, a model predictive control problem is formulated as a Bayesian inference task to infer a set of solutions.

Dieter Fox

Dieter Fox

University of Washington

Training machine learning models using simulation for robotics systems and applications

Systems and techniques are described related to training one or more machine learning models for use in control of a robot. In at least one embodiment, one or more machine learning models are trained based at least on simulations of the robot and renderings of such simulations—which may be performed using one or more ray tracing algorithms, operations, or techniques.

Dieter Fox

Dieter Fox

University of Washington

Grasp determination for an object in clutter

Apparatuses, systems, and techniques determine a set of grasp poses that would allow a robot to successfully grasp an object that is proximate to at least one additional object. In at least one embodiment, the set of grasp poses is modified based on a determination that at least one of the grasp poses in the set of grasp poses would interfere with at least one additional object that is proximate to the object.

Dieter Fox

Dieter Fox

University of Washington

arXiv preprint arXiv:2404.03336

Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation

In recent years, deep reinforcement learning (RL) has shown its effectiveness in solving complex continuous control tasks like locomotion and dexterous manipulation. However, this comes at the cost of an enormous amount of experience required for training, exacerbated by the sensitivity of learning efficiency and the policy performance to hyperparameter selection, which often requires numerous trials of time-consuming experiments. This work introduces a Population-Based Reinforcement Learning (PBRL) approach that exploits a GPU-accelerated physics simulator to enhance the exploration capabilities of RL by concurrently training multiple policies in parallel. The PBRL framework is applied to three state-of-the-art RL algorithms -- PPO, SAC, and DDPG -- dynamically adjusting hyperparameters based on the performance of learning agents. The experiments are performed on four challenging tasks in Isaac Gym -- Anymal Terrain, Shadow Hand, Humanoid, Franka Nut Pick -- by analyzing the effect of population size and mutation mechanisms for hyperparameters. The results show that PBRL agents achieve superior performance, in terms of cumulative reward, compared to non-evolutionary baseline agents. The trained agents are finally deployed in the real world for a Franka Nut Pick} task, demonstrating successful sim-to-real transfer. Code and videos of the learned policies are available on our project website.

Dieter Fox

Dieter Fox

University of Washington

Grasp pose prediction

Apparatuses, systems, and techniques to generate and select grasp proposals. In at least one embodiment, grasp proposals are generated and selected using one or more neural networks, based on, for example, a latent code corresponding to an object.

Dieter Fox

Dieter Fox

University of Washington

Identifying objects using neural network-generated descriptors

Apparatuses, systems, and techniques are presented to identify one or more objects. In at least one embodiment, one or more neural networks can be used to identify one or more objects based, at least in part, on one or more descriptors of one or more segments of the one or more objects.

Adam Fishman

Adam Fishman

University of Washington

Model predictive control techniques for autonomous systems

Apparatuses, systems, and techniques to infer a sequence of actions to perform using one or more neural networks trained, at least in part, by optimizing a probability distribution function using a cost function, wherein the probability distribution represents different sequences of actions that can be performed. In at least one embodiment, a model predictive control problem is formulated as a Bayesian inference task to infer a set of solutions.

Dieter Fox

Dieter Fox

University of Washington

arXiv preprint arXiv:2404.01440

Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associate the two states. By explicitly modeling point-level correspondences and exploiting cues from images, 3D reconstructions, and kinematics, our method yields more accurate and stable results compared to prior work. It also handles more than one movable part and does not rely on any object shape or structure priors. Project page: https://github.com/NVlabs/DigitalTwinArt

Dieter Fox

Dieter Fox

University of Washington

ASID: Active Exploration for System Identification in Robotic Manipulation

Model-free control strategies such as reinforcement learning have shown the ability to learn control strategies without requiring an accurate model or simulator of the world. While this is appealing due to the lack of modeling requirements, real-world RL can be unsafe and sample inefficient, making it impractical in many safety-critical domains. On the other hand, model-based control techniques leveraging accurate simulators can circumvent these challenges and use a large amount of cheap simulation data to learn controllers that can effectively transfer to the real world. The challenge with such model-based techniques is the requirement for an extremely accurate simulation, requiring both the specification of appropriate simulation assets and physical parameters. This requires considerable human effort to design for every environment being considered. In this work, we propose a learning system that can leverage a small amount of real-world data to autonomously refine a simulation model, and then plan an accurate control strategy that can be deployed in the real world. Our approach critically relies on utilizing an initial (possibly inaccurate) simulator to design effective exploration policies that, when deployed in the real world, collect high-quality data. We demonstrate the efficacy of this paradigm in identifying articulation, mass, and other physical parameters in several challenging robotic manipulation tasks, and illustrate that only a small amount of real-world data can allow for effective sim-to-real transfer.

2023/10/13

Article Details
Dieter Fox

Dieter Fox

University of Washington

arXiv preprint arXiv:2402.08191

THE COLOSSEUM: A Benchmark for Evaluating Generalization for Robotic Manipulation

To realize effective large-scale, real-world robotic applications, we must evaluate how well our robot policies adapt to changes in environmental conditions. Unfortunately, a majority of studies evaluate robot performance in environments closely resembling or even identical to the training setup. We present THE COLOSSEUM, a novel simulation benchmark, with 20 diverse manipulation tasks, that enables systematical evaluation of models across 12 axes of environmental perturbations. These perturbations include changes in color, texture, and size of objects, table-tops, and backgrounds; we also vary lighting, distractors, and camera pose. Using THE COLOSSEUM, we compare 4 state-of-the-art manipulation models to reveal that their success rate degrades between 30-50% across these perturbation factors. When multiple perturbations are applied in unison, the success rate degrades 75%. We identify that changing the number of distractor objects, target object color, or lighting conditions are the perturbations that reduce model performance the most. To verify the ecological validity of our results, we show that our results in simulation are correlated () to similar perturbations in real-world experiments. We open source code for others to use THE COLOSSEUM, and also release code to 3D print the objects used to replicate the real-world perturbations. Ultimately, we hope that THE COLOSSEUM will serve as a benchmark to identify modeling decisions that systematically improve generalization for manipulation. See https://robot-colosseum.github.io/ for more details.

Dieter Fox

Dieter Fox

University of Washington

Controlling position of robot by determining goal proposals by using neural networks

A framework for offline learning from a set of diverse and suboptimal demonstrations operates by selectively imitating local sequences from the dataset. At least one embodiment recovers performant policies from large manipulation datasets by decomposing the problem into a goal-conditioned imitation and a high-level goal selection mechanism.

Dieter Fox

Dieter Fox

University of Washington

Imitation learning system

Apparatuses, systems, and techniques to identify a goal of a demonstration. In at least one embodiment, video data of a demonstration is analyzed to identify a goal. Object trajectories identified in the video data are analyzed with respect to a task predicate satisfied by a respective object trajectory, and with respect to motion predicate. Analysis of the trajectory with respect to the motion predicate is used to assess intentionality of a trajectory with respect to the goal.

Dieter Fox

Dieter Fox

University of Washington

Techniques for large-scale three-dimensional scene reconstruction via camera clustering

One embodiment of a method for generating representations of scenes includes assigning each image included in a set of images of a scene to one or more clusters of images based on a camera pose associated with the image, and performing one or more operations to generate, for each cluster included in the one or more clusters, a corresponding three-dimensional (3D) representation of the scene based on one or more images assigned to the cluster.

Dieter Fox

Dieter Fox

University of Washington

arXiv preprint arXiv:2404.07428

AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents. We introduce AdaDemo (Adaptive Online Demonstration Expansion), a general framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset. AdaDemo strategically collects new demonstrations to address the identified weakness in the existing policy, ensuring data efficiency is maximized. Through a comprehensive evaluation on a total of 22 tasks across two robotic manipulation benchmarks (RLBench and Adroit), we demonstrate AdaDemo's capability to progressively improve policy performance by guiding the generation of high-quality demonstration datasets in a data-efficient manner.

Dieter Fox

Dieter Fox

University of Washington

arXiv preprint arXiv:2402.02612

Fast Explicit-Input Assistance for Teleoperation in Clutter

The performance of prediction-based assistance for robot teleoperation degrades in unseen or goal-rich environments due to incorrect or quickly-changing intent inferences. Poor predictions can confuse operators or cause them to change their control input to implicitly signal their goal, resulting in unnatural movement. We present a new assistance algorithm and interface for robotic manipulation where an operator can explicitly communicate a manipulation goal by pointing the end-effector. Rapid optimization and parallel collision checking in a local region around the pointing target enable direct, interactive control over grasp and place pose candidates. We compare the explicit pointing interface to an implicit inference-based assistance scheme in a within-subjects user study (N=20) where participants teleoperate a simulated robot to complete a multi-step singulation and stacking task in cluttered environments. We find that operators prefer the explicit interface, which improved completion time, pick and place success rates, and NASA TLX scores. Our code is available at https://github.com/NVlabs/fast-explicit-teleop

Dieter Fox

Dieter Fox

University of Washington

Prompt generator for use with one or more machine learning processes

Apparatuses, systems, and techniques to generate a prompt for one or more machine learning processes. In at least one embodiment, the machine learning process (es) generate (s) a plan to perform a task (identified in the prompt) that is to be performed by an agent (real world or virtual).