| Total: 11

This paper introduces a novel methodology for 3D template matching that is scalable to higher-dimensional spaces and larger kernel sizes. It uses the Hilbert Maps framework to model raw pointcloud information as a continuous occupancy function, and we derive a closed-form solution to the convolution operation that takes place directly in the Reproducing Kernel Hilbert Space defining these functions. The result is a third function modeling activation values, that can be queried at arbitrary resolutions with logarithmic complexity, and by iteratively searching for high similarity areas we can determine matching candidates. Experimental results show substantial speed gains over standard discrete convolution techniques, such as sliding window and fast Fourier transform, along with a significant decrease in memory requirements, without accuracy loss. This efficiency allows the proposed methodology to be used in areas where discrete convolution is currently infeasible. As a practical example we explore the key problem in robotics of global localization, in which a vehicle must be positioned on a map using only its current sensor information, and provide comparisons with other state-of-the-art techniques in terms of computational speed and accuracy.

Robots assisting the disabled or elderly must perform complex manipulation tasks and must adapt to the home environment and preferences of their user. Learning from demonstration is a promising choice, that would allow the non-technical user to teach the robot different tasks. However, collecting demonstrations in the home environment of a disabled user is time consuming, disruptive to the comfort of the user, and presents safety challenges. It would be desirable to perform the demonstrations in a virtual environment. In this paper we describe a solution to the challenging problem of behavior transfer from virtual demonstration to a physical robot. The virtual demonstrations are used to train a deep neural network based controller, which is using a Long Short Term Memory (LSTM) recurrent neural network to generate trajectories. The training process uses a Mixture Density Network (MDN) to calculate an error signal suitable for the multimodal nature of demonstrations. The controller learned in the virtual environment is transferred to a physical robot (a Rethink Robotics Baxter). An off-the-shelf vision component is used to substitute for geometric knowledge available in the simulation and an inverse kinematics module is used to allow the Baxter to enact the trajectory. Our experimental studies validate the three contributions of the paper: (1) the controller learned from virtual demonstrations can be used to successfully perform the manipulation tasks on a physical robot, (2) the LSTM+MDN architectural choice outperforms other choices, such as the use of feedforward networks and mean-squared error based training signals and (3) allowing imperfect demonstrations in the training set also allows the controller to learn how to correct its manipulation mistakes.

Given a set of obstacles and two designated points in the plane, the Minimum Constraint Removal problem asks for a minimum number of obstacles that can be removed so that a collision-free path exists between the two designated points. It is a well-studied problem in both robotic motion planning and wireless computing that has been shown to be NP-hard in various settings. In this work, we extend the study of Minimum Constraint Removal. We start by presenting refined NP-hardness reductions for the two cases: (1) when all the obstacles are axes-parallel rectangles, and (2) when all the obstacles are line segments such that no three intersect at the same point. These results improve on existing results in the literature. As a byproduct of our NP-hardness reductions, we prove that, unless the Exponential-Time Hypothesis (ETH) fails, Minimum Constraint Removal cannot be solved in subexponential time 2o(n), where n is the number of obstacles in the instance. This shows that significant improvement on the brute-force 2O(n)-time algorithm is unlikely. We then present a subexponential-time algorithm for instances of Minimum Constraint Removal in which the number of obstacles that overlap at any point is constant; the algorithm runs in time 2O(√N), where N is the number of the vertices in the auxiliary graph associated with the instance of the problem. We show that significant improvement on this algorithm is unlikely by showing that, unless ETH fails, Minimum Constraint Removal with bounded overlap number cannot be solved in time 2o(√N). We describe several exact algorithms and approximation algorithms that leverage heuristics and discuss their performance in an extensive empirical simulation.

Loop closure detection is a critical component of large-scale simultaneous localization and mapping (SLAM) in loopy environments. This capability is challenging to achieve in long-term SLAM, when the environment appearance exhibits significant long-term variations across various time of the day, months, and even seasons. In this paper, we introduce a novel formulation to learn an integrated long-term representation based upon both holistic and landmark information, which integrates two previous insights under a unified framework: (1) holistic representations outperform keypoint-based representations, and (2) landmarks as an intermediate representation provide informative cues to detect challenging locations. Our new approach learns the representation by projecting input visual data into a low-dimensional space, which preserves both the global consistency (to minimize representation error) and the local consistency (to preserve landmarks’ pairwise relationship) of the input data. To solve the formulated optimization problem, a new algorithm is developed with theoretically guaranteed convergence. Extensive experiments have been conducted using two large-scale public benchmark data sets, in which the promising performances have demonstrated the effectiveness of the proposed approach.

Inertial sensors play a pivotal role in indoor localization, which in turn lays the foundation for pervasive personal applications. However, low-cost inertial sensors, as commonly found in smartphones, are plagued by bias and noise, which leads to unbounded growth in error when accelerations are double integrated to obtain displacement. Small errors in state estimation propagate to make odometry virtually unusable in a matter of seconds. We propose to break the cycle of continuous integration, and instead segment inertial data into independent windows. The challenge becomes estimating the latent states of each window, such as velocity and orientation, as these are not directly observable from sensor data. We demonstrate how to formulate this as an optimization problem, and show how deep recurrent neural networks can yield highly accurate trajectories, outperforming state-of-the-art shallow techniques, on a wide range of tests and attachments. In particular, we demonstrate that IONet can generalize to estimate odometry for non-periodic motion, such as a shopping trolley or baby-stroller, an extremely challenging task for existing techniques.

We present a reinforcement learning approach to explore and optimize a safety-constrained Markov Decision Process(MDP). In this setting, the agent must maximize discounted cumulative reward while constraining the probability of entering unsafe states, defined using a safety function being within some tolerance. The safety values of all states are not known a priori, and we probabilistically model them via aGaussian Process (GP) prior. As such, properly behaving in such an environment requires balancing a three-way trade-off of exploring the safety function, exploring the reward function, and exploiting acquired knowledge to maximize reward. We propose a novel approach to balance this trade-off. Specifically, our approach explores unvisited states selectively; that is, it prioritizes the exploration of a state if visiting that state significantly improves the knowledge on the achievable cumulative reward. Our approach relies on a novel information gain criterion based on Gaussian Process representations of the reward and safety functions. We demonstrate the effectiveness of our approach on a range of experiments, including a simulation using the real Martian terrain data.

Mapping the occupancy level of an environment is important for a robot to navigate in unknown and unstructured environments. To this end, continuous occupancy mapping techniques which express the probability of a location as a function are used. In this work, we provide a theoretical analysis to compare and contrast the two major branches of Bayesian continuous occupancy mapping techniques---Gaussian process occupancy maps and Bayesian Hilbert maps---considering the fact that both utilize kernel functions to operate in a rich high-dimensional implicit feature space and use variational inference to learn parameters. Then, we extend the recent Bayesian Hilbert maps framework which is so far only used for stationary robots, to map large environments with moving robots. Finally, we propose convolution of kernels as a powerful tool to improve different aspects of continuous occupancy mapping. Our claims are also experimentally validated with both simulated and real-world datasets.

In many reinforcement learning problems, parameters of the model may vary with its phase while the agent attempts to learn through its interaction with the environment. For example, an autonomous car's reward on selecting a path may depend on traffic conditions at the time of the day or the transition dynamics of a drone may depend on the current wind direction. Many such processes exhibit a cyclic phase-structure and could be represented with a control policy parameterized over a circular or cyclic phase space. Attempting to model such phase variations with a standard data-driven approach (e.g. deep networks) without explicitly modeling the phase of the model can be challenging. Ambiguities may arise as the optimal action for a given state can vary depending on the phase. To better model cyclic environments, we propose phase-parameterized policies and value function approximators that explicitly enforce a cyclic structure to the policy or value space. We apply our phase-parameterized reinforcement learning approach to both feed-forward and recurrent deep networks in the context of trajectory optimization and locomotion problems. Our experiments show that our proposed approach has superior modeling performance than traditional function approximators in cyclic environments.

In robotics, it is essential to be able to plan efficiently in high-dimensional continuous state-action spaces for long horizons. For such complex planning problems, unguided uniform sampling of actions until a path to a goal is found is hopelessly inefficient, and gradient-based approaches often fall short when the optimization manifold of a given problem is not smooth. In this paper, we present an approach that guides search in continuous spaces for generic planners by learning an action sampler from past search experience. We use a Generative Adversarial Network (GAN) to represent an action sampler, and address an important issue: search experience consists of a relatively large number of actions that are not on a solution path and a relatively small number of actions that actually are on a solution path. We introduce a new technique, based on an importance-ratio estimation method, for using samples from a non-target distribution to make GAN learning more data-efficient. We provide theoretical guarantees and empirical evaluation in three challenging continuous robot planning problems to illustrate the effectiveness of our algorithm.

Formal verification provides a high degree of confidence in safe system operation, but only if reality matches the verified model. Although a good model will be accurate most of the time, even the best models are incomplete. This is especially true in Cyber-Physical Systems because high-fidelity physical models of systems are expensive to develop and often intractable to verify. Conversely, reinforcement learning-based controllers are lauded for their flexibility in unmodeled environments, but do not provide guarantees of safe operation. This paper presents an approach for provably safe learning that provides the best of both worlds: the exploration and optimization capabilities of learning along with the safety guarantees of formal verification. Our main insight is that formal verification combined with verified runtime monitoring can ensure the safety of a learning agent. Verification results are preserved whenever learning agents limit exploration within the confounds of verified control choices as long as observed reality comports with the model used for off-line verification. When a model violation is detected, the agent abandons efficiency and instead attempts to learn a control strategy that guides the agent to a modeled portion of the state space. We prove that our approach toward incorporating knowledge about safe control into learning systems preserves safety guarantees, and demonstrate that we retain the empirical performance benefits provided by reinforcement learning. We also explore various points in the design space for these justified speculative controllers in a simple model of adaptive cruise control model for autonomous cars.

There has been substantial work in recent years on grounded language acquisition, in which language and sensor data are used to create a model relating linguistic constructs to the perceivable world. While powerful, this approach is frequently hindered by ambiguities, redundancies, and omissions found in natural language. We describe an unsupervised system that learns language by training visual classifiers, first selecting important terms from object descriptions, then automatically choosing negative examples from a paired corpus of perceptual and linguistic data. We evaluate the effectiveness of each stage as well as the system's performance on the overall learning task.