| Total: 5
Many planning methods rely on the use of an immediate reward function as a portable and succinct representation of desired behavior. Rewards are often inferred from demonstrated behavior that is assumed to be near-optimal. We examine a framework, Distance Minimization IRL (DM-IRL), for learning reward functions from scores an expert assigns to possibly suboptimal demonstrations. By changing the expert’s role from a demonstrator to a judge, DM-IRL relaxes some of the assumptions present in IRL, enabling learning from the scoring of arbitrary demonstration trajectories with unknown transition functions. DM-IRL complements existing IRL approaches by addressing different assumptions about the expert. We show that DM-IRL is robust to expert scoring error and prove that finding a policy that produces maximally informative trajectories for an expert to score is strongly NP-hard. Experimentally, we demonstrate that the reward function DM-IRL learns from an MDP with an unknown transition model can transfer to an agent with known characteristics in a novel environment, and we achieve successful learning with limited available training data.
Robotic tactile recognition aims at identifying target objects or environments from tactile sensory readings. The advancement of unsupervised feature learning and biological tactile sensing inspire us proposing the model of 3T-RTCN that performs spatio-temporal feature representation and fusion for tactile recognition. It decomposes tactile data into spatial and temporal threads, and incorporates the strength of randomized tiling convolutional networks. Experimental evaluations show that it outperforms some state-of-the-art methods with a large margin regarding recognition accuracy, robustness, and fault-tolerance; we also achieve an order-of-magnitude speedup over equivalent networks with pretraining and finetuning. Practical suggestions and hints are summarized in the end for effectively handling the tactile data.
This paper presents to the best of our knowledge the first end-to-end object tracking approach which directly maps from raw sensor input to object tracks in sensor space without requiring any feature engineering or system identification in the form of plant or sensor models. Specifically, our system accepts a stream of raw sensor data at one end and, in real-time, produces an estimate of the entire environment state at the output including even occluded objects. We achieve this by framing the problem as a deep learning task and exploit sequence models in the form of recurrent neural networks to learn a mapping from sensor measurements to object tracks. In particular, we propose a learning method based on a form of input dropout which allows learning in an unsupervised manner, only based on raw, occluded sensor data without access to ground-truth annotations. We demonstrate our approach using a synthetic dataset designed to mimic the task of tracking objects in 2D laser data — as commonly encountered in robotics applications — and show that it learns to track many dynamic objects despite occlusions and the presence of sensor noise.
To solve ever more complex and longer tasks, mobile robots need to generate more elaborate plans and must handle dynamic environments and incomplete knowledge. We address this challenge by integrating two seemingly different approaches — PDDL-based planning for efficient plan generation and Golog for highly expressive behavior specification — in a coherent framework that supports continual planning. The latter allows to interleave plan generation and execution through assertions, which are placeholder actions that are dynamically expanded into conditional sub-plans (using classical planners) once a replanning condition is satisfied. We formalize and implement continual planning in Golog which was so far only supported in PDDL-based systems. This enables combining the execution of generated plans with regular Golog programs and execution monitoring. Experiments on autonomous mobile robots show that the approach supports expressive behavior specification combined with efficient sub-plan generation to handle dynamic environments and incomplete knowledge in a unified way.
CMDragons 2015 is the champion of the RoboCup Small Size League of autonomous robot soccer. The team won all of its six games, scoring a total of 48 goals and conceding 0. This unprecedented dominant performance is the result of various features, but we particularly credit our novel offense multi-robot coordination. This paper thus presents our Selectively Reactive Coordination (SRC) algorithm, consisting of two layers: A coordinated opponent-agnostic layer enables the team to create its own plans, setting the pace of the game in offense. An individual opponent-reactive action selection layer enables the robots to maintain reactivity to different opponents. We demonstrate the effectiveness of our coordination through results from RoboCup 2015, and through controlled experiments using a physics-based simulator and an automated referee.