AAAI.2020 - Robotics | Cool Papers - Immersive Paper Discovery

#1 That and There: Judging the Intent of Pointing Actions with Robotic Arms [PDF] [Copy] [Kimi] [REL]

Authors: Malihe Alikhani ; Baber Khalid ; Rahul Shome ; Chaitanya Mitash ; Kostas Bekris ; Matthew Stone

Collaborative robotics requires effective communication between a robot and a human partner. This work proposes a set of interpretive principles for how a robotic arm can use pointing actions to communicate task information to people by extending existing models from the related literature. These principles are evaluated through studies where English-speaking human subjects view animations of simulated robots instructing pick-and-place tasks. The evaluation distinguishes two classes of pointing actions that arise in pick-and-place tasks: referential pointing (identifying objects) and locating pointing (identifying locations). The study indicates that human subjects show greater flexibility in interpreting the intent of referential pointing compared to locating pointing, which needs to be more deliberate. The results also demonstrate the effects of variation in the environment and task context on the interpretation of pointing. Our corpus, experiments and design principles advance models of context, common sense reasoning and communication in embodied communication.

#2 Learning from Interventions Using Hierarchical Policies for Safe Learning [PDF] [Copy] [Kimi] [REL]

Authors: Jing Bi ; Vikas Dhiman ; Tianyou Xiao ; Chenliang Xu

Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well on multiple complex tasks. However, a limitation of the typical LfD approach is that it requires expert demonstrations for all scenarios, including those in which the algorithm is already well-trained. The recently proposed Learning from Interventions (LfI) overcomes this limitation by using an expert overseer. The expert overseer only intervenes when it suspects that an unsafe action is about to be taken. Although LfI significantly improves over LfD, the state-of-the-art LfI fails to account for delay caused by the expert's reaction time and only learns short-term behavior. We address these limitations by 1) interpolating the expert's interventions back in time, and 2) by splitting the policy into two hierarchical levels, one that generates sub-goals for the future and another that generates actions to reach those desired sub-goals. This sub-goal prediction forces the algorithm to learn long-term behavior while also being robust to the expert's reaction time. Our experiments show that LfI using sub-goals in a hierarchical policy framework trains faster and achieves better asymptotic performance than typical LfD.

#3 On the Problem of Covering a 3-D Terrain [PDF] [Copy] [Kimi] [REL]

Authors: Eduard Eiben ; Isuru Godage ; Iyad Kanj ; Ge Xia

We study the problem of covering a 3-dimensional terrain by a sweeping robot that is equipped with a camera. We model the terrain as a mesh in a way that captures the elevation levels of the terrain; this enables a graph-theoretic formulation of the problem in which the underlying graph is a weighted plane graph. We show that the associated graph problem is NP-hard, and that it admits a polynomial time approximation scheme (PTAS). Finally, we implement two heuristic algorithms based on greedy approaches and report our findings.

#4 Long-Term Loop Closure Detection through Visual-Spatial Information Preserving Multi-Order Graph Matching [PDF] [Copy] [Kimi] [REL]

Authors: Peng Gao ; Hao Zhang

Loop closure detection is a fundamental problem for simultaneous localization and mapping (SLAM) in robotics. Most of the previous methods only consider one type of information, based on either visual appearances or spatial relationships of landmarks. In this paper, we introduce a novel visual-spatial information preserving multi-order graph matching approach for long-term loop closure detection. Our approach constructs a graph representation of a place from an input image to integrate visual-spatial information, including visual appearances of the landmarks and the background environment, as well as the second and third-order spatial relationships between two and three landmarks, respectively. Furthermore, we introduce a new formulation that formulates loop closure detection as a multi-order graph matching problem to compute a similarity score directly from the graph representations of the query and template images, instead of performing conventional vector-based image matching. We evaluate the proposed multi-order graph matching approach based on two public long-term loop closure detection benchmark datasets, including the St. Lucia and CMU-VL datasets. Experimental results have shown that our approach is effective for long-term loop closure detection and it outperforms the previous state-of-the-art methods.

#5 Adversarial Fence Patrolling: Non-Uniform Policies for Asymmetric Environments [PDF] [Copy] [Kimi] [REL]

Authors: Yaniv Oshrat ; Noa Agmon ; Sarit Kraus

Robot teams are very useful in patrol tasks, where the robots are required to repeatedly visit a target area in order to detect an adversary. In this work we examine the Fence Patrol problem, in which the robots must travel back and forth along an open polyline and the adversary is aware of the robots' patrol strategy. Previous work has suggested non-deterministic patrol schemes, characterized by a uniform policy along the entire area, guaranteeing that the minimal probability of penetration detection throughout the area is maximized. We present a patrol strategy with a non-uniform policy along different points of the fence, based on the location and other properties of the point. We explore this strategy in different kinds of tracks and show that the minimal probability of penetration detection achieved by this non-uniform (variant) policy is higher than former policies. We further consider applying this model in multi-robot scenarios, exploiting robot cooperation to enhance patrol efficiency. We propose novel methods for calculating the variant values, and demonstrate their performance empirically.

#6 Task and Motion Planning Is PSPACE-Complete [PDF] [Copy] [Kimi] [REL]

Authors: William Vega-Brown ; Nicholas Roy

We present a new representation for task and motion planning that uses constraints to capture both continuous and discrete phenomena in a unified framework. We show that we can decide if a feasible plan exists for a given problem instance using only polynomial space if the constraints are semialgebraic and all actions have uniform stratified accessibility, a technical condition closely related to both controllability and to the existence of a symbolic representation of a planning domain. We show that there cannot exist an algorithm that solves the more general problem of deciding if a plan exists for an instance with arbitrary semialgebraic constraints. Finally, we show that our formalism is universal, in the sense that every deterministic robotic planning problem can be well-approximated within our formalism. Together, these results imply task and motion planning is PSPACE-complete.

#7 AtLoc: Attention Guided Camera Localization [PDF] [Copy] [Kimi] [REL]

Authors: Bing Wang ; Changhao Chen ; Chris Xiaoxuan Lu ; Peijun Zhao ; Niki Trigoni ; Andrew Markham

Deep learning has achieved impressive results in camera localization, but current single-image techniques typically suffer from a lack of robustness, leading to large outliers. To some extent, this has been tackled by sequential (multi-images) or geometry constraint approaches, which can learn to reject dynamic objects and illumination conditions to achieve better performance. In this work, we show that attention can be used to force the network to focus on more geometrically robust objects and features, achieving state-of-the-art performance in common benchmark, even if using only a single image as input. Extensive experimental evidence is provided through public indoor and outdoor datasets. Through visualization of the saliency maps, we demonstrate how the network learns to reject dynamic objects, yielding superior global camera pose regression performance. The source code is avaliable at https://github.com/BingCS/AtLoc.

#8 RoboCoDraw: Robotic Avatar Drawing with GAN-Based Style Transfer and Time-Efficient Path Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Tianying Wang ; Wei Qi Toh ; Hao Zhang ; Xiuchao Sui ; Shaohua Li ; Yong Liu ; Wei Jing

Robotic drawing has become increasingly popular as an entertainment and interactive tool. In this paper we present RoboCoDraw, a real-time collaborative robot-based drawing system that draws stylized human face sketches interactively in front of human users, by using the Generative Adversarial Network (GAN)-based style transfer and a Random-Key Genetic Algorithm (RKGA)-based path optimization. The proposed RoboCoDraw system takes a real human face image as input, converts it to a stylized avatar, then draws it with a robotic arm. A core component in this system is the AvatarGAN proposed by us, which generates a cartoon avatar face image from a real human face. AvatarGAN is trained with unpaired face and avatar images only and can generate avatar images of much better likeness with human face images in comparison with the vanilla CycleGAN. After the avatar image is generated, it is fed to a line extraction algorithm and converted to sketches. An RKGA-based path optimization algorithm is applied to find a time-efficient robotic drawing path to be executed by the robotic arm. We demonstrate the capability of RoboCoDraw on various face images using a lightweight, safe collaborative robot UR5.

#9 Dempster-Shafer Theoretic Learning of Indirect Speech Act Comprehension Norms [PDF] [Copy] [Kimi] [REL]

Authors: Ruchen Wen ; Mohammed Aun Siddiqui ; Tom Williams

For robots to successfully operate as members of human-robot teams, it is crucial for robots to correctly understand the intentions of their human teammates. This task is particularly difficult due to human sociocultural norms: for reasons of social courtesy (e.g., politeness), people rarely express their intentions directly, instead typically employing polite utterance forms such as Indirect Speech Acts (ISAs). It is thus critical for robots to be capable of inferring the intentions behind their teammates' utterances based on both their interaction context (including, e.g., social roles) and their knowledge of the sociocultural norms that are applicable within that context. This work builds off of previous research on understanding and generation of ISAs using Dempster-Shafer Theoretic Uncertain Logic, by showing how other recent work in Dempster-Shafer Theoretic rule learning can be used to learn appropriate uncertainty intervals for robots' representations of sociocultural politeness norms.

#10 Modular Robot Design Synthesis with Deep Reinforcement Learning [PDF] [Copy] [Kimi] [REL]

Authors: Julian Whitman ; Raunaq Bhirangi ; Matthew Travers ; Howie Choset

Modular robots hold the promise of versatility in that their components can be re-arranged to adapt the robot design to a task at deployment time. Even for the simplest designs, determining the optimal design is exponentially complex due to the number of permutations of ways the modules can be connected. Further, when selecting the design for a given task, there is an additional computational burden in evaluating the capability of each robot, e.g., whether it can reach certain points in the workspace. This work uses deep reinforcement learning to create a search heuristic that allows us to efficiently search the space of modular serial manipulator designs. We show that our algorithm is more computationally efficient in determining robot designs for given tasks in comparison to the current state-of-the-art.

#11 Visual Tactile Fusion Object Clustering [PDF] [Copy] [Kimi] [REL]

Authors: Tao Zhang ; Yang Cong ; Gan Sun ; Qianqian Wang ; Zhenming Ding

Object clustering, aiming at grouping similar objects into one cluster with an unsupervised strategy, has been extensively-studied among various data-driven applications. However, most existing state-of-the-art object clustering methods (e.g., single-view or multi-view clustering methods) only explore visual information, while ignoring one of most important sensing modalities, i.e., tactile information which can help capture different object properties and further boost the performance of object clustering task. To effectively benefit both visual and tactile modalities for object clustering, in this paper, we propose a deep Auto-Encoder-like Non-negative Matrix Factorization framework for visual-tactile fusion clustering. Specifically, deep matrix factorization constrained by an under-complete Auto-Encoder-like architecture is employed to jointly learn hierarchical expression of visual-tactile fusion data, and preserve the local structure of data generating distribution of visual and tactile modalities. Meanwhile, a graph regularizer is introduced to capture the intrinsic relations of data samples within each modality. Furthermore, we propose a modality-level consensus regularizer to effectively align the visual and tactile data in a common subspace in which the gap between visual and tactile data is mitigated. For the model optimization, we present an efficient alternating minimization strategy to solve our proposed model. Finally, we conduct extensive experiments on public datasets to verify the effectiveness of our framework.