Robotics | Cool Papers - Immersive Paper Discovery

#1 Personalized and Context-aware Route Planning for Edge-assisted Vehicles [PDF⁴] [Copy] [Kimi⁹]

Authors: Dinesh Cyril Selvaraj ; Falko Dressler ; Carla Fabiana Chiasserini

Conventional route planning services typically offer the same routes to all drivers, focusing primarily on a few standardized factors such as travel distance or time, overlooking individual driver preferences. With the inception of autonomous vehicles expected in the coming years, where vehicles will rely on routes decided by such planners, there arises a need to incorporate the specific preferences of each driver, ensuring personalized navigation experiences. In this work, we propose a novel approach based on graph neural networks (GNNs) and deep reinforcement learning (DRL), aimed at customizing routes to suit individual preferences. By analyzing the historical trajectories of individual drivers, we classify their driving behavior and associate it with relevant road attributes as indicators of driver preferences. The GNN is capable of representing the road network as graph-structured data effectively, while DRL is capable of making decisions utilizing reward mechanisms to optimize route selection with factors such as travel costs, congestion level, and driver satisfaction. We evaluate our proposed GNN-based DRL framework using a real-world road network and demonstrate its ability to accommodate driver preferences, offering a range of route options tailored to individual drivers. The results indicate that our framework can select routes that accommodate driver's preferences with up to a 17% improvement compared to a generic route planner, and reduce the travel time by 33% (afternoon) and 46% (evening) relatively to the shortest distance-based approach.

Subjects: Artificial Intelligence ; Robotics

Publish: 2024-07-25 12:14:12 UTC

#2 CRASAR-U-DROIDs: A Large Scale Benchmark Dataset for Building Alignment and Damage Assessment in Georectified sUAS Imagery [PDF²] [Copy] [Kimi¹]

Authors: Thomas Manzini ; Priyankari Perali ; Raisa Karnik ; Robin Murphy

This document presents the Center for Robot Assisted Search And Rescue - Uncrewed Aerial Systems - Disaster Response Overhead Inspection Dataset (CRASAR-U-DROIDs) for building damage assessment and spatial alignment collected from small uncrewed aerial systems (sUAS) geospatial imagery. This dataset is motivated by the increasing use of sUAS in disaster response and the lack of previous work in utilizing high-resolution geospatial sUAS imagery for machine learning and computer vision models, the lack of alignment with operational use cases, and with hopes of enabling further investigations between sUAS and satellite imagery. The CRASAR-U-DRIODs dataset consists of fifty-two (52) orthomosaics from ten (10) federally declared disasters (Hurricane Ian, Hurricane Ida, Hurricane Harvey, Hurricane Idalia, Hurricane Laura, Hurricane Michael, Musset Bayou Fire, Mayfield Tornado, Kilauea Eruption, and Champlain Towers Collapse) spanning 67.98 square kilometers (26.245 square miles), containing 21,716 building polygons and damage labels, and 7,880 adjustment annotations. The imagery was tiled and presented in conjunction with overlaid building polygons to a pool of 130 annotators who provided human judgments of damage according to the Joint Damage Scale. These annotations were then reviewed via a two-stage review process in which building polygon damage labels were first reviewed individually and then again by committee. Additionally, the building polygons have been aligned spatially to precisely overlap with the imagery to enable more performant machine learning models to be trained. It appears that CRASAR-U-DRIODs is the largest labeled dataset of sUAS orthomosaic imagery.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence ; Robotics

Publish: 2024-07-24 23:39:10 UTC

#3 PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations [PDF¹] [Copy] [Kimi²]

Authors: Cheng Qian ; Julen Urain ; Kevin Zakka ; Jan Peters

In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 56\% F1 score on unseen songs.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence ; Robotics

Publish: 2024-07-25 16:37:07 UTC

#4 Lightweight Language-driven Grasp Detection using Conditional Consistency Model [PDF¹] [Copy] [Kimi²]

Authors: Nghia Nguyen ; Minh Nhat Vu ; Baoru Huang ; An Vuong ; Ngan Le ; Thieu Vo ; Anh Nguyen

Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode visual and textual information, enabling more accurate and versatile grasp positioning that aligns well with the text query. To overcome the long inference time problem in diffusion models, we leverage the image and text features as the condition in the consistency model to reduce the number of denoising timesteps during inference. The intensive experimental results show that our method outperforms other recent grasp detection methods and lightweight diffusion models by a clear margin. We further validate our method in real-world robotic experiments to demonstrate its fast inference time capability.

Subjects: Robotics ; Computer Vision and Pattern Recognition

Publish: 2024-07-25 11:39:20 UTC

#5 Driving pattern interpretation based on action phases clustering [PDF¹] [Copy] [Kimi¹]

Authors: Xue Yao ; Simeon C. Calvert ; Serge P. Hoogendoorn

Current approaches to identifying driving heterogeneity face challenges in comprehending fundamental patterns from the perspective of underlying driving behavior mechanisms. The concept of Action phases was proposed in our previous work, capturing the diversity of driving characteristics with physical meanings. This study presents a novel framework to further interpret driving patterns by classifying Action phases in an unsupervised manner. In this framework, a Resampling and Downsampling Method (RDM) is first applied to standardize the length of Action phases. Then the clustering calibration procedure including ''Feature Selection'', ''Clustering Analysis'', ''Difference/Similarity Evaluation'', and ''Action phases Re-extraction'' is iteratively applied until all differences among clusters and similarities within clusters reach the pre-determined criteria. Application of the framework using real-world datasets revealed six driving patterns in the I80 dataset, labeled as ''Catch up'', ''Keep away'', and ''Maintain distance'', with both ''Stable'' and ''Unstable'' states. Notably, Unstable patterns are more numerous than Stable ones. ''Maintain distance'' is the most common among Stable patterns. These observations align with the dynamic nature of driving. Two patterns ''Stable keep away'' and ''Unstable catch up'' are missing in the US101 dataset, which is in line with our expectations as this dataset was previously shown to have less heterogeneity. This demonstrates the potential of driving patterns in describing driving heterogeneity. The proposed framework promises advantages in addressing label scarcity in supervised learning and enhancing tasks such as driving behavior modeling and driving trajectory prediction.

Subjects: Artificial Intelligence ; Machine Learning ; Robotics ; Applications ; Machine Learning

Publish: 2024-07-17 10:40:23 UTC

#6 StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory [PDF²] [Copy] [Kimi]

Authors: Zhiheng Li ; Yubo Cui ; Jiexi Zhong ; Zheng Fang

Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. Most approaches explore spatio-temporal information from LiDAR sequences to predict moving objects in the current frame. However, they often focus on transferring temporal cues in a single inference and regard every prediction as independent of others. This may cause inconsistent segmentation results for the same object in different frames. To overcome this issue, we propose a streaming network with a memory mechanism, called StreamMOS, to build the association of features and predictions among multiple inferences. Specifically, we utilize a short-term memory to convey historical features, which can be regarded as spatial prior of moving objects and adopted to enhance current inference by temporal fusion. Meanwhile, we build a long-term memory to store previous predictions and exploit them to refine the present forecast at voxel and instance levels through voting. Besides, we present multi-view encoder with cascade projection and asymmetric convolution to extract motion feature of objects in different representations. Extensive experiments validate that our algorithm gets competitive performance on SemanticKITTI and Sipailou Campus datasets. Code will be released at https://github.com/NEU-REAL/StreamMOS.git.

Subjects: Computer Vision and Pattern Recognition ; Robotics

Publish: 2024-07-25 09:51:09 UTC

#7 Adaptive Robot Detumbling of a Non-Rigid Satellite [PDF¹] [Copy] [Kimi¹]

Authors: Longsen Gao ; Claus Danielson ; Rafael Fierro

The challenge of satellite stabilization, particularly those with uncertain flexible dynamics, has become a pressing concern in control and robotics. These uncertainties, especially the dynamics of a third-party client satellite, significantly complicate the stabilization task. This paper introduces a novel adaptive detumbling method to handle non-rigid satellites with unknown motion dynamics (translation and rotation). The distinctive feature of our approach is that we model the non-rigid tumbling satellite as a two-link serial chain with unknown stiffness and damping in contrast to previous detumbling research works which consider the satellite a rigid body. We develop a novel adaptive robotics approach to detumble the satellite by using two space tugs as servicer despite the uncertain dynamics in the post-capture case. Notably, the stiffness properties and other physical parameters, including the mass and inertia of the two links, remain unknown to the servicer. Our proposed method addresses the challenges in detumbling tasks and paves the way for advanced manipulation of non-rigid satellites with uncertain dynamics.

Subject: Robotics

Publish: 2024-07-24 20:09:37 UTC

#8 YOCO: You Only Calibrate Once for Accurate Extrinsic Parameter in LiDAR-Camera Systems [PDF¹] [Copy] [Kimi¹]

Authors: Tianle Zeng ; Dengke He ; Feifan Yan ; Meixi He

In a multi-sensor fusion system composed of cameras and LiDAR, precise extrinsic calibration contributes to the system's long-term stability and accurate perception of the environment. However, methods based on extracting and registering corresponding points still face challenges in terms of automation and precision. This paper proposes a novel fully automatic extrinsic calibration method for LiDAR-camera systems that circumvents the need for corresponding point registration. In our approach, a novel algorithm to extract required LiDAR correspondence point is proposed. This method can effectively filter out irrelevant points by computing the orientation of plane point clouds and extracting points by applying distance- and density-based thresholds. We avoid the need for corresponding point registration by introducing extrinsic parameters between the LiDAR and camera into the projection of extracted points and constructing co-planar constraints. These parameters are then optimized to solve for the extrinsic. We validated our method across multiple sets of LiDAR-camera systems. In synthetic experiments, our method demonstrates superior performance compared to current calibration techniques. Real-world data experiments further confirm the precision and robustness of the proposed algorithm, with average rotation and translation calibration errors between LiDAR and camera of less than 0.05 degree and 0.015m, respectively. This method enables automatic and accurate extrinsic calibration in a single one step, emphasizing the potential of calibration algorithms beyond using corresponding point registration to enhance the automation and precision of LiDAR-camera system calibration.

Subjects: Robotics ; Computer Vision and Pattern Recognition

Publish: 2024-07-25 13:44:49 UTC

#9 CodedVO: Coded Visual Odometry [PDF] [Copy] [Kimi²]

Authors: Sachin Shah ; Naitri Rajyaguru ; Chahat Deep Singh ; Christopher Metzler ; Yiannis Aloimonos

Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery. By incorporating this information into our odometry pipeline, we achieve state-of-the-art performance in monocular visual odometry with a known scale. We evaluate our method in diverse indoor environments and demonstrate its robustness and adaptability. We achieve a 0.08m average trajectory error in odometry evaluation on the ICL-NUIM indoor odometry dataset.

Subjects: Robotics ; Computer Vision and Pattern Recognition

Publish: 2024-07-25 17:54:58 UTC

#10 TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework [PDF¹] [Copy] [Kimi]

Authors: Guanfeng Tang ; Zhiyuan Wu ; Rui Fan

Semantic segmentation and stereo matching, respectively analogous to the ventral and dorsal streams in our human brain, are two key components of autonomous driving perception systems. Addressing these two tasks with separate networks is no longer the mainstream direction in developing computer vision algorithms, particularly with the recent advances in large vision models and embodied artificial intelligence. The trend is shifting towards combining them within a joint learning framework, especially emphasizing feature sharing between the two tasks. The major contributions of this study lie in comprehensively tightening the coupling between semantic segmentation and stereo matching. Specifically, this study introduces three novelties: (1) a tightly coupled, gated feature fusion strategy, (2) a hierarchical deep supervision strategy, and (3) a coupling tightening loss function. The combined use of these technical contributions results in TiCoSS, a state-of-the-art joint learning framework that simultaneously tackles semantic segmentation and stereo matching. Through extensive experiments on the KITTI and vKITTI2 datasets, along with qualitative and quantitative analyses, we validate the effectiveness of our developed strategies and loss function, and demonstrate its superior performance compared to prior arts, with a notable increase in mIoU by over 9%. Our source code will be publicly available at mias.group/TiCoSS upon publication.

Subjects: Computer Vision and Pattern Recognition ; Robotics

Publish: 2024-07-25 13:31:55 UTC

#11 Egocentric Robots in a Human-Centric World? Exploring Group-Robot-Interaction in Public Spaces [PDF] [Copy] [Kimi¹]

Authors: Ana Müller ; Anja Richert

The deployment of social robots in real-world scenarios is increasing, supporting humans in various contexts. However, they still struggle to grasp social dynamics, especially in public spaces, sometimes resulting in violations of social norms, such as interrupting human conversations. This behavior, originating from a limited processing of social norms, might be perceived as robot-centered. Understanding social dynamics, particularly in group-robot-interactions (GRI), underscores the need for further research and development in human-robot-interaction (HRI). Enhancing the interaction abilities of social robots, especially in GRIs, can improve their effectiveness in real-world applications on a micro-level, as group interactions lead to increased motivation and comfort. In this study, we assessed the influence of the interaction condition (dyadic vs. triadic) on the perceived extraversion (ext.) of social robots in public spaces. The research involved 40 HRIs, including 24 dyadic (i.e., one human and one robot) interactions and 16 triadic interactions, which involve at least three entities, including the robot.

Subject: Robotics

Publish: 2024-07-25 13:04:37 UTC

#12 Influence Vectors Control for Robots Using Cellular-like Binary Actuators [PDF] [Copy] [Kimi¹]

Authors: Alexandre Girard ; Jean-Sébastien Plante

Robots using cellular-like redundant binary actuators could outmatch electric-gearmotor robotic systems in terms of reliability, force-to-weight ratio and cost. This paper presents a robust fault tolerant control scheme that is designed to meet the control challenges encountered by such robots, i.e., discrete actuator inputs, complex system modeling and cross-coupling between actuators. In the proposed scheme, a desired vectorial system output, such as a position or a force, is commanded by recruiting actuators based on their influence vectors on the output. No analytical model of the system is needed; influence vectors are identified experimentally by sequentially activating each actuator. For position control tasks, the controller uses a probabilistic approach and a genetic algorithm to determine an optimal combination of actuators to recruit. For motion control tasks, the controller uses a sliding mode approach and independent recruiting decision for each actuator. Experimental results on a four degrees of freedom binary manipulator with twenty actuators confirm the method's effectiveness, and its ability to tolerate massive perturbations and numerous actuator failures.

Subject: Robotics

Publish: 2024-07-25 15:44:06 UTC

#13 CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions [PDF] [Copy] [Kimi]

Authors: Haicheng Liao ; Haoyu Sun ; Huanming Shen ; Chengyue Wang ; Kahou Tam ; Chunlin Tian ; Li Li ; Chengzhong Xu ; Zhenning Li

Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To address these challenges, this study introduces a novel accident anticipation framework for AVs, termed CRASH. It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion. Specifically, we develop the object-aware module to prioritize high-risk objects in complex and ambiguous environments by calculating the spatial-temporal relationships between traffic agents. In parallel, the context-aware is also devised to extend global visual information from the temporal to the frequency domain using the Fast Fourier Transform (FFT) and capture fine-grained visual features of potential objects and broader context cues within traffic scenes. To capture a wider range of visual cues, we further propose a multi-layer fusion that dynamically computes the temporal dependencies between different scenes and iteratively updates the correlations between different visual features for accurate and timely accident prediction. Evaluated on real-world datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D) datasets--our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA). Importantly, its robustness and adaptability are particularly evident in challenging driving scenarios with missing or limited training data, demonstrating significant potential for application in real-world autonomous driving systems.

Subjects: Computer Vision and Pattern Recognition ; Robotics

Publish: 2024-07-25 04:12:49 UTC

#14 Strategic Pseudo-Goal Perturbation for Deadlock-Free Multi-Agent Navigation in Social Mini-Games [PDF] [Copy] [Kimi]

Authors: Abhishek Jha ; Tanishq Gupta ; Sumit Singh Rawat ; Girish Kumar

This work introduces a Strategic Pseudo-Goal Perturbation (SPGP) technique, a novel approach to resolve deadlock situations in multi-agent navigation scenarios. Leveraging the robust framework of Safety Barrier Certificates, our method integrates a strategic perturbation mechanism that guides agents through social mini-games where deadlock and collision occur frequently. The method adopts a strategic calculation process where agents, upon encountering a deadlock select a pseudo goal within a predefined radius around the current position to resolve the deadlock among agents. The calculation is based on controlled strategic algorithm, ensuring that deviation towards pseudo-goal is both purposeful and effective in resolution of deadlock. Once the agent reaches the pseudo goal, it resumes the path towards the original goal, thereby enhancing navigational efficiency and safety. Experimental results demonstrates SPGP's efficacy in reducing deadlock instances and improving overall system throughput in variety of multi-agent navigation scenarios.

Subjects: Multiagent Systems ; Robotics

Publish: 2024-07-25 04:46:51 UTC

#15 Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind [PDF] [Copy] [Kimi]

Authors: Francesca Bianco ; Silvia Rigato ; Maria Laura Filippetti ; Dimitri Ognibene

Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by our beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low-level mental states (e.g., intentions, goals) and attributing high-level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed-forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs-driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.

Subjects: Neural and Evolutionary Computing ; Artificial Intelligence ; Machine Learning ; Robotics

Publish: 2024-07-25 13:15:25 UTC

#16 Pose, Velocity and Landmark Position Estimation Using IMU and Bearing Measurements [PDF] [Copy] [Kimi]

Authors: Miaomiao Wang ; Abdelhamid Tayebi

This paper investigates the estimation problem of the pose (orientation and position) and linear velocity of a rigid body, as well as the landmark positions, using an inertial measurement unit (IMU) and a monocular camera. First, we propose a globally exponentially stable (GES) linear time-varying (LTV) observer for the estimation of body-frame landmark positions and velocity, using IMU and monocular bearing measurements. Thereafter, using the gyro measurements, some landmarks known in the inertial frame and the estimates from the LTV observer, we propose a nonlinear pose observer on $\SO(3)\times \mathbb{R}^3$. The overall estimation system is shown to be almost globally asymptotically stable (AGAS) using the notion of almost global input-to-state stability (ISS). Interestingly, we show that with the knowledge (in the inertial frame) of a small number of landmarks, we can recover (under some conditions) the unknown positions (in the inertial frame) of a large number of landmarks. Numerical simulation results are presented to illustrate the performance of the proposed estimation scheme.

Subjects: Systems and Control ; Robotics ; Systems and Control

Publish: 2024-07-25 15:03:33 UTC

#17 Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception [PDF] [Copy] [Kimi]

Authors: Julia Hindel ; Daniele Cattaneo ; Abhinav Valada

Semantic segmentation models are typically trained on a fixed set of classes, limiting their applicability in open-world scenarios. Class-incremental semantic segmentation aims to update models with emerging new classes while preventing catastrophic forgetting of previously learned ones. However, existing methods impose strict rigidity on old classes, reducing their effectiveness in learning new incremental classes. In this work, we propose Taxonomy-Oriented Poincar\'e-regularized Incremental-Class Segmentation (TOPICS) that learns feature embeddings in hyperbolic space following explicit taxonomy-tree structures. This supervision provides plasticity for old classes, updating ancestors based on new classes while integrating new classes at fitting positions. Additionally, we maintain implicit class relational constraints on the geometric basis of the Poincar\'e ball. This ensures that the latent space can continuously adapt to new constraints while maintaining a robust structure to combat catastrophic forgetting. We also establish eight realistic incremental learning protocols for autonomous driving scenarios, where novel classes can originate from known classes or the background. Extensive evaluations of TOPICS on the Cityscapes and Mapillary Vistas 2.0 benchmarks demonstrate that it achieves state-of-the-art performance. We make the code and trained models publicly available at http://topics.cs.uni-freiburg.de.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence ; Robotics

Publish: 2024-07-25 15:49:26 UTC

#18 Passive wing deployment and retraction in beetles and flapping microrobots [PDF] [Copy] [Kimi]

Authors: Hoang-Vu Phan ; Hoon Cheol Park ; Dario Floreano

Birds, bats and many insects can tuck their wings against their bodies at rest and deploy them to power flight. Whereas birds and bats use well-developed pectoral and wing muscles and tendons, how insects control these movements remains unclear, as mechanisms of wing deployment and retraction vary among insect species. Beetles (Coleoptera) display one of the most complex wing mechanisms. For example, in rhinoceros beetles, the wing deployment initiates by fully opening the elytra and partially releasing the hindwings from the abdomen. Subsequently, the beetle starts flapping, elevates the hindwings at the bases, and unfolds the wingtips in an origami-like fashion. Whilst the origami-like fold have been extensively explored, limited attention has been given to the hindwing base deployment and retraction, which are believed to be driven by thoracic muscles. Using high-speed cameras and robotic flapping-wing models, here we demonstrate that rhinoceros beetles can effortlessly elevate the hindwings to flight position without the need for muscular activity. We show that opening the elytra triggers a spring-like partial release of the hindwings from the body, allowing the clearance needed for subsequent flapping motion that brings the hindwings into flight position. The results also show that after flight, beetles can leverage the elytra to push the hindwings back into the resting position, further strengthening the hypothesis of a passive deployment mechanism. Finally, we validate the hypothesis with a flapping microrobot that passively deploys its wings for stable controlled flight and retracts them neatly upon landing, which offers a simple yet effective approach to the design of insect-like flying micromachines.

Subjects: Biological Physics ; Robotics

Publish: 2024-07-25 16:39:42 UTC

#19 Meta-Reinforcement Learning for Universal Quadrupedal Locomotion Control [PDF] [Copy] [Kimi]

Authors: Fabrizio Di Giuro ; Fatemeh Zargarbashi ; Jin Cheng ; Dongho Kang ; Bhavya Sukhija ; Stelian Coros

This work presents a deep reinforcement learning-based approach to develop a policy for robot-agnostic locomotion control. Our method involves training an agent equipped with memory, implemented as a recurrent policy, on a diverse set of procedurally generated quadruped robots. We demonstrate that the policies trained by our framework transfer seamlessly to both simulated and real-world quadrupeds not encountered during training, maintaining high-quality motion across platforms. Through a series of simulation and hardware experiments, we highlight the critical role of the recurrent unit in enabling generalization, rapid adaptation to changes in the robot's dynamic properties, and sample efficiency.

Subject: Robotics

Publish: 2024-07-05 14:31:51 UTC

#20 Quality Diversity for Robot Learning: Limitations and Future Directions [PDF] [Copy] [Kimi]

Authors: Sumeet Batra ; Bryon Tjanaka ; Stefanos Nikolaidis ; Gaurav Sukhatme

Quality Diversity (QD) has shown great success in discovering high-performing, diverse policies for robot skill learning. While current benchmarks have led to the development of powerful QD methods, we argue that new paradigms must be developed to facilitate open-ended search and generalizability. In particular, many methods focus on learning diverse agents that each move to a different xy position in MAP-Elites-style bounded archives. Here, we show that such tasks can be accomplished with a single, goal-conditioned policy paired with a classical planner, achieving O(1) space complexity w.r.t. the number of policies and generalization to task variants. We hypothesize that this approach is successful because it extracts task-invariant structural knowledge by modeling a relational graph between adjacent cells in the archive. We motivate this view with emerging evidence from computational neuroscience and explore connections between QD and models of cognitive maps in human and other animal brains. We conclude with a discussion exploring the relationships between QD and cognitive maps, and propose future research directions inspired by cognitive maps towards future generalizable algorithms capable of truly open-ended search.

Subjects: Robotics ; Artificial Intelligence

Publish: 2024-07-09 23:29:54 UTC

#21 Amplifying the Kinematics of Origami Mechanisms With Spring Joints [PDF] [Copy] [Kimi]

Author: Malcolm Smith

Due to its rigid foldability and predictable kinematics, the reverse fold is the fundamental mechanism behind some of the most well known origami kinematic structures, including the Miura Ori, Yoshimura, and waterbomb patterns. However, the reverse fold only has one parameter to control its behavior: the starting fold angle. In this paper I introduce an alternative to the traditional reverse fold, based on the spring into action pattern, called the spring joint. This novel rigidly foldable mechanism is able to couple multiple reverse folds into a compact space to amplify the kinematic output of a traditional reverse fold by up to ten times, and to add one parameter for each reverse fold, giving more programmatic control of origami structures. Methods of parameterizing both the starting angle, the path of travel, and the axis of motion are also introduced. Unfortunately, this versatility comes at the cost of a large buildup of layers, making the spring joint impractical for thick origami mechanisms. To solve this problem, I also introduce a modular alternative to the spring joint that has no additional layers, with the same kinematic properties. Both of these mechanisms are tested as replacements for the reverse fold in both traditional and custom origami structures.

Subjects: Robotics ; Algebraic Geometry

Publish: 2024-07-10 07:38:25 UTC

#22 RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control [PDF] [Copy] [Kimi]

Authors: Seung Hyeon Bang ; Carlos Arribalzaga Jové ; Luis Sentis

This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach.

Subject: Robotics

Publish: 2024-07-25 00:51:19 UTC

#23 PGD-VIO: An Accurate Plane-Aided Visual-Inertial Odometry with Graph-Based Drift Suppression [PDF] [Copy] [Kimi]

Authors: Yidi Zhang ; Fulin Tang ; Zewen Xu ; Yihong Wu ; Pengju Ma

Generally, high-level features provide more geometrical information compared to point features, which can be exploited to further constrain motions. Planes are commonplace in man-made environments, offering an active means to reduce drift, due to their extensive spatial and temporal observability. To make full use of planar information, we propose a novel visual-inertial odometry (VIO) using an RGBD camera and an inertial measurement unit (IMU), effectively integrating point and plane features in an extended Kalman filter (EKF) framework. Depth information of point features is leveraged to improve the accuracy of point triangulation, while plane features serve as direct observations added into the state vector. Notably, to benefit long-term navigation,a novel graph-based drift detection strategy is proposed to search overlapping and identical structures in the plane map so that the cumulative drift is suppressed subsequently. The experimental results on two public datasets demonstrate that our system outperforms state-of-the-art methods in localization accuracy and meanwhile generates a compact and consistent plane map, free of expensive global bundle adjustment and loop closing techniques.

Subject: Robotics

Publish: 2024-07-25 02:04:54 UTC

#24 Complex picking via entanglement of granular mechanical metamaterials [PDF] [Copy] [Kimi]

Authors: Ashkan Rezanejad ; Mostafa Mousa ; Matthew Howard ; Antonio Elia Forte

When objects are packed in a cluster, physical interactions are unavoidable. Such interactions emerge because of the objects geometric features; some of these features promote entanglement, while others create repulsion. When entanglement occurs, the cluster exhibits a global, complex behaviour, which arises from the stochastic interactions between objects. We hereby refer to such a cluster as an entangled granular metamaterial. We investigate the geometrical features of the objects which make up the cluster, henceforth referred to as grains, that maximise entanglement. We hypothesise that a cluster composed from grains with high propensity to tangle, will also show propensity to interact with a second cluster of tangled objects. To demonstrate this, we use the entangled granular metamaterials to perform complex robotic picking tasks, where conventional grippers struggle. We employ an electromagnet to attract the metamaterial (ferromagnetic) and drop it onto a second cluster of objects (targets, non-ferromagnetic). When the electromagnet is re-activated, the entanglement ensures that both the metamaterial and the targets are picked, with varying degrees of physical engagement that strongly depend on geometric features. Interestingly, although the metamaterials structural arrangement is random, it creates repeatable and consistent interactions with a second tangled media, enabling robust picking of the latter.

Subjects: Robotics ; Applied Physics

Publish: 2024-07-25 07:54:42 UTC

#25 Goal Estimation-based Adaptive Shared Control for Brain-Machine Interfaces Remote Robot Navigation [PDF] [Copy] [Kimi]

Authors: Tomoka Muraoka ; Tatsuya Aoki ; Masayuki Hirata ; Tadahiro Taniguchi ; Takato Horii ; Takayuki Nagai

In this study, we propose a shared control method for teleoperated mobile robots using brain-machine interfaces (BMI). The control commands generated through BMI for robot operation face issues of low input frequency, discreteness, and uncertainty due to noise. To address these challenges, our method estimates the user's intended goal from their commands and uses this goal to generate auxiliary commands through the autonomous system that are both at a higher input frequency and more continuous. Furthermore, by defining the confidence level of the estimation, we adaptively calculated the weights for combining user and autonomous commands, thus achieving shared control.

Subject: Robotics

Publish: 2024-07-25 10:49:16 UTC