Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction

#1 Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction [PDF] [Copy] [Kimi¹] [REL]

Authors: Zhenjiang Mao, Ivan Ruchkin

Deep learning models are increasingly employed for perception, prediction, and control in robotic systems. For for achieving realistic and consistent outputs, it is crucial to embed physical knowledge into their learned representations. However, doing so is difficult due to high-dimensional observation data, such as images, particularly under conditions of incomplete system knowledge and imprecise state sensing. To address this, we propose Physically Interpretable World Models, a novel architecture that aligns learned latent representations with real-world physical quantities. To this end, our architecture combines three key elements: (1) a vector-quantized image autoencoder, (2) a transformer-based physically interpretable autoencoder, and (3) a partially known dynamical model. The training incorporates weak interval-based supervision to eliminate the impractical reliance on ground-truth physical knowledge. Three case studies demonstrate that our approach achieves physical interpretability and accurate state predictions, thus advancing representation learning for robotics.

Subject: Machine Learning

Publish: 2024-12-17 12:51:24 UTC

2412.12870

#1 Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction [PDF] [Copy] [Kimi1] [REL]

#1 Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction [PDF] [Copy] [Kimi¹] [REL]