Total: 1
While current deep learning models achieve high performance by learning statistical correlations from vast datasets,which stands in stark contrast to human learning. They lack the flexibility of humans-particularly preverbal infants-to autonomously acquire the underlying structure of the world from limited experience and adapt to novel situations. In this study, we propose an unsupervised representation learning method based on a hierarchical relationship in group operations, rather than statistical independence, aiming to build a computational model of the cognitive development of infants. The proposed model features an integrated architecture that simultaneously performs object segmentation and the extraction of motion laws from dynamic image sequences. By introducing the Homomorphism from algebra as a structural constraint within a neural network, the model structurally separates pixel-level changes into meaningful, decomposed transformation components, such as translation and deformation. Using interaction scenes (chasing and evading tasks) based on developmental science findings, we experimentally demonstrate that the model can segment multiple objects into individual slots without any ground-truth labels. Furthermore, we confirmed that relative movements between objects, such as approaching or receding, are accurately mapped and structured into a one-dimensional additive latent space. These results suggest that by introducing algebraic geometric constraints rather than relying solely on statistical correlation learning, physically interpretable "disentangled representations" can be acquired. This study contributes to the understanding of the process by which infants internalize environmental laws as structures and provides a new perspective for constructing artificial systems with developmental intelligence.