Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

#1 Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach [PDF⁷] [Copy] [Kimi] [REL]

Authors: Hesham M. Shehata, Mohammad Abdolrahmani

Recent graph convolutional neural networks (GCNs) have shown high performance in the field of human action recognition by using human skeleton poses. However, it fails to detect human-object interaction cases successfully due to the lack of effective representation of the scene information and appropriate learning architectures. In this context, we propose a methodology to utilize human action recognition performance by considering fixed object information in the environment and following a multi-task learning approach. In order to evaluate the proposed method, we collected real data from public environments and prepared our data set, which includes interaction classes of hands-on fixed objects (e.g., ATM ticketing machines, check-in/out machines, etc.) and non-interaction classes of walking and standing. The multi-task learning approach, along with interaction area information, succeeds in recognizing the studied interaction and non-interaction actions with an accuracy of 99.25%, outperforming the accuracy of the base model using only human skeleton poses by 2.75%.

Subject: Computer Vision and Pattern Recognition

Publish: 2025-09-11 00:14:56 UTC

2509.09067

#1 Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach [PDF7] [Copy] [Kimi] [REL]

#1 Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach [PDF⁷] [Copy] [Kimi] [REL]