Total: 1
Recently, artificial-intelligence (AI) technologies have been increasingly utilized in a wide range of real-world applications. Speech recognition is one of these practical AI tasks and is regarded as a key application for edge AI systems. Consequently, speech recognition has been widely employed as a representative benchmark task for assessing the performance of physical reservoir computing (PRC). Although many PRCs have performed this task, the majority of them rely on the frequency-extraction preprocessing method, such as a cochleagram and mel-frequency cepstrum. Especially about the cochleagram, this method enables high-accuracy recognition; however, it requires a substantial computational cost for preprocessing and is unsuitable for edge computing, due to the limited resources. In this study, we employed a nonlinear interfered spin wave-based PRC, which demonstrated superior computational performance in mathematical tasks. Using this PRC, we evaluated the performance for two types of speech recognition, spoken digit recognition and speaker classification under four configurations: cochleagram-alone, interfered spin wave-based PRC with cochleagram, baseline without PRC, and interfered spin wave-based PRC alone to quantify the contributions of the cochleagram and of the interfered spin wave-based PRC for each task. As a result, although the cochleagram alone yielded accuracies around 90 % for both tasks, the accuracy reached 85.8 % for speaker classification when only the interfered spin wave-based PRC was used. These results indicate the potential of the proposed PRC to handle speech recognition tasks without cochleagram preprocessing.