Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection

#1 Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection [PDF] [Copy] [Kimi¹] [REL]

Authors: Ziwei Zhu, Zhiyong Wu, Runnan Li, Helen Meng, Lianhong Cai

With the explosive development of human-computer speech interaction, spoken term detection is widely required and has attracted increasing interest. In this paper, we propose a weak supervised approach using Siamese recurrent auto-encoder (RAE) to represent speech segments for query-by-example spoken term detection (QbyE-STD). The proposed approach exploits word pairs that contain different instances of the same/different word content as input to train the Siamese RAE. The encoder last hidden state vector of Siamese RAE is used as the feature for QbyE-STD, which is a fixed dimensional embedding feature containing mostly semantic content related information. The advantages of the proposed approach are: 1) extracting more compact feature with fixed dimension while keeping the semantic information for STD; 2) the extracted feature can describe the sequential phonetic structure of similar sounds to degree, which can be applied for zero-resource QbyE-STD. Evaluations on real scene Chinese speech interaction data and TIMIT confirm the effectiveness and efficiency of the proposed approach compared to the conventional ones.

Subject: INTERSPEECH.2018 - Others

zhu18b@interspeech_2018@ISCA

#1 Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection [PDF] [Copy] [Kimi1] [REL]

#1 Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection [PDF] [Copy] [Kimi¹] [REL]