zhu18b@interspeech_2018@ISCA

Total: 1

#1 Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection [PDF] [Copy] [Kimi1]

Authors: Ziwei Zhu ; Zhiyong Wu ; Runnan Li ; Helen Meng ; Lianhong Cai

With the explosive development of human-computer speech interaction, spoken term detection is widely required and has attracted increasing interest. In this paper, we propose a weak supervised approach using Siamese recurrent auto-encoder (RAE) to represent speech segments for query-by-example spoken term detection (QbyE-STD). The proposed approach exploits word pairs that contain different instances of the same/different word content as input to train the Siamese RAE. The encoder last hidden state vector of Siamese RAE is used as the feature for QbyE-STD, which is a fixed dimensional embedding feature containing mostly semantic content related information. The advantages of the proposed approach are: 1) extracting more compact feature with fixed dimension while keeping the semantic information for STD; 2) the extracted feature can describe the sequential phonetic structure of similar sounds to degree, which can be applied for zero-resource QbyE-STD. Evaluations on real scene Chinese speech interaction data and TIMIT confirm the effectiveness and efficiency of the proposed approach compared to the conventional ones.