Automatic human utility evaluation of ASR systems: does WER really predict performance?

favre13@interspeech_2013@ISCA

Total: 1

#1 Automatic human utility evaluation of ASR systems: does WER really predict performance? [PDF] [Copy] [Kimi¹] [REL]

Authors: Benoit Favre, Kyla Cheung, Siavash Kazemian, Adam Lee, Yang Liu, Cosmin Munteanu, Ani Nenkova, Dennis Ochei, Gerald Penn, Stephen Tratz, Clare Voss, Frauke Zeller

We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning on an initial seed of human-subject experimental data, our alternative metric handily outperforms WER, which correlates very poorly with human subjectsf success in finding decisions given ASR transcripts with a range of WERs.

Subject: INTERSPEECH.2013 - Speech Processing

favre13@interspeech_2013@ISCA

#1 Automatic human utility evaluation of ASR systems: does WER really predict performance? [PDF] [Copy] [Kimi1] [REL]

#1 Automatic human utility evaluation of ASR systems: does WER really predict performance? [PDF] [Copy] [Kimi¹] [REL]