favre13@interspeech_2013@ISCA

Total: 1

#1 Automatic human utility evaluation of ASR systems: does WER really predict performance? [PDF] [Copy] [Kimi1]

Authors: Benoit Favre ; Kyla Cheung ; Siavash Kazemian ; Adam Lee ; Yang Liu ; Cosmin Munteanu ; Ani Nenkova ; Dennis Ochei ; Gerald Penn ; Stephen Tratz ; Clare Voss ; Frauke Zeller

We propose an alternative evaluation metric to Word Error Rate (WER) for the decision audit task of meeting recordings, which exemplifies how to evaluate speech recognition within a legitimate application context. Using machine learning on an initial seed of human-subject experimental data, our alternative metric handily outperforms WER, which correlates very poorly with human subjectsf success in finding decisions given ASR transcripts with a range of WERs.