2010.03494

Total: 1

#1 TeaForN: Teacher-Forcing with N-grams [PDF6] [Copy] [Kimi] [REL]

Authors: Sebastian Goodman ; Nan Ding ; Radu Soricut

Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.

Subject: Computation and Language

Publish: 2020-10-07 15:58:25 UTC