Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

#1 Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing [PDF] [Copy] [Kimi] [REL]

Authors: Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin

Variational autoencoders (VAE) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. VAE objective consists of two terms, the KL regularization term and the reconstruction term, balanced by a weighting hyper-parameter 𝛽. One notorious training difficulty is that the KL term tends to vanish. In this paper we study different scheduling schemes for 𝛽, and show that KL vanishing is caused by the lack of good latent codes in training decoder at the beginning of optimization. To remedy the issue, we propose a cyclical annealing schedule, which simply repeats the process of increasing 𝛽 multiple times. This new procedure allows us to learn more meaningful latent codes progressively by leveraging the results of previous learning cycles as warm re-restart. The effectiveness of cyclical annealing schedule is validated on a broad range of NLP tasks, including language modeling, dialog response generation and semi-supervised text classification.

Subject: NAACL.2019 - Main

N19-1021@ACL

#1 Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing [PDF] [Copy] [Kimi] [REL]