D17-1023@ACL

Total: 1

#1 Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics [PDF] [Copy] [Kimi]

Authors: Zhe Zhao ; Tao Liu ; Shen Li ; Bofang Li ; Xiaoyong Du

The existing word representation methods mostly limit their information source to word co-occurrence statistics. In this paper, we introduce ngrams into four representation methods: SGNS, GloVe, PPMI matrix, and its SVD factorization. Comprehensive experiments are conducted on word analogy and similarity tasks. The results show that improved word representations are learned from ngram co-occurrence statistics. We also demonstrate that the trained ngram representations are useful in many aspects such as finding antonyms and collocations. Besides, a novel approach of building co-occurrence matrix is proposed to alleviate the hardware burdens brought by ngrams.