Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics

D17-1023@ACL

Total: 1

#1 Ngram2vec: Learning Improved Word Representations from Ngram Co-occurrence Statistics [PDF] [Copy] [Kimi] [REL]

Authors: Zhe Zhao, Tao Liu, Shen Li, Bofang Li, Xiaoyong Du

The existing word representation methods mostly limit their information source to word co-occurrence statistics. In this paper, we introduce ngrams into four representation methods: SGNS, GloVe, PPMI matrix, and its SVD factorization. Comprehensive experiments are conducted on word analogy and similarity tasks. The results show that improved word representations are learned from ngram co-occurrence statistics. We also demonstrate that the trained ngram representations are useful in many aspects such as finding antonyms and collocations. Besides, a novel approach of building co-occurrence matrix is proposed to alleviate the hardware burdens brought by ngrams.

Subject: EMNLP.2017 - Main