Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets

#1 Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets [PDF] [Copy] [Kimi] [REL]

This research proposal describes two algorithms that are aimed at learning word embeddings for data sparse and sentiment rich data sets. The goal is to use word embeddings adapted for domain specific data sets in downstream applications such as sentiment classification. The first approach learns word embeddings in a supervised fashion via SWESA (Supervised Word Embeddings for Sentiment Analysis), an algorithm for sentiment analysis on data sets that are of modest size. SWESA leverages document labels to jointly learn polarity-aware word embeddings and a classifier to classify unseen documents. In the second approach domain adapted (DA) word embeddings are learned by exploiting the specificity of domain specific data sets and the breadth of generic word embeddings. The new embeddings are formed by aligning corresponding word vectors using Canonical Correlation Analysis (CCA) or the related nonlinear Kernel CCA. Experimental results on binary sentiment classification tasks using both approaches for standard data sets are presented.

Subject: NAACL.2018 - Student Research Workshop

N18-4007@ACL

#1 Learning Word Embeddings for Data Sparse and Sentiment Rich Data Sets [PDF] [Copy] [Kimi] [REL]