AAAI.2016 - AI and the Web

| Total: 33

#1 Supervised Hashing via Uncorrelated Component Analysis [PDF] [Copy] [Kimi] [REL]

Authors: SungRyull Sohn, Hyunwoo Kim, Junmo Kim

The Approximate Nearest Neighbor (ANN) search problem is important in applications such as information retrieval. Several hashing-based search methods that provide effective solutions to the ANN search problem have been proposed. However, most of these focus on similarity preservation and coding error minimization, and pay little attention to optimizing the precision-recall curve or receiver operating characteristic curve. In this paper, we propose a novel projection-based hashing method that attempts to maximize the precision and recall. We first introduce an uncorrelated component analysis (UCA) by examining the precision and recall, and then propose a UCA-based hashing method. The proposed method is evaluated with a variety of datasets. The results show that UCA-based hashing outperforms state-of-the-art methods, and has computationally efficient training and encoding processes.


#2 "8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality [PDF] [Copy] [Kimi] [REL]

Authors: Prakhar Biyani, Kostas Tsioutsiouliklis, John Blackmer

Clickbaits are articles with misleading titles, exaggerating the content on the landing page. Their goal is to entice users to click on the title in order to monetize the landing page. The content on the landing page is usually of low quality. Their presence in user homepage stream of news aggregator sites (e.g., Yahoo news, Google news) may adversely impact user experience. Hence, it is important to identify and demote or block them on homepages. In this paper, we present a machine-learning model to detect clickbaits. We use a variety of features and show that the degree of informality of a webpage (as measured by different metrics) is a strong indicator of it being a clickbait. We conduct extensive experiments to evaluate our approach and analyze properties of clickbait and non-clickbait articles. Our model achieves high performance (74.9% F-1 score) in predicting clickbaits.


#3 Top-N Recommender System via Matrix Completion [PDF] [Copy] [Kimi] [REL]

Authors: Zhao Kang, Chong Peng, Qiang Cheng

Top-N recommender systems have been investigated widely both in industry and academia. However, the recommendation quality is far from satisfactory. In this paper, we propose a simple yet promising algorithm. We fill the user-item matrix based on a low-rank assumption and simultaneously keep the original information. To do that, a nonconvex rank relaxation rather than the nuclear norm is adopted to provide a better rank approximation and an efficient optimization strategy is designed. A comprehensive set of experiments on real datasets demonstrates that our method pushes the accuracy of Top-N recommendation to a new level.


#4 Identifying Sentiment Words Using an Optimization Model with L1 Regularization [PDF] [Copy] [Kimi] [REL]

Authors: Zhi-Hong Deng, Hongliang Yu, Yunlun Yang

Sentiment word identification is a fundamental work in numerous applications of sentiment analysis and opinion mining, such as review mining, opinion holder finding, and twitter classification. In this paper, we propose an optimization model with L1 regularization, called ISOMER, for identifying the sentiment words from the corpus. Our model can employ both seed words and documents with sentiment labels, different from most existing researches adopting seed words only. The L1 penalty in the objective function yields a sparse solution since most candidate words have no sentiment. The experiments on the real datasets show that ISOMER outperforms the classic approaches, and that the lexicon learned by ISOMER can be effectively adapted to document-level sentiment analysis.


#5 Fortune Teller: Predicting Your Career Path [PDF] [Copy] [Kimi] [REL]

Authors: Ye Liu, Luming Zhang, Liqiang Nie, Yan Yan, David Rosenblum

People go to fortune tellers in hopes of learning things about their future. A future career path is one of the topics most frequently discussed. But rather than rely on "black arts" to make predictions, in this work we scientifically and systematically study the feasibility of career path prediction from social network data. In particular, we seamlessly fuse information from multiple social networks to comprehensively describe a user and characterize progressive properties of his or her career path. This is accomplished via a multi-source learning framework with fused lasso penalty, which jointly regularizes the source and career-stage relatedness. Extensive experiments on real-world data confirm the accuracy of our model.


#6 Unfolding Temporal Dynamics: Predicting Social Media Popularity Using Multi-scale Temporal Decomposition [PDF] [Copy] [Kimi] [REL]

Authors: Bo Wu, Tao Mei, Wen-Huang Cheng, Yongdong Zhang

Time information plays a crucial role on social media popularity. Existing research on popularity prediction, effective though, ignores temporal information which is highly related to user-item associations and thus often results in limited success. An essential way is to consider all these factors (user, item, and time), which capture the dynamic nature of photo popularity. In this paper, we present a novel approach to factorize the popularity into user-item context and time-sensitive context for exploring the mechanism of dynamic popularity. The user-item context provides a holistic view of popularity, while the time-sensitive context captures the temporal dynamics nature of popularity. Accordingly, we develop two kinds of time-sensitive features, including user activeness variability and photo prevalence variability. To predict photo popularity, we propose a novel framework named Multi-scale Temporal Decomposition (MTD), which decomposes the popularity matrix in latent spaces based on contextual associations. Specifically, the proposed MTD models time-sensitive context on different time scales, which is beneficial to automatically learn temporal patterns. Based on the experiments conducted on a real-world dataset with 1.29M photos from Flickr, our proposed MTD can achieve the prediction accuracy of 79.8% and outperform the best three state-of-the-art methods with a relative improvement of 9.6% on average.


#7 Predicting the Next Location: A Recurrent Model with Spatial and Temporal Contexts [PDF] [Copy] [Kimi] [REL]

Authors: Qiang Liu, Shu Wu, Liang Wang, Tieniu Tan

Spatial and temporal contextual information plays a key role for analyzing user behaviors, and is helpful for predicting where he or she will go next. With the growing ability of collecting information, more and more temporal and spatial contextual information is collected in systems, and the location prediction problem becomes crucial and feasible. Some works have been proposed to address this problem, but they all have their limitations. Factorizing Personalized Markov Chain (FPMC) is constructed based on a strong independence assumption among different factors, which limits its performance. Tensor Factorization (TF) faces the cold start problem in predicting future actions. Recurrent Neural Networks (RNN) model shows promising performance comparing with PFMC and TF, but all these methods have problem in modeling continuous time interval and geographical distance. In this paper, we extend RNN and propose a novel method called Spatial Temporal Recurrent Neural Networks (ST-RNN). ST-RNN can model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transition matrices for different geographical distances. Experimental results show that the proposed ST-RNN model yields significant improvements over the competitive compared methods on two typical datasets, i.e., Global Terrorism Database (GTD) and Gowalla dataset.


#8 Community-Based Question Answering via Heterogeneous Social Network Learning [PDF] [Copy] [Kimi] [REL]

Authors: Hanyin Fang, Fei Wu, Zhou Zhao, Xinyu Duan, Yueting Zhuang, Martin Ester

Community-based question answering (cQA) sites have accumulated vast amount of questions and corresponding crowdsourced answers over time. How to efficiently share the underlying information and knowledge from reliable (usually highly-reputable) answerers has become an increasingly popular research topic. A major challenge in cQA tasks is the accurate matching of high-quality answers w.r.t given questions. Many of traditional approaches likely recommend corresponding answers merely depending on the content similarity between questions and answers, therefore suffer from the sparsity bottleneck of cQA data. In this paper, we propose a novel framework which encodes not only the contents of question-answer(Q-A) but also the social interaction cues in the community to boost the cQA tasks. More specifically, our framework collaboratively utilizes the rich interaction among questions, answers and answerers to learn the relative quality rank of different answers w.r.t a same question. Moreover, the information in heterogeneous social networks is comprehensively employed to enhance the quality of question-answering (QA) matching by our deep random walk learning framework. Extensive experiments on a large-scale dataset from a real world cQA site show that leveraging the heterogeneous social information indeed achieves better performance than other state-of-the-art cQA methods.


#9 VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback [PDF] [Copy] [Kimi] [REL]

Authors: Ruining He, Julian McAuley

Modern recommender systems model people and items by discovering or `teasing apart' the underlying dimensions that encode the properties of items and users' preferences toward them. Critically, such dimensions are uncovered based on user feedback, often in implicit form (such as purchase histories, browsing logs, etc.); in addition, some recommender systems make use of side information, such as product attributes, temporal information, or review text.However one important feature that is typically ignored by existing personalized recommendation and ranking methods is the visual appearance of the items being considered. In this paper we propose a scalable factorization model to incorporate visual signals into predictors of people's opinions, which we apply to a selection of large, real-world datasets. We make use of visual features extracted from product images using (pre-trained) deep networks, on top of which we learn an additional layer that uncovers the visual dimensions that best explain the variation in people's feedback. This not only leads to significantly more accurate personalized ranking methods, but also helps to alleviate cold start issues, and qualitatively to analyze the visual dimensions that influence people's opinions.


#10 Context-Sensitive Twitter Sentiment Classification Using Neural Network [PDF] [Copy] [Kimi] [REL]

Authors: Yafeng Ren, Yue Zhang, Meishan Zhang, Donghong Ji

Sentiment classification on Twitter has attracted increasing research in recent years.Most existing work focuses on feature engineering according to the tweet content itself.In this paper, we propose a context-based neural network model for Twitter sentiment analysis, incorporating contextualized features from relevant Tweets into the model in the form of word embedding vectors.Experiments on both balanced and unbalanced datasets show that our proposed models outperform the current state-of-the-art.


#11 From Tweets to Wellness: Wellness Event Detection from Twitter Streams [PDF] [Copy] [Kimi] [REL]

Authors: Mohammad Akbari, Xia Hu, Nie Liqiang, Tat-Seng Chua

Social media platforms have become the most popular means for users to share what is happening around them. The abundance and growing usage of social media has resulted in a large repository of users' social posts, which provides a stethoscope for inferring individuals' lifestyle and wellness. As users' social accounts implicitly reflect their habits, preferences, and feelings, it is feasible for us to monitor and understand the wellness of users by harvesting social media data towards a healthier lifestyle. As a first step towards accomplishing this goal, we propose to automatically extract wellness events from users' published social contents. Existing approaches for event extraction are not applicable to personal wellness events due to its domain nature characterized by plenty of noise and variety in data, insufficient samples, and inter-relation among events.To tackle these problems, we propose an optimization learning framework that utilizes the content information of microblogging messages as well as the relations between event categories. By imposing a sparse constraint on the learning model, we also tackle the problems arising from noise and variation in microblogging texts. Experimental results on a real-world dataset from Twitter have demonstrated the superior performance of our framework.


#12 Recommendation with Social Dimensions [PDF] [Copy] [Kimi] [REL]

Authors: Jiliang Tang, Suhang Wang, Xia Hu, Dawei Yin, Yingzhou Bi, Yi Chang, Huan Liu

The pervasive presence of social media greatly enriches online users' social activities, resulting in abundant social relations. Social relations provide an independent source for recommendation, bringing about new opportunities for recommender systems. Exploiting social relations to improve recommendation performance attracts a great amount of attention in recent years. Most existing social recommender systems treat social relations homogeneously and make use of direct connections (or strong dependency connections). However, connections in online social networks are intrinsically heterogeneous and are a composite of various relations. While connected users in online social networks form groups, and users in a group share similar interests, weak dependency connections are established among these users when they are not directly connected. In this paper, we investigate how to exploit the heterogeneity of social relations and weak dependency connections for recommendation. In particular, we employ social dimensions to simultaneously capture heterogeneity of social relations and weak dependency connections, and provide principled ways to model social dimensions, and propose a recommendation framework SoDimRec which incorporates heterogeneity of social relations and weak dependency connections based on social dimensions. Experimental results on real-world data sets demonstrate the effectiveness of the proposed framework. We conduct further experiments to understand the important role of social dimensions in the proposed framework.


#13 Semantic Community Identification in Large Attribute Networks [PDF] [Copy] [Kimi] [REL]

Authors: Xiao Wang, Di Jin, Xiaochun Cao, Liang Yang, Weixiong Zhang

Identification of modular or community structures of a network is a key to understanding the semantics and functions of the network. While many network community detection methods have been developed, which primarily explore network topologies, they provide little semantic information of the communities discovered. Although structures and semantics are closely related, little effort has been made to discover and analyze these two essential network properties together. By integrating network topology and semantic information on nodes, e.g., node attributes, we study the problems of detection of communities and inference of their semantics simultaneously. We propose a novel nonnegative matrix factorization (NMF) model with two sets of parameters, the community membership matrix and community attribute matrix, and present efficient updating rules to evaluate the parameters with a convergence guarantee. The use of node attributes improves upon community detection and provides a semantic interpretation to the resultant network communities. Extensive experimental results on synthetic and real-world networks not only show the superior performance of the new method over the state-of-the-art approaches, but also demonstrate its ability to semantically annotate the communities.


#14 Capturing Semantic Correlation for Item Recommendation in Tagging Systems [PDF] [Copy] [Kimi] [REL]

Authors: Chaochao Chen, Xiaolin Zheng, Yan Wang, Fuxing Hong, Deren Chen

The popularity of tagging systems provides a great opportunity to improve the performance of item recommendation. Although existing approaches use topic modeling to mine the semantic information of items by grouping the tags labelled for items, they overlook an important property that tags link users and items as a bridge. Thus these methods cannot deal with the data sparsity without commonly rated items (DS-WO-CRI) problem, limiting their recommendation performance. Towards solving this challenging problem, we propose a novel tag and rating based collaborative filtering (CF) model for item recommendation, which first uses topic modeling to mine the semantic information of tags for each user and for each item respectively, and then incorporates the semantic information into matrix factorization to factorize rating information and to capture the bridging feature of tags and ratings between users and items.As a result, our model captures the semantic correlation between users and items, and is able to greatly improve recommendation performance, especially in DS-WO-CRI situations.Experiments conducted on two popular real-world datasets demonstrate that our proposed model significantly outperforms the conventional CF approach, the state-of-the-art social relation based CF approach, and the state-of-the-art topic modeling based CF approaches in terms of both precision and recall, and it is an effective approach to the DS-WO-CRI problem.


#15 Cross-Lingual Taxonomy Alignment with Bilingual Biterm Topic Model [PDF] [Copy] [Kimi] [REL]

Authors: Tianxing Wu, Guilin Qi, Haofen Wang, Kang Xu, Xuan Cui

As more and more multilingual knowledge becomes available on the Web, knowledge sharing across languages has become an important task to benefit many applications. One of the most crucial kinds of knowledge on the Web is taxonomy, which is used to organize and classify the Web data. To facilitate knowledge sharing across languages, we need to deal with the problem of cross-lingual taxonomy alignment, which discovers the most relevant category in the target taxonomy of one language for each category in the source taxonomy of another language. Current approaches for aligning cross-lingual taxonomies strongly rely on domain-specific information and the features based on string similarities. In this paper, we present a new approach to deal with the problem of cross-lingual taxonomy alignment without using any domain-specific information. We first identify the candidate matched categories in the target taxonomy for each category in the source taxonomy using the cross-lingual string similarity. We then propose a novel bilingual topic model, called Bilingual Biterm Topic Model (BiBTM), to perform exact matching. BiBTM is trained by the textual contexts extracted from the Web. We conduct experiments on two kinds of real world datasets. The experimental results show that our approach significantly outperforms the designed state-of-the-art comparison methods.


#16 Modeling Users’ Preferences and Social Links in Social Networking Services: A Joint-Evolving Perspective [PDF] [Copy] [Kimi] [REL]

Authors: Le Wu, Yong Ge, Qi Liu, Enhong Chen, Bai Long, Zhenya Huang

Researchers have long converged that the evolution of a Social Networking Service (SNS) platform is driven by the interplay between users' preferences (reflected in user-item consumption behavior) and the social network structure (reflected in user-user interaction behavior), with both kinds of users' behaviors change from time to time. However, traditional approaches either modeled these two kinds of behaviors in an isolated way or relied on a static assumption of a SNS. Thus, it is still unclear how do the roles of users' historical preferences and the dynamic social network structure affect the evolution of SNSs. Furthermore, can jointly modeling users' temporal behaviors in SNSs benefit both behavior prediction tasks?In this paper, we leverage the underlying social theories(i.e., social influence and the homophily effect) to investigate the interplay and evolution of SNSs. We propose a probabilistic approach to fuse these social theories for jointly modeling users' temporal behaviors in SNSs. Thus our proposed model has both the explanatory ability and predictive power. Experimental results on two real-world datasets demonstrate the effectiveness of our proposed model.


#17 Detect Overlapping Communities via Ranking Node Popularities [PDF] [Copy] [Kimi] [REL]

Authors: Di Jin, Hongcui Wang, Jianwu Dang, Dongxiao He, Weixiong Zhang

Detection of overlapping communities has drawn much attention lately as they are essential properties of real complex networks. Despite its influence and popularity, the well studied and widely adopted stochastic model has not been made effective for finding overlapping communities. Here we extend the stochastic model method to detection of overlapping communities with the virtue of autonomous determination of the number of communities. Our approach hinges upon the idea of ranking node popularities within communities and using a Bayesian method to shrink communities to optimize an objective function based on the stochastic generative model. We evaluated the novel approach, showing its superior performance over five state-of-the-art methods, on large real networks and synthetic networks with ground-truths of overlapping communities.


#18 Online Cross-Modal Hashing for Web Image Retrieval [PDF] [Copy] [Kimi] [REL]

Authors: Liang Xie, Jialie Shen, Lei Zhu

Cross-modal hashing (CMH) is an efficient technique for the fast retrieval of web image data, and it has gained a lot of attentions recently. However, traditional CMH methods usually apply batch learning for generating hash functions and codes. They are inefficient for the retrieval of web images which usually have streaming fashion. Online learning can be exploited for CMH. But existing online hashing methods still cannot solve two essential problems: efficient updating of hash codes and analysis of cross-modal correlation. In this paper, we propose Online Cross-modal Hashing (OCMH) which can effectively address the above two problems by learning the shared latent codes (SLC). In OCMH, hash codes can be represented by the permanent SLC and dynamic transfer matrix. Therefore, inefficient updating of hash codes is transformed to the efficient updating of SLC and transfer matrix, and the time complexity is irrelevant to the database size. Moreover, SLC is shared by all the modalities, and thus it can encode the latent cross-modal correlation, which further improves the overall cross-modal correlation between heterogeneous data. Experimental results on two real-world multi-modal web image datasets: MIR Flickr and NUS-WIDE, demonstrate the effectiveness and efficiency of OCMH for online cross-modal web image retrieval.


#19 Improved Neural Machine Translation with SMT Features [PDF] [Copy] [Kimi] [REL]

Authors: Wei He, Zhongjun He, Hua Wu, Haifeng Wang

Neural machine translation (NMT) conducts end-to-end translation with a source language encoder and a target language decoder, making promising translation performance. However, as a newly emerged approach, the method has some limitations. An NMT system usually has to apply a vocabulary of certain size to avoid the time-consuming training and decoding, thus it causes a serious out-of-vocabulary problem. Furthermore, the decoder lacks a mechanism to guarantee all the source words to be translated and usually favors short translations, resulting in fluent but inadequate translations. In order to solve the above problems, we incorporate statistical machine translation (SMT) features, such as a translation model and an n-gram language model, with the NMT model under the log-linear framework. Our experiments show that the proposed method significantly improves the translation quality of the state-ofthe-art NMT system on Chinese-to-English translation tasks. Our method produces a gain of up to 2.33 BLEU score on NIST open test sets.


#20 Business-Aware Visual Concept Discovery from Social Media for Multimodal Business Venue Recognition [PDF] [Copy] [Kimi] [REL]

Authors: Bor-Chun Chen, Yan-Ying Chen, Francine Chen, Dhiraj Joshi

Image localization is important for marketing and recommendation of local business; however, the level of granularity is still a critical issue. Given a consumer photo and its rough GPS information, we are interested in extracting the fine-grained location information, i.e. business venues, of the image. To this end, we propose a novel framework for business venue recognition. The framework mainly contains three parts. First, business-aware visual concept discovery: we mine a set of concepts that are useful for business venue recognition based on three guidelines including business awareness, visually detectable, and discriminative power. We define concepts that satisfy all of these three criteria as business-aware visual concept. Second, business-aware concept detection by convolutional neural networks (BA-CNN): we propose a new network configuration that can incorporate semantic signals mined from business reviews for extracting semantic concept features from a query image. Third, multimodal business venue recognition: we extend visually detected concepts to multimodal feature representations that allow a test image to be associated with business reviews and images from social media for business venue recognition. The experiments results show the visual concepts detected by BA-CNN can achieve up to 22.5% relative improvement for business venue recognition compared to the state-of-the-art convolutional neural network features. Experiments also show that by leveraging multimodal information from social media we can further boost the performance, especially when the database images belonging to each business venue are scarce.


#21 Fusing Social Networks with Deep Learning for Volunteerism Tendency Prediction [PDF] [Copy] [Kimi] [REL]

Authors: Yongpo Jia, Xuemeng Song, Jingbo Zhou, Li Liu, Liqiang Nie, David Rosenblum

Social networks contain a wealth of useful information. In this paper, we study a challenging task for integrating users' information from multiple heterogeneous social networks to gain a comprehensive understanding of users' interests and behaviors. Although much effort has been dedicated to study this problem, most existing approaches adopt linear or shallow models to fuse information from multiple sources. Such approaches cannot properly capture the complex nature of and relationships among different social networks. Adopting deep learning approaches to learning a joint representation can better capture the complexity, but this neglects measuring the level of confidence in each source and the consistency among different sources. In this paper, we present a framework for multiple social network learning, whose core is a novel model that fuses social networks using deep learning with source confidence and consistency regularization. To evaluate the model, we apply it to predict individuals' tendency to volunteerism. With extensive experimental evaluations, we demonstrate the effectiveness of our model, which outperforms several state-of-the-art approaches in terms of precision, recall and F1-score.


#22 STELLAR: Spatial-Temporal Latent Ranking for Successive Point-of-Interest Recommendation [PDF] [Copy] [Kimi] [REL]

Authors: Shenglin Zhao, Tong Zhao, Haiqin Yang, Michael Lyu, Irwin King

Successive point-of-interest (POI) recommendation in location-based social networks (LBSNs) becomes a significant task since it helps users to navigate a number of candidate POIs and provides the best POI recommendations based on users’ most recent check-in knowledge. However, all existing methods for successive POI recommendation only focus on modeling the correlation between POIs based on users’ check-in sequences, but ignore an important fact that successive POI recommendation is a time-subtle recommendation task. In fact, even with the same previous check-in information, users would prefer different successive POIs at different time. To capture the impact of time on successive POI recommendation, in this paper, we propose a spatial-temporal latent ranking (STELLAR) method to explicitly model the interactions among user, POI, and time. In particular, the proposed STELLAR model is built upon a ranking-based pairwise tensor factorization framework with a fine-grained modeling of user-POI, POI-time, and POI-POI interactions for successive POI recommendation. Moreover, we propose a new interval-aware weight utility function to differentiate successive check-ins’ correlations, which breaks the time interval constraint in prior work. Evaluations on two real-world datasets demonstrate that the STELLAR model outperforms state-of-the-art successive POI recommendation model about 20% in Precision@5 and Recall@5.


#23 Building a Large Scale Dataset for Image Emotion Recognition: The Fine Print and The Benchmark [PDF] [Copy] [Kimi] [REL]

Authors: Quanzeng You, Jiebo Luo, Hailin Jin, Jianchao Yang

Psychological research results have confirmed that people can have different emotional reactions to different visual stimuli. Several papers have been published on the problem of visual emotion analysis. In particular, attempts have been made to analyze and predict people's emotional reaction towards images. To this end, different kinds of hand-tuned features are proposed. The results reported on several carefully selected and labeled small image data sets have confirmed the promise of such features. While the recent successes of many computer vision related tasks are due to the adoption of Convolutional Neural Networks (CNNs), visual emotion analysis has not achieved the same level of success. This may be primarily due to the unavailability of confidently labeled and relatively large image data sets for visual emotion analysis. In this work, we introduce a new data set, which started from 3+ million weakly labeled images of different emotions and ended up 30 times as large as the current largest publicly available visual emotion data set. We hope that this data set encourages further research on visual emotion analysis. We also perform extensive benchmarking analyses on this large data set using the state of the art methods including CNNs.


#24 Predicting Online Protest Participation of Social Media Users [PDF] [Copy] [Kimi] [REL]

Authors: Suhas Ranganath, Fred Morstatter, Xia Hu, Jiliang Tang, Suhang Wang, Huan Liu

Social media has emerged to be a popular platform for people to express their viewpoints on political protests like the Arab Spring. Millions of people use social media to communicate and mobilize their viewpoints on protests. Hence, it is a valuable tool for organizing social movements. However, the mechanisms by which protest affects the population is not known, making it difficult to estimate the number of protestors. In this paper, we are inspired by sociological theories of protest participation and propose a framework to predict from the user's past status messages and interactions whether the next post of the user will be a declaration of protest. Drawing concepts from these theories, we model the interplay between the user's status messages and messages interacting with him over time and predict whether the next post of the user will be a declaration of protest. We evaluate the framework using data from the social media platform Twitter on protests during the recent Nigerian elections and demonstrate that it can effectively predict whether the next post of a user is a declaration of protest.


#25 College Towns, Vacation Spots, and Tech Hubs: Using Geo-Social Media to Model and Compare Locations [PDF] [Copy] [Kimi] [REL]

Authors: Hancheng Ge, James Caverlee

In this paper, we explore the potential of geo-social media to construct location-based interest profiles to uncover the hidden relationships among disparate locations. Through an investigation of millions of geo-tagged Tweets, we construct a per-city interest model based on fourteen high-level categories (e.g., technology, art, sports). These interest models support the discovery of related locations that are connected based on these categorical perspectives (e.g., college towns or vacation spots) but perhaps not on the individual tweet level. We then connect these city-based interest models to underlying demographic data. By building multivariate multiple linear regression (MMLR) and neural network (NN) models we show how a location's interest profile may be estimated based purely on its demographics features.