| Total: 48
Human's cognition system prompts that context information provides potentially powerful clue while recognizing objects. However, for fine-grained image classification, the contribution of context may vary over different images, and sometimes the context even confuses the classification result. To alleviate this problem, in our work, we develop a novel approach, two-stream contextualized Convolutional Neural Network, which provides a simple but efficient context-content joint classification model under deep learning framework. The network merely requires the raw image and a coarse segmentation as input to extract both content and context features without need of human interaction. Moreover, our network adopts a weighted fusion scheme to combine the content and the context classifiers, while a subnetwork is introduced to adaptively determine the weight for each image. According to our experiments on public datasets, our approach achieves considerable high recognition accuracy without any tedious human's involvements, as compared with the state-of-the-art approaches.
Multi-instance learning (MIL) is useful for tackling labeling ambiguity in learning tasks, by allowing a bag of instances to share one label. Recently, bag mapping methods, which transform a bag to a single instance in a new space via instance selection, have drawn significant attentions. To date, most existing works are developed based on the original space, i.e., utilizing all instances for bag mapping, and instance selection is indirectly tied to the MIL objective. As a result, it is hard to guarantee the distinguish capacity of the selected instances in the new bag mapping space for MIL. In this paper, we propose a direct discriminative mapping approach for multi-instance learning (MILDM), which identifies instances to directly distinguish bags in the new mapping space. Experiments and comparisons on real-world learning tasks demonstrate the algorithm performance.
We present an algorithm (LsNet2Vec) that, given a large-scale network (millions of nodes), embeds the structural features of node into a lower and fixed dimensions of vector in the set of real numbers. We experiment and evaluate our proposed approach with twelve datasets collected from SNAP. Results show that our model performs comparably with state-of-the-art methods, such as Katz method and Random Walk Restart method, in various experiment settings.
We propose Epitomic Image Super-Resolution (ESR) to enhance the current internal SR methods that exploit the self-similarities in the input. Instead of local nearest neighbor patch matching used in most existing internal SR methods, ESR employs epitomic patch matching that features robustness to noise, and both local and non-local patch matching. Extensive objective and subjective evaluation demonstrate the effectiveness and advantage of ESR on various images.
We consider the link prediction (LP) problem in a partially observed network, where the objective is to make predictions in the unobserved portion of the network. Many existing methods reduce LP to binary classification. However, the dominance of absent links in real world networks makes misclassification error a poor performance metric. Instead, researchers have argued for using ranking performance measures, like AUC, AP and NDCG, for evaluation. We recast the LP problem as a learning to rank problem and use effective learning to rank techniques directly during training which allows us to deal with the class imbalance problem systematically. As a demonstration of our general approach, we develop an LP method by optimizing the cross-entropy surrogate, originally used in the popular ListNet ranking algorithm. We conduct extensive experiments on publicly available co-authorship, citation and metabolic networks to demonstrate the merits of our method.
With the development of Web 2.0, many users express their opinions online. This paper is concerned with the classification of social emotions on varied-scale datasets. Different from traditional models which weight training documents equally, the concept of emotional entropy is proposed to estimate the weight and tackle the issue of noisy documents. The topic assignment is also used to distinguish different emotional senses of the same word. Experimental evaluations using different data sets validate the effectiveness of the proposed social emotion classification model.
We propose an unsupervised semantic role labeling method for Korean language, one of the agglutinative languages which have complicated suffix structures telling much of syntactic. First, we construct an argument embedding and then develop a indicator vector of the suffix such as a Josa. And, we construct an argument tuple by concatenating above two vectors. The role induction is performed by clustering the argument tuples.These method which achieves up to a 70.16% of F1-score and 75.85% of accuracy.
In practice, training language models for individual authors is often expensive because of limited data resources. In such cases, Neural Network Language Models (NNLMs), generally outperform the traditional non-parametric N-gram models. Here we investigate the performance of a feed-forward NNLM on an authorship attribution problem, with moderate author set size and relatively limited data. We also consider how the text topics impact performance. Compared with a well-constructed N-gram baseline method with Kneser-Ney smoothing, the proposed method achieves nearly 2.5% reduction in perplexity and increases author classification accuracy by 3.43% on average, given as few as 5 test sentences. The performance is very competitive with the state of the art in terms of accuracy and demand on test data.
For many researchers, one of the biggest issues is the lack of an efficient method to obtain latest academic progresses in related research fields. We notice that many researchers tend to share their research progresses or recommend scholarly information they have known on their microblogs. In order to exploit microblogging to benefit scientific research, we build a system called MicroScholar to automatically collecting and mining scholarly information from Chinese microblogs. In this paper, we briefly introduce the system framework and focus on the component of scholarly microblog categorization. Several kinds of features have been used in the component and experimental results demonstrate their usefulness.
Human Computer Interaction (HCI) is central for many applications, including hazardous environment inspection and telemedicine. Whereas traditional methods ofHCI for teleoperating electromechanical systems include joysticks, levers, or buttons, our research focuses on using electromyography (EMG) signals to improve intuition and response time. An important challenge is to accurately and efficiently extract and map EMG signals to known position for real-time control. In this preliminary work, we compare the accuracy and real-time performance of several machine-learning techniques for recognizing specific arm positions. We present results from offline analysis, as well as end-to-end operation using a robotic arm.
Human mobility modeling for either transportation system development or individual location based services has a tangible impact on people's everyday experience. In recent years cell phone data has received a lot of attention as a promising data source because of the wide coverage, long observation period, and low cost. The challenge in utilizing such data is how to robustly extract people's trip sequences from sparse and noisy cell phone data and endow the extracted trips with semantic meaning, i.e., trip purposes.In this study we reconstruct trip sequences from sparse cell phone records. Next we propose a Bayesian trip purpose classification method and compare it to a Markov random field based trip purpose clustering method, representing scenarios with and without labeled training data respectively. This procedure shows how the cell phone data, despite their coarse granularity and sparsity, can be turned into a low cost, long term, and ubiquitous sensor network for mobility related services.
Matching a question to its best answer is a common task in community question answering. In this paper, we focus on the non-factoid questions and aim to pick out the best answer from its candidate answers. Most of the existing deep models directly measure the similarity between question and answer by their individual sentence embeddings. In order to tackle the problem of the information lack in question's descriptions and the lexical gap between questions and answers, we propose a novel deep architecture namely SPAN in this paper. Specifically we introduce support answers to help understand the question, which are defined as the best answers of those similar questions to the original one. Then we can obtain two kinds of similarities, one is between question and the candidate answer, and the other one is between support answers and the candidate answer. The matching score is finally generated by combining them. Experiments on Yahoo! Answers demonstrate that SPAN can outperform the baseline models.
In graph-oriented machine learning research, L1 graph is an efficient way to represent the connections of input data samples. Its construction algorithm is based on a numerical optimization motivated by Compressive Sensing theory. As a result, It is a nonparametric method which is highly demanded. However, the information of data such as geometry structure and density distribution are ignored. In this paper, we propose a Structure Aware (SA) L1 graph to improve the data clustering performance by capturing the manifold structure of input data. We use a local dictionary for each datum while calculating its sparse coefficients. SA-L1 graph not only preserves the locality of data but also captures the geometry structure of data. The experimental results show that our new algorithm has better clustering performance than L1 graph.
Creation of summaries of events of interest from multitude of unstructured data is a challenging task commonly faced by intelligence analysts while seeking increased situational awareness. This paper proposes a framework called Storyboarding that leverages unstructured text and images to explain events as sets of sub-events. The framework first generates a textual context for each human face detected from images and then builds a chain of coherent documents where two consecutive documents of the chain contain a common theme as well as a context. Storyboarding helps analysts quickly narrow down large number of possibilities to a few significant ones for further investigation. Empirical studies on Wikipedia documents, images and news articles show that Storyboarding is able to provide deeper insights on events of interests.
In this paper, we describe ROOT13, a supervised system for the classification of hypernyms, co-hyponyms and random words. The system relies on a Random Forest algorithm and 13 unsupervised corpus-based features. We evaluate it with a 10-fold cross validation on 9,600 pairs, equally distributed among the three classes and involving several Parts-Of-Speech (i.e. adjectives, nouns and verbs). When all the classes are present, ROOT13 achieves an F1 score of 88.3%, against a baseline of 57.6% (vector cosine). When the classification is binary, ROOT13 achieves the following results: hypernyms-co-hyponyms (93.4% vs. 60.2%), hypernyms-random (92.3% vs. 65.5%) and co-hyponyms-random (97.3% vs. 81.5%). Our results are competitive with state-of-the-art models.
In this paper, we claim that vector cosine – which is generally considered among the most efficient unsupervised measures for identifying word similarity in Vector Space Models – can be outperformed by an unsupervised measure that calculates the extent of the intersection among the most mutually dependent contexts of the target words. To prove it, we describe and evaluate APSyn, a variant of the Average Precision that, without any optimization, outperforms the vector cosine and the co-occurrence on the standard ESL test set, with an improvement ranging between +9.00% and +17.98%, depending on the number of chosen top contexts.
Text simplification (TS) is the technique of reducing the lexical, syntactical complexity of text. Existing automatic TS systems can simplify text only by lexical simplification or by manually defined rules. Neural Machine Translation (NMT) is a recently proposed approach for Machine Translation (MT) that is receiving a lot of research interest. In this paper, we regard original English and simplified English as two languages, and apply a NMT model–Recurrent Neural Network (RNN) encoder-decoder on TS to make the neural network to learn text simplification rules by itself. Then we discuss challenges and strategies about how to apply a NMT model to the task of text simplification.
We present preliminary work to construct a knowledge curation system to advance research in the study of regional economics. The proposed system exploits natural language processing (NLP) techniques to automatically implement business event extraction, provides a user-facing interface to assist human curators, and a feedback loop to improve the performance of the Information Extraction Model for the automated parts of the system. Progress to date has shown that we can improve standard NLP approaches for entity and relationship extraction through heuristic means and provide indexing of extracted relationships to aid curation.
Hedonic games are a well-studied model of coalition formation, in which selfish agents are partitioned into disjoint sets, and agents care about the make-up of the coalition they end up in. The computational problem of finding a stable outcome tends to be computationally intractable, even after severely restricting the types of preferences that agents are allowed to report. We investigate a structural way of achieving tractability, by requiring that agents' preferences interact in a well-behaved manner. Precisely, we show that stable outcomes can be found in linear time for hedonic games that satisfy a notion of bounded treewidth and bounded degree.
The restricted Boltzmann machine (RBM) has been used as building blocks for many successful deep learning models, e.g., deep belief networks (DBN) and deep Boltzmann machine (DBM) etc. The training of RBM can be extremely slow in pathological regions. The second order optimization methods, such as quasi-Newton methods, were proposed to deal with this problem. However, the non-convexity results in many obstructions for training RBM, including the infeasibility of applying second order optimization methods. In order to overcome this obstruction, we introduce an em-like iterative project quasi-Newton (IPQN) algorithm. Specifically, we iteratively perform the sampling procedure where it is not necessary to update parameters, and the sub-training procedure that is convex. In sub-training procedures, we apply quasi-Newton methods to deal with the pathological problem. We further show that Newton's method turns out to be a good approximation of the natural gradient (NG) method in RBM training. We evaluate IPQN in a series of density estimation experiments on the artificial dataset and the MNIST digit dataset. Experimental results indicate that IPQN achieves an improved convergent performance over the traditional CD method.
The principle of counter-transitivity plays a vital role in argumentation. It states that an argument is strong when its attackers are weak, and is weak when its attackers are strong. In this work, we develop a formal theory about the argument ranking semantics based on this principle. Three approaches, quantity-based, quality-based and the unity of them, are defined to implement the principle. Then, we show an iterative refinement algorithm for capturing the ranking on arguments based on the recursive nature of the principle.
Game theory is a tool for modeling multi-agent decision problems and has been used to analyze strategies in domains such as poker, security, and trading agents. One method for solving very large games is to use abstraction techniques to shrink the game by removing detail, solve the reduced game, and then translate the solution back to the original game. We present a methodology for evaluating the robustness of different game-theoretic solution concepts to the errors introduced by the abstraction process. We present an initial empirical study of the robustness of several solution methods when using abstracted games.
Predicting links and their building time in a knowledge network has been extensively studied in recent years. Most structure-based predictive methods consider structures and the time information of edges separately, which fail to characterize the correlation between them. In this paper, we propose a structure called the Time-Difference-Labeled Path, and a link prediction method (TDLP). Experiments show that TDLP outperforms the state-of-the-art methods.
The Power TAC simulation emphasizes the strategic problems that broker agents face in managing the economics of a smart grid. The brokers must make trades in multiple markets and to be successful, brokers must make many good predictions about future supply, demand,and prices. Clearing price prediction is an important part of the broker’s wholesale market strategy because it helps the broker to make intelligent decisions when purchasing energy at low cost in a day-ahead market. I describe my work on using machine learning methods to predict prices in the Power TAC wholesale market, which will be used in future bidding strategies.
We consider the problem of making efficient quality-time-cost trade-offs in collaborative crowdsourcing systems in which different skills from multiple workers need to be combined to complete a task. We propose CrowdAsm - an approach which helps collaborative crowdsourcing systems determine how to combine the expertise of available workers to maximize the expected quality of results while minimizing the expected delays. Analysis proves that CrowdAsm can achieve close to optimal profit for workers in a given crowdsourcing system if they follow the recommendations.