| Total: 46
In machine learning contests such as the ImageNet Large Scale Visual Recognition Challenge and the KDD Cup, contestants can submit candidate solutions and receive from an oracle (typically the organizers of the competition) the accuracy of their guesses compared to the ground-truth labels. One of the most commonly used accuracy metrics for binary classification tasks is the Area Under the Receiver Operating Characteristics Curve (AUC). In this paper we provide proofs-of-concept of how knowledge of the AUC of a set of guesses can be used, in two different kinds of attacks, to improve the accuracy of those guesses. On the other hand, we also demonstrate the intractability of one kind of AUC exploit by proving that the number of possible binary labelings of n examples for which a candidate solution obtains a AUC score of c grows exponentially in n, for every c in (0,1).
Programming language processing (similar to natural language processing) is a hot research topic in the field of software engineering; it has also aroused growing interest in the artificial intelligence community. However, different from a natural language sentence, a program contains rich, explicit, and complicated structural information. Hence, traditional NLP models may be inappropriate for programs. In this paper, we propose a novel tree-based convolutional neural network (TBCNN) for programming language processing, in which a convolution kernel is designed over programs' abstract syntax trees to capture structural information. TBCNN is a generic architecture for programming language processing; our experiments show its effectiveness in two different program analysis tasks: classifying programs according to functionality, and detecting code snippets of certain patterns. TBCNN outperforms baseline methods, including several neural models for NLP.
The number of mitoses per tissue area gives an important aggressiveness indication of the invasive breast carcinoma.However, automatic mitosis detection in histology images remains a challenging problem. Traditional methods either employ hand-crafted features to discriminate mitoses from other cells or construct a pixel-wise classifier to label every pixel in a sliding window way. While the former suffers from the large shape variation of mitoses and the existence of many mimics with similar appearance, the slow speed of the later prohibits its use in clinical practice.In order to overcome these shortcomings, we propose a fast and accurate method to detect mitosis by designing a novel deep cascaded convolutional neural network, which is composed of two components. First, by leveraging the fully convolutional neural network, we propose a coarse retrieval model to identify and locate the candidates of mitosis while preserving a high sensitivity.Based on these candidates, a fine discrimination model utilizing knowledge transferred from cross-domain is developed to further single out mitoses from hard mimics.Our approach outperformed other methods by a large margin in 2014 ICPR MITOS-ATYPIA challenge in terms of detection accuracy. When compared with the state-of-the-art methods on the 2012 ICPR MITOSIS data (a smaller and less challenging dataset), our method achieved comparable or better results with a roughly 60 times faster speed.
The goal of connectomics is to manifest the interconnections of neural system with the Electron Microscopy (EM) images. However, the formidable size of EM image data renders human annotation impractical, as it may take decades to fulfill the whole job. An alternative way to reconstruct the connectome can be attained with the computerized scheme that can automatically segment the neuronal structures. The segmentation of EM images is very challenging as the depicted structures can be very diverse.To address this difficult problem, a deep contextual network is proposed here by leveraging multi-level contextual information from the deep hierarchical structure to achieve better segmentation performance.To further improve the robustness against the vanishing gradients and strengthen the capability of the back-propagation of gradient flow, auxiliary classifiers are incorporated in the architecture of our deep neural network. It will be shown that our method can effectively parse the semantic meaning from the images with the underlying neural network and accurately delineate the structural boundaries with the reference of low-level contextual cues. Experimental results on the benchmark dataset of 2012 ISBI segmentation challenge of neuronal structures suggest that the proposed method can outperform the state-of-the-art methods by a large margin with respect to different evaluation measurements. Our method can potentially facilitate the automatic connectome analysis from EM images with less human intervention effort.
Noisy and incomplete data restoration is a critical preprocessing step in developing effective learning algorithms, which targets to reduce the effect of noise and missing values in data. By utilizing attribute correlations and/or instance similarities, various techniques have been developed for data denoising and imputation tasks. However, current existing data restoration methods are either specifically designed for a particular task, or incapable of dealing with mixed-attribute data. In this paper, we develop a new probabilistic model to provide a general and principled method for restoring mixed-attribute data. The main contributions of this study are twofold: a) a unified generative model, utilizing a generic random mixed field (RMF) prior, is designed to exploit mixed-attribute correlations; and b) a structured mean-field variational approach is proposed to solve the challenging inference problem of simultaneous denoising and imputation. We evaluate our method by classification experiments on both synthetic data and real benchmark datasets. Experiments demonstrate, our approach can effectively improve the classification accuracy of noisy and incomplete data by comparing with other data restoration methods.
In this paper, we study a cold-start heterogeneous-devicelocalization problem. This problem is challenging, becauseit results in an extreme inductive transfer learning setting,where there is only source domain data but no target do-main data. This problem is also underexplored. As there is notarget domain data for calibration, we aim to learn a robustfeature representation only from the source domain. There islittle previous work on such a robust feature learning task; besides, the existing robust feature representation propos-als are both heuristic and inexpressive. As our contribution,we for the first time provide a principled and expressive robust feature representation to solve the challenging cold-startheterogeneous-device localization problem. We evaluate ourmodel on two public real-world data sets, and show that itsignificantly outperforms the best baseline by 23.1%–91.3%across four pairs of heterogeneous devices.
Head pose estimation via embedding model has beendemonstrated its effectiveness from the recent works.However, most of the previous methods only focuson manifold relationship among poses, while overlookthe underlying global structure among subjects and poses.To build a robust and effective head pose estimator,we propose a novel Pose-dependent Low-Rank Embedding(PLRE) method, which is designed to exploita discriminative subspace to keep within-pose samplesclose while between-pose samples far away. Specifically,low-rank embedding is employed under the multitaskframework, where each subject can be naturallyconsidered as one task. Then, two novel terms are incorporatedto align multiple tasks to pursue a better posedependentembedding. One is the cross-task alignmentterm, aiming to constrain each low-rank coefficient toshare the similar structure. The other is pose-dependentgraph regularizer, which is developed to capture manifoldstructure of same pose cross different subjects. Experimentson databases CMU-PIE, MIT-CBCL, and extendedYaleB with different levels of random noise areconducted and six embedding model based baselinesare compared. The consistent superior results demonstratethe effectiveness of our proposed method.
When learning a hidden Markov model (HMM), sequential observations can often be complemented by real-valued summary response variables generated from the path of hidden states. Such settings arise in numerous domains, including many applications in biology, like motif discovery and genome annotation. In this paper, we present a flexible framework for jointly modeling both latent sequence features and the functional mapping that relates the summary response variables to the hidden state sequence. The algorithm is compatible with a rich set of mapping functions. Results show that the availability of additional continuous response variables can simultaneously improve the annotation of the sequential observations and yield good prediction performance in both synthetic data and real-world datasets.
A cascade classifier has turned out to be effective insliding-window based real-time object detection. In acascade classifier, node learning is the key process,which includes feature selection and classifier design. Previous algorithms fail to effectively tackle the asymmetry and intersection problems existing in cascade classification, thereby limiting the performance of object detection. In this paper, we improve current feature selection algorithm by addressing both asymmetry and intersection problems. We formulate asymmetric feature selection as a submodular function maximization problem. We then propose a new algorithm SAFS with formal performance guarantee to solve this problem.We use face detection as a case study and perform experiments on two real-world face detection datasets. The experimental results demonstrate that our algorithm SAFS outperforms the state-of-art feature selection algorithms in cascade object detection, such as FFS and LACBoost.
Automatic image annotation is an important problem in several machine learning applications such as image search. Since there exists a semantic gap between low-level image features and high-level semantics, the description ability of image representation can largely affect annotation results. In fact, image representation learning and image tagging are two closely related tasks. A proper image representation can achieve better image annotation results, and image tags can be treated as guidance to learn more effective image representation. In this paper, we present an optimal predictive subspace learning method which jointly conducts multi-view representation learning and image tagging. The two tasks can promote each other and the annotation performance can be further improved. To make the subspace to be more compact and discriminative, both visual structure and semantic information are exploited during learning. Moreover, we introduce powerful predictors (SVM) for image tagging to achieve better annotation performance. Experiments on standard image annotation datasets demonstrate the advantages of our method over the existing image annotation methods.
Multi-view data is highly common nowadays, since various view-points and different sensors tend to facilitate better data representation. However, data from different views show a large divergence. Specifically, one sample lies in two kinds of structures, one is class structure and the other is view structure, which are intertwined with one another in the original feature space. To address this, we develop a Robust Multi-view Subspace Learning algorithm (RMSL) through dual low-rank decompositions, which desires to seek a low-dimensional view-invariant subspace for multi-view data. Through dual low-rank decompositions, RMSL aims to disassemble two intertwined structures from each other in the low-dimensional subspace. Furthermore, we develop two novel graph regularizers to guide dual low-rank decompositions in a supervised fashion. In this way, the semantic gap across different views would be mitigated so that RMSL can preserve more within-class information and reduce the influence of view variance to seek a more robust low-dimensional subspace. Extensive experiments on two multi-view benchmarks, e.g., face and object images, have witnessed the superiority of our proposed algorithm, by comparing it with the state-of-the-art algorithms.
When dealing with images and semantics, most computational systems attempt to automatically extract meaning from images. Here we attempt to go the other direction and autonomously create images that communicate concepts. We present an enhanced semantic model that is used to generate novel images that convey meaning. We employ a vector space model and a large corpus to learn vector representations of words and then train the semantic model to predict word vectors that could describe a given image. Once trained, the model autonomously guides the process of rendering images that convey particular concepts. A significant contribution is that, because of the semantic associations encoded in these word vectors, we can also render images that convey concepts on which the model was not explicitly trained. We evaluate the semantic model with an image clustering technique and demonstrate that the model is successful in creating images that communicate semantic relationships.
Obtaining a protein's 3D structure is crucial to the understanding of its functions and interactions with other proteins. It is critical to accelerate the protein crystallization process with improved accuracy for understanding cancer and designing drugs. Systematic high-throughput approaches in protein crystallization have been widely applied, generating a large number of protein crystallization-trial images. Therefore, an efficient and effective automatic analysis for these images is a top priority. In this paper, we present a novel system, CrystalNet, for automatically labeling outcomes of protein crystallization-trial images. CrystalNet is a deep convolutional neural network that automatically extracts features from X-ray protein crystallization images for classification. We show that (1) CrystalNet can provide real-time labels for crystallization images effectively, requiring approximately 2 seconds to provide labels for all 1536 images of crystallization microassay on each plate; (2) compared with the state-of-the-art classification systems in crystallization image analysis, our technique demonstrates an improvement of 8% in accuracy, and achieve 90.8% accuracy in classification. As a part of the high-throughput pipeline which generates millions of images a year, CrystalNet can lead to a substantial reduction of labor-intensive screening.
This paper adapts topic models to the psychometric testing of MOOC students based on their online forum postings. Measurement theory from education and psychology provides statistical models for quantifying a person's attainment of intangible attributes such as attitudes, abilities or intelligence. Such models infer latent skill levels by relating them to individuals' observed responses on a series of items such as quiz questions. The set of items can be used to measure a latent skill if individuals' responses on them conform to a Guttman scale. Such well-scaled items differentiate between individuals and inferred levels span the entire range from most basic to the advanced. In practice, education researchers manually devise items (quiz questions) while optimising well-scaled conformance. Due to the costly nature and expert requirements of this process, psychometric testing has found limited use in everyday teaching. We aim to develop usable measurement models for highly-instrumented MOOC delivery platforms, by using participation in automatically-extracted online forum topics as items. The challenge is to formalise the Guttman scale educational constraint and incorporate it into topic models. To favour topics that automatically conform to a Guttman scale, we introduce a novel regularisation into non-negative matrix factorisation-based topic modelling. We demonstrate the suitability of our approach with both quantitative experiments on three Coursera MOOCs, and with a qualitative survey of topic interpretability on two MOOCs by domain expert interviews.
Tagging has become increasingly important in many real-world applications noticeably including web applications, such as web blogs and resource sharing systems. Despite this importance, tagging methods often face difficult challenges such as limited training samples and incomplete labels, which usually lead to degenerated performance on tag prediction. To improve the generalization performance, in this paper, we propose Regularized Marginalized Cross-View learning (RMCV) by jointly modeling on attribute noise and label noise. In more details, the proposed model constructs infinite training examples with attribute noises from known exponential-family distributions and exploits label noise via marginalized denoising autoencoder. Therefore, the model benefits from its robustness and alleviates the problem of tag sparsity. While RMCV is a general method for learning tagging, in the evaluations we focus on the specific application of multi-label text tagging. Extensive evaluations on three benchmark data sets demonstrate that RMCV outstands with a superior performance in comparison with state-of-the-art methods.
We address the problem of learning behaviour policies to optimise online metrics from heterogeneous usage data. While online metrics, e.g., click-through rate, can be optimised effectively using exploration data, such data is costly to collect in practice, as it temporarily degrades the user experience. Leveraging related data sources to improve online performance would be extremely valuable, but is not possible using current approaches. We formulate this task as a policy transfer learning problem, and propose a first solution, called collective noise contrastive estimation (collective NCE). NCE is an efficient solution to approximating the gradient of a log-softmax objective. Our approach jointly optimises embeddings of heterogeneous data to transfer knowledge from the source domain to the target domain. We demonstrate the effectiveness of our approach by learning an effective policy for an online radio station jointly from user-generated playlists, and usage data collected in an exploration bucket.
Linear submodular bandits has been proven to be effective in solving the diversification and feature-based exploration problems in retrieval systems. Concurrently, many web-based applications, such as news article recommendation and online ad placement, can be modeled as budget-limited problems. However, the diversification problem under a budget constraint has not been considered. In this paper, we first introduce the budget constraint to linear submodular bandits as a new problem called the linear submodular bandits with a knapsack constraint. We then define an alpha-approximation unit-cost regret considering that submodular function maximization is NP-hard. To solve this problem, we propose two greedy algorithms based on a modified UCB rule. We then prove these two algorithms with different regret bounds and computational costs. We also conduct a number of experiments and the experimental results confirm our theoretical analyses.
A key challenge in complex activity recognition is the fact that a complex activity can often be performed in several different ways, with each consisting of its own configuration of atomic actions and their temporal dependencies. This leads us to define an atomic activity-based probabilistic framework that employs Allen's interval relations to represent local temporal dependencies. The framework introduces a latent variable from the Chinese Restaurant Process to explicitly characterize these unique internal configurations of a particular complex activity as a variable number of tables.It can be analytically shown that the resulting interval network satisfies the transitivity property, and as a result, all local temporal dependencies can be retained and are globally consistent.Empirical evaluations on benchmark datasets suggest our approach significantly outperforms the state-of-the-art methods.
Image-set classification is the assignment of a label to a given image set. In real-life scenarios such as surveillance videos, each image set often contains much redundancy in terms of features and samples. This paper introduces a joint learning method for image-set classification that simultaneously learns compact binary codes and removes redundant samples. The joint objective function of our model mainly includes two parts. The first part seeks a hashing function to generate binary codes that have larger inter-class and smaller intra-class distances. The second one reduces redundant samples with discrete constraints in a low-rank way. A kernel method based on anchor points is further used to reduce sample variations. The proposed discrete objective function is simplified to a series of sub-problems that admit an analytical solution, resulting in a high-quality discrete solution with a low computational cost. Experiments on three commonly used image-set datasets show that the proposed method for the tasks of face recognition from image sets is efficient and effective.
Style classification (e.g., architectural, music, fashion) attracts an increasing attention in both research and industrial fields. Most existing works focused on low-level visual features composition for style representation. However, little effort has been devoted to automatic mid-level or high-level style features learning by reorganizing low-level descriptors. Moreover, styles are usually spread out and not easy to differentiate from one to another. In this paper, we call these less representative images as weak style images. To address these issues, we propose a consensus style centralizing auto-encoder (CSCAE) to extract robust style features to facilitate weak style classification. CSCAE is the ensemble of several style centralizing auto-encoders (SCAEs) with consensus constraint. Each SCAE centralizes each feature of certain category in a progressive way. We apply our method in fashion style classification and manga style classification as two example applications. In addition, we collect a new dataset, Online Shopping, for fashion style classification evaluation, which will be publicly available for vision based fashion style research. Experiments demonstrate the effectiveness of SCAE and CSCAE on both public and newly collected datasets when compared with the most recent state-of-the-art works.
How do people describe clothing? The words like “formal”or "casual" are usually used. However, recent works often focus on recognizing or extracting visual features (e.g., sleeve length, color distribution and clothing pattern) from clothing images accurately. How can we bridge the gap between the visual features and the aesthetic words? In this paper, we formulate this task to a novel three-level framework: visual features(VF) - image-scale space (ISS) - aesthetic words space(AWS). Leveraging the art-field image-scale space served as an intermediate layer, we first propose a Stacked Denoising Autoencoder Guided by CorrelativeLabels (SDAE-GCL) to map the visual features to the image-scale space; and then according to the semantic distances computed byWordNet::Similarity, we map the most often used aesthetic words in online clothing shops to the image-scale space too. Employing upper body menswear images downloaded from several global online clothing shops as experimental data, the results indicate that the proposed three-level framework can help to capture the subtle relationship between visual features and aesthetic words better compared to several baselines. To demonstrate that our three-level framework and its implementation methods are universally applicable, we finally present some interesting analyses on the fashion trend of menswear in the last 10 years.
In this paper, we investigate the usage of autoencoders in modeling textual data. Traditional autoencoders suffer from at least two aspects: scalability with the high dimensionality of vocabulary size and dealing with task-irrelevant words. We address this problem by introducing supervision via the loss function of autoencoders. In particular, we first train a linear classifier on the labeled data, then define a loss for the autoencoder with the weights learned from the linear classifier. To reduce the bias brought by one single classifier, we define a posterior probability distribution on the weights of the classifier, and derive the marginalized loss of the autoencoder with Laplace approximation. We show that our choice of loss function can be rationalized from the perspective of Bregman Divergence, which justifies the soundness of our model. We evaluate the effectiveness of our model on six sentiment analysis datasets, and show that our model significantly outperforms all the competing methods with respect to classification accuracy. We also show that our model is able to take advantage of unlabeled dataset and get improved performance. We further show that our model successfully learns highly discriminative feature maps, which explains its superior performance.
Social networks often provide group features to help users with similar interests associate and consume content together. Recommending groups to users poses challenges due to their complex relationship: user-group affinity is typically measured implicitly and varies with time; similarly, group characteristics change as users join and leave. To tackle these challenges, we adapt existing matrix factorization techniques to learn user-group affinity based on two different implicit engagement metrics: (i) which group-provided content users consume; and (ii) which content users provide to groups. To capture the temporally extended nature of group engagement we implement a time-varying factorization. We test the assertion that latent preferences for groups and users are sparse in investigating elastic-net regularization. Experiments using data from DeviantArt indicate that the time-varying implicit engagement-based model provides the best top-K group recommendations, illustrating the benefit of the added model complexity.
Large-scale Nuclear Norm penalized Least Square problem (NNLS) is frequently encountered in estimation of low rank structures. In this paper we accelerate the solution procedure by combining non-smooth convex optimization with smooth Riemannian method. Our methods comprise of two phases. In the first phase, we use Alternating Direction Method of Multipliers (ADMM) both to identify the fix rank manifold where an optimum resides and to provide an initializer for the subsequent refinement. In the second phase, two superlinearly convergent Riemannian methods: Riemannian NewTon (NT) and Riemannian Conjugate Gradient descent (CG) are adopted to improve the approximation over a fix rank manifold. We prove that our Hybrid method of ADMM and NT (HADMNT) converges to an optimum of NNLS at least quadratically. The experiments on large-scale collaborative filtering datasets demonstrate very competitive performance of these fast hybrid methods compared to the state-of-the-arts.
We present a discriminative nonparametric latent feature relational model (LFRM) for link prediction to automatically infer the dimensionality of latent features. Under the generic RegBayes (regularized Bayesian inference) framework, we handily incorporate the prediction loss with probabilistic inference of a Bayesian model; set distinct regularization parameters for different types of links to handle the imbalance issue in real networks; and unify the analysis of both the smooth logistic log-loss and the piecewise linear hinge loss. For the nonconjugate posterior inference, we present a simple Gibbs sampler via data augmentation, without making restricting assumptions as done in variational methods. We further develop an approximate sampler using stochastic gradient Langevin dynamics to handle large networks with hundreds of thousands of entities and millions of links, orders of magnitude larger than what existing LFRM models can process. Extensive studies on various real networks show promising performance.