Total: 45

Hurricanes are cyclones circulating about a defined center whose closed wind speeds exceed 75 mph originating over tropical and subtropical waters. At landfall, hurricanes can result in severe disasters. The accuracy of predicting their trajectory paths is critical to reduce economic loss and save human lives. Given the complexity and nonlinearity of weather data, a recurrent neural network (RNN) could be beneficial in modeling hurricane behavior. We propose the application of a fully connected RNN to predict the trajectory of hurricanes. We employed the RNN over a fine grid to reduce typical truncation errors. We utilized their latitude, longitude, wind speed, and pressure publicly provided by the National Hurricane Center (NHC) to predict the trajectory of a hurricane at 6-hour intervals. Results show that this proposed technique is competitive to methods currently employed by the NHC and can predict up to approximately 120 hours of hurricane path.

In this work, we consider applying machine learning to the analysis and compression of audio signals in the context of monitoring elephants in sub-Saharan Africa. Earth’s biodiversity is increasingly under threat by sources of anthropogenic change (e.g. resource extraction, land use change, and climate change) and surveying animal populations is critical for developing conservation strategies. However, manually monitoring tropical forests or deep oceans is intractable. For species that communicate acoustically, researchers have argued for placing audio recorders in the habitats as a costeffective and non-invasive method, a strategy known as passive acoustic monitoring (PAM). In collaboration with conservation efforts, we construct a large labeled dataset of passive acoustic recordings of the African Forest Elephant via crowdsourcing, compromising thousands of hours of recordings in the wild. Using state-of-the-art techniques in artificial intelligence we improve upon previously proposed methods for passive acoustic monitoring for classification and segmentation. In real-time detection of elephant calls, network bandwidth quickly becomes a bottleneck and efficient ways to compress the data are needed. Most audio compression schemes are aimed at human listeners and are unsuitable for low-frequency elephant calls. To remedy this, we provide a novel end-to-end differentiable method for compression of audio signals that can be adapted to acoustic monitoring of any species and dramatically improves over naive coding strategies.

Traffic prediction is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, such as spatial dependency of complicated road networks and temporal dynamics, and many more. The factors make traffic prediction a challenging task due to the uncertainty and complexity of traffic states. In the literature, many research works have applied deep learning methods on traffic prediction problems combining convolutional neural networks (CNNs) with recurrent neural networks (RNNs), which CNNs are utilized for spatial dependency and RNNs for temporal dynamics. However, such combinations cannot capture the connectivity and globality of traffic networks. In this paper, we first propose to adopt residual recurrent graph neural networks (Res-RGNN) that can capture graph-based spatial dependencies and temporal dynamics jointly. Due to gradient vanishing, RNNs are hard to capture periodic temporal correlations. Hence, we further propose a novel hop scheme into Res-RGNN to utilize the periodic temporal dependencies. Based on Res-RGNN and hop Res-RGNN, we finally propose a novel end-to-end multiple Res-RGNNs framework, referred to as “MRes-RGNN”, for traffic prediction. Experimental results on two traffic datasets have demonstrated that the proposed MRes-RGNN outperforms state-of-the-art methods significantly.

Citizen science projects are successful at gathering rich datasets for various applications. However, the data collected by citizen scientists are often biased — in particular, aligned more with the citizens’ preferences than with scientific objectives. We propose the Shift Compensation Network (SCN), an end-to-end learning scheme which learns the shift from the scientific objectives to the biased data while compensating for the shift by re-weighting the training data. Applied to bird observational data from the citizen science project eBird, we demonstrate how SCN quantifies the data distribution shift and outperforms supervised learning models that do not address the data bias. Compared with competing models in the context of covariate shift, we further demonstrate the advantage of SCN in both its effectiveness and its capability of handling massive high-dimensional data.

Centrality metrics are among the main tools in social network analysis. Being central for a user of a network leads to several benefits to the user: central users are highly influential and play key roles within the network. Therefore, the optimization problem of increasing the centrality of a network user recently received considerable attention. Given a network and a target user v, the centrality maximization problem consists in creating k new links incident to v in such a way that the centrality of v is maximized, according to some centrality metric. Most of the algorithms proposed in the literature are based on showing that a given centrality metric is monotone and submodular with respect to link addition. However, this property does not hold for several shortest-path based centrality metrics if the links are undirected. In this paper we study the centrality maximization problem in undirected networks for one of the most important shortestpath based centrality measures, the coverage centrality. We provide several hardness and approximation results. We first show that the problem cannot be approximated within a factor greater than 1 − 1/e, unless P = NP, and, under the stronger gap-ETH hypothesis, the problem cannot be approximated within a factor better than 1/no(1), where n is the number of users. We then propose two greedy approximation algorithms, and show that, by suitably combining them, we√ can guarantee an approximation factor of Ω(1/ n). We experimentally compare the solutions provided by our approximation algorithm with optimal solutions computed by means of an exact IP formulation. We show that our algorithm produces solutions that are very close to the optimum.

We consider the problem of how decision making can be fair when the underlying probabilistic model of the world is not known with certainty. We argue that recent notions of fairness in machine learning need to explicitly incorporate parameter uncertainty, hence we introduce the notion of Bayesian fairness as a suitable candidate for fair decision rules. Using balance, a definition of fairness introduced in (Kleinberg, Mullainathan, and Raghavan 2016), we show how a Bayesian perspective can lead to well-performing and fair decision rules even under high uncertainty.

Massive open online courses (MOOCs) have developed rapidly in recent years, and have attracted millions of online users. However, a central challenge is the extremely high dropout rate — recent reports show that the completion rate in MOOCs is below 5% (Onah, Sinclair, and Boyatt 2014; Kizilcec, Piech, and Schneider 2013; Seaton et al. 2014). What are the major factors that cause the users to drop out? What are the major motivations for the users to study in MOOCs? In this paper, employing a dataset from XuetangX1, one of the largest MOOCs in China, we conduct a systematical study for the dropout problem in MOOCs. We found that the users’ learning behavior can be clustered into several distinct categories. Our statistics also reveal high correlation between dropouts of different courses and strong influence between friends’ dropout behaviors. Based on the gained insights, we propose a Context-aware Feature Interaction Network (CFIN) to model and to predict users’ dropout behavior. CFIN utilizes context-smoothing technique to smooth feature values with different context, and use attention mechanism to combine user and course information into the modeling framework. Experiments on two large datasets show that the proposed method achieves better performance than several state-of-the-art methods. The proposed method model has been deployed on a real system to help improve user retention.

We provide a formal definition of blameworthiness in settings where multiple agents can collaborate to avoid a negative outcome. We first provide a method for ascribing blameworthiness to groups relative to an epistemic state (a distribution over causal models that describe how the outcome might arise). We then show how we can go from an ascription of blameworthiness for groups to an ascription of blameworthiness for individuals using a standard notion from cooperative game theory, the Shapley value. We believe that getting a good notion of blameworthiness in a group setting will be critical for designing autonomous agents that behave in a moral manner.

The inverse geodesic length (IGL) is a well-known and widely used measure of network performance. It equals the sum of the inverse distances of all pairs of vertices. In network analysis, IGL of a network is often used to assess and evaluate how well heuristics perform in strengthening or weakening a network. We consider the edge-deletion problem MINIGLED. Formally, given a graph G, a budget k, and a target inverse geodesic length T, the question is whether there exists a subset of edges X with |X| ≤ ck, such that the inverse geodesic length of G − X is at most T. In this paper, we design algorithms and study the complexity of MINIGL-ED. We show that it is NP-complete and cannot be solved in subexponential time even when restricted to bipartite or split graphs assuming the Exponential Time Hypothesis. In terms of parameterized complexity, we consider the problem with respect to various parameters. We show that MINIGL-ED is fixed-parameter tractable for parameter T and vertex cover by modeling the problem as an integer quadratic program. We also provide FPT algorithms parameterized by twin cover and neighborhood diversity combined with the deletion budget k. On the negative side we show that MINIGL-ED is W[1]-hard for parameter tree-width.

Susceptibility of deep neural networks to adversarial attacks poses a major theoretical and practical challenge. All efforts to harden classifiers against such attacks have seen limited success till now. Two distinct categories of samples against which deep neural networks are vulnerable, “adversarial samples” and “fooling samples”, have been tackled separately so far due to the difficulty posed when considered together. In this work, we show how one can defend against them both under a unified framework. Our model has the form of a variational autoencoder with a Gaussian mixture prior on the latent variable, such that each mixture component corresponds to a single class. We show how selective classification can be performed using this model, thereby causing the adversarial objective to entail a conflict. The proposed method leads to the rejection of adversarial samples instead of misclassification, while maintaining high precision and recall on test data. It also inherently provides a way of learning a selective classifier in a semi-supervised scenario, which can similarly resist adversarial attacks. We further show how one can reclassify the detected adversarial samples by iterative optimization.1

Migration presents sweeping societal challenges that have recently attracted significant attention from the scientific community. One of the prominent approaches that have been suggested employs optimization and machine learning to match migrants to localities in a way that maximizes the expected number of migrants who find employment. However, it relies on a strong additivity assumption that, we argue, does not hold in practice, due to competition effects; we propose to enhance the data-driven approach by explicitly optimizing for these effects. Specifically, we cast our problem as the maximization of an approximately submodular function subject to matroid constraints, and prove that the worst-case guarantees given by the classic greedy algorithm extend to this setting. We then present three different models for competition effects, and show that they all give rise to submodular objectives. Finally, we demonstrate via simulations that our approach leads to significant gains across the board.

The Electrocardiogram (ECG) is performed routinely by medical personnel to identify structural, functional and electrical cardiac events. Many attempts were made to automate this task using machine learning algorithms including classic supervised learning algorithms and deep neural networks, reaching state-of-the-art performance. The ECG signal conveys the specific electrical cardiac activity of each subject thus extreme variations are observed between patients. These variations are challenging for deep learning algorithms, and impede generalization. In this work, we propose a semisupervised approach for patient-specific ECG classification. We propose a generative model that learns to synthesize patient-specific ECG signals, which can then be used as additional training data to improve a patient-specific classifier performance. Empirical results prove that the generated signals significantly improve ECG classification in a patient-specific setting.

This paper examines the impact of tolls on social welfare in the context of a transportation network in which only a portion of the agents are subject to tolls. More specifically, this paper addresses the question: which subset of agents provides the most system benefit if they are compliant with an approximate marginal cost tolling scheme? Since previous work suggests this problem is NP-hard, we examine a heuristic approach. Our experimental results on three real-world traffic scenarios suggest that evaluating the marginal impact of a given agent serves as a particularly strong heuristic for selecting an agent to be compliant. Results from using this heuristic for selecting 7.6% of the agents to be compliant achieved an increase of up to 10.9% in social welfare over not tolling at all. The presented heuristic approach and conclusions can help practitioners target specific agents to participate in an opt-in tolling scheme.

New technologies drastically change recruitment techniques. Some research projects aim at designing interactive systems that help candidates practice job interviews. Other studies aim at the automatic detection of social signals (e.g. smile, turn of speech, etc...) in videos of job interviews. These studies are limited with respect to the number of interviews they process, but also by the fact that they only analyze simulated job interviews (e.g. students pretending to apply for a fake position). Asynchronous video interviewing tools have become mature products on the human resources market, and thus, a popular step in the recruitment process. As part of a project to help recruiters, we collected a corpus of more than 7000 candidates having asynchronous video job interviews for real positions and recording videos of themselves answering a set of questions. We propose a new hierarchical attention model called HireNet that aims at predicting the hirability of the candidates as evaluated by recruiters. In HireNet, an interview is considered as a sequence of questions and answers containing salient socials signals. Two contextual sources of information are modeled in HireNet: the words contained in the question and in the job position. Our model achieves better F1-scores than previous approaches for each modality (verbal content, audio and video). Results from early and late multimodal fusion suggest that more sophisticated fusion schemes are needed to improve on the monomodal results. Finally, some examples of moments captured by the attention mechanisms suggest our model could potentially be used to help finding key moments in an asynchronous job interview.

To learn the underlying parent-child influence relationships between nodes in a diffusion network, most existing approaches require timestamps that pinpoint the exact time when node infections occur in historical diffusion processes. In many real-world diffusion processes like the spread of epidemics, monitoring such infection temporal information is often expensive and difficult. In this work, we study how to carry out diffusion network inference without infection timestamps, using only the final infection statuses of nodes in each historical diffusion process, which are more readily accessible in practice. Our main result is a probabilistic model that can find for each node an appropriate number of most probable parent nodes, who are most likely to have generated the historical infection results of the node. Extensive experiments on both synthetic and real-world networks are conducted, and the results verify the effectiveness and efficiency of our approach.

Large, labeled datasets have driven deep learning methods to achieve expert-level performance on a variety of medical imaging tasks. We present CheXpert, a large dataset that contains 224,316 chest radiographs of 65,240 patients. We design a labeler to automatically detect the presence of 14 observations in radiology reports, capturing uncertainties inherent in radiograph interpretation. We investigate different approaches to using the uncertainty labels for training convolutional neural networks that output the probability of these observations given the available frontal and lateral radiographs. On a validation set of 200 chest radiographic studies which were manually annotated by 3 board-certified radiologists, we find that different uncertainty approaches are useful for different pathologies. We then evaluate our best model on a test set composed of 500 chest radiographic studies annotated by a consensus of 5 board-certified radiologists, and compare the performance of our model to that of 3 additional radiologists in the detection of 5 selected pathologies. On Cardiomegaly, Edema, and Pleural Effusion, the model ROC and PR curves lie above all 3 radiologist operating points. We release the dataset to the public as a standard benchmark to evaluate performance of chest radiograph interpretation models.

The formation of a complex network is highly driven by multi-aspect node influences and interactions, reflected on network structures and the content embodied in network nodes. Limited work has jointly modeled all these aspects, which typically focuses on topological structures but overlooks the heterogeneous interactions behind node linkage and contributions of node content to the interactive heterogeneities. Here, we propose a multi-aspect interaction and influence-unified evolutionary coupled system (MAI-ECS) for network representation by involving node content and linkage-based network structure. MAI-ECS jointly and iteratively learns two systems: a multi-aspect interaction learning system to capture heterogeneous hidden interactions between nodes and an influence propagation system to capture multiaspect node influences and their propagation between nodes. MAI-ECS couples, unifies and optimizes the two systems toward an effective representation of explicit node content and network structure, and implicit node interactions and influences. MAI-ECS shows superior performance in node classification and link prediction in comparison with the stateof-the-art methods on two real-world datasets. Further, we demonstrate the semantic interpretability of the results generated by MAI-ECS.

With the rapid expansion of mobile phone networks in developing countries, large-scale graph machine learning has gained sudden relevance in the study of global poverty. Recent applications range from humanitarian response and poverty estimation to urban planning and epidemic containment. Yet the vast majority of computational tools and algorithms used in these applications do not account for the multi-view nature of social networks: people are related in myriad ways, but most graph learning models treat relations as binary. In this paper, we develop a graph-based convolutional network for learning on multi-view networks. We show that this method outperforms state-of-the-art semi-supervised learning algorithms on three different prediction tasks using mobile phone datasets from three different developing countries. We also show that, while designed specifically for use in poverty research, the algorithm also outperforms existing benchmarks on a broader set of learning tasks on multi-view networks, including node labelling in citation networks.

Trends in terrestrial temperature variability are perhaps more relevant for species viability than trends in mean temperature. In this paper, we develop methodology for estimating such trends using multi-resolution climate data from polar orbiting weather satellites. We derive two novel algorithms for computation that are tailored for dense, gridded observations over both space and time. We evaluate our methods with a simulation that mimics these data’s features and on a large, publicly available, global temperature dataset with the eventual goal of tracking trends in cloud reflectance temperature variability.

Modern statistical and machine learning methods are increasingly capable of modeling individual or personalized treatment effects. These predictions could be used to allocate different interventions across populations based on individual characteristics. In many domains, like social services, the availability of different possible interventions can be severely resource limited. This paper considers possible improvements to the allocation of such services in the context of homelessness service provision in a major metropolitan area. Using data from the homeless system, we use a counterfactual approach to show potential for substantial benefits in terms of reducing the number of families who experience repeat episodes of homelessness by choosing optimal allocations (based on predicted outcomes) to a fixed number of beds in different types of homelessness service facilities. Such changes in the allocation mechanism would not be without tradeoffs, however; a significant fraction of households are predicted to have a higher probability of re-entry in the optimal allocation than in the original one. We discuss the efficiency, equity and fairness issues that arise and consider potential implications for policy.

Diffusion imaging and tractography enable mapping structural connections in the human brain, in-vivo. Linear Fascicle Evaluation (LiFE) is a state-of-the-art approach for pruning spurious connections in the estimated structural connectome, by optimizing its fit to the measured diffusion data. Yet, LiFE imposes heavy demands on computing time, precluding its use in analyses of large connectome databases. Here, we introduce a GPU-based implementation of LiFE that achieves 50-100x speedups over conventional CPU-based implementations for connectome sizes of up to several million fibers. Briefly, the algorithm accelerates generalized matrix multiplications on a compressed tensor through efficient GPU kernels, while ensuring favorable memory access patterns. Leveraging these speedups, we advance LiFE’s algorithm by imposing a regularization constraint on estimated fiber weights during connectome pruning. Our regularized, accelerated, LiFE algorithm (“ReAl-LiFE”) estimates sparser connectomes that also provide more accurate fits to the underlying diffusion signal. We demonstrate the utility of our approach by classifying pathological signatures of structural connectivity in patients with Alzheimer’s Disease (AD). We estimated million fiber whole-brain connectomes, followed by pruning with ReAl-LiFE, for 90 individuals (45 AD patients and 45 healthy controls). Linear classifiers, based on support vector machines, achieved over 80% accuracy in classifying AD patients from healthy controls based on their ReAl-LiFE pruned structural connectomes alone. Moreover, classification based on the ReAl-LiFE pruned connectome outperformed both the unpruned connectome, as well as the LiFE pruned connectome, in terms of accuracy. We propose our GPU-accelerated approach as a widely relevant tool for non-negative least squares optimization, across many domains.

Current Internet market makers are facing an intense competitive environment, where personalized price reductions or discounted coupons are provided by their peers to attract more customers. Much investment is spent to catch up with each other’s competitors but participants in such a price cut war are often incapable of winning due to their lack of information about others’ strategies or customers’ preference. We formalize the problem as a stochastic game with imperfect and incomplete information and develop a variant of Latent Dirichlet Allocation (LDA) to infer latent variables under the current market environment, which represents preferences of customers and strategies of competitors. Tests on simulated experiments and an open dataset for real data show that, by subsuming all available market information of the market maker’s competitors, our model exhibits a significant improvement for understanding the market environment and finding the best response strategies in the Internet price war. Our work marks the first successful learning method to infer latent information in the environment of price war by the LDA modeling, and sets an example for related competitive applications to follow.

Geographic information systems’ (GIS) research is widely used within the social and physical sciences and plays a crucial role in the development and implementation by governments of economic, education, environment and transportation policy. While machine learning methods have been applied to GIS datasets, the uptake of powerful deep learning CNN methodologies has been limited in part due to challenges posed by the complex and often poorly structured nature of the data. In this paper, we demonstrate the utility of GCNNs for GIS analysis via a multi-graph hierarchical spatial-filter GCNN network model in the context of GIS systems to predict election outcomes using socio-economic features drawn from the 2016 Australian Census. We report a marked improvement in performance accuracy of Hierarchical GCNNs over benchmark generalised linear models and standard GCNNs, especially in semi-supervised tasks. These results indicate the widespread potential for GIS-GCNN research methods to enrich socio-economic GIS analysis, aiding the social sciences and policy development.

Blame games tend to follow major disruptions, be they financial crises, natural disasters or terrorist attacks. To study how the blame game evolves and shapes the dominant crisis narratives is of great significance, as sense-making processes can affect regulatory outcomes, social hierarchies, and cultural norms. However, it takes tremendous time and efforts for social scientists to manually examine each relevant news article and extract the blame ties (A blames B). In this study, we define a new task, Blame Tie Extraction, and construct a new dataset related to the United States financial crisis (20072010) from The New York Times, The Wall Street Journal and USA Today. We build a Bi-directional Long Short-Term Memory (BiLSTM) network for contexts where the entities appear in and it learns to automatically extract such blame ties at the document level. Leveraging the large unsupervised model such as GloVe and ELMo, our best model achieves an F1 score of 70% on the test set for blame tie extraction, making it a useful tool for social scientists to extract blame ties more efficiently.

Though there is a growing literature on fairness for supervised learning, incorporating fairness into unsupervised learning has been less well-studied. This paper studies fairness in the context of principal component analysis (PCA). We first define fairness for dimensionality reduction, and our definition can be interpreted as saying a reduction is fair if information about a protected class (e.g., race or gender) cannot be inferred from the dimensionality-reduced data points. Next, we develop convex optimization formulations that can improve the fairness (with respect to our definition) of PCA and kernel PCA. These formulations are semidefinite programs, and we demonstrate their effectiveness using several datasets. We conclude by showing how our approach can be used to perform a fair (with respect to age) clustering of health data that may be used to set health insurance rates.