| Total: 96
From the observation that users reading news tend to not click outdated news, we propose the notion of 'lifetime' of news, with two hypotheses: (i) news has a shorter lifetime, compared to other types of items such as movies or e-commerce products; (ii) news only competes with other news whose lifetimes have not ended, and which has an overlapping lifetime (i.e., limited competitions). By further developing the characteristics of the lifetime of news, then we present a novel approach for news recommendation, namely, Lifetime-Aware News reCommEndeR System (LANCER) that carefully exploits the lifetime of news during training and recommendation. Using real-world news datasets (e.g., Adressa and MIND), we successfully demonstrate that state-of-the-art news recommendation models can get significantly benefited by integrating the notion of lifetime and LANCER, by up to about 40% increases in recommendation accuracy.
Cross-domain recommendation (CDR) aims to alleviate the data sparsity by transferring knowledge from an informative source domain to the target domain, which inevitably proposes stern challenges to data privacy and transferability during the transfer process. A small amount of recent CDR works have investigated privacy protection, while they still suffer from satisfying practical requirements (e.g., limited privacy-preserving ability) and preventing the potential risk of negative transfer. To address the above challenging problems, we propose a novel and unified privacy-preserving federated framework for dual-target CDR, namely P2FCDR. We design P2FCDR as peer-to-peer federated network architecture to ensure the local data storage and privacy protection of business partners. Specifically, for the special knowledge transfer process in CDR under federated settings, we initialize an optimizable orthogonal mapping matrix to learn the embedding transformation across domains and adopt the local differential privacy technique on the transformed embedding before exchanging across domains, which provides more reliable privacy protection. Furthermore, we exploit the similarity between in-domain and cross-domain embedding, and develop a gated selecting vector to refine the information fusion for more accurate dual transfer. Extensive experiments on three real-world datasets demonstrate that P2FCDR significantly outperforms the state-of-the-art methods and effectively protects data privacy.
Understanding the relationships between items can improve the accuracy and interpretability of recommender systems. Among these relationships, the substitute and complement relationships attract the most attention in e-commerce platforms. The substitutable items are interchangeable and might be compared with each other before purchasing, while the complementary items are used in conjunction and are usually bought together with the query item. In this paper, we focus on two issues of inferring the substitutable and complementary items: 1) how to model their mutual influence to improve the performance of downstream tasks, 2) how to further discriminate them by considering the strength of relationship for different item pairs. We propose a novel multi-task learning framework named Enhanced Multi-Relationships Integration Graph Convolutional Network (EMRIGCN). We regard the relationship inference task as a link prediction task in heterogeneous graph with different types of edges between nodes (items). To model the mutual influence between substitute and complement, EMRIGCN adopts a two-level integration module, i.e., feature and structure integration, based on experts sharing mechanism during message passing. To obtain the strength of relationship for item pairs, we build an auxiliary loss function to further increase or decrease the distances between embeddings of items with weak or strong relation in latent space. Extensive experiments on both public and industrial datasets prove that EMRIGCN significantly outperforms the state-of-the-art solutions. We also conducted A/B tests on real world recommender systems of Meituan Maicai, an online supermarket platform in China, and obtained 15.3% improvement on VBR and 15.34% improvement on RPM.
Temporal facts, the facts for characterizing events that hold in specific time periods, are attracting rising attention in the knowledge graph (KG) research communities. In terms of quality management, the introduction of time restrictions brings new challenges to maintaining the temporal consistency of KGs and detecting potential temporal conflicts. Previous studies rely on manually enumerated temporal constraints to detect conflicts, which are labor-intensive and may have granularity issues. We start from the common pattern of temporal facts and constraints and propose a pattern-based temporal constraint mining method, PaTeCon. PaTeCon uses automatically determined graph patterns and their relevant statistical information over the given KG instead of human experts to generate time constraints. Specifically, PaTeCon dynamically attaches type restriction to candidate constraints according to their measuring scores. We evaluate PaTeCon on two large-scale datasets based on Wikidata and Freebase respectively, the experimental results show that pattern-based automatic constraint mining is powerful in generating valuable temporal constraints.
Entity linking (EL) is the task of linking the text segments to the referring entities in the knowledge graph, typically decomposed into mention detection, and entity disambiguation. Compared to traditional methods treating the two tasks separately, recent end-to-end entity linking methods exploit the mutual dependency between mentions and entities to achieve better performance. However, existing end-to-end EL methods have problems utilizing the dependency of mentions and entities in the task. To this end, we propose to model the EL task as a hierarchical decision-making process and design a hierarchical reinforcement learning algorithm to solve the problem. We conduct extensive experiments to show that the proposed method achieves state-of-the-art performance in several EL benchmark datasets. Our code is publicly available at https://github.com/lhlclhl/he2eel.
We propose an entity-agnostic representation learning method for handling the problem of inefficient parameter storage costs brought by embedding knowledge graphs. Conventional knowledge graph embedding methods map elements in a knowledge graph, including entities and relations, into continuous vector spaces by assigning them one or multiple specific embeddings (i.e., vector representations). Thus the number of embedding parameters increases linearly as the growth of knowledge graphs. In our proposed model, Entity-Agnostic Representation Learning (EARL), we only learn the embeddings for a small set of entities and refer to them as reserved entities. To obtain the embeddings for the full set of entities, we encode their distinguishable information from their connected relations, k-nearest reserved entities, and multi-hop neighbors. We learn universal and entity-agnostic encoders for transforming distinguishable information into entity embeddings. This approach allows our proposed EARL to have a static, efficient, and lower parameter count than conventional knowledge graph embedding methods. Experimental results show that EARL uses fewer parameters and performs better on link prediction tasks than baselines, reflecting its parameter efficiency.
Due to the powerful capability to gather the information of neighborhood nodes, Graph Convolutional Network (GCN) has become a widely explored hotspot in recent years. As a well-established extension, Graph AutoEncoder (GAE) succeeds in mining underlying node representations via evaluating the quality of adjacency matrix reconstruction from learned features. However, limited works on GAE were devoted to leveraging both semantic and topological graphs, and they only indirectly extracted the relationships between graphs via weights shared by features. To better capture the connections between nodes from these two types of graphs, this paper proposes a graph neural network dubbed Dual Low-Rank Graph AutoEncoder (DLR-GAE), which takes both semantic and topological homophily into consideration. Differing from prior works that share common weights between GCNs, the presented DLR-GAE conducts sustained exploration of low-rank information between two distinct graphs, and reconstructs adjacency matrices from learned latent factors and embeddings. In order to obtain valid adjacency matrices that meet certain conditions, we design some surrogates and projections to restrict the learned factor matrix. We compare the proposed model with state-of-the-art methods on several datasets, which demonstrates the superior accuracy of DLR-GAE in semi-supervised classification.
Sequential Recommender Systems (SRSs) aim to predict the next item that users will consume, by modeling the user interests within their item sequences. While most existing SRSs focus on a single type of user behavior, only a few pay attention to multi-behavior sequences, although they are very common in real-world scenarios. It is challenging to effectively capture the user interests within multi-behavior sequences, because the information about user interests is entangled throughout the sequences in complex relationships. To this end, we first address the characteristics of multi-behavior sequences that should be considered in SRSs, and then propose novel methods for Dynamic Multi-behavior Sequence modeling named DyMuS, which is a light version, and DyMuS+, which is an improved version, considering the characteristics. DyMuS first encodes each behavior sequence independently, and then combines the encoded sequences using dynamic routing, which dynamically integrates information required in the final result from among many candidates, based on correlations between the sequences. DyMuS+, furthermore, applies the dynamic routing even to encoding each behavior sequence to further capture the correlations at item-level. Moreover, we release a new, large and up-to-date dataset for multi-behavior recommendation. Our experiments on DyMuS and DyMuS+ show their superiority and the significance of capturing the characteristics of multi-behavior sequences.
Knowledge graphs represent known facts using triplets. While existing knowledge graph embedding methods only consider the connections between entities, we propose considering the relationships between triplets. For example, let us consider two triplets T1 and T2 where T1 is (Academy_Awards, Nominates, Avatar) and T2 is (Avatar, Wins, Academy_Awards). Given these two base-level triplets, we see that T1 is a prerequisite for T2. In this paper, we define a higher-level triplet to represent a relationship between triplets, e.g., where PrerequisiteFor is a higher-level relation. We define a bi-level knowledge graph that consists of the base-level and the higher-level triplets. We also propose a data augmentation strategy based on the random walks on the bi-level knowledge graph to augment plausible triplets. Our model called BiVE learns embeddings by taking into account the structures of the base-level and the higher-level triplets, with additional consideration of the augmented triplets. We propose two new tasks: triplet prediction and conditional link prediction. Given a triplet T1 and a higher-level relation, the triplet prediction predicts a triplet that is likely to be connected to T1 by the higher-level relation, e.g., . The conditional link prediction predicts a missing entity in a triplet conditioned on another triplet, e.g., . Experimental results show that BiVE significantly outperforms all other methods in the two new tasks and the typical base-level link prediction in real-world bi-level knowledge graphs.
Existing knowledge graph (KG) embedding models have primarily focused on static KGs. However, real-world KGs do not remain static, but rather evolve and grow in tandem with the development of KG applications. Consequently, new facts and previously unseen entities and relations continually emerge, necessitating an embedding model that can quickly learn and transfer new knowledge through growth. Motivated by this, we delve into an expanding field of KG embedding in this paper, i.e., lifelong KG embedding. We consider knowledge transfer and retention of the learning on growing snapshots of a KG without having to learn embeddings from scratch. The proposed model includes a masked KG autoencoder for embedding learning and update, with an embedding transfer strategy to inject the learned knowledge into the new entity and relation embeddings, and an embedding regularization method to avoid catastrophic forgetting. To investigate the impacts of different aspects of KG growth, we construct four datasets to evaluate the performance of lifelong KG embedding. Experimental results show that the proposed model outperforms the state-of-the-art inductive and lifelong embedding baselines.
Sequential recommendation is an important task to predict the next-item to access based on a sequence of interacted items. Most existing works learn user preference as the transition pattern from the previous item to the next one, ignoring the time interval between these two items. However, we observe that the time interval in a sequence may vary significantly different, and thus result in the ineffectiveness of user modeling due to the issue of preference drift. In fact, we conducted an empirical study to validate this observation, and found that a sequence with uniformly distributed time interval (denoted as uniform sequence) is more beneficial for performance improvement than that with greatly varying time interval. Therefore, we propose to augment sequence data from the perspective of time interval, which is not studied in the literature. Specifically, we design five operators (Ti-Crop, Ti-Reorder, Ti-Mask, Ti-Substitute, Ti-Insert) to transform the original non-uniform sequence to uniform sequence with the consideration of variance of time intervals. Then, we devise a control strategy to execute data augmentation on item sequences in different lengths. Finally, we implement these improvements on a state-of-the-art model CoSeRec and validate our approach on four real datasets. The experimental results show that our approach reaches significantly better performance than the other 9 competing methods. Our implementation is available: https://github.com/KingGugu/TiCoSeRec.
We present a simple linear programming (LP) based method to learn compact and interpretable sets of rules encoding the facts in a knowledge graph (KG) and use these rules to solve the KG completion problem. Our LP model chooses a set of rules of bounded complexity from a list of candidate first-order logic rules and assigns weights to them. The complexity bound is enforced via explicit constraints. We combine simple rule generation heuristics with our rule selection LP to obtain predictions with accuracy comparable to state-of-the-art codes, even while generating much more compact rule sets. Furthermore, when we take as input rules generated by other codes, we often improve interpretability by reducing the number of chosen rules, while maintaining accuracy.
As a representative of public transportation, the fundamental issue of managing bike-sharing systems is bike flow prediction. Recent methods overemphasize the spatio-temporal correlations in the data, ignoring the effects of contextual conditions on the transportation system and the inter-regional time-varying causality. In addition, due to the disturbance of incomplete observations in the data, random contextual conditions lead to spurious correlations between data and features, making the prediction of the model ineffective in special scenarios. To overcome this issue, we propose a Spatio-temporal Neural Structure Causal Model(STNSCM) from the perspective of causality. First, we build a causal graph to describe the traffic prediction, and further analyze the causal relationship between the input data, contextual conditions, spatio-temporal states, and prediction results. Second, we propose to apply the frontdoor criterion to eliminate confounding biases in the feature extraction process. Finally, we propose a counterfactual representation reasoning module to extrapolate the spatio-temporal state under the factual scenario to future counterfactual scenarios to improve the prediction performance. Experiments on real-world datasets demonstrate the superior performance of our model, especially its resistance to fluctuations caused by the external environment. The source code and data will be released.
Deep generative models have demonstrated superior performance in lossless compression on identically distributed data. However, in real-world scenarios, data to be compressed are of various distributions and usually cannot be known in advance. Thus, commercially expected neural compression must have strong Out-of-Distribution (OoD) generalization capabilities. Compared with traditional compression methods, deep learning methods have intrinsic flaws for OoD generalization. In this work, we make the attempt to tackle this challenge via exploiting a zoo of Deep Autoregressive models (DAMix). We build a model zoo consisting of autoregressive models trained on data from diverse distributions. In the test phase, we select useful expert models by a simple model evaluation score and adaptively aggregate the predictions of selected models. By assuming the outputs from each expert model are biased in favor of their training distributions, a von Mises-Fisher based filter is proposed to recover the value of unbiased predictions that provides more accurate density estimations than a single model. We derive the posterior of unbiased predictions as well as concentration parameters in the filter, and a novel temporal Stein variational gradient descent for sequential data is proposed to adaptively update the posterior distributions. We evaluate DAMix on 22 image datasets, including in-distribution and OoD data, and demonstrate that making use of unbiased predictions has up to 45.6% improvement over the single model trained on ImageNet.
Deep Entity Matching (EM) is one of the core research topics in data integration. Typical existing works construct EM models by training deep neural networks (DNNs) based on the training samples with onehot labels. However, these sharp supervision signals of onehot labels harm the generalization of EM models, causing them to overfit the training samples and perform badly in unseen datasets. To solve this problem, we first propose that the challenge of training a well-generalized EM model lies in achieving the compromise between fitting the training samples and imposing regularization, i.e., the bias-variance tradeoff. Then, we propose a novel Soft Target-EnhAnced Matching (Steam) framework, which exploits the automatically generated soft targets as label-wise regularizers to constrain the model training. Specifically, Steam regards the EM model trained in previous iteration as a virtual teacher and takes its softened output as the extra regularizer to train the EM model in the current iteration. As such, Steam effectively calibrates the obtained EM model, achieving the bias-variance tradeoff without any additional computational cost. We conduct extensive experiments over open datasets and the results show that our proposed Steam outperforms the state-of-the-art EM approaches in terms of effectiveness and label efficiency.
Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also face some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate augmented data into models by randomly masking parts of the input. However, some open problems of random dropping on GNNs remain to be solved. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, augmented data introduced to GNNs causes the incomplete coverage of parameters and unstable training process. Third, there is no theoretical analysis on the effectiveness of random dropping methods on GNNs. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the propagated messages during the message-passing process. More importantly, we find that DropMessage provides a unified framework for most existing random dropping methods, based on which we give theoretical analysis of their effectiveness. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, enabling it become a theoretical upper bound of other methods. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has the advantages of both effectiveness and generalization, and can significantly alleviate the problems mentioned above. A detailed version with full appendix can be found on arXiv: https://arxiv.org/abs/2204.10037.
A core step of mining human mobility data is to learn accurate representations for user-generated check-in sequences. The learned representations should be able to fully describe the spatial-temporal mobility patterns of users and the high-level semantics of traveling. However, existing check-in sequence representation learning is usually implicitly achieved by end-to-end models designed for specific downstream tasks, resulting in unsatisfactory generalizable abilities and poor performance. Besides, although the sequence representation learning models that follow the contrastive learning pre-training paradigm have achieved breakthroughs in many fields like NLP, they fail to simultaneously consider the unique spatial-temporal characteristics of check-in sequences and need manual adjustments on the data augmentation strategies. So, directly applying them to check-in sequences cannot yield a meaningful pretext task. To this end, in this paper we propose a contrastive pre-training model with adversarial perturbations for check-in sequence representation learning (CACSR). Firstly, we design a novel spatial-temporal augmentation block for disturbing the spatial-temporal features of check-in sequences in the latent space to relieve the stress of designing manual data augmentation strategies. Secondly, to construct an effective contrastive pretext task, we generate “hard” positive and negative pairs for the check-in sequence by adversarial training. These two designs encourage the model to capture the high-level spatial-temporal patterns and semantics of check-in sequences while ignoring the noisy and unimportant details. We demonstrate the effectiveness and versatility of CACSR on two kinds of downstream tasks using three real-world datasets. The results show that our model outperforms both the state-of-the-art pre-training methods and the end-to-end models.
Contrastive learning (CL), which can extract the information shared between different contrastive views, has become a popular paradigm for vision representation learning. Inspired by the success in computer vision, recent work introduces CL into graph modeling, dubbed as graph contrastive learning (GCL). However, generating contrastive views in graphs is more challenging than that in images, since we have little prior knowledge on how to significantly augment a graph without changing its labels. We argue that typical data augmentation techniques (e.g., edge dropping) in GCL cannot generate diverse enough contrastive views to filter out noises. Moreover, previous GCL methods employ two view encoders with exactly the same neural architecture and tied parameters, which further harms the diversity of augmented views. To address this limitation, we propose a novel paradigm named model augmented GCL (MA-GCL), which will focus on manipulating the architectures of view encoders instead of perturbing graph inputs. Specifically, we present three easy-to-implement model augmentation tricks for GCL, namely asymmetric, random and shuffling, which can respectively help alleviate high-frequency noises, enrich training instances and bring safer augmentations. All three tricks are compatible with typical data augmentations. Experimental results show that MA-GCL can achieve state-of-the-art performance on node classification benchmarks by applying the three tricks on a simple base model. Extensive studies also validate our motivation and the effectiveness of each trick. (Code, data and appendix are available at https://github.com/GXM1141/MA-GCL. )
Many deep spatio-temporal learning methods have been proposed for crowd flow modeling in recent years. However, most of them focus on designing a spatial and temporal convolution mechanism to aggregate information from nearby nodes and historical observations for a pre-defined prediction task. Different from the existing research, this paper aims to provide a generic and dynamic representation learning method for crowd flow modeling. The main idea of our method is to maintain a continuous-time representation for each node, and update the representations of all nodes continuously according to the streaming observed data. Along this line, a particular encoder-decoder architecture is proposed, where the encoder converts the newly happened transactions into a timestamped message, and then the representations of related nodes are updated according to the generated message. The role of the decoder is to guide the representation learning process by reconstructing the observed transactions based on the most recent node representations. Moreover, a number of virtual nodes are added to discover macro-level spatial patterns and also share the representations among spatially-interacted stations. Experiments have been conducted on two real-world datasets for four popular prediction tasks in crowd flow modeling. The result demonstrates that our method could achieve better prediction performance for all the tasks than baseline methods.
Learning the underlying distribution of molecular graphs and generating high-fidelity samples is a fundamental research problem in drug discovery and material science. However, accurately modeling distribution and rapidly generating novel molecular graphs remain crucial and challenging goals. To accomplish these goals, we propose a novel Conditional Diffusion model based on discrete Graph Structures (CDGS) for molecular graph generation. Specifically, we construct a forward graph diffusion process on both graph structures and inherent features through stochastic differential equations (SDE) and derive discrete graph structures as the condition for reverse generative processes. We present a specialized hybrid graph noise prediction model that extracts the global context and the local node-edge dependency from intermediate graph states. We further utilize ordinary differential equation (ODE) solvers for efficient graph sampling, based on the semi-linear structure of the probability flow ODE. We also combine the solvers with gradient guidance from the molecule property predictor for similarity-constrained molecule optimization. Experiments on diverse datasets validate the effectiveness of our framework. Particularly, the proposed method still generates high-quality molecular graphs in a limited number of steps.
This paper investigates a new yet challenging problem called Reverse k-Maximum Inner Product Search (RkMIPS). Given a query (item) vector, a set of item vectors, and a set of user vectors, the problem of RkMIPS aims to find a set of user vectors whose inner products with the query vector are one of the k largest among the query and item vectors. We propose the first subquadratic-time algorithm, i.e., Shifting-aware Asymmetric Hashing (SAH), to tackle the RkMIPS problem. To speed up the Maximum Inner Product Search (MIPS) on item vectors, we design a shifting-invariant asymmetric transformation and develop a novel sublinear-time Shifting-Aware Asymmetric Locality Sensitive Hashing (SA-ALSH) scheme. Furthermore, we devise a new blocking strategy based on the Cone-Tree to effectively prune user vectors (in a batch). We prove that SAH achieves a theoretical guarantee for solving the RMIPS problem. Experimental results on five real-world datasets show that SAH runs 4~8x faster than the state-of-the-art methods for RkMIPS while achieving F1-scores of over 90%. The code is available at https://github.com/HuangQiang/SAH.
Beyond achieving higher compression efficiency over classical image compression codecs, deep image compression is expected to be improved with additional side information, e.g., another image from a different perspective of the same scene. To better utilize the side information under the distributed compression scenario, the existing method only implements patch matching at the image domain to solve the parallax problem caused by the difference in viewing points. However, the patch matching at the image domain is not robust to the variance of scale, shape, and illumination caused by the different viewing angles, and can not make full use of the rich texture information of the side information image. To resolve this issue, we propose Multi-Scale Feature Domain Patch Matching (MSFDPM) to fully utilizes side information at the decoder of the distributed image compression model. Specifically, MSFDPM consists of a side information feature extractor, a multi-scale feature domain patch matching module, and a multi-scale feature fusion network. Furthermore, we reuse inter-patch correlation from the shallow layer to accelerate the patch matching of the deep layer. Finally, we find that our patch matching in a multi-scale feature domain further improves compression rate by about 20% compared with the patch matching method at image domain.
With the rapid development of the airline industry, maximizing the market share with a constrained budget is an urgent econometric problem for an airline. We investigate the problem by adjusting flight frequencies on different flight routes. Owing to the large search space of solutions and the difficulty of predicting the market, this problem is in general daunting to solve. This paper proposes a novel two-stage optimization method to address the challenges. On the higher level, we use a signal to guide the optimization process toward a constrained satisfying solution. On the lower level, we consider the consecutive itineraries in real scenarios and model the unseen correlations between routes in itineraries for market share prediction. In theory, we prove the convergence of our optimization approach. In the experiment, we empirically verify the superiority of both our prediction model and optimization approach over existing works with large-scale real-world data. Our code has been released at: https://github.com/codingAndBS/AirlineMarket.
Graph Neural Networks (GNNs) have been a prevailing technique for tackling various analysis tasks on graph data. A key premise for the remarkable performance of GNNs relies on complete and trustworthy initial graph descriptions (i.e., node features and graph structure), which is often not satisfied since real-world graphs are often incomplete due to various unavoidable factors. In particular, GNNs face greater challenges when both node features and graph structure are incomplete at the same time. The existing methods either focus on feature completion or structure completion. They usually rely on the matching relationship between features and structure, or employ joint learning of node representation and feature (or structure) completion in the hope of achieving mutual benefit. However, recent studies confirm that the mutual interference between features and structure leads to the degradation of GNN performance. When both features and structure are incomplete, the mismatch between features and structure caused by the missing randomness exacerbates the interference between the two, which may trigger incorrect completions that negatively affect node representation. To this end, in this paper we propose a general GNN framework based on teacher-student distillation to improve the performance of GNNs on incomplete graphs, namely T2-GNN. To avoid the interference between features and structure, we separately design feature-level and structure-level teacher models to provide targeted guidance for student model (base GNNs, such as GCN) through distillation. Then we design two personalized methods to obtain well-trained feature and structure teachers. To ensure that the knowledge of the teacher model is comprehensively and effectively distilled to the student model, we further propose a dual distillation mode to enable the student to acquire as much expert knowledge as possible. Extensive experiments on eight benchmark datasets demonstrate the effectiveness and robustness of the new framework on graphs with incomplete features and structure.
Healthcare acquired infections (HAIs) (e.g., Methicillin-resistant Staphylococcus aureus infection) have complex transmission pathways, spreading not just via direct person-to-person contacts, but also via contaminated surfaces. Prior work in mathematical epidemiology has led to a class of models – which we call load sharing models – that provide a discrete-time, stochastic formalization of HAI-spread on temporal contact networks. The focus of this paper is the source detection problem for the load sharing model. The source detection problem has been studied extensively in SEIR type models, but this prior work does not apply to load sharing models. We show that a natural formulation of the source detection problem for the load sharing model is computationally hard, even to approximate. We then present two alternate formulations that are much more tractable. The tractability of our problems depends crucially on the submodularity of the expected number of infections as a function of the source set. Prior techniques for showing submodularity, such as the "live graph" technique are not applicable for the load sharing model and our key technical contribution is to use a more sophisticated "coupling" technique to show the submodularity result. We propose algorithms for our two problem formulations by extending existing algorithmic results from submodular optimization and combining these with an expectation propagation heuristic for the load sharing model that leads to orders-of-magnitude speedup. We present experimental results on temporal contact networks based on fine-grained EMR data from three different hospitals. Our results on synthetic outbreaks on these networks show that our algorithms outperform baselines by up to 5.97 times. Furthermore, case studies based on hospital outbreaks of Clostridioides difficile infection show that our algorithms identify clinically meaningful sources.