| Total: 43
In the past decade, automatic product description generation for e-commerce have witnessed significant advancement. As the services provided by e-commerce platforms become diverse, it is necessary to dynamically adapt the patterns of descriptions generated. The selling point of products is an important type of product description for which the length should be as short as possible while still conveying key information. In addition, this kind of product description should be eye-catching to the readers. Currently, product selling points are normally written by human experts. Thus, the creation and maintenance of these contents incur high costs. These costs can be significantly reduced if product selling points can be automatically generated by machines. In this paper, we report our experience developing and deploying the Intelligent Online Selling Point Extraction (IOSPE) system to serve the recommendation system in the JD.com e-commerce platform. Since July 2020, IOSPE has become a core service for 62 key categories of products (covering more than 4 million products). So far, it has generated more than 1.1 billion selling points, thereby significantly scaling up the selling point creation operation and saving human labour. These IOSPE generated selling points have increased the click-through rate (CTR) by 1.89% and the average duration the customers spent on the products by more than 2.03% compared to the previous practice, which are significant improvements for such a large-scale e-commerce platform.
Web search engines focus on serving highly relevant results within hundreds of milliseconds. Pre-trained language transformer models such as BERT are therefore hard to use in this scenario due to their high computational demands. We present our real-time approach to the document ranking problem leveraging a BERT-based siamese architecture. The model is already deployed in a commercial search engine and it improves production performance by more than 3%. For further research and evaluation, we release DaReCzech, a unique data set of 1.6 million Czech user query-document pairs with manually assigned relevance levels. We also release Small-E-Czech, an Electra-small language model pre-trained on a large Czech corpus. We believe this data will support endeavours both of search relevance and multilingual-focused research communities.
The paper addresses the challenge of accelerating identification of changes in risk drivers in the insurance industry. Specifically, the work presents a method to identify significant news events ("signals") from batches of news data to inform Life & Health insurance decisions. Signals are defined as events that are relevant to a tracked risk driver, widely discussed in multiple news outlets, contain novel information and affect stakeholders. The method converts unstructured data (news articles) into a sequence of keywords by employing a linguistic knowledge graph-based model. Then, for each time window, the method forms a graph with extracted keywords as nodes and draws weighted edges based on keyword co-occurrences in articles. Lastly, events are derived in an unsupervised way as graph communities and scored for the requirements of a signal: relevance, novelty and virality. The methodology is illustrated for a Life & Health topic using news articles from Dow Jones DNA proprietary data set, and assessed against baselines on a publicly available news data set. The method is implemented as an analytics engine in Early Warning System deployed at Swiss Re for the last 1.5 years to extract relevant events from live news data. We present the system's architectural design in production and discuss its use and impact.
Item representation learning is crucial for search and recommendation tasks in e-commerce. In e-commerce, the instances (e.g., items, users) in different domains are always related. Such instance relationship across domains contains useful local information for transfer learning. However, existing transfer learning based approaches did not leverage this knowledge. In this paper, we report on our experience designing and deploying Prior-Guided Transfer Learning (PGTL) to bridge this gap. It utilizes the instance relationship across domains to extract prior knowledge for the target domain and leverages it to guide the fine-grained transfer learning for e-commerce item representation learning tasks. Rather than directly transferring knowledge from the source domain to the target domain, the prior knowledge can serve as a bridge to link both domains and enhance knowledge transfer, especially when the domain distribution discrepancy is large. Since its deployment on the Taiwanese portal of Taobao in Aug 2020, PGTL has significantly improved the item exposure rate and item click-through rate compared to previous approaches
Artificial intelligence (AI) is a promising technology to transform the healthcare industry. Due to the highly sensitive nature of patient data, federated learning (FL) is often leveraged to build models for smart healthcare applications. Existing deployed FL frameworks cannot address the key issues of varying data quality and heterogeneous data distributions across multiple institutions in this sector. In this paper, we report our experience developing and deploying the Contribution-Aware Federated Learning (CAFL) framework for smart healthcare. It provides an efficient and accurate approach to fairly evaluate FL participants' contribution to model performance without exposing their private data, and improves the FL model training protocol to allow the best performing intermediate models to be distributed to participants for FL training. Since its deployment in Yidu Cloud Technology Inc. in March 2021, CAFL has served 8 well-established medical institutions in China to build healthcare decision support models. It can perform contribution evaluations 2.84 times faster than the best existing approach, and has improved the average accuracy of the resulting models by 2.62% compared to the previous system (which is significant in industrial settings). To our knowledge, it is the first contribution-aware federated learning successfully deployed in the healthcare industry.
Accounts Payable (AP) is a resource-intensive business process in large enterprises for paying vendors within contractual payment deadlines for goods and services procured from them. There are multiple verifications before payment to the supplier/vendor. After the validations, the invoice flows through several steps such as vendor identification, line-item matching for Purchase order (PO) based invoices, Accounting Code identification for Non- Purchase order (Non-PO) based invoices, tax code identification, etc. Currently, each of these steps is mostly manual and cumbersome making it labor-intensive, error-prone, and requiring constant training of agents. Automatically processing these invoices for payment without any manual intervention is quite difficult. To tackle this challenge, we have developed an automated end-to-end invoice processing system using AI-based modules for multiple steps of the invoice processing pipeline. It can be configured to an individual client’s requirements with minimal effort. Currently, the system is deployed in production for two clients. It has successfully processed around ~80k invoices out of which 76% invoices were processed with low or no manual intervention.
We present a machine-learning-guided process that can efficiently extract factor tables from unstructured rate filing documents. Our approach combines multiple deep-learning-based models that work in tandem to create structured representations of tabular data present in unstructured documents such as pdf files. This process combines CNN's to detect tables, language-based models to extract table metadata and conventional computer vision techniques to improve the accuracy of tabular data on the machine-learning side. The extracted tabular data is validated through an intuitive user interface. This process, which we call Harvest, significantly reduces the time needed to extract tabular information from PDF files, enabling analysis of such data at a speed and scale that was previously unattainable.
Product copywriting is a critical component of e-commerce recommendation platforms. It aims to attract users' interest and improve user experience by highlighting product characteristics with textual descriptions. In this paper, we report our experience deploying the proposed Automatic Product Copywriting Generation (APCG) system into the JD.com e-commerce product recommendation platform. It consists of two main components: 1) natural language generation, which is built from a transformer-pointer network and a pre-trained sequence-to-sequence model based on millions of training data from our in-house platform; and 2) copywriting quality control, which is based on both automatic evaluation and human screening. For selected domains, the models are trained and updated daily with the updated training data. In addition, the model is also used as a real-time writing assistant tool on our live broadcast platform. The APCG system has been deployed in JD.com since Feb 2021. By Sep 2021, it has generated 2.53 million product descriptions, and improved the overall averaged click-through rate (CTR) and the Conversion Rate (CVR) by 4.22% and 3.61%, compared to baselines, respectively on a year-on-year basis. The accumulated Gross Merchandise Volume (GMV) made by our system is improved by 213.42%, compared to the number in Feb 2021.
Predictive VM (Virtual Machine) auto-scaling is a promising technique to optimize cloud applications’ operating costs and performance. Understanding the job arrival rate is crucial for accurately predicting future changes in cloud workloads and proactively provisioning and de-provisioning VMs for hosting the applications. However, developing a model that accurately predicts cloud workload changes is extremely challenging due to the dynamic nature of cloud workloads. Long- Short-Term-Memory (LSTM) models have been developed for cloud workload prediction. Unfortunately, the state-of-the-art LSTM model leverages recurrences to predict, which naturally adds complexity and increases the inference overhead as input sequences grow longer. To develop a cloud workload prediction model with high accuracy and low inference overhead, this work presents a novel time-series forecasting model called WGAN-gp Transformer, inspired by the Transformer network and improved Wasserstein-GANs. The proposed method adopts a Transformer network as a generator and a multi-layer perceptron as a critic. The extensive evaluations with real-world workload traces show WGAN- gp Transformer achieves 5× faster inference time with up to 5.1% higher prediction accuracy against the state-of-the-art. We also apply WGAN-gp Transformer to auto-scaling mechanisms on Google cloud platforms, and the WGAN-gp Transformer-based auto-scaling mechanism outperforms the LSTM-based mechanism by significantly reducing VM over-provisioning and under-provisioning rates.
Site Reliability Engineers (SREs) play a key role in identifying the cause of an issue and preforming remediation steps to resolve it. After an issue is reported, SREs come together in a virtual room (collaboration platform) to triage the issue. While doing so, they leave behind a wealth of information, in the form of conversations, which can be used later for triaging similar issues. However, usability of these conversations offer challenges due to them being and scarcity of conversation utterance label. This paper presents a novel approach for issue artefact extraction from noisy conversations with minimal labelled data. We propose a combination of unsupervised and supervised models with minimal human intervention that leverages domain knowledge to predict artefacts for a small amount of conversation data and use that for fine-tuning an already pre-trained language model for artefact prediction on a large amount of conversation data. Experimental results on our dataset show that the proposed ensemble of the unsupervised and supervised models is better than using either one of them individually. We also present a deployment case study of the proposed artefact prediction.
The CO2 capture efficiency in solvent-based carbon capture systems (CCSs) critically depends on the gas-solvent interfacial area (IA), making maximization of IA a foundational challenge in CCS design. While the IA associated with a particular CCS design can be estimated via a computational fluid dynamics (CFD) simulation, using CFD to derive the IAs associated with numerous CCS designs is prohibitively costly. Fortunately, previous works such as Deep Fluids (DF) (Kim et al., 2019) show that large simulation speedups are achievable by replacing CFD simulators with neural network (NN) surrogates that faithfully mimic the CFD simulation process. This raises the possibility of a fast, accurate replacement for a CFD simulator and therefore efficient approximation of the IAs required by CCS design optimization. Thus, here, we build on the DF approach to develop surrogates that can successfully be applied to our complex carbon-capture CFD simulations. Our optimized DF-style surrogates produce large speedups (4000x) while obtaining IA relative errors as low as 4% on unseen CCS configurations that lie within the range of training configurations. This hints at the promise of NN surrogates for our CCS design optimization problem. Nonetheless, DF has inherent limitations with respect to CCS design (e.g., limited transferability of trained models to new CCS packings). We conclude with ideas to address these challenges.
Micronutrient deficiency (MND), which is a form of malnutrition that can have serious health consequences, is difficult to diagnose in early stages without blood draws, which are expensive and time-consuming to collect and process. It is even more difficult at a public health scale seeking to identify regions at higher risk of MND. To provide data more widely and frequently, we propose an accurate, scalable, low-cost, and interpretable regional-level MND prediction system. Specifically, our work is the first to use satellite data, such as forest cover, weather, and presence of water, to predict deficiency of micronutrients such as iron, Vitamin B12, and Vitamin A, directly from their biomarkers. We use real-world, ground truth biomarker data collected from four different regions across Madagascar for training, and demonstrate that satellite data are viable for predicting regional-level MND, surprisingly exceeding the performance of baseline predictions based only on survey responses. Our method could be broadly applied to other countries where satellite data are available, and potentially create high societal impact if these predictions are used by policy makers, public health officials, or healthcare providers.
Improving health equity is an urgent task for our society. The advent of mobile clinics plays an important role in enhancing health equity, as they can provide easier access to preventive healthcare for patients from disadvantaged populations. For effective functioning of mobile clinics, accurate prediction of demand (expected number of individuals visiting mobile clinic) is the key to their daily operations and staff/resource allocation. Despite its importance, there have been very limited studies on predicting demand of mobile clinics. To the best of our knowledge, we are among the first to explore this area, using AI-based techniques. A crucial challenge in this task is that there are no known existing data sources from which we can extract useful information to account for the exogenous factors that may affect the demand, while considering protection of client privacy. We propose a novel methodology that completely uses public data sources to extract the features, with several new components that are designed to improve the prediction. Empirical evaluation on a real-world dataset from the mobile clinic The Family Van shows that, by leveraging publicly available data (which introduces no extra monetary cost to the mobile clinics), our AI-based method achieves 26.4% - 51.8% lower Root Mean Squared Error (RMSE) than the historical average-based estimation (which is presently employed by mobile clinics like The Family Van). Our algorithm makes it possible for mobile clinics to plan proactively, rather than reactively, as what has been doing.
Implementations of artificial intelligence (AI) based on deep learning (DL) have proven to be highly successful in many domains, from biomedical imaging to natural language processing, but are still rarely applied in the space industry, particularly for onboard learning of planetary surfaces. In this project, we discuss the utility and limitations of DL, enhanced with topological footprints of the sensed objects, for multi-class classification of planetary surface patterns, in conjunction with tactile and embedded sensing in rover exploratory missions. We consider a Topological Convolutional Network (TCN) model with a persistence-based attention mechanism for supervised classification of various landforms. We study TCN's performance on the Barefoot surface pattern dataset, a novel surface pressure dataset from a prototype tactile rover wheel, known as the Barefoot Rover tactile wheel. Multi-class pattern recognition in the Barefoot data has neither been ever tackled before with DL nor assessed with topological methods. We provide insights into advantages and restrictions of topological DL as the early-stage concept for onboard learning and planetary exploration.
Finding or tracking the location of an object accurately is a crucial problem in defense applications, robotics and computer vision. Radars fall into the spectrum of high-end defense sensors or systems upon which the security and surveillance of the entire world depends. There has been a lot of focus on the topic of Multi Sensor Tracking in recent years, with radars as the sensors. The Indian Air Force uses a Multi Sensor Tracking (MST) system to detect flights pan India, developed and supported by BEL(Bharat Electronics Limited), a defense agency we are working with. In this paper, we describe our Machine Learning approach, which is built on top of the existing system, the Air force uses. For purposes of this work, we trained our models on about 13 million anonymized real Multi Sensor tracking data points provided by radars performing tracking activity across the Indian air space. The approach has shown an increase in the accuracy of tracking by 5 percent from 91 to 96. The model and the corresponding code were transitioned to BEL, which has been tested in their simulation environment with a plan to take forward for ground testing. Our approach comprises of 3 steps: (a) We train a Neural Network model and a CatBoost model and ensemble them using a Logistic Regression model to predict one type of error, namely Splitting error, which can help to improve the accuracy of tracking. (b) We again train a Neural Network model and a CatBoost model and ensemble them using a different Logistic Regression model to predict the second type of error, namely Merging error, which can further improve the accuracy of tracking. (c) We use cosine similarity to find the nearest neighbour and correct the data points, predicted to have Splitting/Merging errors, by predicting the original global track of these data points.
In this paper, we address a crucial problem in fashion e-commerce (with respect to customer experience, as well as revenue): color variants identification, i.e., identifying fashion products that match exactly in their design (or style), but only to differ in their color. We propose a generic framework, that leverages deep visual Representation Learning at its heart, to address this problem for our fashion e-commerce platform. Our framework could be trained with supervisory signals in the form of triplets, that are obtained manually. However, it is infeasible to obtain manual annotations for the entire huge collection of data usually present in fashion e-commerce platforms, such as ours, while capturing all the difficult corner cases. But, to our rescue, interestingly we observed that this crucial problem in fashion e-commerce could also be solved by simple color jitter based image augmentation, that recently became widely popular in the contrastive Self-Supervised Learning (SSL) literature, that seeks to learn visual representations without using manual labels. This naturally led to a question in our mind: Could we leverage SSL in our use-case, and still obtain comparable performance to our supervised framework? The answer is, Yes! because, color variant fashion objects are nothing but manifestations of a style, in different colors, and a model trained to be invariant to the color (with, or without supervision), should be able to recognize this! This is what the paper further demonstrates, both qualitatively, and quantitatively, while evaluating a couple of state-of-the-art SSL techniques, and also proposing a novel method.
As climate change is increasing the frequency and intensity of climate and weather hazards, improving detection and monitoring of flood events is a priority. Being weather independent and high resolution, Sentinel 1 (S1) radar satellite imagery data has become the go to data source to detect flood events accurately. However, current methods are either based on fixed thresholds to differentiate water from land or train Artificial Intelligence (AI) models based on only S1 data, despite the availability of many other relevant data sources publicly. These models also lack comprehensive validations on out-of-sample data and deployment at scale. In this study, we investigated whether adding extra input layers could increase the performance of AI models in detecting floods from S1 data. We also provide performance across a range of 11 historical events, with results ranging between 0.93 and 0.97 accuracy, 0.53 and 0.81 IoU, and 0.68 and 0.89 F1 scores. Finally, we show the infrastructure we developed to deploy our AI models at scale to satisfy a range of use cases and user requests.
With increasing world population and expanded use of forests as cohabited regions, interactions and conflicts with wildlife are increasing, leading to large scale loss of lives (animal and human) and livelihoods (economic). While community knowledge is valuable, forest officials and conservation organisations can greatly benefit from predictive analysis of human-wildlife conflict, leading to targeted interventions that can potentially help save lives and livelihoods. However, the problem of prediction is a complex socio-technical problem in the context of limited data in low-resource regions. Identifying the right features to make accurate predictions of conflicts at the required spatial granularity using a sparse conflict training dataset is the key challenge that we address in this paper. Specifically, we do an illustrative case study on human-wildlife conflicts in the Bramhapuri Forest Division in Chandrapur, Maharashtra, India. Most existing work has considered human wildlife conflicts in protected areas and to the best of our knowledge, this is the first effort at prediction of human-wildlife conflicts in unprotected areas and using those predictions for deploying interventions on the ground.
We propose PaintTeR, our Paintings TextRank algorithm for extracting art-related text spans from passages on paintings. PaintTeR combines a lexicon of painting words curated automatically through distant supervision with random walks on a large-scale word co-occurrence graph for ranking passage spans for artistic characteristics. The spans extracted with PaintTeR are used in state-of-the-art Question Generation and Reading Comprehension models for designing an interactive aid that enables gallery and museum visitors focus on the artistic elements of paintings. We provide experiments on two datasets of expert-written passages on paintings to showcase the effectiveness of PaintTeR. Evaluations by both gallery experts as well as crowdworkers indicate that our proposed algorithm can be used to select relevant and interesting art-centered questions. To the best of our knowledge, ours is the first work to effectively fine-tune question generation models using minimal supervision for a low-resource, specialized context such as gallery visits.
Various types of machine learning techniques are available for analyzing electronic health records (EHRs). For predictive tasks, most existing methods either explicitly or implicitly divide these time-series datasets into predetermined observation and prediction windows. Patients have different lengths of medical history and the desired predictions (for purposes such as diagnosis or treatment) are required at different times in the future. In this paper, we propose a method that uses a sequence-to-sequence generator model to transfer an input sequence of EHR data to a sequence of user-defined target labels, providing the end-users with ``flexible'' observation and prediction windows to define. We use adversarial and semi-supervised approaches in our design, where the sequence-to-sequence model acts as a generator and a discriminator distinguishes between the actual (observed) and generated labels. We evaluate our models through an extensive series of experiments using two large EHR datasets from adult and pediatric populations. In an obesity predicting case study, we show that our model can achieve superior results in flexible-window prediction tasks, after being trained once and even with large missing rates on the input EHR data. Moreover, using a number of attention analysis experiments, we show that the proposed model can effectively learn more relevant features in different prediction tasks.
Formal response organizations perform rapid damage assessments after natural and human-induced disasters to measure the extent of damage to infrastructures such as roads, bridges, and buildings. This time-critical task, when performed using traditional approaches such as experts surveying the disaster areas, poses serious challenges and delays response. This paper presents an AI-based system that leverages citizen science to collect damage images reported on social media and perform rapid damage assessment in real-time. Several image processing models in the system tackle non-trivial challenges posed by social media as a data source, such as high-volume of redundant and irrelevant content. The system determines the severity of damage using a state-of-the-art computer vision model. Together with a response organization in the US, we deployed the system to identify damage reports during a major real-world disaster. We observe that almost 42% of the images are unique, 28% relevant, and more importantly, only 10% of them contain either mild or severe damage. Experts from our partner organization provided feedback on the system's mistakes, which we used to perform additional experiments to retrain the models. Consequently, the retrained models based on expert feedback on the target domain data helped us achieve significant performance improvements.
In recent years, companies in the Architecture, Engineering, and Construction (AEC) industry have started exploring how artificial intelligence (AI) can reduce time-consuming and repetitive tasks. One use case that can benefit from the adoption of AI is the determination of quantities in floor plans. This information is required for several planning and construction steps. Currently, the task requires companies to invest a significant amount of manual effort. Either digital floor plans are not available for existing buildings, or the formats cannot be processed due to lack of standardization. In this paper, we therefore propose a human-in-the-loop approach for the detection and classification of symbols in floor plans. The developed system calculates a measure of uncertainty for each detected symbol which is used to acquire the knowledge of human experts for those symbols that are difficult to classify. We evaluate our approach with a real-world dataset provided by an industry partner and find that the selective acquisition of human expert knowledge enhances the model’s performance by up to 12.9%—resulting in an overall prediction accuracy of 92.1% on average. We further design a pipeline for the generation of synthetic training data that allows the systems to be adapted to new construction projects with minimal manual effort. Overall, our work supports professionals in the AEC industry on their journey to the data-driven generation of business value.
Product allocation in retail is the process of placing products throughout a store to connect consumers with relevant products. Discovering a good allocation strategy is challenging due to the scarcity of data and the high cost of experimentation in the physical world. Some work explores Reinforcement learning (RL) as a solution, but these approaches are often limited because of the sim2real problem. Learning policies from logged trajectories of a system is a key step forward for RL in physical systems. Recent work has shown that model-based offline RL can improve the effectiveness of offline policy estimation through uncertainty-penalized exploration. However, existing work assumes a continuous state space and access to a covariance matrix of the environment dynamics, which is not possible in the discrete case. To solve this problem, we propose a Bayesian model-based technique that naturally produces probabilistic estimates of the environment dynamics via the posterior predictive distribution, which we use for uncertainty-penalized exploration. We call our approach Posterior Penalized Offline Policy Optimization (PPOPO). We show that our world model better fits historical data due to informative priors, and that PPOPO outperforms other offline techniques in simulation and against real-world data.
More than US$ 27 billion is estimated to have been paid-out in farm support in USA alone since 1991 in response to climate change impacts on agriculture, with costs likely continuing to rise. With the wider adoption of precision agriculture - an agriculture management strategy that involves gathering, processing and analyzing temporal, spatial and individual data - in both developed and developing countries, there is an increasing opportunity to harness accumulating, shareable, big data using artificial intelligence (AI) methods, collected from weather stations, field sensor networks, Internet-of-Things devices, unmanned aerial vehicles, and earth observational satellites. This requires smart algorithms tailored to agricultural data types, integrated into digital solutions that are viable, flexible, and scalable for wide deployment for a wide variety of agricultural users and decision-makers. We discuss a novel AI approach that addresses the real-world problem of developing a viable solution for reliably, timely, and cost-effectively forecasting crop status across large agricultural regions using Earth observational information in near-real-time. Our approach is based on extracting time-conditioned topological features which characterize complex spatio-temporal dependencies between crop production regions and integrating such topological signatures into Long Short Term Memory (LSTM). We discuss utility and limitations of the resulting zigzag persistence-based LSTM (ZZTop-LSTM) as a new tool for developing more informed crop insurance rate-making and accurate tracking of changing risk exposures and vulnerabilities within insurance risk areas.
Driver's anxiety about the remaining driving range of electric vehicles (EVs) has been quite improved by mounting a high-capacity battery pack. However, when EVs need to be charged, the drivers still feel uncomfortable if inaccurate range prediction is provided because the inaccuracy makes it difficult to decide when and where to charge EV. In this paper, to mitigate the EV range anxiety, a new machine learning (ML) method to enhance range prediction accuracy is proposed in a practical way. For continuously obtaining the recent traffic conditions ahead, input features indicating the near-future vehicle dynamics are connected to a long short-term memory (LSTM) network, which can consecutively utilize a relation of neighboring data, and then the output features of the LSTM network with another input features consisting of energy-related vehicle system states become another input layer for deep learning network (DNN). The proposed LSTM-DNN mixture model is trained by exploiting the driving data of about 160,000 km and the following test performance shows that the model retains the range prediction accuracy of 2 ~ 3 km in a time window of 40 min. The test results indicate that the LSTM-DNN range prediction model is able to make a far-sighted range prediction while considering varying map and traffic information to a destination.