Distributed, Parallel, and Cluster Computing | Cool Papers

#1 A Unified Solution to Diverse Heterogeneities in One-shot Federated Learning [PDF] [Copy] [Kimi] [REL]

Authors: Jun Bai ; Yiliao Song ; Di Wu ; Atul Sajjanhar ; Yong Xiang ; Wei Zhou ; Xiaohui Tao ; Yan Li

One-shot federated learning (FL) limits the communication between the server and clients to a single round, which largely decreases the privacy leakage risks in traditional FLs requiring multiple communications. However, we find existing one-shot FL frameworks are vulnerable to distributional heterogeneity due to their insufficient focus on data heterogeneity while concentrating predominantly on model heterogeneity. Filling this gap, we propose a unified, data-free, one-shot federated learning framework (FedHydra) that can effectively address both model and data heterogeneity. Rather than applying existing value-only learning mechanisms, a structure-value learning mechanism is proposed in FedHydra. Specifically, a new stratified learning structure is proposed to cover data heterogeneity, and the value of each item during computation reflects model heterogeneity. By this design, the data and model heterogeneity issues are simultaneously monitored from different aspects during learning. Consequently, FedHydra can effectively mitigate both issues by minimizing their inherent conflicts. We compared FedHydra with three SOTA baselines on four benchmark datasets. Experimental results show that our method outperforms the previous one-shot FL methods in both homogeneous and heterogeneous settings.

Subjects: Distributed, Parallel, and Cluster Computing ; Machine Learning

Publish: 2024-10-28 15:20:52 UTC

#2 CloudHeatMap: Heatmap-Based Monitoring for Large-Scale Cloud Systems [PDF] [Copy] [Kimi] [REL]

Authors: Sarah Sohana ; William Pourmajidi ; John Steinbacher ; Andriy Miranskyy

Cloud computing is essential for modern enterprises, requiring robust tools to monitor and manage Large-Scale Cloud Systems (LCS). Traditional monitoring tools often miss critical insights due to the complexity and volume of LCS telemetry data. This paper presents CloudHeatMap, a novel heatmap-based visualization tool for near-real-time monitoring of LCS health. It offers intuitive visualizations of key metrics such as call volumes, response times, and HTTP response codes, enabling operators to quickly identify performance issues. A case study on the IBM Cloud Console demonstrates the tool's effectiveness in enhancing operational monitoring and decision-making. A demonstration is available at https://www.youtube.com/watch?v=3u5K1qp51EA .

Subjects: Distributed, Parallel, and Cluster Computing ; Software Engineering

Publish: 2024-10-28 14:57:10 UTC

#3 Fully-Distributed Byzantine Agreement in Sparse Networks [PDF] [Copy] [Kimi] [REL]

Authors: John Augustine ; Fabien Dufoulon ; Gopal Pandurangan

Byzantine agreement is a fundamental problem in fault-tolerant distributed networks that has been studied intensively for the last four decades. Most of these works designed protocols for complete networks. A key goal in Byzantine protocols is to tolerate as many Byzantine nodes as possible. The work of Dwork, Peleg, Pippenger, and Upfal [STOC 1986, SICOMP 1988] was the first to address the Byzantine agreement problem in sparse, bounded degree networks and presented a protocol that achieved almost-everywhere agreement among honest nodes. In such networks, all known Byzantine agreement protocols (e.g., Dwork, Peleg, Pippenger, and Upfal, STOC 1986; Upfal, PODC 1992; King, Saia, Sanwalani, and Vee, FOCS 2006) that tolerated a large number of Byzantine nodes had a major drawback that they were not fully-distributed -- in those protocols, nodes are required to have initial knowledge of the entire network topology. This drawback makes such protocols inapplicable to real-world communication networks such as peer-to-peer (P2P) networks, which are typically sparse and bounded degree and where nodes initially have only local knowledge of themselves and their neighbors. Indeed, a fundamental open question raised by the above works is whether one can design Byzantine protocols that tolerate a large number of Byzantine nodes in sparse networks that work with only local knowledge, i.e., fully-distributed protocols. The work of Augustine, Pandurangan, and Robinson [PODC 2013] presented the first fully-distributed Byzantine agreement protocol that works in sparse networks, but it tolerated only up to $O(\sqrt{n}/ polylog(n))$ Byzantine nodes (where $n$ is the total network size). We answer the earlier open question by presenting fully-distributed Byzantine agreement protocols for sparse, bounded degree networks that tolerate significantly more Byzantine nodes -- up to $O(n/ polylog(n))$ of them.

Subjects: Distributed, Parallel, and Cluster Computing ; Data Structures and Algorithms

Publish: 2024-10-28 09:32:46 UTC

#4 Advancing Towards Green Blockchain: A Practical Energy-Efficient Blockchain Based Application for CV Verification [PDF] [Copy] [Kimi] [REL]

Authors: Gabriel Fernández-Blanco ; Iván Froiz-Míguez ; Paula Fraga-Lamas ; Tiago M. Fernández-Caramés

Blockchain has been widely criticized due to the use of inefficient consensus protocols and energy-intensive mechanisms that derived into a global enormous power consumption. Fortunately, since the first blockchain was conceived in 2008 (the one that supports Bitcoin), hardware and consensus protocols have evolved, decreasing energy consumption significantly. This article describes a green blockchain solution and quantifies energy savings when deploying the system on traditional computers and embedded Single-Board Computers (SBCs). To illustrate such savings, it is proposed a solution for tackling the problem of academic certificate forgery, which has a significant cost to society, since it harms the trustworthiness of certificates and academic institutions. The proposed solution is aimed at recording and verifying academic records (ARs) through a decentralized application (DApp) that is supported by a smart contract deployed in the Ethereum blockchain. The application stores the raw data (i.e., the data that are not managed by the blockchain) on a decentralized storage system based on Inter-Planetary File System (IPFS). To demonstrate the efficiency of the developed solution, it is evaluated in terms of performance (transaction latency and throughput) and efficiency (CPU usage and energy consumption), comparing the results obtained with a traditional Proof-of-Work (PoW) consensus protocol and the new Proof-of-Authority (PoA) protocol. The results shown in this paper indicate that the latter is clearly greener and demands less CPU load. Moreover, this article compares the performance of a traditional computer and two SBCs (a Raspberry Pi 4 and an Orange Pi One), showing that is possible to make use of the latter low-power devices to implement blockchain nodes for proposed DApp, but at the cost of higher response latency that varies greatly depending on the used SBCs [...]

Subjects: Distributed, Parallel, and Cluster Computing ; Cryptography and Security ; Computers and Society ; Systems and Control

Publish: 2024-10-27 21:32:20 UTC

#5 A Comprehensive Survey on Green Blockchain: Developing the Next Generation of Energy Efficient and Sustainable Blockchain Systems [PDF] [Copy] [Kimi] [REL]

Authors: Tiago M. Fernández-Caramés ; Paula Fraga-Lamas

Although Blockchain has been successfully used in many different fields and applications, it has been traditionally regarded as an energy-intensive technology, essentially due to the past use of inefficient consensus algorithms that prioritized security over sustainability. However, in the last years, thanks to the significant progress made on key blockchain components, their energy consumption can be decreased noticeably. To achieve this objective, this article analyzes the main components of blockchains and explores strategies to reduce their energy consumption. In this way, this article delves into each component of a blockchain system, including consensus mechanisms, network architecture, data storage and validation, smart contract execution, mining and block creation, and outlines specific strategies to decrease their energy consumption. For such a purpose, consensus mechanisms are compared, recommendations for reducing network communications energy consumption are provided, techniques for data storage and validation are suggested and diverse optimizations are proposed both for software and hardware components. Moreover, the main challenges and limitations of reducing power consumption in blockchain systems are analyzed. As a consequence, this article provides a guideline for the future researchers and developers who aim to develop the next generation of Green Blockchain solutions.

Subjects: Distributed, Parallel, and Cluster Computing ; Cryptography and Security

Publish: 2024-10-27 20:22:25 UTC

#6 CodeRosetta: Pushing the Boundaries of Unsupervised Code Translation for Parallel Programming [PDF] [Copy] [Kimi¹] [REL]

Authors: Ali TehraniJamsaz ; Arijit Bhattacharjee ; Le Chen ; Nesreen K. Ahmed ; Amir Yazdanbakhsh ; Ali Jannesari

Recent advancements in Large Language Models (LLMs) have renewed interest in automatic programming language translation. Encoder-decoder transformer models, in particular, have shown promise in translating between different programming languages. However, translating between a language and its high-performance computing (HPC) extensions remains underexplored due to challenges such as complex parallel semantics. In this paper, we introduce CodeRosetta, an encoder-decoder transformer model designed specifically for translating between programming languages and their HPC extensions. CodeRosetta is evaluated on C++ to CUDA and Fortran to C++ translation tasks. It uses a customized learning framework with tailored pretraining and training objectives to effectively capture both code semantics and parallel structural nuances, enabling bidirectional translation. Our results show that CodeRosetta outperforms state-of-the-art baselines in C++ to CUDA translation by 2.9 BLEU and 1.72 CodeBLEU points while improving compilation accuracy by 6.05%. Compared to general closed-source LLMs, our method improves C++ to CUDA translation by 22.08 BLEU and 14.39 CodeBLEU, with 2.75% higher compilation accuracy. Finally, CodeRosetta exhibits proficiency in Fortran to parallel C++ translation, marking it, to our knowledge, as the first encoder-decoder model for this complex task, improving CodeBLEU by at least 4.63 points compared to closed-source and open-code LLMs.

Subjects: Distributed, Parallel, and Cluster Computing ; Artificial Intelligence ; Machine Learning ; Performance ; Programming Languages ; Software Engineering

Publish: 2024-10-27 17:34:07 UTC

#7 Solving Sequential Greedy Problems Distributedly with Sub-Logarithmic Energy Cost [PDF] [Copy] [Kimi] [REL]

Authors: Alkida Balliu ; Pierre Fraigniaud ; Dennis Olivetti ; Mikaël Rabie

We study the awake complexity of graph problems that belong to the class O-LOCAL, which includes a large subset of problems solvable by sequential greedy algorithms, such as $(\Delta+1)$-coloring, maximal independent set, maximal matching, etc. It is known from previous work that, in $n$-node graphs of maximum degree $\Delta$, any problem in the class O-LOCAL can be solved by a deterministic distributed algorithm with awake complexity $O(\log\Delta+\log^\star n)$. In this paper, we show that any problem belonging to the class O-LOCAL can be solved by a deterministic distributed algorithm with awake complexity $O(\sqrt{\log n}\cdot\log^\star n)$. This leads to a polynomial improvement over the state of the art when $\Delta\gg 2^{\sqrt{\log n}}$, e.g., $\Delta=n^\epsilon$ for some arbitrarily small $\epsilon>0$. The key ingredient for achieving our results is the computation of a network decomposition, that uses a small-enough number of colors, in sub-logarithmic time in the Sleeping model, which can be of independent interest.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2024-10-27 16:26:19 UTC

#8 When Less is More: Achieving Faster Convergence in Distributed Edge Machine Learning [PDF] [Copy] [Kimi] [REL]

Authors: Advik Raj Basani ; Siddharth Chaitra Vivek ; Advaith Krishna ; Arnab K. Paul

Distributed Machine Learning (DML) on resource-constrained edge devices holds immense potential for real-world applications. However, achieving fast convergence in DML in these heterogeneous environments remains a significant challenge. Traditional frameworks like Bulk Synchronous Parallel and Asynchronous Stochastic Parallel rely on frequent, small updates that incur substantial communication overhead and hinder convergence speed. Furthermore, these frameworks often employ static dataset sizes, neglecting the heterogeneity of edge devices and potentially leading to straggler nodes that slow down the entire training process. The straggler nodes, i.e., edge devices that take significantly longer to process their assigned data chunk, hinder the overall training speed. To address these limitations, this paper proposes Hermes, a novel probabilistic framework for efficient DML on edge devices. This framework leverages a dynamic threshold based on recent test loss behavior to identify statistically significant improvements in the model's generalization capability, hence transmitting updates only when major improvements are detected, thereby significantly reducing communication overhead. Additionally, Hermes employs dynamic dataset allocation to optimize resource utilization and prevents performance degradation caused by straggler nodes. Our evaluations on a real-world heterogeneous resource-constrained environment demonstrate that Hermes achieves faster convergence compared to state-of-the-art methods, resulting in a remarkable $13.22$x reduction in training time and a $62.1\%$ decrease in communication overhead.

Subjects: Distributed, Parallel, and Cluster Computing ; Machine Learning ; Performance

Publish: 2024-10-27 16:17:03 UTC

#9 Distributed Complexity of $P_k$-freeness: Decision and Certification [PDF] [Copy] [Kimi] [REL]

Author: Masayuki Miyamoto

The class of graphs that do not contain a path on $k$ nodes as an induced subgraph ($P_k$-free graphs) has rich applications in the theory of graph algorithms. This paper explores the problem of deciding $P_k$-freeness from the viewpoint of distributed computing. For specific small values of $k$, we present the \textit{first} $\mathsf{CONGEST}$ algorithms specified for $P_k$-freeness, utilizing structural properties of $P_k$-free graphs in a novel way. Specifically, we show that $P_k$-freeness can be decided in $\tilde{O}(1)$ rounds for $k=4$ in the $\mathsf{broadcast\;CONGEST}$ model, and in $\tilde{O}(n)$ rounds for $k=5$ in the $\mathsf{CONGEST}$ model, where $n$ is the number of nodes in the network and $\tilde{O}(\cdot)$ hides a $\mathrm{polylog}(n)$ factor. These results significantly improve the previous $O(n^{2-2/(3k+2)})$ upper bounds by Eden et al. (Dist.~Comp.~2022). We also construct a local certification of $P_5$-freeness with certificates of size $\tilde{O}(n)$. This is nearly optimal, given our $\Omega(n^{1-o(1)})$ lower bound on certificate size, and marks a significant advancement as no nontrivial bounds for proof-labeling schemes of $P_5$-freeness were previously known. For general $k$, we establish the first $\mathsf{CONGEST}$ lower bound, which is of the form $n^{2-1/\Theta(k)}$. The $n^{1/\Theta(k)}$ factor is unavoidable, in view of the $O(n^{2-2/(3k+2)})$ upper bound mentioned above. Additionally, our approach yields the \textit{first} superlinear lower bound on certificate size for local certification. This partially answers the conjecture on the optimal certificate size of $P_k$-freeness, asked by Bousquet et al. (arXiv:2402.12148). Finally, we propose a novel variant of the problem called ordered $P_k$ detection, and show a linear lower bound and its nontrivial connection to distributed subgraph detection.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2024-10-27 06:53:16 UTC

#10 EACO-RAG: Edge-Assisted and Collaborative RAG with Adaptive Knowledge Update [PDF] [Copy] [Kimi] [REL]

Authors: Jiaxing Li ; Chi Xu ; Lianchen Jia ; Feng Wang ; Cong Zhang ; Jiangchuan Liu

Large Language Models are revolutionizing Web, mobile, and Web of Things systems, driving intelligent and scalable solutions. However, as Retrieval-Augmented Generation (RAG) systems expand, they encounter significant challenges related to scalability, including increased delay and communication overhead. To address these issues, we propose EACO-RAG, an edge-assisted distributed RAG system that leverages adaptive knowledge updates and inter-node collaboration. By distributing vector datasets across edge nodes and optimizing retrieval processes, EACO-RAG significantly reduces delay and resource consumption while enhancing response accuracy. The system employs a multi-armed bandit framework with safe online Bayesian methods to balance performance and cost. Extensive experimental evaluation demonstrates that EACO-RAG outperforms traditional centralized RAG systems in both response time and resource efficiency. EACO-RAG effectively reduces delay and resource expenditure to levels comparable to, or even lower than, those of local RAG systems, while significantly improving accuracy. This study presents the first systematic exploration of edge-assisted distributed RAG architectures, providing a scalable and cost-effective solution for large-scale distributed environments.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2024-10-27 00:42:21 UTC

#11 Configuration management in the distributed cloud [PDF] [Copy] [Kimi] [REL]

Authors: Tamara Ranković ; Ivana Kovačević ; Veljko Maksimović ; Goran Sladić ; Miloš Simić

Owing to their cost-effectiveness and flexibility, cloud services have been the default choice for the deployment of innumerable software systems over the years. However, novel paradigms are beginning to emerge, as the cloud can't meet the requirements of increasingly many latency- and privacy-sensitive applications. The distributed cloud model, being one of the attempts to overcome these challenges, places a distributed cloud layer between device and cloud layers, intending to bring resources closer to data sources. As application code should be kept separate from its configuration, especially in highly dynamic cloud environments, there is a need to incorporate configuration primitives in future distributed cloud platforms. In this paper, we present the design and implementation of a configuration management subsystem for an open-source distributed cloud platform. Our solution spreads across the cloud and distributed cloud layers and supports configuration versioning, selective dissemination to nodes in the distributed cloud layer, and logical isolation via namespaces. Our work serves as a demonstration of the feasibility and usability of the new cloud-extending models and provides valuable insight into one of the possible implementations.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2024-10-26 21:07:17 UTC

#12 Misconfiguration prevention and error cause detection for distributed-cloud applications [PDF] [Copy] [Kimi] [REL]

Authors: Tamara Ranković ; Filip Šiljić ; Jovan Tomić ; Goran Sladić ; Miloš Simić

Major software failures are reported to be due to misconfiguration. As manual configuration is too error-prone to be deemed a reliable strategy for dynamic and complex systems, automated configuration management has become a standard. Countermeasures against misconfiguration can be focused on prevention or, if failure already occurred, detection. Configuration is often used as a broad term for any set of parameters or system states that dictate how an application will behave, but in this paper, we only focus on parameters consumed on process startup, usually from configuration files. Our objective is to enhance configuration management processes in environments based on the distributed cloud model, a novel cloud model that allows dynamic allocation of strategically located resources. The two mechanisms we propose are configuration validation using schemas and configuration version control with support for detecting differences between configuration versions. Our solution reduces the risk of incorrect configuration as schemas prevent any non-compliant configuration from reaching applications. However, if failure still occurs because the schema was incomplete or a valid configuration revealed existing software bugs, the version control system can precisely locate configuration changes that triggered the failure.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2024-10-26 20:57:20 UTC

#13 Towards Fully Automatic Distributed Lower Bounds [PDF] [Copy] [Kimi] [REL]

Authors: Alkida Balliu ; Sebastian Brandt ; Fabian Kuhn ; Dennis Olivetti ; Joonatan Saarhelo

In the past few years, a successful line of research has lead to lower bounds for several fundamental local graph problems in the distributed setting. These results were obtained via a technique called round elimination. On a high level, the round elimination technique can be seen as a recursive application of a function that takes as input a problem $\Pi$ and outputs a problem $\Pi'$ that is one round easier than $\Pi$. Applying this function recursively to concrete problems of interest can be highly nontrivial, which is one of the reasons that has made the technique difficult to approach. The contribution of our paper is threefold. Firstly, we develop a new and fully automatic method for finding lower bounds of $\Omega(\log_\Delta n)$ and $\Omega(\log_\Delta \log n)$ rounds for deterministic and randomized algorithms, respectively, via round elimination. Secondly, we show that this automatic method is indeed useful, by obtaining lower bounds for defective coloring problems. We show that the problem of coloring the nodes of a graph with $3$ colors and defect at most $(\Delta - 3)/2$ requires $\Omega(\log_\Delta n)$ rounds for deterministic algorithms and $\Omega(\log_\Delta \log n)$ rounds for randomized ones. We note that lower bounds for coloring problems are notoriously challenging to obtain, both in general, and via the round elimination technique. Both the first and (indirectly) the second contribution build on our third contribution -- a new and conceptually simple way to compute the one-round easier problem $\Pi'$ in the round elimination framework. This new procedure provides a clear and easy recipe for applying round elimination, thereby making a substantial step towards the greater goal of having a fully automatic procedure for obtaining lower bounds in the distributed setting.

Subject: Distributed, Parallel, and Cluster Computing

Publish: 2024-10-26 16:38:01 UTC

#14 SANSee: A Physical-layer Semantic-aware Networking Framework for Distributed Wireless Sensing [PDF] [Copy] [Kimi] [REL]

Authors: Huixiang Zhu ; Yong Xiao ; Yingyu Li ; Guangming Shi ; Marwan Krunz

Contactless device-free wireless sensing has recently attracted significant interest due to its potential to support a wide range of immersive human-machine interactive applications using ubiquitously available radio frequency (RF) signals. Traditional approaches focus on developing a single global model based on a combined dataset collected from different locations. However, wireless signals are known to be location and environment specific. Thus, a global model results in inconsistent and unreliable sensing results. It is also unrealistic to construct individual models for all the possible locations and environmental scenarios. Motivated by the observation that signals recorded at different locations are closely related to a set of physical-layer semantic features, in this paper we propose SANSee, a semantic-aware networking-based framework for distributed wireless sensing. SANSee allows models constructed in one or a limited number of locations to be transferred to new locations without requiring any locally labeled data or model training. SANSee is built on the concept of physical-layer semantic-aware network (pSAN), which characterizes the semantic similarity and the correlations of sensed data across different locations. A pSAN-based zero-shot transfer learning solution is introduced to allow receivers in new locations to obtain location-specific models by directly aggregating the models trained by other receivers. We theoretically prove that models obtained by SANSee can approach the locally optimal models. Experimental results based on real-world datasets are used to verify that the accuracy of the transferred models obtained by SANSee matches that of the models trained by the locally labeled data based on supervised learning approaches.

Subjects: Distributed, Parallel, and Cluster Computing ; Networking and Internet Architecture

Publish: 2024-10-16 02:57:01 UTC

#15 A Scored Non-Deterministic Finite Automata Processor for Sequence Alignment [PDF] [Copy] [Kimi] [REL]

Author: Ryan Karbowniczak Rasha Karakchi

The rapid growth of symbolic data in areas like internet, biological, and financial data has increased the demand for efficient pattern matching and regular expression processing. Non-deterministic Finite Automata (NFA) are used for these tasks, but general-purpose platforms often face memory bottlenecks due to the concurrent nature of NFAs. To address this, Domain-Specific Architectures (DSAs) like FPGA and ASIC-based automata processors have been developed for improved efficiency. However, many modern applications require identifying the optimal match path, such as in DNA sequence alignment, which demands scoring methods to evaluate the best match. This work enhances the FPGA-based NAPOLY automata processor by integrating scoring capabilities, creating an extended version called NAPOLY+ that assigns weights to transitions, enabling the identification of the highest scoring path. Implementing this approach introduces challenges, including increased state space complexity and resource demands due to multiple active paths. The NAPOLY+ system addresses these by incorporating arithmetic components to calculate scores along paths and using efficient memory management to maintain scalability. Experimental evaluation on the Zynq Ultrascale+ ZCU104 FPGA demonstrated high device utilization and performance variations based on array size and fan-out. While results are preliminary, ongoing testing will include real datasets to assess the end-to-end performance of NAPOLY+ in practical applications such as BLAST.

Subjects: Distributed, Parallel, and Cluster Computing ; Emerging Technologies

Publish: 2024-10-11 14:42:05 UTC

#16 Federated Time Series Generation on Feature and Temporally Misaligned Data [PDF] [Copy] [Kimi¹] [REL]

Authors: Chenrui Fan ; Zhi Wen Soi ; Aditya Shankar ; Abele Mălan ; Lydia Y. Chen

Distributed time series data presents a challenge for federated learning, as clients often possess different feature sets and have misaligned time steps. Existing federated time series models are limited by the assumption of perfect temporal or feature alignment across clients. In this paper, we propose FedTDD, a novel federated time series diffusion model that jointly learns a synthesizer across clients. At the core of FedTDD is a novel data distillation and aggregation framework that reconciles the differences between clients by imputing the misaligned timesteps and features. In contrast to traditional federated learning, FedTDD learns the correlation across clients' time series through the exchange of local synthetic outputs instead of model parameters. A coordinator iteratively improves a global distiller network by leveraging shared knowledge from clients through the exchange of synthetic data. As the distiller becomes more refined over time, it subsequently enhances the quality of the clients' local feature estimates, allowing each client to then improve its local imputations for missing data using the latest, more accurate distiller. Experimental results on five datasets demonstrate FedTDD's effectiveness compared to centralized training, and the effectiveness of sharing synthetic outputs to transfer knowledge of local time series. Notably, FedTDD achieves 79.4% and 62.8% improvement over local training in Context-FID and Correlational scores.

Subjects: Machine Learning ; Distributed, Parallel, and Cluster Computing

Publish: 2024-10-28 14:35:07 UTC

#17 Smart Space Environments: Key Challenges and Innovative Solutions [PDF] [Copy] [Kimi] [REL]

Author: Ramakant Kumar

The integration of LoRaWAN (Long Range Wide Area Network) technology with both active and passive sensors presents a transformative opportunity for the development of smart home systems. This paper explores how active sensors, such as motion detectors and ultrasonic sensors, and passive sensors, including temperature and humidity sensors, work together to enhance connectivity and efficiency within diverse environments while addressing the challenges of modern living. By leveraging LoRaWAN long-range capabilities and low power consumption, the proposed framework enables effective data transmission from remote sensors, facilitating applications such as smart agriculture, environmental monitoring, and comprehensive home automation. Active sensors emit energy to detect changes in their surroundings, providing real-time data crucial for security and automation, while passive sensors capture ambient energy to monitor environmental conditions, ensuring resource efficiency and user comfort. The synergy between LoRaWAN and these various sensor types promotes innovation, contributing to a more responsive and sustainable living experience. Furthermore, this research highlights the adaptability of the proposed system, allowing for seamless integration of new devices and advanced functionalities. As the landscape of smart home technology continues to evolve, ongoing research in this area will yield advanced solutions tailored to user needs, ultimately paving the way for smarter, safer, and more efficient living environments.

Subjects: Emerging Technologies ; Distributed, Parallel, and Cluster Computing

Publish: 2024-10-27 15:40:20 UTC

#18 FuseFL: One-Shot Federated Learning through the Lens of Causality with Progressive Model Fusion [PDF] [Copy] [Kimi¹] [REL]

Authors: Zhenheng Tang ; Yonggang Zhang ; Peijie Dong ; Yiu-ming Cheung ; Amelie Chi Zhou ; Bo Han ; Xiaowen Chu

One-shot Federated Learning (OFL) significantly reduces communication costs in FL by aggregating trained models only once. However, the performance of advanced OFL methods is far behind the normal FL. In this work, we provide a causal view to find that this performance drop of OFL methods comes from the isolation problem, which means that local isolatedly trained models in OFL may easily fit to spurious correlations due to the data heterogeneity. From the causal perspective, we observe that the spurious fitting can be alleviated by augmenting intermediate features from other clients. Built upon our observation, we propose a novel learning approach to endow OFL with superb performance and low communication and storage costs, termed as FuseFL. Specifically, FuseFL decomposes neural networks into several blocks, and progressively trains and fuses each block following a bottom-up manner for feature augmentation, introducing no additional communication costs. Comprehensive experiments demonstrate that FuseFL outperforms existing OFL and ensemble FL by a significant margin. We conduct comprehensive experiments to show that FuseFL supports high scalability of clients, heterogeneous model training, and low memory costs. Our work is the first attempt using causality to analyze and alleviate data heterogeneity of OFL.

Subjects: Machine Learning ; Artificial Intelligence ; Distributed, Parallel, and Cluster Computing ; Networking and Internet Architecture

Publish: 2024-10-27 09:07:10 UTC

#19 Efficient Circuit Wire Cutting Based on Commuting Groups [PDF] [Copy] [Kimi] [REL]

Authors: Xinpeng Li ; Vinooth Kulkarni ; Daniel T. Chen ; Qiang Guan ; Weiwen Jiang ; Ning Xie ; Shuai Xu ; Vipin Chaudhary

Current quantum devices face challenges when dealing with large circuits due to error rates as circuit size and the number of qubits increase. The circuit wire-cutting technique addresses this issue by breaking down a large circuit into smaller, more manageable subcircuits. However, the exponential increase in the number of subcircuits and the complexity of reconstruction as more cuts are made poses a great practical challenge. Inspired by ancilla-assisted quantum process tomography and the MUBs-based grouping technique for simultaneous measurement, we propose a new approach that can reduce subcircuit running overhead. The approach first uses ancillary qubits to transform all quantum input initializations into quantum output measurements. These output measurements are then organized into commuting groups for the purpose of simultaneous measurement, based on MUBs-based grouping. This approach significantly reduces the number of necessary subcircuits as well as the total number of shots. Lastly, we provide numerical experiments to demonstrate the complexity reduction.

Subjects: Quantum Physics ; Distributed, Parallel, and Cluster Computing

Publish: 2024-10-27 02:40:00 UTC

#20 Lightweight, Secure and Stateful Serverless Computing with PSL [PDF] [Copy] [Kimi] [REL]

Authors: Alexander Thomas ; Shubham Mishra ; Kaiyuan Chen ; John Kubiatowicz

We present PSL, a lightweight, secure and stateful Function-as-a-Serivce (FaaS) framework for Trusted Execution Environments (TEEs). The framework provides rich programming language support on heterogeneous TEE hardware for statically compiled binaries and/or WebAssembly (WASM) bytecodes, with a familiar Key-Value Store (KVS) interface to secure, performant, network-embedded storage. It achieves near-native execution speeds by utilizing the dynamic memory mapping capabilities of Intel SGX2 to create an in-enclave WASM runtime with Just-In-Time (JIT) compilation. PSL is designed to efficiently operate within an asynchronous environment with a distributed tamper-proof confidential storage system, assuming minority failures. The system exchanges eventually-consistent state updates across nodes while utilizing release-consistent locking mechanisms to enhance transactional capabilities. The execution of PSL is up to 3.7x faster than the state-of-the-art SGX WASM runtime. PSL reaches 95k ops/s with YCSB 100% read workload and 89k ops/s with 50% read/write workload. We demonstrate the scalability and adaptivity of PSL through a case study of secure and distributed training of deep neural networks.

Subjects: Cryptography and Security ; Distributed, Parallel, and Cluster Computing

Publish: 2024-10-25 23:17:56 UTC

#21 OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery [PDF] [Copy] [Kimi] [REL]

Authors: Philipe Dias ; Aristeidis Tsaris ; Jordan Bowman ; Abhishek Potnis ; Jacob Arndt ; H. Lexie Yang ; Dalton Lunga

While the pretraining of Foundation Models (FMs) for remote sensing (RS) imagery is on the rise, models remain restricted to a few hundred million parameters. Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities, but requires data scaling and computing resources typically not available outside industry R&D labs. In this work, we pair high-performance computing resources including Frontier supercomputer, America's first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs. Our study assesses performance of different pretrained variants of vision Transformers across image classification, semantic segmentation and object detection benchmarks, which highlight the importance of data scaling for effective model scaling. Moreover, we discuss construction of a novel TIU pretraining dataset, model initialization, with data and pretrained models intended for public release. By discussing technical challenges and details often lacking in the related literature, this work is intended to offer best practices to the geospatial community toward efficient training and benchmarking of larger FMs.

Subjects: Computer Vision and Pattern Recognition ; Artificial Intelligence ; Distributed, Parallel, and Cluster Computing

Publish: 2024-10-25 20:55:12 UTC

#22 Scheduling Languages: A Past, Present, and Future Taxonomy [PDF] [Copy] [Kimi] [REL]

Authors: Mary Hall ; Cosmin Oancea ; Anne C. Elster ; Ari Rasch ; Sameeran Joshi ; Amir Mohammad Tavakkoli ; Richard Schulze

Scheduling languages express to a compiler a sequence of optimizations to apply. Compilers that support a scheduling language interface allow exploration of compiler optimizations, i.e., exploratory compilers. While scheduling languages have become a common feature of tools for expert users, the proliferation of these languages without unifying common features may be confusing to users. Moreover, we recognize a need to organize the compiler developer community around common exploratory compiler infrastructure, and future advances to address, for example, data layout and data movement. To support a broader set of users may require raising the level of abstraction. This paper provides a taxonomy of scheduling languages, first discussing their origins in iterative compilation and autotuning, noting the common features and how they are used in existing frameworks, and then calling for changes to increase their utility and portability.

Subjects: Programming Languages ; Distributed, Parallel, and Cluster Computing ; Performance

Publish: 2024-10-25 18:52:57 UTC

#23 SALINA: Towards Sustainable Live Sonar Analytics in Wild Ecosystems [PDF] [Copy] [Kimi] [REL]

Authors: Chi Xu ; Rongsheng Qian ; Hao Fang ; Xiaoqiang Ma ; William I. Atlas ; Jiangchuan Liu ; Mark A. Spoljaric

Sonar radar captures visual representations of underwater objects and structures using sound wave reflections, making it essential for exploration, mapping, and continuous surveillance in wild ecosystems. Real-time analysis of sonar data is crucial for time-sensitive applications, including environmental anomaly detection and in-season fishery management, where rapid decision-making is needed. However, the lack of both relevant datasets and pre-trained DNN models, coupled with resource limitations in wild environments, hinders the effective deployment and continuous operation of live sonar analytics. We present SALINA, a sustainable live sonar analytics system designed to address these challenges. SALINA enables real-time processing of acoustic sonar data with spatial and temporal adaptations, and features energy-efficient operation through a robust energy management module. Deployed for six months at two inland rivers in British Columbia, Canada, SALINA provided continuous 24/7 underwater monitoring, supporting fishery stewardship and wildlife restoration efforts. Through extensive real-world testing, SALINA demonstrated an up to 9.5% improvement in average precision and a 10.1% increase in tracking metrics. The energy management module successfully handled extreme weather, preventing outages and reducing contingency costs. These results offer valuable insights for long-term deployment of acoustic data systems in the wild.

Subjects: Signal Processing ; Artificial Intelligence ; Distributed, Parallel, and Cluster Computing

Publish: 2024-10-10 00:32:28 UTC