Cryptography and Security

Date: Thu, 9 May 2024 | Total: 26

#1 Anomaly Detection in Certificate Transparency Logs [PDF] [Copy] [Kimi]

Authors: Richard Ostertág ; Martin Stanek

We propose an anomaly detection technique for X.509 certificates utilizing Isolation Forest. This method can be beneficial when compliance testing with X.509 linters proves unsatisfactory, and we seek to identify anomalies beyond standards compliance. The technique is validated on a sample of certificates from Certificate Transparency logs.

#2 SINBAD: Saliency-informed detection of breakage caused by ad blocking [PDF] [Copy] [Kimi]

Authors: Saiid El Hajj Chehade ; Sandra Siby ; Carmela Troncoso

Privacy-enhancing blocking tools based on filter-list rules tend to break legitimate functionality. Filter-list maintainers could benefit from automated breakage detection tools that allow them to proactively fix problematic rules before deploying them to millions of users. We introduce SINBAD, an automated breakage detector that improves the accuracy over the state of the art by 20%, and is the first to detect dynamic breakage and breakage caused by style-oriented filter rules. The success of SINBAD is rooted in three innovations: (1) the use of user-reported breakage issues in forums that enable the creation of a high-quality dataset for training in which only breakage that users perceive as an issue is included; (2) the use of 'web saliency' to automatically identify user-relevant regions of a website on which to prioritize automated interactions aimed at triggering breakage; and (3) the analysis of webpages via subtrees which enables fine-grained identification of problematic filter rules.

#3 Systematic Use of Random Self-Reducibility against Physical Attacks [PDF] [Copy] [Kimi]

Authors: Ferhat Erata ; TingHung Chiu ; Anthony Etim ; Srilalith Nampally ; Tejas Raju ; Rajashree Ramu ; Ruzica Piskac ; Timos Antonopoulos ; Wenjie Xiong ; Jakub Szefer

This work presents a novel, black-box software-based countermeasure against physical attacks including power side-channel and fault-injection attacks. The approach uses the concept of random self-reducibility and self-correctness to add randomness and redundancy in the execution for protection. Our approach is at the operation level, is not algorithm-specific, and thus, can be applied for protecting a wide range of algorithms. The countermeasure is empirically evaluated against attacks over operations like modular exponentiation, modular multiplication, polynomial multiplication, and number theoretic transforms. An end-to-end implementation of this countermeasure is demonstrated for RSA-CRT signature algorithm and Kyber Key Generation public key cryptosystems. The countermeasure reduced the power side-channel leakage by two orders of magnitude, to an acceptably secure level in TVLA analysis. For fault injection, the countermeasure reduces the number of faults to 95.4% in average.

#4 Air Gap: Protecting Privacy-Conscious Conversational Agents [PDF] [Copy] [Kimi]

Authors: Eugene Bagdasaryan ; Ren Yi ; Sahra Ghalebikesabi ; Peter Kairouz ; Marco Gruteser ; Sewoong Oh ; Borja Balle ; Daniel Ramage

The growing use of large language model (LLM)-based conversational agents to manage sensitive user data raises significant privacy concerns. While these agents excel at understanding and acting on context, this capability can be exploited by malicious actors. We introduce a novel threat model where adversarial third-party apps manipulate the context of interaction to trick LLM-based agents into revealing private information not relevant to the task at hand. Grounded in the framework of contextual integrity, we introduce AirGapAgent, a privacy-conscious agent designed to prevent unintended data leakage by restricting the agent's access to only the data necessary for a specific task. Extensive experiments using Gemini, GPT, and Mistral models as agents validate our approach's effectiveness in mitigating this form of context hijacking while maintaining core agent functionality. For example, we show that a single-query context hijacking attack on a Gemini Ultra agent reduces its ability to protect user data from 94% to 45%, while an AirGapAgent achieves 97% protection, rendering the same attack ineffective.

#5 (In)Security of Mobile Apps in Developing Countries: A Systematic Literature Review [PDF] [Copy] [Kimi]

Authors: Alioune Diallo ; Jordan Samhi ; Tegawendé Bissyandé ; Jacques Klein

In developing countries, several key sectors, including education, finance, agriculture, and healthcare, mainly deliver their services via mobile app technology on handheld devices. As a result, mobile app security has emerged as a paramount issue in developing countries. In this paper, we investigate the state of research on mobile app security, focusing on developing countries. More specifically, we performed a systematic literature review exploring the research directions taken by existing works, the different security concerns addressed, and the techniques used by researchers to highlight or address app security issues. Our main findings are: (1) the literature includes only a few studies on mobile app security in the context of developing countries ; (2) among the different security concerns that researchers study, vulnerability detection appears to be the leading research topic; (3) FinTech apps are revealed as the main target in the relevant literature. Overall, our work highlights that there is largely room for developing further specialized techniques addressing mobile app security in the context of developing countries.

#6 Gröbner Basis Cryptanalysis of Ciminion and Hydra [PDF] [Copy] [Kimi]

Author: Matthias Johann Steiner

Ciminion and Hydra are two recently introduced symmetric key Pseudo-Random Functions for Multi-Party Computation applications. For efficiency both primitives utilize quadratic permutations at round level. Therefore, polynomial system solving-based attacks pose a serious threat to these primitives. For Ciminion we construct a quadratic degree reverse lexicographic (DRL) Gr\"obner basis for the iterated polynomial model via affine transformations. For Hydra we provide a computer-aided proof in SageMath that a quadratic DRL Gr\"obner basis is already contained within the iterated polynomial system for the Hydra heads after affine transformations and a linear change of coordinates. Our Ciminion DRL Gr\"obner basis simplifies cryptanalysis, since one does not need to impose genericity assumptions, like being regular or semi-regular, anymore to derive complexity estimates on key recovery attacks. In the Hydra proposal it was claimed that $r_\mathcal{H} = 31$ rounds for the heads are sufficient to achieve $128$ bits of security against Gr\"obner basis attacks for key recovery. However, for $r_\mathcal{H} = 31$ standard term order conversion to a lexicographic (LEX) Gr\"obner basis for our Hydra DRL Gr\"obner basis requires just $126$ bits. Moreover, via the Eigenvalue Method up to $r_\mathcal{H} = 33$ rounds can be attacked below $128$ bits.

#7 HackCar: a test platform for attacks and defenses on a cost-contained automotive architecture [PDF] [Copy] [Kimi]

Authors: Dario Stabili ; Filip Valgimigli ; Edoardo Torrini ; Mirco Marchetti

In this paper, we introduce the design of HackCar, a testing platform for replicating attacks and defenses on a generic automotive system without requiring access to a complete vehicle. This platform empowers security researchers to illustrate the consequences of attacks targeting an automotive system on a realistic platform, facilitating the development and testing of security countermeasures against both existing and novel attacks. The HackCar platform is built upon an F1-10th model, to which various automotive-grade microcontrollers are connected through automotive communication protocols. This solution is crafted to be entirely modular, allowing for the creation of diverse test scenarios. Researchers and practitioners can thus develop innovative security solutions while adhering to the constraints of automotive-grade microcontrollers. We showcase our design by comparing it with a real, licensed, and unmodified vehicle. Additionally, we analyze the behavior of the HackCar in both an attack-free scenario and a scenario where an attack on in-vehicle communication is deployed.

#8 Adversarial Threats to Automatic Modulation Open Set Recognition in Wireless Networks [PDF] [Copy] [Kimi]

Authors: Yandie Yang ; Sicheng Zhang ; Kuixian Li ; Qiao Tian ; Yun Lin

Automatic Modulation Open Set Recognition (AMOSR) is a crucial technological approach for cognitive radio communications, wireless spectrum management, and interference monitoring within wireless networks. Numerous studies have shown that AMR is highly susceptible to minimal perturbations carefully designed by malicious attackers, leading to misclassification of signals. However, the adversarial security issue of AMOSR has not yet been explored. This paper adopts the perspective of attackers and proposes an Open Set Adversarial Attack (OSAttack), aiming at investigating the adversarial vulnerabilities of various AMOSR methods. Initially, an adversarial threat model for AMOSR scenarios is established. Subsequently, by analyzing the decision criteria of both discriminative and generative open set recognition, OSFGSM and OSPGD are proposed to reduce the performance of AMOSR. Finally, the influence of OSAttack on AMOSR is evaluated utilizing a range of qualitative and quantitative indicators. The results indicate that despite the increased resistance of AMOSR models to conventional interference signals, they remain vulnerable to attacks by adversarial examples.

#9 A fuzzy reward and punishment scheme for vehicular ad hoc networks [PDF] [Copy] [Kimi]

Authors: Rezvi Shahariar ; Chris Phillips

Trust management is an important security approach for the successful implementation of Vehicular Ad Hoc Networks (VANETs). Trust models evaluate messages to assign reward or punishment. This can be used to influence a driver's future behaviour. In the author's previous work, a sender side based trust management framework is developed which avoids the receiver evaluation of messages. However, this does not guarantee that a trusted driver will not lie. These "untrue attacks" are resolved by the RSUs using collaboration to rule on a dispute, providing a fixed amount of reward and punishment. The lack of sophistication is addressed in this paper with a novel fuzzy RSU controller considering the severity of incident, driver past behaviour, and RSU confidence to determine the reward or punishment for the conflicted drivers. Although any driver can lie in any situation, it is expected that trustworthy drivers are more likely to remain so, and vice versa. This behaviour is captured in a Markov chain model for sender and reporter drivers where their lying characteristics depend on trust score and trust state. Each trust state defines the driver's likelihood of lying using different probability distribution. An extensive simulation is performed to evaluate the performance of the fuzzy assessment and examine the Markov chain driver behaviour model with changing the initial trust score of all or some drivers in Veins simulator. The fuzzy and the fixed RSU assessment schemes are compared, and the result shows that the fuzzy scheme can encourage drivers to improve their behaviour.

#10 A trust management framework for vehicular ad hoc networks [PDF] [Copy] [Kimi]

Authors: Rezvi Shahariar ; Chris Phillips

Vehicular Ad Hoc Networks (VANETs) enable road users and public infrastructure to share information that improves the operation of roads and driver experience. However, these are vulnerable to poorly behaved authorized users. Trust management is used to address attacks from authorized users in accordance with their trust score. By removing the dissemination of trust metrics in the validation process, communication overhead and response time are lowered. In this paper, we propose a new Tamper-Proof Device (TPD) based trust management framework for controlling trust at the sender side vehicle that regulates driver behaviour. Moreover, the dissemination of feedback is only required when there is conflicting information in the VANET. If a conflict arises, the Road-Side Unit (RSU) decides, using the weighted voting system, whether the originator is to be believed, or not. The framework is evaluated against a centralized reputation approach and the results demonstrate that it outperforms the latter.

#11 The Need Of Trustworthy Announcements To Achieve Driving Comfort [PDF] [Copy] [Kimi]

Authors: Rezvi Shahariar ; Chris Phillips

An Intelligent Transport System (ITS) is more demanding nowadays and it can be achieved through deploying Vehicular Ad Hoc Networks (VANETs). Vehicles and Roadside Units (RSUs) exchange traffic events. Malicious drivers generate false events. Thus, they need to be identified to maintain trustworthy communication. When an authorised user acts maliciously, the security scheme typically fails. However, a trust model can isolate false messages. In this paper, the significance of trustworthy announcements for VANETs is analysed. To this end, a series of experiments is conducted in Veins to illustrate how the trustworthiness of announcements affects travel time. A traffic scenario is created where vehicles detour to an alternate route with an announcement from the leading vehicle. Both true and false announcements are considered. Results confirm that false announcements and refraining from announcements increase travel time. However, the travel time is reduced with trustworthy announcements. From this analysis, it can be concluded that trustworthy announcements facilitate driver comfort.

#12 Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities [PDF] [Copy] [Kimi]

Authors: Yagmur Yigit ; Mohamed Amine Ferrag ; Iqbal H. Sarker ; Leandros A. Maglaras ; Christos Chrysoulas ; Naghmeh Moradpoor ; Helge Janicke

Critical National Infrastructure (CNI) encompasses a nation's essential assets that are fundamental to the operation of society and the economy, ensuring the provision of vital utilities such as energy, water, transportation, and communication. Nevertheless, growing cybersecurity threats targeting these infrastructures can potentially interfere with operations and seriously risk national security and public safety. In this paper, we examine the intricate issues raised by cybersecurity risks to vital infrastructure, highlighting these systems' vulnerability to different types of cyberattacks. We analyse the significance of trust, privacy, and resilience for Critical Infrastructure Protection (CIP), examining the diverse standards and regulations to manage these domains. We also scrutinise the co-analysis of safety and security, offering innovative approaches for their integration and emphasising the interdependence between these fields. Furthermore, we introduce a comprehensive method for CIP leveraging Generative AI and Large Language Models (LLMs), giving a tailored lifecycle and discussing specific applications across different critical infrastructure sectors. Lastly, we discuss potential future directions that promise to enhance the security and resilience of critical infrastructures. This paper proposes innovative strategies for CIP from evolving attacks and enhances comprehension of cybersecurity concerns related to critical infrastructure.

#13 Systematic review, analysis, and characterisation of malicious industrial network traffic datasets for aiding Machine Learning algorithm performance testing [PDF] [Copy] [Kimi]

Authors: Martin Dobler ; Michael Hellwig ; Nuno Lopes ; Ken Oakley ; Mike Winterburn

The adoption of the Industrial Internet of Things (IIoT) as a complementary technology to Operational Technology (OT) has enabled a new level of standardised data access and process visibility. This convergence of Information Technology (IT), OT, and IIoT has also created new cybersecurity vulnerabilities and risks that must be managed. Artificial Intelligence (AI) is emerging as a powerful tool to monitor OT/IIoT networks for malicious activity and is a highly active area of research. AI researchers are applying advanced Machine Learning (ML) and Deep Learning (DL) techniques to the detection of anomalous or malicious activity in network traffic. They typically use datasets derived from IoT/IIoT/OT network traffic captures to measure the performance of their proposed approaches. Therefore, there is a widespread need for datasets for algorithm testing. This work systematically reviews publicly available network traffic capture-based datasets, including categorisation of contained attack types, review of metadata, and statistical as well as complexity analysis. Each dataset is analysed to provide researchers with metadata that can be used to select the best dataset for their research question. This results in an added benefit to the community as researchers can select the best dataset for their research more easily and according to their specific Machine Learning goals.

#14 Enhancing Data Integrity and Traceability in Industry Cyber Physical Systems (ICPS) through Blockchain Technology: A Comprehensive Approach [PDF] [Copy] [Kimi]

Authors: Mohammad Ikbal Hossain ; Dr. Tanja Steigner ; Muhammad Imam Hussain ; Afroja Akther

Blockchain technology, heralded as a transformative innovation, has far-reaching implications beyond its initial application in cryptocurrencies. This study explores the potential of blockchain in enhancing data integrity and traceability within Industry Cyber-Physical Systems (ICPS), a crucial aspect in the era of Industry 4.0. ICPS, integrating computational and physical components, is pivotal in managing critical infrastructure like manufacturing, power grids, and transportation networks. However, they face challenges in security, privacy, and reliability. With its inherent immutability, transparency, and distributed consensus, blockchain presents a groundbreaking approach to address these challenges. It ensures robust data reliability and traceability across ICPS, enhancing transaction transparency and facilitating secure data sharing. This research unearths various blockchain applications in ICPS, including supply chain management, quality control, contract management, and data sharing. Each application demonstrates blockchain's capacity to streamline processes, reduce fraud, and enhance system efficiency. In supply chain management, blockchain provides real-time auditing and compliance. For quality control, it establishes tamper-proof records, boosting consumer confidence. In contract management, smart contracts automate execution, enhancing efficiency. Blockchain also fosters secure collaboration in ICPS, which is crucial for system stability and safety. This study emphasizes the need for further research on blockchain's practical implementation in ICPS, focusing on challenges like scalability, system integration, and security vulnerabilities. It also suggests examining blockchain's economic and organizational impacts in ICPS to understand its feasibility and long-term advantages.

#15 Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution [PDF] [Copy] [Kimi]

Authors: Shuo Shao ; Yiming Li ; Hongwei Yao ; Yiling He ; Zhan Qin ; Kui Ren

Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited' from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors ($i.e.$, backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity. In this paper, we argue that both limitations stem from the `zero-bit' nature of existing watermarking schemes, where they exploit the status ($i.e.$, misclassified) of predictions for verification. Motivated by this understanding, we design a new watermarking paradigm, $i.e.$, Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution instead of model predictions. Specifically, EaaW embeds a `multi-bit' watermark into the feature attribution explanation of specific trigger samples without changing the original prediction. We correspondingly design the watermark embedding and extraction algorithms inspired by explainable artificial intelligence. In particular, our approach can be used for different tasks ($e.g.$, image classification and text generation). Extensive experiments verify the effectiveness and harmlessness of our EaaW and its resistance to potential attacks.

#16 Quantum-Edge Cloud Computing: A Future Paradigm for IoT Applications [PDF] [Copy] [Kimi]

Authors: Mohammad Ikbal Hossain ; Shaharier Arafat Sumon ; Habib Md. Hasan ; Fatema Akter ; Md Bahauddin Badhon ; Mohammad Nahid Ul Islam

The Internet of Things (IoT) is expanding rapidly, which has created a need for sophisticated computational frameworks that can handle the data and security requirements inherent in modern IoT applications. However, traditional cloud computing frameworks have struggled with latency, scalability, and security vulnerabilities. Quantum-Edge Cloud Computing (QECC) is a new paradigm that effectively addresses these challenges by combining the computational power of quantum computing, the low-latency benefits of edge computing, and the scalable resources of cloud computing. This study has been conducted based on a published literature review, performance improvements, and metrics data from Bangladesh on smart city infrastructure, healthcare monitoring, and the industrial IoT sector. We have discussed the integration of quantum cryptography to enhance data integrity, the role of edge computing in reducing response times, and how cloud computing's resource abundance can support large IoT networks. We examine case studies, such as the use of quantum sensors in self-driving vehicles, to illustrate the real-world impact of QECC. Furthermore, the paper identifies future research directions, including developing quantum-resistant encryption and optimizing quantum algorithms for edge computing. The convergence of these technologies in QECC promises to overcome the existing limitations of IoT frameworks and set a new standard for the future of IoT applications.

#17 Blockchains for Internet of Things: Fundamentals, Applications, and Challenges [PDF] [Copy] [Kimi]

Authors: Yusen Wu ; Ye Hu ; Mingzhe Chen ; Yelena Yesha ; Mérouane Debbah

Internet of Things (IoT) services necessitate the storage, transmission, and analysis of diverse data for inference, autonomy, and control. Blockchains, with their inherent properties of decentralization and security, offer efficient database solutions for these devices through consensus-based data sharing. However, it's essential to recognize that not every blockchain system is suitable for specific IoT applications, and some might be more beneficial when excluded with privacy concerns. For example, public blockchains are not suitable for storing sensitive data. This paper presents a detailed review of three distinct blockchains tailored for enhancing IoT applications. We initially delve into the foundational aspects of three blockchain systems, highlighting their strengths, limitations, and implementation needs. Additionally, we discuss the security issues in different blockchains. Subsequently, we explore the blockchain's application in three pivotal IoT areas: edge AI, communications, and healthcare. We underscore potential challenges and the future directions for integrating different blockchains in IoT. Ultimately, this paper aims to offer a comprehensive perspective on the synergies between blockchains and the IoT ecosystem, highlighting the opportunities and complexities involved.

#18 Large Language Models for Cyber Security: A Systematic Literature Review [PDF] [Copy] [Kimi]

Authors: HanXiang Xu ; ShenAo Wang ; Ningke Li ; Yanjie Zhao ; Kai Chen ; Kailong Wang ; Yang Liu ; Ting Yu ; HaoYu Wang

The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.

#19 Honeyfile Camouflage: Hiding Fake Files in Plain Sight [PDF] [Copy] [Kimi]

Authors: Roelien C. Timmer ; David Liebowitz ; Surya Nepal ; Salil S. Kanhere

Honeyfiles are a particularly useful type of honeypot: fake files deployed to detect and infer information from malicious behaviour. This paper considers the challenge of naming honeyfiles so they are camouflaged when placed amongst real files in a file system. Based on cosine distances in semantic vector spaces, we develop two metrics for filename camouflage: one based on simple averaging and one on clustering with mixture fitting. We evaluate and compare the metrics, showing that both perform well on a publicly available GitHub software repository dataset.

#20 AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models [PDF] [Copy] [Kimi]

Authors: Yongheng Zhang ; Tingwen Du ; Yunshan Ma ; Xiang Wang ; Yi Xie ; Guozheng Yang ; Yuliang Lu ; Ee-Chien Chang

Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.

#21 Cryptanalysis of the SIMON Cypher Using Neo4j [PDF] [Copy] [Kimi]

Authors: Jonathan Cook ; Sabih ur Rehman ; M. Arif Khan

The exponential growth in the number of Internet of Things (IoT) devices has seen the introduction of several Lightweight Encryption Algorithms (LEA). While LEAs are designed to enhance the integrity, privacy and security of data collected and transmitted by IoT devices, it is hazardous to assume that all LEAs are secure and exhibit similar levels of protection. To improve encryption strength, cryptanalysts and algorithm designers routinely probe LEAs using various cryptanalysis techniques to identify vulnerabilities and limitations of LEAs. Despite recent improvements in the efficiency of cryptanalysis utilising heuristic methods and a Partial Difference Distribution Table (PDDT), the process remains inefficient, with the random nature of the heuristic inhibiting reproducible results. However, the use of a PDDT presents opportunities to identify relationships between differentials utilising knowledge graphs, leading to the identification of efficient paths throughout the PDDT. This paper introduces the novel use of knowledge graphs to identify intricate relationships between differentials in the SIMON LEA, allowing for the identification of optimal paths throughout the differentials, and increasing the effectiveness of the differential security analyses of SIMON.

#22 Carbon Filter: Real-time Alert Triage Using Large Scale Clustering and Fast Search [PDF] [Copy] [Kimi]

Authors: Jonathan Oliver ; Raghav Batta ; Adam Bates ; Muhammad Adil Inam ; Shelly Mehta ; Shugao Xia

"Alert fatigue" is one of the biggest challenges faced by the Security Operations Center (SOC) today, with analysts spending more than half of their time reviewing false alerts. Endpoint detection products raise alerts by pattern matching on event telemetry against behavioral rules that describe potentially malicious behavior, but can suffer from high false positives that distract from actual attacks. While alert triage techniques based on data provenance may show promise, these techniques can take over a minute to inspect a single alert, while EDR customers may face tens of millions of alerts per day; the current reality is that these approaches aren't nearly scalable enough for production environments. We present Carbon Filter, a statistical learning based system that dramatically reduces the number of alerts analysts need to manually review. Our approach is based on the observation that false alert triggers can be efficiently identified and separated from suspicious behaviors by examining the process initiation context (e.g., the command line) that launched the responsible process. Through the use of fast-search algorithms for training and inference, our approach scales to millions of alerts per day. Through batching queries to the model, we observe a theoretical maximum throughput of 20 million alerts per hour. Based on the analysis of tens of million alerts from customer deployments, our solution resulted in a 6-fold improvement in the Signal-to-Noise ratio without compromising on alert triage performance.

#23 Inferring Discussion Topics about Exploitation of Vulnerabilities from Underground Hacking Forums [PDF] [Copy] [Kimi]

Author: Felipe Moreno-Vera

The increasing sophistication of cyber threats necessitates proactive measures to identify vulnerabilities and potential exploits. Underground hacking forums serve as breeding grounds for the exchange of hacking techniques and discussions related to exploitation. In this research, we propose an innovative approach using topic modeling to analyze and uncover key themes in vulnerabilities discussed within these forums. The objective of our study is to develop a machine learning-based model that can automatically detect and classify vulnerability-related discussions in underground hacking forums. By monitoring and analyzing the content of these forums, we aim to identify emerging vulnerabilities, exploit techniques, and potential threat actors. To achieve this, we collect a large-scale dataset consisting of posts and threads from multiple underground forums. We preprocess and clean the data to ensure accuracy and reliability. Leveraging topic modeling techniques, specifically Latent Dirichlet Allocation (LDA), we uncover latent topics and their associated keywords within the dataset. This enables us to identify recurring themes and prevalent discussions related to vulnerabilities, exploits, and potential targets.

#24 Differentially Private Synthetic Data with Private Density Estimation [PDF] [Copy] [Kimi]

Authors: Nikolija Bojkovic ; Po-Ling Loh

The need to analyze sensitive data, such as medical records or financial data, has created a critical research challenge in recent years. In this paper, we adopt the framework of differential privacy, and explore mechanisms for generating an entire dataset which accurately captures characteristics of the original data. We build upon the work of Boedihardjo et al, which laid the foundations for a new optimization-based algorithm for generating private synthetic data. Importantly, we adapt their algorithm by replacing a uniform sampling step with a private distribution estimator; this allows us to obtain better computational guarantees for discrete distributions, and develop a novel algorithm suitable for continuous distributions. We also explore applications of our work to several statistical tasks.

#25 Differentially Private Federated Learning without Noise Addition: When is it Possible? [PDF] [Copy] [Kimi]

Authors: Jiang Zhang ; Yahya H Ezzeldin ; Ahmed Roushdy Elkordy ; Konstantinos Psounis ; Salman Avestimehr

Federated Learning (FL) with Secure Aggregation (SA) has gained significant attention as a privacy preserving framework for training machine learning models while preventing the server from learning information about users' data from their individual encrypted model updates. Recent research has extended privacy guarantees of FL with SA by bounding the information leakage through the aggregate model over multiple training rounds thanks to leveraging the "noise" from other users' updates. However, the privacy metric used in that work (mutual information) measures the on-average privacy leakage, without providing any privacy guarantees for worse-case scenarios. To address this, in this work we study the conditions under which FL with SA can provide worst-case differential privacy guarantees. Specifically, we formally identify the necessary condition that SA can provide DP without addition noise. We then prove that when the randomness inside the aggregated model update is Gaussian with non-singular covariance matrix, SA can provide differential privacy guarantees with the level of privacy $\epsilon$ bounded by the reciprocal of the minimum eigenvalue of the covariance matrix. However, we further demonstrate that in practice, these conditions are almost unlikely to hold and hence additional noise added in model updates is still required in order for SA in FL to achieve DP. Lastly, we discuss the potential solution of leveraging inherent randomness inside aggregated model update to reduce the amount of addition noise required for DP guarantee.