USENIX-Sec.2023 - Fall | Cool Papers - Immersive Paper Discovery

#1 “Employees Who Don’t Accept the Time Security Takes Are Not Aware Enough”: The CISO View of Human-Centred Security [PDF²] [Copy] [Kimi⁶] [REL]

Authors: Jonas Hielscher, Uta Menges, Simon Parkin, Annette Kluge, M. Angela Sasse

In larger organisations, the security controls and policies that protect employees are typically managed by a Chief Information Security Officer (CISO). In research, industry, and policy, there are increasing efforts to relate principles of human behaviour interventions and influence to the practice of the CISO, despite these being complex disciplines in their own right. Here we explore how well the concepts of human-centred security (HCS) have survived exposure to the needs of practice: in an action research approach we engaged with n=30 members of a Swiss-based community of CISOs in five workshop sessions over the course of 8 months, dedicated to discussing HCS. We coded and analysed over 25 hours of notes we took during the discussions. We found that CISOs far and foremost perceive HCS as what is available on the market, namely awareness and phishing simulations. While they regularly shift responsibility either to the management (by demanding more support) or to the employees (by blaming them) we see a lack of power but also silo-thinking that prevents CISOs from considering actual human behaviour and friction that security causes for employees. We conclude that industry best practices and the state-of-the-art in HCS research are not aligned.

#2 “Millions of people are watching you”: Understanding the Digital-Safety Needs and Practices of Creators [PDF] [Copy] [Kimi⁵] [REL]

Authors: Patrawat Samermit, Anna Turner, Patrick Gage Kelley, Tara Matthews, Vanessia Wu, Sunny Consolvo, Kurt Thomas

Online content creators—who create and share their content on platforms such as Instagram, TikTok, Twitch, and YouTube—are uniquely at-risk of increased digital-safety threats due to their public prominence, the diverse social norms of wide-ranging audiences, and their access to audience members as a valuable resource. We interviewed 23 creators to understand their digital-safety experiences. This includes the security, privacy, and abuse threats they have experienced across multiple platforms and how the threats have changed over time. We also examined the protective practices they have employed to stay safer, including tensions in how they adopt the practices. We found that creators have diverse threat models that take into consideration their emotional, physical, relational, and financial safety. Most adopted protections—including distancing from technology, moderating their communities, and seeking external or social support—only after experiencing a serious safety incident. Lessons from their experiences help us better prepare and protect creators and ensure a diversity of voices are present online.

#3 “Security is not my field, I’m a stats guy”: A Qualitative Root Cause Analysis of Barriers to Adversarial Machine Learning Defenses in Industry [PDF¹] [Copy] [Kimi⁴] [REL]

Authors: Jaron Mink, Harjot Kaur, Juliane Schmüser, Sascha Fahl, Yasemin Acar

Adversarial machine learning (AML) has the potential to leak training data, force arbitrary classifications, and greatly degrade overall performance of machine learning models, all of which academics and companies alike consider as serious issues. Despite this, seminal work has found that most organizations insufficiently protect against such threats. While the lack of defenses to AML is most commonly attributed to missing knowledge, it is unknown why mitigations are unrealized in industry projects. To better understand the reasons behind the lack of deployed AML defenses, we conduct semi-structured interviews (n=21) with data scientists and data engineers to explore what barriers impede the effective implementation of such defenses. We find that practitioners’ ability to deploy defenses is hampered by three primary factors: a lack of institutional motivation and educational resources for these concepts, an inability to adequately assess their AML risk and make subsequent decisions, and organizational structures and goals that discourage implementation in favor of other objectives. We conclude by discussing practical recommendations for companies and practitioners to be made more aware of these risks, and better prepared to respond.

#4 A Data-free Backdoor Injection Approach in Neural Networks [PDF⁵] [Copy] [Kimi⁸] [REL]

Authors: Peizhuo Lv, Chang Yue, Ruigang Liang, Yunfei Yang, Shengzhi Zhang, Hualong Ma, Kai Chen

Recently, the backdoor attack on deep neural networks (DNNs) has been extensively studied, which causes the backdoored models to behave well on benign samples, whereas performing maliciously on controlled samples (with triggers attached). Almost all existing backdoor attacks require access to the original training/testing dataset or data relevant to the main task to inject backdoors into the target models, which is unrealistic in many scenarios, e.g., private training data. In this paper, we propose a novel backdoor injection approach in a "data-free" manner. We collect substitute data irrelevant to the main task and reduce its volume by filtering out redundant samples to improve the efficiency of backdoor injection. We design a novel loss function for fine-tuning the original model into the backdoored one using the substitute data, and optimize the fine-tuning to balance the backdoor injection and the performance on the main task. We conduct extensive experiments on various deep learning scenarios, e.g., image classification, text classification, tabular classification, image generation, and multimodal, using different models, e.g., Convolutional Neural Networks (CNNs), Autoencoders, Transformer models, Tabular models, as well as Multimodal DNNs. The evaluation results demonstrate that our data-free backdoor injection approach can efficiently embed backdoors with a nearly 100\% attack success rate, incurring an acceptable performance downgrade on the main task.

#5 A Large Scale Study of the Ethereum Arbitrage Ecosystem [PDF] [Copy] [Kimi²] [REL]

Authors: Robert McLaughlin, Christopher Kruegel, Giovanni Vigna

The Ethereum blockchain rapidly became the epicenter of a complex financial ecosystem, powered by decentralized exchanges (DEXs). These exchanges form a diverse capital market where anyone can swap one type of token for another. Arbitrage trades are a normal and expected phenomenon in free capital markets, and, indeed, several recent works identify these transactions on decentralized exchanges. Unfortunately, existing studies leave significant knowledge gaps in our understanding of the system as a whole, which hinders research into the security, stability, and economic impacts of arbitrage. To address this issue, we perform two large-scale measurements over a 28-month period. First, we design a novel arbitrage identification strategy capable of analyzing over 10x more DEX applications than prior work. This uncovers 3.8 million arbitrages, which yield a total of $321 million in profit. Second, we design a novel arbitrage opportunity detection system, which is the first to support modern complex price models at scale. This system identifies 4 billion opportunities and would generate a weekly profit of 395 Ether (approximately$ 500,000, at the time of writing). We observe two key insights that demonstrate the usefulness of these measurements: (1) an increasing percentage of revenue is paid to the miners, which threatens consensus stability, and (2) arbitrage opportunities occasionally persist for several blocks, which implies that price-oracle manipulation attacks may be less costly than expected.

#6 A Plot is Worth a Thousand Words: Model Information Stealing Attacks via Scientific Plots [PDF¹] [Copy] [Kimi⁴] [REL]

Authors: Boyang Zhang, Xinlei He, Yun Shen, Tianhao Wang, Yang Zhang

Building advanced machine learning (ML) models requires expert knowledge and many trials to discover the best architecture and hyperparameter settings. Previous work demonstrates that model information can be leveraged to assist other attacks, such as membership inference, generating adversarial examples. Therefore, such information, e.g., hyperparameters, should be kept confidential. It is well known that an adversary can leverage a target ML model's output to steal the model's information. In this paper, we discover a new side channel for model information stealing attacks, i.e., models' scientific plots which are extensively used to demonstrate model performance and are easily accessible. Our attack is simple and straightforward. We leverage the shadow model training techniques to generate training data for the attack model which is essentially an image classifier. Extensive evaluation on three benchmark datasets shows that our proposed attack can effectively infer the architecture/hyperparameters of image classifiers based on convolutional neural network (CNN) given the scientific plot generated from it. We also reveal that the attack's success is mainly caused by the shape of the scientific plots, and further demonstrate that the attacks are robust in various scenarios. Given the simplicity and effectiveness of the attack method, our study indicates scientific plots indeed constitute a valid side channel for model information stealing attacks. To mitigate the attacks, we propose several defense mechanisms that can reduce the original attacks' accuracy while maintaining the plot utility. However, such defenses can still be bypassed by adaptive attacks.

#7 Abuse Vectors: A Framework for Conceptualizing IoT-Enabled Interpersonal Abuse [PDF] [Copy] [Kimi⁴] [REL]

Authors: Sophie Stephenson, Majed Almansoori, Pardis Emami-Naeini, Danny Yuxing Huang, Rahul Chatterjee

Tech-enabled interpersonal abuse (IPA) is a pervasive problem. Abusers, often intimate partners, use tools such as spyware to surveil and harass victim-survivors. Unfortunately, anecdotal evidence suggests that smart, Internet-connected devices such as home thermostats, cameras, and Bluetooth item finders may similarly be used against victim-survivors of IPA. To tackle abuse involving smart devices, it is vital that we understand the ecosystem of smart devices that enable IPA. Thus, in this work, we conduct a large-scale qualitative analysis of the smart devices used in IPA. We systematically crawl Google Search results to uncover web pages discussing how abusers use smart devices to enact IPA. By analyzing these web pages, we identify 32 devices used for IPA and detail the varied strategies abusers use for spying and harassment via these devices. Then, we design a simple, yet powerful framework—abuse vectors—which conceptualizes IoT-enabled IPA as four overarching patterns: Covert Spying, Unauthorized Access, Repurposing, and Intended Use. Using this lens, we pinpoint the necessary solutions required to address each vector of IoT abuse and encourage the security community to take action.

#8 ACon^2: Adaptive Conformal Consensus for Provable Blockchain Oracles [PDF] [Copy] [Kimi²] [REL]

Authors: Sangdon Park, Osbert Bastani, Taesoo Kim

Blockchains with smart contracts are distributed ledger systems that achieve block-state consistency among distributed nodes by only allowing deterministic operations of smart contracts. However, the power of smart contracts is enabled by interacting with stochastic off-chain data, which in turn opens the possibility to undermine the block-state consistency. To address this issue, an oracle smart contract is used to provide a single consistent source of external data; but, simultaneously, this introduces a single point of failure, which is called the oracle problem. To address the oracle problem, we propose an adaptive conformal consensus (ACon2) algorithm that derives a consensus set of data from multiple oracle contracts via the recent advance in online uncertainty quantification learning. Interesting, the consensus set provides a desired correctness guarantee under distribution shift and Byzantine adversaries. We demonstrate the efficacy of the proposed algorithm on two price datasets and an Ethereum case study. In particular, the Solidity implementation of the proposed algorithm shows the potential practicality of the proposed algorithm, implying that online machine learning algorithms are applicable to address security issues in blockchains.

#9 Adversarial Training for Raw-Binary Malware Classifiers [PDF] [Copy] [Kimi⁴] [REL]

Authors: Keane Lucas, Samruddhi Pai, Weiran Lin, Lujo Bauer, Michael K. Reiter, Mahmood Sharif

Machine learning (ML) models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical tool in cybersecurity. However, previous work provoked caution by creating variants of malicious binaries, referred to as adversarial examples, that are transformed in a functionality-preserving way to evade detection. In this work, we investigate the effectiveness of using adversarial training methods to create malware classification models that are more robust to some state-of-the-art attacks. To train our most robust models, we significantly increase the efficiency and scale of creating adversarial examples to make adversarial training practical, which has not been done before in raw-binary malware detectors. We then analyze the effects of varying the length of adversarial training, as well as analyze the effects of training with various types of attacks. We find that data augmentation does not deter state-of-the-art attacks, but that using a generic gradient-guided method, used in other discrete domains, does improve robustness. We also show that in most cases, models can be made more robust to malware-domain attacks by adversarially training them with lower-effort versions of the same attack. In the best case, we reduce one state-of-the-art attack's success rate from 90% to 5%. We also find that training with some types of attacks can increase robustness to other types of attacks. Finally, we discuss insights gained from our results, and how they can be used to more effectively train robust malware detectors.

#10 Aegis: Mitigating Targeted Bit-flip Attacks against Deep Neural Networks [PDF¹] [Copy] [Kimi²] [REL]

Authors: Jialai Wang, Ziyuan Zhang, Meiqi Wang, Han Qiu, Tianwei Zhang, Qi Li, Zongpeng Li, Tao Wei, Chao Zhang

Bit-flip attacks (BFAs) have attracted substantial attention recently, in which an adversary could tamper with a small number of model parameter bits to break the integrity of DNNs. To mitigate such threats, a batch of defense methods are proposed, focusing on the untargeted scenarios. Unfortunately, they either require extra trustworthy applications or make models more vulnerable to targeted BFAs. Countermeasures against targeted BFAs, stealthier and more purposeful by nature, are far from well established. In this work, we propose Aegis, a novel defense method to mitigate targeted BFAs. The core observation is that existing targeted attacks focus on flipping critical bits in certain important layers. Thus, we design a dynamic-exit mechanism to attach extra internal classifiers (ICs) to hidden layers. This mechanism enables input samples to early-exit from different layers, which effectively upsets the adversary's attack plans. Moreover, the dynamic-exit mechanism randomly selects ICs for predictions during each inference to significantly increase the attack cost for the adaptive attacks where all defense mechanisms are transparent to the adversary. We further propose a robustness training strategy to adapt ICs to the attack scenarios through simulating BFAs during the IC training phase, to increase model robustness. Extensive evaluations over four well-known datasets and two popular DNN structures reveal that Aegis could effectively mitigate different state-of-the-art targeted attacks, reducing attack success rate by 5-10x, significantly outperforming existing defense methods. We open source the code of Aegis.

#11 AIRS: Explanation for Deep Reinforcement Learning based Security Applications [PDF³] [Copy] [Kimi⁴] [REL]

Authors: Jiahao Yu, Wenbo Guo, Qi Qin, Gang Wang, Ting Wang, Xinyu Xing

Recently, we have witnessed the success of deep reinforcement learning (DRL) in many security applications, ranging from malware mutation to selfish blockchain mining. Like all other machine learning methods, the lack of explainability has been limiting its broad adoption as users have difficulty establishing trust in DRL models' decisions. Over the past years, different methods have been proposed to explain DRL models but unfortunately, they are often not suitable for security applications, in which explanation fidelity, efficiency, and the capability of model debugging are largely lacking. In this work, we propose AIRS, a general framework to explain deep reinforcement learning-based security applications. Unlike previous works that pinpoint important features to the agent's current action, our explanation is at the step level. It models the relationship between the final reward and the key steps that a DRL agent takes, and thus outputs the steps that are most critical towards the final reward the agent has gathered. Using four representative security-critical applications, we evaluate AIRS from the perspectives of explainability, fidelity, stability, and efficiency. We show that AIRS could outperform alternative explainable DRL methods. We also showcase AIRS's utility, demonstrating that our explanation could facilitate the DRL model's failure offset, help users establish trust in a model decision, and even assist the identification of inappropriate reward designs.

#12 An Input-Agnostic Hierarchical Deep Learning Framework for Traffic Fingerprinting [PDF²] [Copy] [Kimi⁴] [REL]

Authors: Jian Qu, Xiaobo Ma, Jianfeng Li, Xiapu Luo, Lei Xue, Junjie Zhang, Zhenhua Li, Li Feng, Xiaohong Guan

Deep learning has proven to be promising for traffic fingerprinting that explores features of packet timing and sizes. Although well-known for automatic feature extraction, it is faced with a gap between the heterogeneousness of the traffic (i.e., raw packet timing and sizes) and the homogeneousness of the required input (i.e., input-specific). To address this gap, we design an input-agnostic hierarchical deep learning framework for traffic fingerprinting that can hierarchically abstract comprehensive heterogeneous traffic features into homogeneous vectors seamlessly digestible by existing neural networks for further classification. The extensive evaluation demonstrates that our framework, with just one paradigm, not only supports heterogeneous traffic input but also achieves better or comparable performance compared to state-of-the-art methods black across a wide range of traffic fingerprinting tasks.

#13 Araña: Discovering and Characterizing Password Guessing Attacks in Practice [PDF] [Copy] [Kimi²] [REL]

Authors: Mazharul Islam, Marina Sanusi Bohuk, Paul Chung, Thomas Ristenpart, Rahul Chatterjee

Remote password guessing attacks remain one of the largest sources of account compromise. Understanding and characterizing attacker strategies is critical to improving security but doing so has been challenging thus far due to the sensitivity of login services and the lack of ground truth labels for benign and malicious login requests. We perform an in-depth measurement study of guessing attacks targeting two large universities. Using a rich dataset of more than 34 million login requests to the two universities as well as thousands of compromise reports, we were able to develop a new analysis pipeline to identify 29 attack clusters—many of which involved compromises not previously known to security engineers. Our analysis provides the richest investigation to date of password guessing attacks as seen from login services. We believe our tooling will be useful in future efforts to develop real-time detection of attack campaigns, and our characterization of attack campaigns can help more broadly guide mitigation design.

#14 ARGUS: Context-Based Detection of Stealthy IoT Infiltration Attacks [PDF²] [Copy] [Kimi³] [REL]

Authors: Phillip Rieger, Marco Chilese, Reham Mohamed, Markus Miettinen, Hossein Fereidooni, Ahmad-Reza Sadeghi

IoT application domains, device diversity and connectivity are rapidly growing. IoT devices control various functions in smart homes and buildings, smart cities, and smart factories, making these devices an attractive target for attackers. On the other hand, the large variability of different application scenarios and inherent heterogeneity of devices make it very challenging to reliably detect abnormal IoT device behaviors and distinguish these from benign behaviors. Existing approaches for detecting attacks are mostly limited to attacks directly compromising individual IoT devices, or, require predefined detection policies. They cannot detect attacks that utilize the control plane of the IoT system to trigger actions in an unintended/malicious context, e.g., opening a smart lock while the smart home residents are absent. In this paper, we tackle this problem and propose ARGUS, the first self-learning intrusion detection system for detecting contextual attacks on IoT environments, in which the attacker maliciously invokes IoT device actions to reach its goals. ARGUS monitors the contextual setting based on the state and actions of IoT devices in the environment. An unsupervised Deep Neural Network (DNN) is used for modeling the typical contextual device behavior and detecting actions taking place in abnormal contextual settings. This unsupervised approach ensures that ARGUS is not restricted to detecting previously known attacks but is also able to detect new attacks. We evaluated ARGUS on heterogeneous real-world smart-home settings and achieve at least an F1-Score of 99.64% for each setup, with a false positive rate (FPR) of at most 0.03%.

#15 ARI: Attestation of Real-time Mission Execution Integrity [PDF] [Copy] [Kimi¹] [REL]

Authors: Jinwen Wang, Yujie Wang, Ao Li, Yang Xiao, Ruide Zhang, Wenjing Lou, Y. Thomas Hou, Ning Zhang

With the proliferation of autonomous safety-critical cyber-physical systems (CPS) in our daily life, their security is becoming ever more important. Remote attestation is a powerful mechanism to enable remote verification of system integrity. While recent developments have made it possible to efficiently attest IoT operations, autonomous systems that are built on top of real-time cyber-physical control loops and execute missions independently present new unique challenges. In this paper, we formulate a new security property, Real-time Mission Execution Integrity (RMEI) to provide proof of correct and timely execution of the missions. While it is an attractive property, measuring it can incur prohibitive overhead for the real-time autonomous system. To tackle this challenge, we propose policy-based attestation of compartments to enable a trade-off between the level of details in measurement and runtime overhead. To further minimize the impact on real-time responsiveness, multiple techniques were developed to improve the performance, including customized software instrumentation and timing recovery through re-execution. We implemented a prototype of ARI and evaluated its performance on five CPS platforms. A user study involving 21 developers with different skill sets was conducted to understand the usability of our solution.

#16 ARMore: Pushing Love Back Into Binaries [PDF] [Copy] [Kimi¹] [REL]

Authors: Luca Di Bartolomeo, Hossein Moghaddas, Mathias Payer

Static rewriting enables late-state code changes (e.g., to add mitigations, to remove unnecessary code, or to instrument for code coverage) at low overhead in security-critical environments. Most research on static rewriting has so far focused on the x86 architecture. However, the prevalence and proliferation of ARM-based devices along with a large amount of personal data (e.g., health and sensor data) that they process calls for efficient introspection and analysis capabilities on the ARM platform. Addressing the unique challenges on aarch64, we introduce ARMore, the first efficient, robust, and heuristic-free static binary rewriter for arbitrary aarch64 binaries that produces reassembleable assembly. The key improvements introduced by ARMore make the recovery of indirect control flow an option rather than a necessity. Instead of crashing, the cost of an uncovered target only causes the small overhead of an additional branch. ARMore can rewrite binaries from different languages and compilers (even arbitrary hand-written assembly), both on PIC and non-PIC code, with or without symbols, including exception handling for C++ and Go binaries, and also including binaries with mixed data and text. ARMore is sound as it does not rely on any assumptions about the input binary. ARMore is also efficient: it does not employ any expensive dynamic translation techniques, incurring negligible overhead (<1% in our evaluated benchmarks). Our AFL++ coverage instrumentation pass enables fuzzing of closed-source aarch64 binaries at three times the speed compared to the state-of-the-art (AFL-QEMU), and we found 58 unique crashes in closed-source software. ARMore is the only static rewriter whose rewritten binaries correctly pass all SQLite3 and coreutils test cases and autopkgtest of 97.5% Debian packages.

#17 Attacks are Forwarded: Breaking the Isolation of MicroVM-based Containers Through Operation Forwarding [PDF¹] [Copy] [Kimi²] [REL]

Authors: Jietao Xiao, Nanzi Yang, Wenbo Shen, Jinku Li, Xin Guo, Zhiqiang Dong, Fei Xie, Jianfeng Ma

People proposed to use virtualization techniques to reinforce the isolation between containers. In the design, each container runs inside a lightweight virtual machine (called microVM). MicroVM-based containers benefit from both the security of microVM and the high efficiency of the container, and thus are widely used on the public cloud. However, in this paper, we demonstrate a new attack surface that can be exploited to break the isolation of the microVM-based container, called operation forwarding attacks. Our key observation is that certain operations of the microVM-based container are forwarded to host system calls and host kernel functions. The attacker can leverage the operation forwarding to exploit the host kernel’s vulnerabilities and exhaust host resources. To fully understand the security risk of operation forwarding attacks, we divide the components of the microVM-based container into three layers according to their functionalities and present corresponding attacking strategies to exploit the operation forwarding of each layer. Moreover, we design eight attacks against Kata Containers and Firecracker-based containers and conduct experiments on the local environment, AWS, and Alibaba Cloud. Our results show that the attacker can trigger potential privilege escalation, downgrade 93.4% IO performance and 75.0% CPU performance of the victim container, and even crash the host. We further give security suggestions for mitigating these attacks.

#18 AURC: Detecting Errors in Program Code and Documentation [PDF²] [Copy] [Kimi³] [REL]

Authors: Peiwei Hu, Ruigang Liang, Ying Cao, Kai Chen, Runze Zhang

Error detection in program code and documentation is a critical problem in computer security. Previous studies have shown promising vulnerability discovery performance by extensive code or document-guided analysis. However, the state-of-the-arts have the following significant limitations: (i) They assume the documents are correct and treat the code that violates documents as bugs, thus cannot find documents’ defects and code’s bugs if APIs have defective documents or no documents. (ii) They utilize majority voting to judge the inconsistent code snippets and treat the deviants as bugs, thus cannot cope with situations where correct usage is minor or all use cases are wrong. In this paper, we present AURC, a static framework for detecting code bugs of incorrect return checks and document defects. We observe that three objects participate in the API invocation, the document, the caller (code that invokes API), and the callee (the source code of API). Mutual corroboration of these three objects eliminates the reliance on the above assumptions. AURC contains a context-sensitive backward analysis to process callees, a pre-trained model-based document classifier, and a container that collects conditions of if statements from callers. After cross-checking the results from callees, callers, and documents, AURC delivers them to the correctness inference module to infer the defective one. We evaluated AURC on ten popular codebases. AURC discovered 529 new bugs that can lead to security issues like heap buffer overflow and sensitive information leakage, and 224 new document defects. Maintainers acknowledge our findings and have accepted 222 code patches and 76 document patches.

#19 Authenticated private information retrieval [PDF] [Copy] [Kimi⁴] [REL]

Authors: Simone Colombo, Kirill Nikitin, Henry Corrigan-Gibbs, David J. Wu, Bryan Ford

This paper introduces protocols for authenticated private information retrieval. These schemes enable a client to fetch a record from a remote database server such that (a) the server does not learn which record the client reads, and (b) the client either obtains the "authentic" record or detects server misbehavior and safely aborts. Both properties are crucial for many applications. Standard private-information-retrieval schemes either do not ensure this form of output authenticity, or they require multiple database replicas with an honest majority. In contrast, we offer multi-server schemes that protect security as long as at least one server is honest. Moreover, if the client can obtain a short digest of the database out of band, then our schemes require only a single server. Performing an authenticated private PGP-public-key lookup on an OpenPGP key server's database of 3.5 million keys (3 GiB), using two non-colluding servers, takes under 1.2 core-seconds of computation, essentially matching the time taken by unauthenticated private information retrieval. Our authenticated single-server schemes are 30-100× more costly than state-of-the-art unauthenticated single-server schemes, though they achieve incomparably stronger integrity properties.

#20 AutoFR: Automated Filter Rule Generation for Adblocking [PDF] [Copy] [Kimi¹] [REL]

Authors: Hieu Le, Salma Elmalaki, Athina Markopoulou, Zubair Shafiq

Adblocking relies on filter lists, which are manually curated and maintained by a community of filter list authors. Filter list curation is a laborious process that does not scale well to a large number of sites or over time. In this paper, we introduce AutoFR, a reinforcement learning framework to fully automate the process of filter rule creation and evaluation for sites of interest. We design an algorithm based on multi-arm bandits to generate filter rules that block ads while controlling the trade-off between blocking ads and avoiding visual breakage. We test AutoFR on thousands of sites and we show that it is efficient: it takes only a few minutes to generate filter rules for a site of interest. AutoFR is effective: it generates filter rules that can block 86% of the ads, as compared to 87% by EasyList, while achieving comparable visual breakage. Furthermore, AutoFR generates filter rules that generalize well to new sites. We envision that AutoFR can assist the adblocking community in filter rule generation at scale.

#21 autofz: Automated Fuzzer Composition at Runtime [PDF¹] [Copy] [Kimi²] [REL]

Authors: Yu-Fu Fu, Jaehyuk Lee, Taesoo Kim

Fuzzing has gained in popularity for software vulnerability detection by virtue of the tremendous effort to develop a diverse set of fuzzers. Thanks to various fuzzing techniques, most of the fuzzers have been able to demonstrate great performance on their selected targets. However, paradoxically, this diversity in fuzzers also made it difficult to select fuzzers that are best suitable for complex real-world programs, which we call selection burden. Communities attempted to address this problem by creating a set of standard benchmarks to compare and contrast the performance of fuzzers for a wide range of applications, but the result was always a suboptimal decision—the best-performing fuzzer on average does not guarantee the best outcome for the target of a user's interest. To overcome this problem, we propose an automated, yet non-intrusive meta-fuzzer, called autofz, to maximize the benefits of existing state-of-the-art fuzzers via dynamic composition. To an end user, this means that, instead of spending time on selecting which fuzzer to adopt (similar in concept to hyperparameter tuning in ML), one can simply put all of the available fuzzers to autofz (similar in concept to AutoML), and achieve the best, optimal result. The key idea is to monitor the runtime progress of the fuzzers, called trends (similar in concept to gradient descent), and make a fine-grained adjustment of resource allocation (e.g., CPU time) of each fuzzer. This is a stark contrast to existing approaches that statically combine a set of fuzzers, or via exhaustive pre-training per target program - autofz deduces a suitable set of fuzzers of the active workload in a fine-grained manner at runtime. Our evaluation shows that, given the same amount of computation resources, autofz outperforms any best-performing individual fuzzers in 11 out of 12 available benchmarks and beats the best, collaborative fuzzing approaches in 19 out of 20 benchmarks without any prior knowledge in terms of coverage. Moreover, on average, autofz found 152% more bugs than individual fuzzers on UNIFUZZ and FTS, and 415% more bugs than collaborative fuzzing on UNIFUZZ.

#22 Automated Cookie Notice Analysis and Enforcement [PDF] [Copy] [Kimi¹] [REL]

Authors: Rishabh Khandelwal, Asmit Nayak, Hamza Harkous, Kassem Fawaz

Online websites use cookie notices to elicit consent from the users, as required by recent privacy regulations like the GDPR and the CCPA. Prior work has shown that these notices are designed in a way to manipulate users into making website-friendly choices which put users' privacy at risk. In this work, we present CookieEnforcer, a new system for automatically discovering cookie notices and extracting a set of instructions that result in disabling all non-essential cookies. In order to achieve this, we first build an automatic cookie notice detector that utilizes the rendering pattern of the HTML elements to identify the cookie notices. Next, we analyze the cookie notices and predict the set of actions required to disable all unnecessary cookies. This is done by modeling the problem as a sequence-to-sequence task, where the input is a machine-readable cookie notice and the output is the set of clicks to make. We demonstrate the efficacy of CookieEnforcer via an end-to-end accuracy evaluation, showing that it can generate the required steps in 91% of the cases. Via a user study, we also show that CookieEnforcer can significantly reduce the user effort. Finally, we characterize the behavior of CookieEnforcer on the top 100k websites from the Tranco list, showcasing its stability and scalability.

#23 Automated Exploitable Heap Layout Generation for Heap Overflows Through Manipulation Distance-Guided Fuzzing [PDF] [Copy] [Kimi¹] [REL]

Authors: Bin Zhang, Jiongyi Chen, Runhao Li, Chao Feng, Ruilin Li, Chaojing Tang

Generating exploitable heap layouts is a fundamental step to produce working exploits for heap overflows. For this purpose, the heap primitives identified from the target program, serving as functional units to manipulate the heap layout, are strategically leveraged to construct exploitable states. To flexibly use primitives, prior efforts only focus on particular program types or programs with dispatcher-loop structures. Beyond that, automatically generating exploitable heap layouts is hard for general-purpose programs due to the difficulties in explicitly and flexibly using primitives. This paper presents Scatter, enabling the generation of exploitable heap layouts for heap overflows in general-purpose programs in a primitive-free manner. At the center of Scatter is a fuzzer that is guided by a new manipulation distance which measures the distance to the corruption of a victim object in the heap layout space. To make the fuzzing-based approach practical, Scatter leverages a set of techniques to improve the efficiency and handle the side effects introduced by the heap manager's sophisticated behaviors in the real-world environment. Our evaluation demonstrates that Scatter can successfully generate a total of 126 exploitable heap layouts for 18 out of 27 heap overflows in 10 general-purpose programs.

#24 BalanceProofs: Maintainable Vector Commitments with Fast Aggregation [PDF] [Copy] [Kimi¹] [REL]

Authors: Weijie Wang, Annie Ulichney, Charalampos Papamanthou

We present BalanceProofs, the first vector commitment that is maintainable (i.e., supporting sublinear updates) while also enjoying fast proof aggregation and verification. The basic version of BalanceProofs has O(√nlogn) update time and O(√n) query time and its constant-size aggregated proofs can be produced and verified in milliseconds. In particular, BalanceProofs improves the aggregation time and aggregation verification time of the only known maintainable and aggregatable vector commitment scheme, Hyperproofs (USENIX SECURITY 2022), by up to 1000× and up to 100× respectively. Fast verification of aggregated proofs is particularly useful for applications such as stateless cryptocurrencies (and was a major bottleneck for Hyperproofs), where an aggregated proof of balances is produced once but must be verified multiple times and by a large number of nodes. As a limitation, the updating time in BalanceProofs compared to Hyperproofs is roughly 6× slower, but always stays in the range from 10 to 18 milliseconds. We finally study useful tradeoffs in BalanceProofs between (aggregate) proof size, update time and (aggregate) proof computation and verification, by introducing a bucketing technique, and present an extensive evaluation as well as a comparison to Hyperproofs.

#25 Bilingual Problems: Studying the Security Risks Incurred by Native Extensions in Scripting Languages [PDF] [Copy] [Kimi¹] [REL]

Authors: Cristian-Alexandru Staicu, Sazzadur Rahaman, Ágnes Kiss, Michael Backes

Scripting languages are continuously gaining popularity due to their ease of use and the flourishing software ecosystems surrounding them. These languages offer crash and memory safety by design. Thus, developers do not need to understand and prevent low-level security issues like the ones plaguing the C code. However, scripting languages often allow native extensions, a way for custom C/C++ code to be invoked directly from the high-level language. While this feature promises several benefits, such as increased performance or the reuse of legacy code, it can also break the language’s guarantees, e.g., crash safety. In this work, we first provide a comparative analysis of the security risks of native extension APIs in three popular scripting languages. Additionally, we discuss a novel methodology for studying the misuse of the native extension API. We then perform an in-depth study of npm, an ecosystem that is most exposed to threats introduced by native extensions. We show that vulnerabilities in extensions can be exploited in their embedding library by producing reads of uninitialized memory, hard crashes, or memory leaks in 33 npm packages simply by invoking their API with well-crafted inputs. Moreover, we identify six open-source web applications in which a weak adversary can deploy such exploits remotely. Finally, we were assigned seven security advisories for the work presented in this paper, most labeled as high severity.