Cryptography and Security

2026-01-19 | | Total: 20

#1 IMS: Intelligent Hardware Monitoring System for Secure SoCs [PDF] [Copy] [Kimi] [REL]

Authors: Wadid Foudhaili, Aykut Rencber, Anouar Nechi, Rainer Buchty, Mladen Berekovic, Andres Gomez, Saleh Mulhem

In the modern Systems-on-Chip (SoC), the Advanced eXtensible Interface (AXI) protocol exhibits security vulnerabilities, enabling partial or complete denial-of-service (DoS) through protocol-violation attacks. The recent countermeasures lack a dedicated real-time protocol semantic analysis and evade protocol compliance checks. This paper tackles this AXI vulnerability issue and presents an intelligent hardware monitoring system (IMS) for real-time detection of AXI protocol violations. IMS is a hardware module leveraging neural networks to achieve high detection accuracy. For model training, we perform DoS attacks through header-field manipulation and systematic malicious operations, while recording AXI transactions to build a training dataset. We then deploy a quantization-optimized neural network, achieving 98.7% detection accuracy with <=3% latency overhead, and throughput of >2.5 million inferences/s. We subsequently integrate this IMS into a RISC-V SoC as a memory-mapped IP core to monitor its AXI bus. For demonstration and initial assessment for later ASIC integration, we implemented this IMS on an AMD Zynq UltraScale+ MPSoC ZCU104 board, showing an overall small hardware footprint (9.04% look-up-tables (LUTs), 0.23% DSP slices, and 0.70% flip-flops) and negligible impact on the overall design's achievable frequency. This demonstrates the feasibility of lightweight, security monitoring for resource-constrained edge environments.

Subjects: Cryptography and Security , Hardware Architecture , Machine Learning

Publish: 2026-01-16 17:10:17 UTC


#2 Understanding Help Seeking for Digital Privacy, Safety, and Security [PDF] [Copy] [Kimi] [REL]

Authors: Kurt Thomas, Sai Teja Peddinti, Sarah Meiklejohn, Tara Matthews, Amelia Hassoun, Animesh Srivastava, Jessica McClearn, Patrick Gage Kelley, Sunny Consolvo, Nina Taft

The complexity of navigating digital privacy, safety, and security threats often falls directly on users. This leads to users seeking help from family and peers, platforms and advice guides, dedicated communities, and even large language models (LLMs). As a precursor to improving resources across this ecosystem, our community needs to understand what help seeking looks like in the wild. To that end, we blend qualitative coding with LLM fine-tuning to sift through over one billion Reddit posts from the last four years to identify where and for what users seek digital privacy, safety, or security help. We isolate three million relevant posts with 93% precision and recall and automatically annotate each with the topics discussed (e.g., security tools, privacy configurations, scams, account compromise, content moderation, and more). We use this dataset to understand the scope and scale of help seeking, the communities that provide help, and the types of help sought. Our work informs the development of better resources for users (e.g., user guides or LLM help-giving agents) while underscoring the inherent challenges of supporting users through complex combinations of threats, platforms, mitigations, context, and emotions.

Subject: Cryptography and Security

Publish: 2026-01-16 16:10:02 UTC


#3 InterPUF: Distributed Authentication via Physically Unclonable Functions and Multi-party Computation for Reconfigurable Interposers [PDF] [Copy] [Kimi] [REL]

Authors: Ishraq Tashdid, Tasnuva Farheen, Sazadur Rahman

Modern system-in-package (SiP) platforms increasingly adopt reconfigurable interposers to enable plug-and-play chiplet integration across heterogeneous multi-vendor ecosystems. However, this flexibility introduces severe trust challenges, as traditional authentication schemes fail to scale or adapt in decentralized, post-fabrication programmable environments. This paper presents InterPUF, a compact and scalable authentication framework that transforms the interposer into a distributed root of trust. InterPUF embeds a route-based differential delay physically unclonable function (PUF) across the reconfigurable interconnect and secures authentication using multi-party computation (MPC), ensuring raw PUF signatures are never exposed. Our hardware evaluation shows only 0.23% area and 0.072% power overhead across diverse chiplets while preserving authentication latency within tens of nanoseconds. Simulation results using pyPUF confirm strong uniqueness, reliability, and modeling resistance under process, voltage, and temperature variations. By combining interposer-resident PUF primitives with cryptographic hashing and collaborative verification, InterPUF enforces a minimal-trust authentication model without relying on a centralized anchor.

Subjects: Cryptography and Security , Hardware Architecture

Publish: 2026-01-16 15:26:07 UTC


#4 VidLeaks: Membership Inference Attacks Against Text-to-Video Models [PDF2] [Copy] [Kimi1] [REL]

Authors: Li Wang, Wenyu Chen, Ning Yu, Zheng Li, Shanqing Guo

The proliferation of powerful Text-to-Video (T2V) models, trained on massive web-scale datasets, raises urgent concerns about copyright and privacy violations. Membership inference attacks (MIAs) provide a principled tool for auditing such risks, yet existing techniques - designed for static data like images or text - fail to capture the spatio-temporal complexities of video generation. In particular, they overlook the sparsity of memorization signals in keyframes and the instability introduced by stochastic temporal dynamics. In this paper, we conduct the first systematic study of MIAs against T2V models and introduce a novel framework VidLeaks, which probes sparse-temporal memorization through two complementary signals: 1) Spatial Reconstruction Fidelity (SRF), using a Top-K similarity to amplify spatial memorization signals from sparsely memorized keyframes, and 2) Temporal Generative Stability (TGS), which measures semantic consistency across multiple queries to capture temporal leakage. We evaluate VidLeaks under three progressively restrictive black-box settings - supervised, reference-based, and query-only. Experiments on three representative T2V models reveal severe vulnerabilities: VidLeaks achieves AUC of 82.92% on AnimateDiff and 97.01% on InstructVideo even in the strict query-only setting, posing a realistic and exploitable privacy risk. Our work provides the first concrete evidence that T2V models leak substantial membership information through both sparse and temporal memorization, establishing a foundation for auditing video generation systems and motivating the development of new defenses. Code is available at: https://zenodo.org/records/17972831.

Subjects: Cryptography and Security , Computer Vision and Pattern Recognition

Publish: 2026-01-16 11:35:52 UTC


#5 LoRA as Oracle [PDF1] [Copy] [Kimi] [REL]

Authors: Marco Arazzi, Antonino Nocera

Backdoored and privacy-leaking deep neural networks pose a serious threat to the deployment of machine learning systems in security-critical settings. Existing defenses for backdoor detection and membership inference typically require access to clean reference models, extensive retraining, or strong assumptions about the attack mechanism. In this work, we introduce a novel LoRA-based oracle framework that leverages low-rank adaptation modules as a lightweight, model-agnostic probe for both backdoor detection and membership inference. Our approach attaches task-specific LoRA adapters to a frozen backbone and analyzes their optimization dynamics and representation shifts when exposed to suspicious samples. We show that poisoned and member samples induce distinctive low-rank updates that differ significantly from those generated by clean or non-member data. These signals can be measured using simple ranking and energy-based statistics, enabling reliable inference without access to the original training data or modification of the deployed model.

Subjects: Cryptography and Security , Artificial Intelligence

Publish: 2026-01-16 11:32:32 UTC


#6 SD-RAG: A Prompt-Injection-Resilient Framework for Selective Disclosure in Retrieval-Augmented Generation [PDF] [Copy] [Kimi] [REL]

Authors: Aiman Al Masoud, Marco Arazzi, Antonino Nocera

Retrieval-Augmented Generation (RAG) has attracted significant attention due to its ability to combine the generative capabilities of Large Language Models (LLMs) with knowledge obtained through efficient retrieval mechanisms over large-scale data collections. Currently, the majority of existing approaches overlook the risks associated with exposing sensitive or access-controlled information directly to the generation model. Only a few approaches propose techniques to instruct the generative model to refrain from disclosing sensitive information; however, recent studies have also demonstrated that LLMs remain vulnerable to prompt injection attacks that can override intended behavioral constraints. For these reasons, we propose a novel approach to Selective Disclosure in Retrieval-Augmented Generation, called SD-RAG, which decouples the enforcement of security and privacy constraints from the generation process itself. Rather than relying on prompt-level safeguards, SD-RAG applies sanitization and disclosure controls during the retrieval phase, prior to augmenting the language model's input. Moreover, we introduce a semantic mechanism to allow the ingestion of human-readable dynamic security and privacy constraints together with an optimized graph-based data model that supports fine-grained, policy-aware retrieval. Our experimental evaluation demonstrates the superiority of SD-RAG over baseline existing approaches, achieving up to a $58\%$ improvement in the privacy score, while also showing a strong resilience to prompt injection attacks targeting the generative model.

Subjects: Cryptography and Security , Artificial Intelligence

Publish: 2026-01-16 11:22:02 UTC


#7 Proving Circuit Functional Equivalence in Zero Knowledge [PDF] [Copy] [Kimi] [REL]

Authors: Sirui Shen, Zunchen Huang, Chenglu Jin

The modern integrated circuit ecosystem is increasingly reliant on third-party intellectual property integration, which introduces security risks, including hardware Trojans and security vulnerabilities. Addressing the resulting trust deadlock between IP vendors and system integrators without exposing proprietary designs requires novel privacy-preserving verification techniques. However, existing privacy-preserving hardware verification methods are all simulation-based and fail to offer formal guarantees. In this paper, we propose ZK-CEC, the first privacy-preserving framework for hardware formal verification. By combining formal verification and zero-knowledge proof (ZKP), ZK-CEC establishes a foundation for formally verifying IP correctness and security without compromising the confidentiality of the designs. We observe that existing zero-knowledge protocols for formal verification are designed to prove statements of public formulas. However, in a privacy-preserving verification context where the formula is secret, these protocols cannot prevent a malicious prover from forging the formula, thereby compromising the soundness of the verification. To address these gaps, we first propose a blueprint for proving the unsatisfiability of a secret design against a public constraint, which is widely applicable to proving properties in software, hardware, and cyber-physical systems. Based on the proposed blueprint, we construct ZK-CEC, which enables a prover to convince the verifier that a secret IP's functionality aligns perfectly with the public specification in zero knowledge, revealing only the length and width of the proof. We implement ZK-CEC and evaluate its performance across various circuits, including arithmetic units and cryptographic components. Experimental results show that ZK-CEC successfully verifies practical designs, such as the AES S-Box, within practical time limits.

Subjects: Cryptography and Security , Logic in Computer Science

Publish: 2026-01-16 10:43:30 UTC


#8 A Defender-Attacker-Defender Model for Optimizing the Resilience of Hospital Networks to Cyberattacks [PDF] [Copy] [Kimi] [REL]

Authors: Stephan Helfrich, Emilia Grass

Considering the increasing frequency of cyberattacks affecting multiple hospitals simultaneously, improving resilience at a network level is essential. Various countermeasures exist to improve resilience against cyberattacks, such as deploying controls that strengthen IT infrastructures to limit their impact, or enabling resource sharing, patient transfers and backup capacities to maintain services of hospitals in response to realized attacks. However, determining the most cost-effective combination among these wide range of countermeasures is a complex challenge, further intensified by constrained budgets and competing priorities between maintaining efficient daily hospital operations and investing in disaster preparedness. To address these challenges, we propose a defender-attacker-defender optimization model that supports decision-makers in identifying effective strategies for improving the resilience of a network of hospitals against cyberattacks. The model explicitly captures interdependence between hospital services and their supporting IT infrastructures. By doing so, cyberattacks can be directly translated into reductions of service capacities, which allows to assess proactive and reactive strategies on both the operational and technical sides within a single framework. Further, time-dependent resilience measures are incorporated as design objectives to account for the mid- to long-term consequences of cyberattacks. The model is validated based on the German hospital network, suggesting that enabling cooperation with backup capacities particularly in urban areas, alongside strengthening of IT infrastructures across all hospitals, are crucial strategies.

Subjects: Cryptography and Security , Optimization and Control

Publish: 2026-01-16 09:41:54 UTC


#9 Shaping a Quantum-Resistant Future: Strategies for Post-Quantum PKI [PDF] [Copy] [Kimi] [REL]

Authors: Grazia D'Onghia, Diana Gratiela Berbecaru, Antonio Lioy

As the quantum computing era approaches, securing classical cryptographic protocols becomes imperative. Public key cryptography is widely used for signature and key exchange but it is the type of cryptography more threatened by quantum computing. Its application typically requires support via a public-key certificate, which is a signed data structure and must therefore face twice the quantum challenge: for the certified keys and for the signature itself. We present the latest developments in selecting robust Post-Quantum algorithms and investigate their applicability in the Public Key Infrastructure context. Our contribution entails defining requirements for a secure transition to a quantum-resistant Public Key Infrastructure, with a focus on adaptations for the X.509 certificate format. Additionally, we explore transitioning Certificate Revocation List and Online Certificate Status Protocol to support quantum-resistant algorithms. Through comparative analysis, we elucidate the complex transition to a quantum-resistant PKI.

Subject: Cryptography and Security

Publish: 2026-01-16 09:02:10 UTC


#10 Towards Quantum-Resistant Trusted Computing: Architectures for Post-Quantum Integrity Verification Techniques [PDF] [Copy] [Kimi] [REL]

Authors: Grazia D'Onghia, Antonio Lioy

Trust is the core building block of secure systems, and it is enforced through methods to ensure that a specific system is properly configured and works as expected. In this context, a Root of Trust (RoT) establishes a trusted environment, where both data and code are authenticated via a digital signature based on asymmetric cryptography, which is vulnerable to the threat posed by Quantum Computers (QCs). Firmware, being the first layer of trusted software, faces unique risks due to its longevity and difficult update. The transition of firmware protection to Post-Quantum Cryptography (PQC) is urgent, since it reduces the risk derived from exposing all computing and network devices to quantum-based attacks. This paper offers an analysis of the most common trust techniques and their roadmap towards a Post-Quantum (PQ) world, by investigating the current status of PQC and the challenges posed by such algorithms in existing Trusted Computing (TC) solutions from an integration perspective. Furthermore, this paper proposes an architecture for TC techniques enhanced with PEC, addressing the imperative for immediate adoption of quantum-resistant algorithms.

Subject: Cryptography and Security

Publish: 2026-01-16 08:52:09 UTC


#11 AJAR: Adaptive Jailbreak Architecture for Red-teaming [PDF1] [Copy] [Kimi] [REL]

Authors: Yipu Dou, Wang Yang

As Large Language Models (LLMs) evolve from static chatbots into autonomous agents capable of tool execution, the landscape of AI safety is shifting from content moderation to action security. However, existing red-teaming frameworks remain bifurcated: they either focus on rigid, script-based text attacks or lack the architectural modularity to simulate complex, multi-turn agentic exploitations. In this paper, we introduce AJAR (Adaptive Jailbreak Architecture for Red-teaming), a proof-of-concept framework designed to bridge this gap through Protocol-driven Cognitive Orchestration. Built upon the robust runtime of Petri, AJAR leverages the Model Context Protocol (MCP) to decouple adversarial logic from the execution loop, encapsulating state-of-the-art algorithms like X-Teaming as standardized, plug-and-play services. We validate the architectural feasibility of AJAR through a controlled qualitative case study, demonstrating its ability to perform stateful backtracking within a tool-use environment. Furthermore, our preliminary exploration of the "Agentic Gap" reveals a complex safety dynamic: while tool usage introduces new injection vectors via code execution, the cognitive load of parameter formatting can inadvertently disrupt persona-based attacks. AJAR is open-sourced to facilitate the standardized, environment-aware evaluation of this emerging attack surface. The code and data are available at https://github.com/douyipu/ajar.

Subjects: Cryptography and Security , Computation and Language

Publish: 2026-01-16 03:30:40 UTC


#12 Beyond Max Tokens: Stealthy Resource Amplification via Tool Calling Chains in LLM Agents [PDF] [Copy] [Kimi] [REL]

Authors: Kaiyu Zhou, Yongsen Zheng, Yicheng He, Meng Xue, Xueluan Gong, Yuji Wang, Kwok-Yan Lam

The agent-tool communication loop is a critical attack surface in modern Large Language Model (LLM) agents. Existing Denial-of-Service (DoS) attacks, primarily triggered via user prompts or injected retrieval-augmented generation (RAG) context, are ineffective for this new paradigm. They are fundamentally single-turn and often lack a task-oriented approach, making them conspicuous in goal-oriented workflows and unable to exploit the compounding costs of multi-turn agent-tool interactions. We introduce a stealthy, multi-turn economic DoS attack that operates at the tool layer under the guise of a correctly completed task. Our method adjusts text-visible fields and a template-governed return policy in a benign, Model Context Protocol (MCP)-compatible tool server, optimizing these edits with a Monte Carlo Tree Search (MCTS) optimizer. These adjustments leave function signatures unchanged and preserve the final payload, steering the agent into prolonged, verbose tool-calling sequences using text-only notices. This compounds costs across turns, escaping single-turn caps while keeping the final answer correct to evade validation. Across six LLMs on the ToolBench and BFCL benchmarks, our attack expands tasks into trajectories exceeding 60,000 tokens, inflates costs by up to 658x, and raises energy by 100-560x. It drives GPU KV cache occupancy from <1% to 35-74% and cuts co-running throughput by approximately 50%. Because the server remains protocol-compatible and task outcomes are correct, conventional checks fail. These results elevate the agent-tool interface to a first-class security frontier, demanding a paradigm shift from validating final answers to monitoring the economic and computational cost of the entire agentic process.

Subjects: Cryptography and Security , Artificial Intelligence

Publish: 2026-01-16 02:47:45 UTC


#13 Secure Data Bridging in Industry 4.0: An OPC UA Aggregation Approach for Including Insecure Legacy Systems [PDF] [Copy] [Kimi] [REL]

Authors: Dalibor Sain, Thomas Rosenstatter, Olaf Saßnick, Christian Schäfer, Stefan Huber

The increased connectivity of industrial networks has led to a surge in cyberattacks, emphasizing the need for cybersecurity measures tailored to the specific requirements of industrial systems. Modern Industry 4.0 technologies, such as OPC UA, offer enhanced resilience against these threats. However, widespread adoption remains limited due to long installation times, proprietary technology, restricted flexibility, and formal process requirements (e.g. safety certifications). Consequently, many systems do not yet implement these technologies, or only partially. This leads to the challenge of dealing with so-called brownfield systems, which are often placed in isolated security zones to mitigate risks. However, the need for data exchange between secure and insecure zones persists. This paper reviews existing solutions to address this challenge by analysing their approaches, advantages, and limitations. Building on these insights, we identify three key concepts, evaluate their suitability and compatibility, and ultimately introduce the SigmaServer, a novel TCP-level aggregation method. The developed proof-of-principle implementation is evaluated in an operational technology (OT) testbed, demonstrating its applicability and effectiveness in bridging secure and insecure zones.

Subjects: Cryptography and Security , Systems and Control

Publish: 2026-01-16 01:18:31 UTC


#14 Hidden-in-Plain-Text: A Benchmark for Social-Web Indirect Prompt Injection in RAG [PDF] [Copy] [Kimi] [REL]

Authors: Haoze Guo, Ziqi Wei

Retrieval-augmented generation (RAG) systems put more and more emphasis on grounding their responses in user-generated content found on the Web, amplifying both their usefulness and their attack surface. Most notably, indirect prompt injection and retrieval poisoning attack the web-native carriers that survive ingestion pipelines and are very concerning. We provide OpenRAG-Soc, a compact, reproducible benchmark-and-harness for web-facing RAG evaluation under these threats, in a discrete data package. The suite combines a social corpus with interchangeable sparse and dense retrievers and deployable mitigations - HTML/Markdown sanitization, Unicode normalization, and attribution-gated answered. It standardizes end-to-end evaluation from ingestion to generation and reports attacks time of one of the responses at answer time, rank shifts in both sparse and dense retrievers, utility and latency, allowing for apples-to-apples comparisons across carriers and defenses. OpenRAG-Soc targets practitioners who need fast, and realistic tests to track risk and harden deployments.

Subjects: Cryptography and Security , Human-Computer Interaction

Publish: 2026-01-16 00:50:42 UTC


#15 Adaptive Privacy Budgeting [PDF] [Copy] [Kimi] [REL]

Authors: Yuting Liang, Ke Yi

We study the problem of adaptive privacy budgeting under generalized differential privacy. Consider the setting where each user $i\in [n]$ holds a tuple $x_i\in U:=U_1\times \dotsb \times U_T$, where $x_i(l)\in U_l$ represents the $l$-th component of their data. For every $l\in [T]$ (or a subset), an untrusted analyst wishes to compute some $f_l(x_1(l),\dots,x_n(l))$, while respecting the privacy of each user. For many functions $f_l$, data from the users are not all equally important, and there is potential to use the privacy budgets of the users strategically, leading to privacy savings that can be used to improve the utility of later queries. In particular, the budgeting should be adaptive to the outputs of previous queries, so that greater savings can be achieved on more typical instances. In this paper, we provide such an adaptive budgeting framework, with various applications demonstrating its applicability.

Subject: Cryptography and Security

Publish: 2026-01-15 21:32:50 UTC


#16 Multi-Agent Taint Specification Extraction for Vulnerability Detection [PDF] [Copy] [Kimi] [REL]

Authors: Jonah Ghebremichael, Saastha Vasan, Saad Ullah, Greg Tystahl, David Adei, Christopher Kruegel, Giovanni Vigna, William Enck, Alexandros Kapravelos

Static Application Security Testing (SAST) tools using taint analysis are widely viewed as providing higher-quality vulnerability detection results compared to traditional pattern-based approaches. However, performing static taint analysis for JavaScript poses two major challenges. First, JavaScript's dynamic features complicate data flow extraction required for taint tracking. Second, npm's large library ecosystem makes it difficult to identify relevant sources/sinks and establish taint propagation across dependencies. In this paper, we present SemTaint, a multi-agent system that strategically combines the semantic understanding of Large Language Models (LLMs) with traditional static program analysis to extract taint specifications, including sources, sinks, call edges, and library flow summaries tailored to each package. Conceptually, SemTaint uses static program analysis to calculate a call graph and defers to an LLM to resolve call edges that cannot be resolved statically. Further, it uses the LLM to classify sources and sinks for a given CWE. The resulting taint specification is then provided to a SAST tool, which performs vulnerability analysis. We integrate SemTaint with CodeQL, a state-of-the-art SAST tool, and demonstrate its effectiveness by detecting 106 of 162 vulnerabilities previously undetectable by CodeQL. Furthermore, we find 4 novel vulnerabilities in 4 popular npm packages. In doing so, we demonstrate that LLMs can practically enhance existing static program analysis algorithms, combining the strengths of both symbolic reasoning and semantic understanding for improved vulnerability detection.

Subjects: Cryptography and Security , Software Engineering

Publish: 2026-01-15 21:31:51 UTC


#17 SecMLOps: A Comprehensive Framework for Integrating Security Throughout the MLOps Lifecycle [PDF] [Copy] [Kimi] [REL]

Authors: Xinrui Zhang, Pincan Zhao, Jason Jaskolka, Heng Li, Rongxing Lu

Machine Learning (ML) has emerged as a pivotal technology in the operation of large and complex systems, driving advancements in fields such as autonomous vehicles, healthcare diagnostics, and financial fraud detection. Despite its benefits, the deployment of ML models brings significant security challenges, such as adversarial attacks, which can compromise the integrity and reliability of these systems. To address these challenges, this paper builds upon the concept of Secure Machine Learning Operations (SecMLOps), providing a comprehensive framework designed to integrate robust security measures throughout the entire ML operations (MLOps) lifecycle. SecMLOps builds on the principles of MLOps by embedding security considerations from the initial design phase through to deployment and continuous monitoring. This framework is particularly focused on safeguarding against sophisticated attacks that target various stages of the MLOps lifecycle, thereby enhancing the resilience and trustworthiness of ML applications. A detailed advanced pedestrian detection system (PDS) use case demonstrates the practical application of SecMLOps in securing critical MLOps. Through extensive empirical evaluations, we highlight the trade-offs between security measures and system performance, providing critical insights into optimizing security without unduly impacting operational efficiency. Our findings underscore the importance of a balanced approach, offering valuable guidance for practitioners on how to achieve an optimal balance between security and performance in ML deployments across various domains.

Subjects: Cryptography and Security , Software Engineering

Publish: 2026-01-15 20:28:48 UTC


#18 Too Helpful to Be Safe: User-Mediated Attacks on Planning and Web-Use Agents [PDF] [Copy] [Kimi] [REL]

Authors: Fengchao Chen, Tingmin Wu, Van Nguyen, Carsten Rudolph

Large Language Models (LLMs) have enabled agents to move beyond conversation toward end-to-end task execution and become more helpful. However, this helpfulness introduces new security risks stem less from direct interface abuse than from acting on user-provided content. Existing studies on agent security largely focus on model-internal vulnerabilities or adversarial access to agent interfaces, overlooking attacks that exploit users as unintended conduits. In this paper, we study user-mediated attacks, where benign users are tricked into relaying untrusted or attacker-controlled content to agents, and analyze how commercial LLM agents respond under such conditions. We conduct a systematic evaluation of 12 commercial agents in a sandboxed environment, covering 6 trip-planning agents and 6 web-use agents, and compare agent behavior across scenarios with no, soft, and hard user-requested safety checks. Our results show that agents are too helpful to be safe by default. Without explicit safety requests, trip-planning agents bypass safety constraints in over 92% of cases, converting unverified content into confident booking guidance. Web-use agents exhibit near-deterministic execution of risky actions, with 9 out of 17 supported tests reaching a 100% bypass rate. Even when users express soft or hard safety intent, constraint bypass remains substantial, reaching up to 54.7% and 7% for trip-planning agents, respectively. These findings reveal that the primary issue is not a lack of safety capability, but its prioritization. Agents invoke safety checks only conditionally when explicitly prompted, and otherwise default to goal-driven execution. Moreover, agents lack clear task boundaries and stopping rules, frequently over-executing workflows in ways that lead to unnecessary data disclosure and real-world harm.

Subject: Cryptography and Security

Publish: 2026-01-14 03:29:13 UTC


#19 Chatting with Confidants or Corporations? Privacy Management with AI Companions [PDF] [Copy] [Kimi] [REL]

Authors: Hsuen-Chi Chiu, Jeremy Foote

AI chatbots designed as emotional companions blur the boundaries between interpersonal intimacy and institutional software, creating a complex, multi-dimensional privacy environment. Drawing on Communication Privacy Management theory and Masur's horizontal (user-AI) and vertical (user-platform) privacy framework, we conducted in-depth interviews with fifteen users of companion AI platforms such as Replika and Character.AI. Our findings reveal that users blend interpersonal habits with institutional awareness: while the non-judgmental, always-available nature of chatbots fosters emotional safety and encourages self-disclosure, users remain mindful of institutional risks and actively manage privacy through layered strategies and selective sharing. Despite this, many feel uncertain or powerless regarding platform-level data control. Anthropomorphic design further blurs privacy boundaries, sometimes leading to unintentional oversharing and privacy turbulence. These results extend privacy theory by highlighting the unique interplay of emotional and institutional privacy management in human-AI companionship.

Subjects: Cryptography and Security , Computers and Society , Human-Computer Interaction

Publish: 2026-01-13 15:15:37 UTC


#20 Differentially Private Subspace Fine-Tuning for Large Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Lele Zheng, Xiang Wang, Tao Zhang, Yang Cao, Ke Cheng, Yulong Shen

Fine-tuning large language models on downstream tasks is crucial for realizing their cross-domain potential but often relies on sensitive data, raising privacy concerns. Differential privacy (DP) offers rigorous privacy guarantees and has been widely adopted in fine-tuning; however, naively injecting noise across the high-dimensional parameter space creates perturbations with large norms, degrading performance and destabilizing training. To address this issue, we propose DP-SFT, a two-stage subspace fine-tuning method that substantially reduces noise magnitude while preserving formal DP guarantees. Our intuition is that, during fine-tuning, significant parameter updates lie within a low-dimensional, task-specific subspace, while other directions change minimally. Hence, we only inject DP noise into this subspace to protect privacy without perturbing irrelevant parameters. In phase one, we identify the subspace by analyzing principal gradient directions to capture task-specific update signals. In phase two, we project full gradients onto this subspace, add DP noise, and map the perturbed gradients back to the original parameter space for model updates, markedly lowering noise impact. Experiments on multiple datasets demonstrate that DP-SFT enhances accuracy and stability under rigorous DP constraints, accelerates convergence, and achieves substantial gains over DP fine-tuning baselines.

Subjects: Machine Learning , Cryptography and Security

Publish: 2026-01-16 09:15:46 UTC