AI Sandboxes: A Threat Model, Taxonomy, and Measurement Framework

#1 AI Sandboxes: A Threat Model, Taxonomy, and Measurement Framework [PDF] [Copy] [Kimi] [REL]

Authors: Inderjeet Singh, Haitham Mahmoud, Andrés Murillo

AI systems are increasingly evaluated in bounded environments that combine isolation, simulation, instrumentation, supervision, and evidence capture. For physical AI, AIoT, and cyber-physical systems, this shift is not a matter of terminology: the system under test may sense, decide, actuate, communicate, and fail through physical processes, networked devices, and human operators. This article develops an assurance-oriented account of AI sandboxes as controlled environments for testing, evaluation, verification, and validation across digital AI, embodied autonomy, and cyber-physical deployments. We formalize the sandbox boundary and a weakest-link rule for composing per-dimension evidence into a bounded deployment claim; separate major sandbox archetypes; define a cyber-physical threat model that includes attacks on the assurance apparatus itself; and introduce a measurement framework spanning fidelity, controllability, observability, containment, reproducibility, and governance artifacts, instantiated on three worked case studies of real sandboxes. The resulting threat model, taxonomy, and measurement framework clarify what a sandbox can validly test, which risks it can contain, and what forms of evidence it can support for safety, security, and regulatory assurance.

Subjects: Cryptography and Security , Artificial Intelligence , Robotics , Software Engineering

Publish: 2026-06-16 22:57:24 UTC

2606.18532

#1 AI Sandboxes: A Threat Model, Taxonomy, and Measurement Framework [PDF] [Copy] [Kimi] [REL]