Evaluating LLM Generated Detection Rules in Cybersecurity

#1 Evaluating LLM Generated Detection Rules in Cybersecurity [PDF] [Copy] [Kimi] [REL]

Authors: Anna Bertiger, Bobby Filar, Aryan Luthra, Stefano Meschiari, Aiden Mitchell, Sam Scholten, Vivek Sharath

LLMs are increasingly pervasive in the security environment, with limited measures of their effectiveness, which limits trust and usefulness to security practitioners. Here, we present an open-source evaluation framework and benchmark metrics for evaluating LLM-generated cybersecurity rules. The benchmark employs a holdout set-based methodology to measure the effectiveness of LLM-generated security rules in comparison to a human-generated corpus of rules. It provides three key metrics inspired by the way experts evaluate security rules, offering a realistic, multifaceted evaluation of the effectiveness of an LLM-based security rule generator. This methodology is illustrated using rules from Sublime Security's detection team and those written by Sublime Security's Automated Detection Engineer (ADE), with a thorough analysis of ADE's skills presented in the results section.

Subject: Cryptography and Security

Publish: 2025-09-20 17:21:51 UTC

2509.16749

#1 Evaluating LLM Generated Detection Rules in Cybersecurity [PDF] [Copy] [Kimi] [REL]