2501.14170

Total: 1

#1 Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models [PDF] [Copy] [Kimi] [REL]

Authors: Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng

Observability in cloud infrastructure is critical for service providers, driving the widespread adoption of anomaly detection systems for monitoring metrics. However, existing systems often struggle to simultaneously achieve explainability, reproducibility, and autonomy, which are three indispensable properties for production use. We introduce Argos, an agentic system for detecting time-series anomalies in cloud infrastructure by leveraging large language models (LLMs). Argos proposes to use explainable and reproducible anomaly rules as intermediate representation and employs LLMs to autonomously generate such rules. The system will efficiently train error-free and accuracy-guaranteed anomaly rules through multiple collaborative agents and deploy the trained rules for low-cost online anomaly detection. Through evaluation results, we demonstrate that Argos outperforms state-of-the-art methods, increasing F1 scores by up to 9.5% and 28.3% on public anomaly detection datasets and an internal dataset collected from Microsoft, respectively.

Subjects: Machine Learning , Distributed, Parallel, and Cluster Computing , Multiagent Systems

Publish: 2025-01-24 01:38:37 UTC