Total: 1
Web applications and APIs face constant threats from malicious actors seeking to exploit vulnerabilities for illicit gains. To defend against these threats, it is essential to have anomaly detection systems that can identify a variety of malicious behaviors. However, a significant challenge in this area is the limited availability of training data. Existing datasets often do not provide sufficient coverage of the diverse API structures, parameter formats, and usage patterns encountered in real-world scenarios. As a result, models trained on these datasets often struggle to generalize and may fail to detect less common or emerging attack vectors. To enhance detection accuracy and robustness, it is crucial to access larger and more representative datasets that capture the true variability of API traffic. To address this, we introduce a GAN-inspired learning framework that extends limited API traffic datasets through targeted, domain-aware synthesis. Drawing on techniques from Natural Language Processing (NLP), our approach leverages Transformer-based architectures, particularly RoBERTa, to enhance the contextual representation of API requests and generate realistic synthetic samples aligned with security-specific semantics. We evaluate our framework on two benchmark datasets, CSIC 2010 and ATRDF 2023, and compare it with a previous data augmentation technique to assess the importance of domain-specific synthesis. In addition, we apply our augmented data to various anomaly detection models to evaluate its impact on classification performance. Our method achieves up to a 4.94% increase in F1 score on CSIC 2010 and up to 21.10% on ATRDF 2023. The source codes of this work are available at https://github.com/ArielCyber/GAN-API.