2606.07611

Total: 1

#1 MIRAGE: Metadata-Integrated Repository Analysis and Guided Enhancement for MSR Datasets [PDF] [Copy] [Kimi] [REL]

Authors: Aabia Ather, Muhammad Usayd Ather, Qurat-Ul-Ain Somroo, Muhammad Khuram Shahzad

This paper proposes an improved approach to the analysis of Mining Software Repositories (MSR) datasets via metadata enrichment, FAIRness assessment, and topic-driven analysis. This research expands upon an earlier dataset directory created specifically for the analysis of MSR datasets by adding new annotations to the datasets, enriching the metadata categories, and offering more advanced filtering options. The metadata of the MSR papers presented from 2013 to 2024 has been gathered using the Semantic Scholar API. The analysis is based on Latent Dirichlet Allocation (LDA) topic modeling and statistical analysis. Dataset-level attributes were included into the expanded dataset directory, namely repository hosting site, format, accessibility, reusability, and dataset quality. The study reveals that the choice of repository hosting sites and data formats influences citation patterns and dataset usability. Furthermore, the enhanced annotation approach improves the analysis and discoverability of MSR datasets, supporting more effective reuse and evaluation of research artifacts.

Subjects: Information Retrieval , Artificial Intelligence , Machine Learning , Software Engineering

Publish: 2026-05-29 16:10:18 UTC