Software Engineering

2025-03-28 | | Total: 18

#1 GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics [PDF] [Copy] [Kimi1] [REL]

Authors: Arsham Gholamzadeh Khoee, Shuai Wang, Yinan Yu, Robert Feldt, Dhasarathy Parthasarathy

Ensuring the reliability and effectiveness of software release decisions is critical, particularly in safety-critical domains like automotive systems. Precise analysis of release validation data, often presented in tabular form, plays a pivotal role in this process. However, traditional methods that rely on manual analysis of extensive test datasets and validation metrics are prone to delays and high costs. Large Language Models (LLMs) offer a promising alternative but face challenges in analytical reasoning, contextual understanding, handling out-of-scope queries, and processing structured test data consistently; limitations that hinder their direct application in safety-critical scenarios. This paper introduces GateLens, an LLM-based tool for analyzing tabular data in the automotive domain. GateLens translates natural language queries into Relational Algebra (RA) expressions and then generates optimized Python code. It outperforms the baseline system on benchmarking datasets, achieving higher F1 scores and handling complex and ambiguous queries with greater robustness. Ablation studies confirm the critical role of the RA module, with performance dropping sharply when omitted. Industrial evaluations reveal that GateLens reduces analysis time by over 80% while maintaining high accuracy and reliability. As demonstrated by presented results, GateLens achieved high performance without relying on few-shot examples, showcasing strong generalization across various query types from diverse company roles. Insights from deploying GateLens with a partner automotive company offer practical guidance for integrating AI into critical workflows such as release validation. Results show that by automating test result analysis, GateLens enables faster, more informed, and dependable release decisions, and can thus advance software scalability and reliability in automotive systems.

Subjects: Software Engineering , Artificial Intelligence , Computation and Language , Multiagent Systems

Publish: 2025-03-27 17:48:32 UTC


#2 Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs [PDF1] [Copy] [Kimi1] [REL]

Authors: Boyang Yang, Haoye Tian, Jiadong Ren, Shunfu Jin, Yang Liu, Feng Liu, Bach Le

Repository-level software repair faces challenges in bridging semantic gaps between issue descriptions and code patches. Existing approaches, which mostly depend on large language models (LLMs), suffer from semantic ambiguities, limited structural context understanding, and insufficient reasoning capability. To address these limitations, we propose KGCompass with two innovations: (1) a novel repository-aware knowledge graph (KG) that accurately links repository artifacts (issues and pull requests) and codebase entities (files, classes, and functions), allowing us to effectively narrow down the vast search space to only 20 most relevant functions with accurate candidate bug locations and contextual information, and (2) a path-guided repair mechanism that leverages KG-mined entity path, tracing through which allows us to augment LLMs with relevant contextual information to generate precise patches along with their explanations. Experimental results in the SWE-Bench-Lite demonstrate that KGCompass achieves state-of-the-art repair performance (45.67%) and function-level localization accuracy (51.33%) across open-source approaches, costing only $0.20 per repair. Our analysis reveals that among successfully localized bugs, 69.7% require multi-hop traversals through the knowledge graph, without which LLM-based approaches struggle to accurately locate bugs. The knowledge graph built in KGCompass is language agnostic and can be incrementally updated, making it a practical solution for real-world development environments.

Subject: Software Engineering

Publish: 2025-03-27 17:21:47 UTC


#3 SoK: Towards Reproducibility for Software Packages in Scripting Language Ecosystems [PDF] [Copy] [Kimi1] [REL]

Authors: Timo Pohl, Pavel Novák, Marc Ohm, Michael Meier

The disconnect between distributed software artifacts and their supposed source code enables attackers to leverage the build process for inserting malicious functionality. Past research in this field focuses on compiled language ecosystems, mostly analysing Linux distribution packages. However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This SoK provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems. We find that the literature is sparse, focusing on few individual problems and ecosystems. This allows us to effectively identify next steps to improve reproducibility in this field.

Subjects: Software Engineering , Cryptography and Security

Publish: 2025-03-27 17:10:38 UTC


#4 KRAFT -- A Knowledge-Graph-Based Resource Allocation Framework [PDF] [Copy] [Kimi1] [REL]

Authors: Leon Bein, Niels Martin, Luise Pufahl

Resource allocation in business process management involves assigning resources to open tasks while considering factors such as individual roles, aptitudes, case-specific characteristics, and regulatory constraints. Current information systems for resource allocation often require extensive manual effort to specify and maintain allocation rules, making them rigid and challenging to adapt. In contrast, fully automated approaches provide limited explainability, making it difficult to understand and justify allocation decisions. Knowledge graphs, which represent real-world entities and their relationships, offer a promising solution by capturing complex dependencies and enabling dynamic, context-aware resource allocation. This paper introduces KRAFT, a novel approach that leverages knowledge graphs and reasoning techniques to support resource allocation decisions. We demonstrate that integrating knowledge graphs into resource allocation software allows for adaptable and transparent decision-making based on an evolving knowledge base.

Subject: Software Engineering

Publish: 2025-03-27 15:59:57 UTC


#5 MONO2REST: Identifying and Exposing Microservices: a Reusable RESTification Approach [PDF] [Copy] [Kimi1] [REL]

Authors: Matthéo Lecrivain, Hanifa Barry, Dalila Tamzalit, Houari Sahraoui

The microservices architectural style has become the de facto standard for large-scale cloud applications, offering numerous benefits in scalability, maintainability, and deployment flexibility. Many organizations are pursuing the migration of legacy monolithic systems to a microservices architecture. However, this process is challenging, risky, time-intensive, and prone-to-failure while several organizations lack necessary financial resources, time, or expertise to set up this migration process. So, rather than trying to migrate a legacy system where migration is risky or not feasible, we suggest exposing it as a microservice application without without having to migrate it. In this paper, we present a reusable, automated, two-phase approach that combines evolutionary algorithms with machine learning techniques. In the first phase, we identify microservices at the method level using a multi-objective genetic algorithm that considers both structural and semantic dependencies between methods. In the second phase, we generate REST APIs for each identified microservice using a classification algorithm to assign HTTP methods and endpoints. We evaluated our approach with a case study on the Spring PetClinic application, which has both monolithic and microservices implementations that serve as ground truth for comparison. Results demonstrate that our approach successfully aligns identified microservices with those in the reference microservices implementation, highlighting its effectiveness in service identification and API generation.

Subjects: Software Engineering , Artificial Intelligence

Publish: 2025-03-27 14:10:33 UTC


#6 Code Review Comprehension: Reviewing Strategies Seen Through Code Comprehension Theories [PDF] [Copy] [Kimi1] [REL]

Authors: Pavlína Wurzel Gonçalves, Pooja Rani, Margaret-Anne Storey, Diomidis Spinellis, Alberto Bacchelli

Despite the popularity and importance of modern code review, the understanding of the cognitive processes that enable reviewers to analyze code and provide meaningful feedback is lacking. To address this gap, we observed and interviewed ten experienced reviewers while they performed 25 code reviews from their review queue. Since comprehending code changes is essential to perform code review and the primary challenge for reviewers, we focused our analysis on this cognitive process. Using Letovsky's model of code comprehension, we performed a theory-driven thematic analysis to investigate how reviewers apply code comprehension to navigate changes and provide feedback. Our findings confirm that code comprehension is fundamental to code review. We extend Letovsky's model to propose the Code Review Comprehension Model and demonstrate that code review, like code comprehension, relies on opportunistic strategies. These strategies typically begin with a context-building phase, followed by code inspection involving code reading, testing, and discussion management. To interpret and evaluate the proposed change, reviewers construct a mental model of the change as an extension of their understanding of the overall software system and contrast mental representations of expected and ideal solutions against the actual implementation. Based on our findings, we discuss how review tools and practices can better support reviewers in employing their strategies and in forming understanding. Data and material: https://doi.org/10.5281/zenodo.14748996

Subject: Software Engineering

Publish: 2025-03-27 12:44:40 UTC


#7 HORIZON: a Classification and Comparison Framework for Pricing-driven Feature Toggling [PDF] [Copy] [Kimi1] [REL]

Authors: Alejandro García-Fernández, Jose Antonio Parejo, Antonio Ruiz-Cortés

Software as a Service (SaaS) has seen rapid growth in recent years, thanks to its ability to adapt to diverse user needs through subscription-based models. However, as pricing models enhance the customization of subscriptions, managing the associated constraints within a system's codebase becomes increasingly challenging. In response, Pricing-driven Development and Operation has emerged to integrate pricing considerations across the software lifecycle. Among its most challenging objectives is regulating feature access according to users' subscriptions -- a process that requires managing a multitude of conditions throughout the system's codebase. Feature toggles have traditionally been employed to manage dynamic system behavior, but their application to pricing-driven constraints presents unique challenges. When used to enforce subscription-based restrictions, toggles must adapt -- among other factors -- to individual user's use of features, ensuring that subscription limits are not exceeded. Despite the increasing significance of this problem, current industrial solutions lack explicit support for pricing-driven feature toggling, and existing academic contributions remain constrained to specific architectures. This paper contributes to fill this gap by introducing HORIZON, a classification and comparison framework for feature toggling tools tailored to pricing-driven environments. Its utility is demonstrated by categorizing the solutions identified in the literature as promising for such environments, revealing both their strengths and limitations, and thereby pinpointing critical avenues for improvement. In doing so, HORIZON not only provides a comprehensive view of the current landscape but also lays the groundwork for a focused research agenda, guiding the development of more robust and adaptable solutions for streamlining SaaS development and operations driven by pricings.

Subject: Software Engineering

Publish: 2025-03-27 12:40:10 UTC


#8 Automated Analysis of Pricings in SaaS-based Information Systems [PDF] [Copy] [Kimi1] [REL]

Authors: Alejandro García-Fernández, José Antonio Parejo, Pablo Trinidad, Antonio Ruiz-Cortés

Software as a Service (SaaS) pricing models, encompassing features, usage limits, plans, and add-ons, have grown exponentially in complexity, evolving from offering tens to thousands of configuration options. This rapid expansion poses significant challenges for the development and operation of SaaS-based Information Systems (IS), as manual management of such configurations becomes time-consuming, error-prone, and ultimately unsustainable. The emerging paradigm of Pricing-driven DevOps aims to address these issues by automating pricing management tasks, such as transforming human-oriented pricings into machine-oriented (iPricing) or finding the optimal subscription that matches the requirements of a certain user, ultimately reducing human intervention. This paper advances the field by proposing seven analysis operations that partially or fully support these pricing management tasks, thus serving as a foundation for defining new, more specialized operations. To achieve this, we mapped iPricings into Constraint Satisfaction Optimization Problems (CSOP), an approach successfully used in similar domains, enabling us to implement and apply these operations to uncover latent, yet non-trivial insights from complex pricing models. The proposed approach has been implemented in a reference framework using MiniZinc, and tested with over 150 pricing models, identifying errors in 35 pricings of the benchmark. Results demonstrate its effectiveness in identifying errors and its potential to streamline Pricing-driven DevOps.

Subject: Software Engineering

Publish: 2025-03-27 12:36:57 UTC


#9 Scaling Automated Database System Testing [PDF1] [Copy] [Kimi1] [REL]

Authors: Suyang Zhong, Manuel Rigger

Recently, various automated testing approaches have been proposed that use specialized test oracles to find hundreds of logic bugs in mature, widely-used Database Management Systems (DBMSs). These test oracles require database and query generators, which must account for the often significant differences between the SQL dialects of these systems. Since it can take weeks to implement such generators, many DBMS developers are unlikely to invest the time to adopt such automated testing approaches. In short, existing approaches fail to scale to the plethora of DBMSs. In this work, we present both a vision and a platform, SQLancer++, to apply test oracles to any SQL-based DBMS that supports a subset of common SQL features. Our technical core contribution is a novel architecture for an adaptive SQL statement generator. This adaptive SQL generator generates SQL statements with various features, some of which might not be supported by the given DBMS, and then learns through interaction with the DBMS, which of these are understood by the DBMS. Thus, over time, the generator will generate mostly valid SQL statements. We evaluated SQLancer++ across 17 DBMSs and discovered a total of 195 unique, previously unknown bugs, of which 180 were fixed after we reported them. While SQLancer++ is the first major step towards scaling automated DBMS testing, various follow-up challenges remain.

Subjects: Software Engineering , Databases

Publish: 2025-03-27 12:10:36 UTC


#10 The Promise and Pitfalls of WebAssembly: Perspectives from the Industry [PDF] [Copy] [Kimi1] [REL]

Authors: Ningyu He, Shangtong Cao, Haoyu Wang, Yao Guo, Xiapu Luo

As JavaScript has been criticized for performance and security issues in web applications, WebAssembly (Wasm) was proposed in 2017 and is regarded as the complementation for JavaScript. Due to its advantages like compact-size, native-like speed, and portability, Wasm binaries are gradually used as the compilation target for industrial projects in other high-level programming languages and are responsible for computation-intensive tasks in browsers, e.g., 3D graphic rendering and video decoding. Intuitively, characterizing in-the-wild adopted Wasm binaries from different perspectives, like their metadata, relation with source programming language, existence of security threats, and practical purpose, is the prerequisite before delving deeper into the Wasm ecosystem and beneficial to its roadmap selection. However, currently, there is no work that conducts a large-scale measurement study on in-the-wild adopted Wasm binaries. To fill this gap, we collect the largest-ever dataset to the best of our knowledge, and characterize the status quo of them from industry perspectives. According to the different roles of people engaging in the community, i.e., web developers, Wasm maintainers, and researchers, we reorganized our findings to suggestions and best practices for them accordingly. We believe this work can shed light on the future direction of the web and Wasm.

Subject: Software Engineering

Publish: 2025-03-27 08:01:22 UTC


#11 Less Noise, More Signal: DRR for Better Optimizations of SE Tasks [PDF] [Copy] [Kimi1] [REL]

Authors: Andre Lustosa, Tim Menzies

SE analytics problems do not always need complex AI. Better and faster solutions can sometimes be obtained by matching the complexity of the problem to the complexity of the solution. This paper introduces the Dimensionality Reduction Ratio (DRR), a new metric for predicting when lightweight algorithms suffice. Analyzing SE optimization problems from software configuration to process decisions and open-source project health we show that DRR pinpoints "simple" tasks where costly methods like DEHB (a state-of-the-art evolutionary optimizer) are overkill. For high-DRR problems, simpler methods can be just as effective and run two orders of magnitude faster.

Subject: Software Engineering

Publish: 2025-03-27 02:02:06 UTC


#12 Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring [PDF1] [Copy] [Kimi1] [REL]

Authors: Fraol Batole, Abhiram Bellur, Malinda Dilhara, Mohammed Raihan Ullah, Yaroslav Zharov, Timofey Bryksin, Kai Ishikawa, Haifeng Chen, Masaharu Morimoto, Shota Motoura, Takeo Hosomi, Tien N. Nguyen, Hridesh Rajan, Nikolaos Tsantalis, Danny Dig

MOVEMETHOD is a hallmark refactoring. Despite a plethora of research tools that recommend which methods to move and where, these recommendations do not align with how expert developers perform MOVEMETHOD. Given the extensive training of Large Language Models and their reliance upon naturalness of code, they should expertly recommend which methods are misplaced in a given class and which classes are better hosts. Our formative study of 2016 LLM recommendations revealed that LLMs give expert suggestions, yet they are unreliable: up to 80% of the suggestions are hallucinations. We introduce the first LLM fully powered assistant for MOVEMETHOD refactoring that automates its whole end-to-end lifecycle, from recommendation to execution. We designed novel solutions that automatically filter LLM hallucinations using static analysis from IDEs and a novel workflow that requires LLMs to be self-consistent, critique, and rank refactoring suggestions. As MOVEMETHOD refactoring requires global, projectlevel reasoning, we solved the limited context size of LLMs by employing refactoring-aware retrieval augment generation (RAG). Our approach, MM-assist, synergistically combines the strengths of the LLM, IDE, static analysis, and semantic relevance. In our thorough, multi-methodology empirical evaluation, we compare MM-assist with the previous state-of-the-art approaches. MM-assist significantly outperforms them: (i) on a benchmark widely used by other researchers, our Recall@1 and Recall@3 show a 1.7x improvement; (ii) on a corpus of 210 recent refactorings from Open-source software, our Recall rates improve by at least 2.4x. Lastly, we conducted a user study with 30 experienced participants who used MM-assist to refactor their own code for one week. They rated 82.8% of MM-assist recommendations positively. This shows that MM-assist is both effective and useful.

Subject: Software Engineering

Publish: 2025-03-26 19:05:20 UTC


#13 StepGrade: Grading Programming Assignments with Context-Aware LLMs [PDF1] [Copy] [Kimi1] [REL]

Authors: Mohammad Akyash, Kimia Zamiri Azar, Hadi Mardani Kamali

Grading programming assignments is a labor-intensive and time-consuming process that demands careful evaluation across multiple dimensions of the code. To overcome these challenges, automated grading systems are leveraged to enhance efficiency and reduce the workload on educators. Traditional automated grading systems often focus solely on correctness, failing to provide interpretable evaluations or actionable feedback for students. This study introduces StepGrade, which explores the use of Chain-of-Thought (CoT) prompting with Large Language Models (LLMs) as an innovative solution to address these challenges. Unlike regular prompting, which offers limited and surface-level outputs, CoT prompting allows the model to reason step-by-step through the interconnected grading criteria, i.e., functionality, code quality, and algorithmic efficiency, ensuring a more comprehensive and transparent evaluation. This interconnectedness necessitates the use of CoT to systematically address each criterion while considering their mutual influence. To empirically validate the efficiency of StepGrade, we conducted a case study involving 30 Python programming assignments across three difficulty levels (easy, intermediate, and advanced). The approach is validated against expert human evaluations to assess its consistency, accuracy, and fairness. Results demonstrate that CoT prompting significantly outperforms regular prompting in both grading quality and interpretability. By reducing the time and effort required for manual grading, this research demonstrates the potential of GPT-4 with CoT prompting to revolutionize programming education through scalable and pedagogically effective automated grading systems.

Subject: Software Engineering

Publish: 2025-03-26 17:36:26 UTC


#14 CodeTool: Enhancing Programmatic Tool Invocation of LLMs via Process Supervision [PDF1] [Copy] [Kimi1] [REL]

Authors: Yifei Lu, Fanghua Ye, Jian Li, Qiang Gao, Cheng Liu, Haibo Luo, Nan Du, Xiaolong Li, Feiliang Ren

Tool invocation significantly enhances the capabilities of Large Language Models (LLMs), yet challenges persist, particularly in complex task scenarios. Current methods, such as instruction-enhanced reasoning and supervised fine-tuning, often result in unnecessarily long reasoning paths and face difficulties in verifying the correctness of intermediate steps. In this paper, we propose CodeTool, a novel framework for stepwise code generation that improves LLM tool invocation by leveraging the concise and easily verifiable nature of code. CodeTool incorporates two distinct process rewards: the On-the-spot Reward, which provides immediate feedback on the accuracy of each tool invocation, and the Latent Reward, which assesses the contribution of each step toward overall task completion. By maximizing the cumulative reward of the On-the-spot and Latend Rewards at each step, LLMs are guided to follow efficient and accurate reasoning paths. Extensive experiments on StableToolBench and RestBench-TMDB demonstrate the superiority of CodeTool over existing approaches.

Subject: Software Engineering

Publish: 2025-03-26 13:19:01 UTC


#15 A Measure Based Generalizable Approach to Understandability [PDF] [Copy] [Kimi2] [REL]

Authors: Vikas Kushwaha, Sruti Srinivasa Ragavan, Subhajit Roy

Successful agent-human partnerships require that any agent generated information is understandable to the human, and that the human can easily steer the agent towards a goal. Such effective communication requires the agent to develop a finer-level notion of what is understandable to the human. State-of-the-art agents, including LLMs, lack this detailed notion of understandability because they only capture average human sensibilities from the training data, and therefore afford limited steerability (e.g., requiring non-trivial prompt engineering). In this paper, instead of only relying on data, we argue for developing generalizable, domain-agnostic measures of understandability that can be used as directives for these agents. Existing research on understandability measures is fragmented, we survey various such efforts across domains, and lay a cognitive-science-rooted groundwork for more coherent and domain-agnostic research investigations in future.

Subjects: Human-Computer Interaction , Artificial Intelligence , Software Engineering

Publish: 2025-03-27 15:36:49 UTC


#16 debug-gym: A Text-Based Environment for Interactive Debugging [PDF3] [Copy] [Kimi3] [REL]

Authors: Xingdi Yuan, Morgane M Moss, Charbel El Feghali, Chinmay Singh, Darya Moldavskaya, Drew MacPhee, Lucas Caccia, Matheus Pereira, Minseon Kim, Alessandro Sordoni, Marc-Alexandre Côté

Large Language Models (LLMs) are increasingly relied upon for coding tasks, yet in most scenarios it is assumed that all relevant information can be either accessed in context or matches their training data. We posit that LLMs can benefit from the ability to interactively explore a codebase to gather the information relevant to their task. To achieve this, we present a textual environment, namely debug-gym, for developing LLM-based agents in an interactive coding setting. Our environment is lightweight and provides a preset of useful tools, such as a Python debugger (pdb), designed to facilitate an LLM-based agent's interactive debugging. Beyond coding and debugging tasks, this approach can be generalized to other tasks that would benefit from information-seeking behavior by an LLM agent.

Subjects: Artificial Intelligence , Computation and Language , Programming Languages , Software Engineering

Publish: 2025-03-27 14:43:28 UTC


#17 A Data-Driven Method for INS/DVL Alignment [PDF] [Copy] [Kimi1] [REL]

Authors: Guy Damari, Itzik Klein

Autonomous underwater vehicles (AUVs) are sophisticated robotic platforms crucial for a wide range of applications. The accuracy of AUV navigation systems is critical to their success. Inertial sensors and Doppler velocity logs (DVL) fusion is a promising solution for long-range underwater navigation. However, the effectiveness of this fusion depends heavily on an accurate alignment between the inertial sensors and the DVL. While current alignment methods show promise, there remains significant room for improvement in terms of accuracy, convergence time, and alignment trajectory efficiency. In this research we propose an end-to-end deep learning framework for the alignment process. By leveraging deep-learning capabilities, such as noise reduction and capture of nonlinearities in the data, we show using simulative data, that our proposed approach enhances both alignment accuracy and reduces convergence time beyond current model-based methods.

Subjects: Robotics , Software Engineering

Publish: 2025-03-27 10:38:37 UTC


#18 How to Secure Existing C and C++ Software without Memory Safety [PDF] [Copy] [Kimi1] [REL]

Author: Úlfar Erlingsson

The most important security benefit of software memory safety is easy to state: for C and C++ software, attackers can exploit most bugs and vulnerabilities to gain full, unfettered control of software behavior, whereas this is not true for most bugs in memory-safe software. Fortunately, this security benefit -- most bugs don't give attackers full control -- can be had for unmodified C/C++ software, without per-application effort. This doesn't require trying to establish memory safety; instead, it is sufficient to eliminate most of the combinatorial ways in which software with corrupted memory can execute. To eliminate these interleavings, there already exist practical compiler and runtime mechanisms that incur little overhead and need no special hardware or platform support. Each of the mechanisms described here is already in production use, at scale, on one or more platforms. By supporting their combined use in development toolchains, the security of all C and C++ software against remote code execution attacks can be rapidly, and dramatically, improved.

Subjects: Cryptography and Security , Software Engineering

Publish: 2025-03-27 04:20:47 UTC