Hardware Architecture

2026-01-21 | | Total: 17

#1 The Quest for Reliable AI Accelerators: Cross-Layer Evaluation and Design Optimization [PDF] [Copy] [Kimi] [REL]

Authors: Meng Li, Tong Xie, Zuodong Zhang, Runsheng Wang

As the CMOS technology pushes to the nanoscale, aging effects and process variations have become increasingly pronounced, posing significant reliability challenges for AI accelerators. Traditional guardband-based design approaches, which rely on pessimistic timing margin, sacrifice significant performance and computational efficiency, rendering them inadequate for high-performance AI computing demands. Current reliability-aware AI accelerator design faces two core challenges: (1) the lack of systematic cross-layer analysis tools to capture coupling reliability effects across device, circuit, architecture, and application layers; and (2) the fundamental trade-off between conventional reliability optimization and computational efficiency. To address these challenges, this paper systematically presents a series of reliability-aware accelerator designs, encompassing (1) aging and variation-aware dynamic timing analyzer, (2) accelerator dataflow optimization using critical input pattern reduction, and (3) resilience characterization and novel architecture design for large language models (LLMs). By tightly integrating cross-layer reliability modeling and AI workload characteristics, these co-optimization approaches effectively achieve reliable and efficient AI acceleration.

Subject: Hardware Architecture

Publish: 2026-01-20 16:50:56 UTC


#2 CREATE: Cross-Layer Resilience Characterization and Optimization for Efficient yet Reliable Embodied AI Systems [PDF] [Copy] [Kimi] [REL]

Authors: Tong Xie, Yijiahao Qi, Jinqi Wen, Zishen Wan, Yanchi Dong, Zihao Wang, Shaofei Cai, Yitao Liang, Tianyu Jia, Yuan Wang, Runsheng Wang, Meng Li

Embodied Artificial Intelligence (AI) has recently attracted significant attention as it bridges AI with the physical world. Modern embodied AI systems often combine a Large Language Model (LLM)-based planner for high-level task planning and a reinforcement learning (RL)-based controller for low-level action generation, enabling embodied agents to tackle complex tasks in real-world environments. However, deploying embodied agents remains challenging due to their high computation requirements, especially for battery-powered local devices. Although techniques like lowering operating voltage can improve energy efficiency, they can introduce bit errors and result in task failures. In this work, we propose CREATE, a general design principle that leverages heterogeneous resilience at different layers for synergistic energy-reliability co-optimization. For the first time, we conduct a comprehensive error injection study on modern embodied AI systems and observe an inherent but heterogeneous fault tolerance. Building upon these insights, we develop an anomaly detection and clearance mechanism at the circuit level to eliminate outlier errors. At the model level, we propose a weight-rotation-enhanced planning algorithm to improve the fault tolerance of the LLM-based planner. Furthermore, we introduce an application-level technique, autonomy-adaptive voltage scaling, to dynamically adjust the operating voltage of the controllers. The voltage scaling circuit is co-designed to enable online voltage adjustment. Extensive experiments demonstrate that without compromising task quality, CREATE achieves 40.6% computational energy savings on average over nominal-voltage baselines and 35.0% over prior-art techniques. This further leads to 29.5% to 37.3% chip-level energy savings and approximately a 15% to 30% improvement in battery life.

Subject: Hardware Architecture

Publish: 2026-01-20 16:40:09 UTC


#3 '1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators [PDF] [Copy] [Kimi] [REL]

Authors: Ruichi Han, Yizhi Chen, Tong Lei, Jordi Altayo Gonzalez, Ahmed Hemani

Interconnect power consumption remains a bottleneck in Deep Neural Network (DNN) accelerators. While ordering data based on '1'-bit counts can mitigate this via reduced switching activity, practical hardware sorting implementations remain underexplored. This work proposes the hardware implementation of a comparison-free sorting unit optimized for Convolutional Neural Networks (CNN). By leveraging approximate computing to group population counts into coarse-grained buckets, our design achieves hardware area reductions while preserving the link power benefits of data reordering. Our approximate sorting unit achieves up to 35.4% area reduction while maintaining 19.50\% BT reduction compared to 20.42% of precise implementation.

Subjects: Hardware Architecture , Artificial Intelligence , Machine Learning

Publish: 2026-01-20 15:47:36 UTC


#4 From RTL to Prompt Coding: Empowering the Next Generation of Chip Designers through LLMs [PDF] [Copy] [Kimi] [REL]

Authors: Lukas Krupp, Matthew Venn, Norbert Wehn

This paper presents an LLM-based learning platform for chip design education, aiming to make chip design accessible to beginners without overwhelming them with technical complexity. It represents the first educational platform that assists learners holistically across both frontend and backend design. The proposed approach integrates an LLM-based chat agent into a browser-based workflow built upon the Tiny Tapeout ecosystem. The workflow guides users from an initial design idea through RTL code generation to a tapeout-ready chip. To evaluate the concept, a case study was conducted with 18 high-school students. Within a 90-minute session they developed eight functional VGA chip designs in a 130 nm technology. Despite having no prior experience in chip design, all groups successfully implemented tapeout-ready projects. The results demonstrate the feasibility and educational impact of LLM-assisted chip design, highlighting its potential to attract and inspire early learners and significantly broaden the target audience for the field.

Subject: Hardware Architecture

Publish: 2026-01-20 10:16:45 UTC


#5 The Non-Predictability of Mispredicted Branches using Timing Information [PDF] [Copy] [Kimi] [REL]

Authors: Ioannis Constantinou, Arthur Perais, Yiannakis Sazeides

Branch misprediction latency is one of the most important contributors to performance degradation and wasted energy consumption in a modern core. State-of-the-art predictors generally perform very well but occasionally suffer from high Misprediction Per Kilo Instruction due to hard-to-predict branches. In this work, we investigate if predicting branches using microarchitectural information, in addition to traditional branch history, can improve prediction accuracy. Our approach considers branch timing information (resolution cycle) both for older branches in the Reorder Buffer (ROB) and recently committed, and for younger branches relative to the branch we re-predict. We propose Speculative Branch Resolution (SBR) in which, N cycles after a branch allocates in the ROB, various timing information is collected and used to re-predict. Using the gem5 simulator we implement and perform a limit-study of SBR using a TAGE-Like predictor. Our experiments show that the post-alloc timing information we used was not able to yield performance gains over an unbounded TAGE-SC. However, we find two hard to predict branches where timing information did provide an advantage and thoroughly analysed one of them to understand why. This finding suggests that predictors may benefit from specific microarchitectural information to increase accuracy on specific hard to predict branches and that overriding predictions in the backend may yet yield performance benefits, but that further research is needed to determine such information vectors.

Subject: Hardware Architecture

Publish: 2026-01-20 10:03:20 UTC


#6 PRIMAL: Processing-In-Memory Based Low-Rank Adaptation for LLM Inference Accelerator [PDF] [Copy] [Kimi] [REL]

Authors: Yue Jiet Chong, Yimin Wang, Zhen Wu, Xuanyao Fong

This paper presents PRIMAL, a processing-in-memory (PIM) based large language model (LLM) inference accelerator with low-rank adaptation (LoRA). PRIMAL integrates heterogeneous PIM processing elements (PEs), interconnected by 2D-mesh inter-PE computational network (IPCN). A novel SRAM reprogramming and power gating (SRPG) scheme enables pipelined LoRA updates and sub-linear power scaling by overlapping reconfiguration with computation and gating idle resources. PRIMAL employs optimized spatial mapping and dataflow orchestration to minimize communication overhead, and achieves $1.5\times$ throughput and $25\times$ energy efficiency over NVIDIA H100 with LoRA rank 8 (Q,V) on Llama-13B.

Subject: Hardware Architecture

Publish: 2026-01-20 05:52:32 UTC


#7 Best Practices for Large Load Interconnections: A North American Perspective on Data Centers [PDF] [Copy] [Kimi] [REL]

Authors: Rafi Zahedi, Amin Zamani, Rahul Anilkumar

Large loads are expanding rapidly across North America, led by data centers, cryptocurrency mining, hydrogen production facilities, and heavy-duty charging stations. Each class presents distinct electrical characteristics, but data centers are drawing particular attention as AI deployment drives unprecedented capacity growth. Their scale, duty cycles, and converter-dominated interfaces introduce new challenges for transmission interconnections, especially regarding disturbance behavior, steady-state performance, and operational visibility. This paper reviews best practices for large-load interconnections across North America, synthesizing utility and system operator guidelines into a coherent set of technical requirements. The approach combines handbook and manual analysis with cross-utility comparisons and an outlook on European directions. The review highlights requirements on power quality, telemetry, commissioning tests, and protection coordination, while noting gaps in ride-through specifications, load-variation management, and post-disturbance recovery targets. Building on these findings, the paper proposes practical guidance for developers and utilities.

Subject: Hardware Architecture

Publish: 2026-01-19 03:15:23 UTC


#8 CD-PIM: A High-Bandwidth and Compute-Efficient LPDDR5-Based PIM for Low-Batch LLM Acceleration on Edge-Device [PDF] [Copy] [Kimi] [REL]

Authors: Ye Lin, Chao Fang, Xiaoyong Song, Qi Wu, Anying Jiang, Yichuan Bai, Li Du

Edge deployment of low-batch large language models (LLMs) faces critical memory bandwidth bottlenecks when executing memory-intensive general matrix-vector multiplications (GEMV) operations. While digital processing-in-memory (PIM) architectures promise to accelerate GEMV operations, existing PIM-equipped edge devices still suffer from three key limitations: limited bandwidth improvement, component under-utilization in mixed workloads, and low compute capacity of computing units (CUs). In this paper, we propose CD-PIM to address these challenges through three key innovations. First, we introduce a high-bandwidth compute-efficient mode (HBCEM) that enhances bandwidth by dividing each bank into four pseudo-banks through segmented global bitlines. Second, we propose a low-batch interleaving mode (LBIM) to improve component utilization by overlapping GEMV operations with GEMM operations. Third, we design a compute-efficient CU that performs enhanced GEMV operations in a pipelined manner by serially feeding weight data into the computing core. Forth, we adopt a column-wise mapping for the key-cache matrix and row-wise mapping for the value-cache matrix, which fully utilizes CU resources. Our evaluation shows that compared to a GPU-only baseline and state-of-the-art PIM designs, our CD-PIM achieves 11.42x and 4.25x speedup on average within a single batch in HBCEM mode, respectively. Moreover, for low-batch sizes, the CD-PIM achieves an average speedup of 1.12x in LBIM compared to HBCEM.

Subject: Hardware Architecture

Publish: 2026-01-18 07:58:23 UTC


#9 Biological Intuition on Digital Hardware: An RTL Implementation of Poisson-Encoded SNNs for Static Image Classification [PDF] [Copy] [Kimi] [REL]

Authors: Debabrata Das, Yogeeth G. K., Arnav Gupta

The deployment of Artificial Intelligence on edge devices (TinyML) is often constrained by the high power consumption and latency associated with traditional Artificial Neural Networks (ANNs) and their reliance on intensive Matrix-Multiply (MAC) operations. Neuromorphic computing offers a compelling alternative by mimicking biological efficiency through event-driven processing. This paper presents the design and implementation of a cycle-accurate, hardware-oriented Spiking Neural Network (SNN) core implemented in SystemVerilog. Unlike conventional accelerators, this design utilizes a Leaky Integrate-and-Fire (LIF) neuron model powered by fixed-point arithmetic and bit-wise primitives (shifts and additions) to eliminate the need for complex floating-point hardware. The architecture features an on-chip Poisson encoder for stochastic spike generation and a novel active pruning mechanism that dynamically disables neurons post-classification to minimize dynamic power consumption. We demonstrate the hardware's efficacy through a fully connected layer implementation targeting digit classification. Simulation results indicate that the design achieves rapid convergence (89% accuracy) within limited timesteps while maintaining a significantly reduced computational footprint compared to traditional dense architectures. This work serves as a foundational building block for scalable, energy-efficient neuromorphic hardware on FPGA and ASIC platforms.

Subject: Hardware Architecture

Publish: 2026-01-17 20:08:23 UTC


#10 Domain-specific Hardware Acceleration for Model Predictive Path Integral Control [PDF] [Copy] [Kimi] [REL]

Authors: Erwan Tanguy-Legac, Tommaso Belvedere, Gianluca Corsini, Marco Tognon, Marcello Traiola

Accurately controlling a robotic system in real time is a challenging problem. To address this, the robotics community has adopted various algorithms, such as Model Predictive Control (MPC) and Model Predictive Path Integral (MPPI) control. The first is difficult to implement on non-linear systems such as unmanned aerial vehicles, whilst the second requires a heavy computational load. GPUs have been successfully used to accelerate MPPI implementations; however, their power consumption is often excessive for autonomous or unmanned targets, especially when battery-powered. On the other hand, custom designs, often implemented on FPGAs, have been proposed to accelerate robotic algorithms while consuming considerably less energy than their GPU (or CPU) implementation. However, no MPPI custom accelerator has been proposed so far. In this work, we present a hardware accelerator for MPPI control and simulate its execution. Results show that the MPPI custom accelerator allows more accurate trajectories than GPU-based MPPI implementations.

Subjects: Hardware Architecture , Robotics

Publish: 2026-01-17 15:44:52 UTC


#11 NuRedact: Non-Uniform eFPGA Architecture for Low-Overhead and Secure IP Redaction [PDF] [Copy] [Kimi] [REL]

Authors: Voktho Das, Kimia Azar, Hadi Kamali

While logic locking has been extensively studied as a countermeasure against integrated circuit (IC) supply chain threats, recent research has shifted toward reconfigurable-based redaction techniques, e.g., LUT- and eFPGA-based schemes. While these approaches raise the bar against attacks, they incur substantial overhead, much of which arises not from genuine functional reconfigurability need, but from artificial complexity intended solely to frustrate reverse engineering (RE). As a result, fabrics are often underutilized, and security is achieved at disproportionate cost. This paper introduces NuRedact, the first full-custom eFPGA redaction framework that embraces architectural non-uniformity to balance security and efficiency. Built as an extension of the widely adopted OpenFPGA infrastructure, NuRedact introduces a three-stage methodology: (i) custom fabric generation with pin-mapping irregularity, (ii) VPR-level modifications to enable non-uniform placement guided by an automated Python-based optimizer, and (iii) redaction-aware reconfiguration and mapping of target IP modules. Experimental results show up to 9x area reduction compared to conventional uniform fabrics, achieving competitive efficiency with LUT-based and even transistor-level redaction techniques while retaining strong resilience. From a security perspective, NuRedact fabrics are evaluated against state-of-the-art attack models, including SAT-based, cyclic, and sequential variants, and show enhanced resilience while maintaining practical design overheads.

Subjects: Hardware Architecture , Cryptography and Security

Publish: 2026-01-16 20:55:30 UTC


#12 Multi-Partner Project: Multi-GPU Performance Portability Analysis for CFD Simulations at Scale [PDF] [Copy] [Kimi] [REL]

Authors: Panagiotis-Eleftherios Eleftherakis, George Anagnostopoulos, Anastassis Kapetanakis, Mohammad Umair, Jean-Yves Vet, Konstantinos Iliakis, Jonathan Vincent, Jing Gong, Akshay Patil, Clara García-Sánchez, Gerardo Zampino, Ricardo Vinuesa, Sotirios Xydis

As heterogeneous supercomputing architectures leveraging GPUs become increasingly central to high-performance computing (HPC), it is crucial for computational fluid dynamics (CFD) simulations, a de-facto HPC workload, to efficiently utilize such hardware. One of the key challenges of HPC codes is performance portability, i.e. the ability to maintain near-optimal performance across different accelerators. In the context of the \textbf{REFMAP} project, which targets scalable, GPU-enabled multi-fidelity CFD for urban airflow prediction, this paper analyzes the performance portability of SOD2D, a state-of-the-art Spectral Elements simulation framework across AMD and NVIDIA GPU architectures. We first discuss the physical and numerical models underlying SOD2D, highlighting its computational hotspots. Then, we examine its performance and scalability in a multi-level manner, i.e. defining and characterizing an extensive full-stack design space spanning across application, software and hardware infrastructure related parameters. Single-GPU performance characterization across server-grade NVIDIA and AMD GPU architectures and vendor-specific compiler stacks, show the potential as well as the diverse effect of memory access optimizations, i.e. 0.69$\times$ - 3.91$\times$ deviations in acceleration speedup. Performance variability of SOD2D at scale is further examined on the LUMI multi-GPU cluster, where profiling reveals similar throughput variations, highlighting the limits of performance projections and the need for multi-level, informed tuning.

Subjects: Distributed, Parallel, and Cluster Computing , Hardware Architecture

Publish: 2026-01-20 17:10:00 UTC


#13 Differentiable Logic Synthesis: Spectral Coefficient Selection via Sinkhorn-Constrained Composition [PDF] [Copy] [Kimi] [REL]

Author: Gorgi Pavlov

Learning precise Boolean logic via gradient descent remains challenging: neural networks typically converge to "fuzzy" approximations that degrade under quantization. We introduce Hierarchical Spectral Composition, a differentiable architecture that selects spectral coefficients from a frozen Boolean Fourier basis and composes them via Sinkhorn-constrained routing with column-sign modulation. Our approach draws on recent insights from Manifold-Constrained Hyper-Connections (mHC), which demonstrated that projecting routing matrices onto the Birkhoff polytope preserves identity mappings and stabilizes large-scale training. We adapt this framework to logic synthesis, adding column-sign modulation to enable Boolean negation -- a capability absent in standard doubly stochastic routing. We validate our approach across four phases of increasing complexity: (1) For n=2 (16 Boolean operations over 4-dim basis), gradient descent achieves 100% accuracy with zero routing drift and zero-loss quantization to ternary masks. (2) For n=3 (10 three-variable operations), gradient descent achieves 76% accuracy, but exhaustive enumeration over 3^8 = 6561 configurations proves that optimal ternary masks exist for all operations (100% accuracy, 39% sparsity). (3) For n=4 (10 four-variable operations over 16-dim basis), spectral synthesis -- combining exact Walsh-Hadamard coefficients, ternary quantization, and MCMC refinement with parallel tempering -- achieves 100% accuracy on all operations. This progression establishes (a) that ternary polynomial threshold representations exist for all tested functions, and (b) that finding them requires methods beyond pure gradient descent as dimensionality grows. All operations enable single-cycle combinational logic inference at 10,959 MOps/s on GPU, demonstrating viability for hardware-efficient neuro-symbolic logic synthesis.

Subjects: Machine Learning , Hardware Architecture , Logic in Computer Science

Publish: 2026-01-20 13:26:52 UTC


#14 Secure Multi-Path Routing with All-or-Nothing Transform for Network-on-Chip Architectures [PDF] [Copy] [Kimi] [REL]

Authors: Hansika Weerasena, Matthew Randall, Prabhat Mishra

Ensuring Network-on-Chip (NoC) security is crucial to design trustworthy NoC-based System-on-Chip (SoC) architectures. While there are various threats that exploit on-chip communication vulnerabilities, eavesdropping attacks via malicious nodes are among the most common and stealthy. Although encryption can secure packets for confidentiality, it may introduce unacceptable overhead for resource-constrained SoCs. In this paper, we propose a lightweight confidentiality-preserving framework that utilizes a quasi-group based All-Or-Nothing Transform (AONT) combined with secure multi-path routing in NoC-based SoCs. By applying AONT to each packet and distributing its transformed blocks across multiple non-overlapping routes, we ensure that no intermediate router can reconstruct the original data without all blocks. Extensive experimental evaluation demonstrates that our method effectively mitigates eavesdropping attacks by malicious routers with negligible area and performance overhead. Our results also reveal that AONT-based multi-path routing can provide 7.3x reduction in overhead compared to traditional encryption for securing against eavesdropping attacks.

Subjects: Cryptography and Security , Hardware Architecture , Networking and Internet Architecture

Publish: 2026-01-20 05:25:30 UTC


#15 CPU-less parallel execution of lambda calculus in digital logic [PDF] [Copy] [Kimi] [REL]

Authors: Harry Fitchett, Charles Fox

While transistor density is still increasing, clock speeds are not, motivating the search for new parallel architectures. One approach is to completely abandon the concept of CPU -- and thus serial imperative programming -- and instead to specify and execute tasks in parallel, compiling from programming languages to data flow digital logic. It is well-known that pure functional languages are inherently parallel, due to the Church-Rosser theorem, and CPU-based parallel compilers exist for many functional languages. However, these still rely on conventional CPUs and their von Neumann bottlenecks. An alternative is to compile functional languages directly into digital logic to maximize available parallelism. It is difficult to work with complete modern functional languages due to their many features, so we demonstrate a proof-of-concept system using lambda calculus as the source language and compiling to digital logic. We show how functional hardware can be tailored to a simplistic functional language, forming the ground for a new model of CPU-less functional computation. At the algorithmic level, we use a tree-based representation, with data localized within nodes and communicated data passed between them. This is implemented by physical digital logic blocks corresponding to nodes, and buses enabling message passing. Node types and behaviors correspond to lambda grammar forms, and beta-reductions are performed in parallel allowing branches independent from one another to perform transformations simultaneously. As evidence for this approach, we present an implementation, along with simulation results, showcasing successful execution of lambda expressions. This suggests that the approach could be scaled to larger functional languages. Successful execution of a test suite of lambda expressions suggests that the approach could be scaled to larger functional languages.

Subjects: Distributed, Parallel, and Cluster Computing , Hardware Architecture , Programming Languages

Publish: 2026-01-19 13:22:13 UTC


#16 Speaking to Silicon: Neural Communication with Bitcoin Mining ASICs [PDF] [Copy] [Kimi] [REL]

Authors: Francisco Angulo de Lafuente, Vladimir Veselov, Richard Goodman

This definitive research memoria presents a comprehensive, mathematically verified paradigm for neural communication with Bitcoin mining Application-Specific Integrated Circuits (ASICs), integrating five complementary frameworks: thermodynamic reservoir computing, hierarchical number system theory, algorithmic analysis, network latency optimization, and machine-checked mathematical formalization. We establish that obsolete cryptocurrency mining hardware exhibits emergent computational properties enabling bidirectional information exchange between AI systems and silicon substrates. The research program demonstrates: (1) reservoir computing with NARMA-10 Normalized Root Mean Square Error (NRMSE) of 0.8661; (2) the Thermodynamic Probability Filter (TPF) achieving 92.19% theoretical energy reduction; (3) the Virtual Block Manager achieving +25% effective hashrate; and (4) hardware universality across multiple ASIC families including Antminer S9, Lucky Miner LV06, and Goldshell LB-Box. A significant contribution is the machine-checked mathematical formalization using Lean 4 and Mathlib, providing unambiguous definitions, machine-verified theorems, and reviewer-proof claims. Key theorems proven include: independence implies zero leakage, predictor beats baseline implies non-independence (the logical core of TPF), energy savings theoretical maximum, and Physical Unclonable Function (PUF) distinguishability witnesses. Vladimir Veselov's hierarchical number system theory explains why early-round information contains predictive power. This work establishes a new paradigm: treating ASICs not as passive computational substrates but as active conversational partners whose thermodynamic state encodes exploitable computational information.

Subjects: Neural and Evolutionary Computing , Hardware Architecture , Cryptography and Security , Machine Learning

Publish: 2026-01-17 12:20:22 UTC


#17 SimFuzz: Similarity-guided Block-level Mutation for RISC-V Processor Fuzzing [PDF] [Copy] [Kimi] [REL]

Authors: Hao Lyu, Jingzheng Wu, Xiang Ling, Yicheng Zhong, Zhiyuan Li, Tianyue Luo

The Instruction Set Architecture (ISA) defines processor operations and serves as the interface between hardware and software. As an open ISA, RISC-V lowers the barriers to processor design and encourages widespread adoption, but also exposes processors to security risks such as functional bugs. Processor fuzzing is a powerful technique for automatically detecting these bugs. However, existing fuzzing methods suffer from two main limitations. First, their emphasis on redundant test case generation causes them to overlook cross-processor corner cases. Second, they rely too heavily on coverage guidance. Current coverage metrics are biased and inefficient, and become ineffective once coverage growth plateaus. To overcome these limitations, we propose SimFuzz, a fuzzing framework that constructs a high-quality seed corpus from historical bug-triggering inputs and employs similarity-guided, block-level mutation to efficiently explore the processor input space. By introducing instruction similarity, SimFuzz expands the input space around seeds while preserving control-flow structure, enabling deeper exploration without relying on coverage feedback. We evaluate SimFuzz on three widely used open-source RISC-V processors: Rocket, BOOM, and XiangShan, and discover 17 bugs in total, including 14 previously unknown issues, 7 of which have been assigned CVE identifiers. These bugs affect the decode and memory units, cause instruction and data errors, and can lead to kernel instability or system crashes. Experimental results show that SimFuzz achieves up to 73.22% multiplexer coverage on the high-quality seed corpus. Our findings highlight critical security bugs in mainstream RISC-V processors and offer actionable insights for improving functional verification.

Subjects: Cryptography and Security , Hardware Architecture

Publish: 2026-01-17 00:11:24 UTC