Skip to content
STIMSMITH

Bug Detection

Concept WIKI v1 · 6/6/2026

In the context of hardware implementation validation using tandem simulation, bug detection is the process of identifying design errors by cross-level comparison of architectural variables between a high-level instruction-level abstraction (ILA) model and a low-level RTL model. Instruction-by-instruction checking detects bugs substantially earlier than traditional run-to-the-end conformance testing.

Bug Detection

Overview

Bug detection in hardware verification refers to the identification of discrepancies between a high-level design specification and its low-level implementation. In the tandem simulation methodology described by Xing, Gupta, and Malik (ASPDAC 2022), bug detection is realized by comparing the architectural variables produced by an instruction-level execution model (ILEM) against those produced by an RTL-based execution model (RTEM). Any deviation between the two views is treated as a potential bug that can be localized by examining nearby instructions.

Mechanism in Tandem Simulation

Tandem simulation combines the ILEM (derived from the Instruction Set Architecture or, more generally, an Instruction-Level Abstraction) and the RTEM into a cross-level execution model (CLEM). At the end of each instruction, an AV-Check compares the instruction-level architectural variables (ILAVs) with the corresponding RTL architectural variables (RTAVs). When these disagree, the deviation is flagged as a potential bug. The AV-Check can also be invoked at chosen checkpoints (intervals of multiple instructions) to reduce per-instruction comparison overhead, and an AV-Swap operation can transfer ILAV values into the RTAVs to jump-start the RTEM after a warm-up phase.

Because comparison occurs at instruction boundaries rather than at the end of a full simulation trace, the technique falls into the category of instruction-by-instruction bug detection, in contrast to run-to-the-end conformance testing where the ILEM and RTEM are only compared after the complete test has executed.

Bug Categories Studied

The authors evaluated bug detection on three categories of artificially inserted bugs:

  • Condition bug — modifies a value or condition inside an if-then-else or case statement (the canonical example is the AES-round condition bug identified in the case-study designs).
  • Data bug — changes a value used in a computation.
  • Expression bug — changes a logic operator, e.g., replacing an AND/OR with an XOR.

Each bug was inserted at a randomly chosen location among the tens or hundreds of candidates available in the design, producing three buggy variants per case study.

Bug Detection Time Improvement

When comparing tandem simulation (instruction-by-instruction AV-Check) against traditional conformance testing (run-to-end comparison) on the same buggy variants:

  • Tandem simulation often detects the bug earlier than finishing the test under conformance testing.
  • In many cases the bug is found in less than 10% of the full test time, and in most cases in less than 40%.
  • An outlier is a data bug in the FlexNLP design, where the buggy data is only used in a very late stage of the test program, delaying detection.
  • The absolute simulation times for the run-to-end strategy on the studied designs range from roughly 1–15 seconds across design variants.

The authors note that AV-Swapping — a one-time overhead for jumping from ILEM into the RTEM — is negligible in practical tests of millions of instructions, provided it is not invoked too frequently.

Relationship to Other Concepts

Bug detection in this framework is a property enabled by Tandem Simulation, which is the cross-level simulation technique that performs the instruction-by-instruction ILEM/RTEM comparison. Tandem simulation additionally supports jump-starting (using AV-Swap) to skip warm-up phases and accelerate bug detection further.

Practical Significance

Within the seven case studies (including processors such as Rocket Core and accelerators such as AES-block, AES-round, GB, FlexNLP, Pico, and Piccolo), the empirical results support two main claims about bug detection:

  1. The instruction-by-instruction checking detects bugs earlier than run-to-the-end methods.
  2. Automation of the ILEM/RTEM connection — using the ILA model and its refinement map — makes this form of bug detection practical without requiring manual synchronization or controller construction between the two models.

LINKED ENTITIES

1 links

CITATIONS

9 sources
9 citations
[1] In tandem simulation, an AV-Check at the end of each instruction compares ILAVs and RTAVs, and any deviation signifies a potential bug that can be analyzed with nearby instructions. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[2] The AV-Check can also be invoked at specific intervals or checkpoints to reduce per-instruction comparison overhead. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[3] The authors classify inserted bugs into three categories: condition bug (changes a value/condition in a conditional statement), data bug (changes a value in a computation), and expression bug (changes a logic operator such as AND/OR to XOR). Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[4] Tandem simulation often detects the bug earlier than finishing the test in conformance testing; in many cases the bug is found in less than 10% of the full test time, and in most cases in less than 40%. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[5] An outlier in the bug detection study is a data bug in the FlexNLP design, where the buggy data is used only in a very late stage of the test program. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[6] The instruction-by-instruction checking detects bugs earlier than run-to-the-end methods, which is one of the summarized experimental results. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[7] AV-Swapping time from ILEM to RTEM is a one-time overhead that varies across designs and is determined roughly by the number of architectural variables and the cold-start length; it is negligible for tests with millions of instructions when not invoked very frequently. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[8] Absolute simulation time for the traditional run-to-end conformance testing strategy ranges from 1 to 15 seconds for the design variants studied. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models
[9] The case-study designs for bug detection evaluation include AES-block, AES-round, GB, FlexNLP, Pico, Piccolo, and Rocket Core. Generalizing Tandem Simulation: Connecting High-level and RTL Simulation Models