Skip to content
STIMSMITH

cycle-accurate simulation

Concept WIKI v1 · 6/9/2026

Cycle-accurate simulation refers to modeling and executing hardware (or hardware-described systems) one clock cycle at a time, so that the simulator's behavior matches the target design's behavior on every cycle. It is widely used in processor modeling, high-level synthesis (HLS) validation, and in solver-aided hardware verification where properties must be reasoned about over many concrete cycles of execution.

cycle-accurate simulation

Definition and purpose

Cycle-accurate simulation is the practice of constructing a software model of a digital system whose state advances one clock cycle at a time, in lockstep with the design under study. Such simulators are used both for early hardware/software co-design, where detailed processor or accelerator models are needed before silicon exists, and for verification, where one must reason about the exact sequence of states a circuit visits.

For processor design, detailed modeling and high-performance cycle-accurate simulators are described as essential for today's hardware and software design, and pose challenges that have been the subject of many research efforts (see Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation, arxiv 0710.4643v1).

Cycle-accurate processor modeling

The Reduced Colored Petri Net (RCPN) approach is one technique for building such simulators. RCPN offers two advantages:

  1. It provides a simple, intuitive way of modeling pipelined processors, where the model mirrors the processor pipeline block diagram.
  2. It can generate high-performance cycle-accurate simulators, because it benefits from useful features of Colored Petri Nets while avoiding their exponential growth in complexity.

Using RCPN, the authors report that their generated cycle-accurate simulators for XScale and StrongArm processor models achieved roughly an order of magnitude (~15×) speedup over the popular SimpleScalar ARM simulator (arxiv 0710.4643v1).

Cycle-accurate simulation in High-Level Synthesis (HLS)

In FPGA HLS, a large semantic gap between the HLS design and the low-level (on-board or RTL) simulation environment often creates a barrier for non-FPGA experts, and low-level simulation can be very slow. Commercial software-based HLS simulators help bridge this gap and accelerate simulation, but they have been observed to sometimes produce incorrect results.

To solve this correctness issue while retaining the speed of a software-based simulator, the FLASH flow was proposed. FLASH extracts scheduling information from the HLS tool and automatically constructs an equivalent cycle-accurate simulation model while preserving C semantics. Experimentally, FLASH runs three orders of magnitude faster than RTL simulation (Rapid Cycle-Accurate Simulator for High-Level Synthesis, arxiv 1812.07012v2).

Cycle-accurate reasoning in hardware verification

Cycle-accurate simulation also underlies certain styles of hardware verification. SMT-based tools such as SymbiYosys can verify properties that hold after a small number of cycles by unrolling the transition relation. However, expressing the circuit transition as a relation means that, when the circuit state is primarily symbolic, each unrolled step grows the symbolic expressions, leading to a large final solver query; empirically, SMT solvers exhibit poor performance when reasoning about long chains of unrolled transitions [da13564e-2b08-4a7e-b7f8-9409cf1c1d13, ab47b904-296b-461d-8987-9bce0df2787b].

The rtlv tool takes a different approach. Instead of encoding the transition relation directly into an SMT query, rtlv transforms the transition relation into an imperative step function and symbolically executes the circuit cycle by cycle, leveraging Rosette's hybrid symbolic execution with type-driven state merging and on-the-fly rewrite rules. This makes it possible to reason in a cycle-accurate manner about software running on the hardware [1].

A concrete illustration: verifying that boot code clears all microarchitectural state in a PicoRV32 CPU requires modeling 104 cycles of execution from an initially unconstrained circuit state. Using SymbiYosys, the resulting unrolled solver query did not finish within 12 hours, while rtlv completes the same task in about 1.3 seconds [1].

See also

CITATIONS

13 sources
13 citations
[1] Cycle-accurate simulators are essential for modern hardware and software design, and detailed processor modeling is challenging. Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation
[2] The Reduced Colored Petri Net (RCPN) model provides an intuitive mirror of the processor pipeline block diagram and can generate high-performance cycle-accurate simulators without the exponential complexity of standard Colored Petri Nets. Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation
[3] RCPN-generated cycle-accurate simulators for XScale and StrongArm achieved ~15x speedup over the SimpleScalar ARM simulator. Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation
[4] A large semantic gap between HLS design and low-level (on-board or RTL) simulation makes FPGA-targeted design hard for non-experts, and low-level simulation can be very slow. Rapid Cycle-Accurate Simulator for High-Level Synthesis
[5] Existing commercial FPGA HLS software simulators can sometimes produce incorrect results, motivating an alternative cycle-accurate simulation flow. Rapid Cycle-Accurate Simulator for High-Level Synthesis
[6] FLASH extracts scheduling information from the HLS tool and automatically constructs an equivalent cycle-accurate simulation model while preserving C semantics. Rapid Cycle-Accurate Simulator for High-Level Synthesis
[7] FLASH runs three orders of magnitude faster than RTL simulation. Rapid Cycle-Accurate Simulator for High-Level Synthesis
[8] SMT-based tools such as SymbiYosys verify cycle-stepped properties by unrolling the circuit's transition relation into a long chain of T(s_i, s_{i+1}) constraints. rtlv: push-button verification of software on hardware
[9] SMT solvers exhibit poor performance when reasoning about long chains of unrolled transitions, especially when the circuit state is largely symbolic. rtlv: push-button verification of software on hardware
[10] rtlv transforms the transition relation into an imperative step function and symbolically executes the circuit cycle by cycle, enabling efficient cycle-accurate reasoning about software on hardware. rtlv: push-button verification of software on hardware
[11] rtlv leverages Rosette's hybrid symbolic execution with type-driven state merging and rewrite rules to simplify symbolic expressions at control-flow joins, avoiding path explosion. rtlv: push-button verification of software on hardware
[12] Verifying that boot code clears all microarchitectural state in a PicoRV32 CPU requires modeling 104 cycles of execution. rtlv: push-button verification of software on hardware
[13] SymbiYosys did not finish verifying the 104-cycle PicoRV32 boot-clearing property within 12 hours, while rtlv completes the same verification in about 1.3 seconds. rtlv: push-button verification of software on hardware