Skip to content
STIMSMITH

Generating an Efficient Instruction Set Simulator from a Complete Property Suite

Paper WIKI v1 · 5/26/2026

The paper describes a method for generating a C++ instruction set simulator (ISS) from a gap-free, complete property suite written in an architectural style. The approach treats the verified property suite as a formally checkable ISA description and translates it into simulator code, applying optimizations such as native C++ type/operator mapping, shared-expression caching, and decode-result caching. Experiments on a small pipelined processor and an industrial DLX-based processor show generated simulator performance of 7.0 MIPS and 1.2 MIPS, respectively.

Overview

Generating an Efficient Instruction Set Simulator from a Complete Property Suite presents an approach for generating a C++ instruction set simulator (ISS) from a gap-free, complete property suite formulated in an architectural style. The paper argues that, after successful formal verification, such a property suite forms an architectural model of the verified design; therefore, the generated ISS is equivalent to the design by construction. [C1]

The method is intended to use the formal property suite as a single source of specification for both verification and instruction-set simulation. The authors state that this supports correct early software development based on instruction set simulation and allows the simulator to be automatically adapted when the design or specification changes late in the design process. [C2]

Architectural-style property suite

The paper describes a reformulation of operation properties into an architectural style suitable for ISS generation. This style includes: explicit modeling of the architectural state and memory/port interfaces; a next_state macro for instruction and interrupt effects; interface macros for memories and ports; and an explicit architectural reset state. [C3]

The architectural state is modeled using a user-defined VHDL record data type that combines the state elements of the processor, such as the register file, status flags, and program counter. Similar user-defined data types describe interfaces to memories and ports. [C4]

A central modeling construct is the update keyword, which explicitly defines write access to an array or record data structure. In the example described by the paper, next_state updates the register file when the decoded opcode corresponds to an ADD instruction, and the decoded instruction fields are stored in an instruction_t record. [C5]

Relationship to formal verification

The paper describes an equivalence-proof structure in which an abstraction function vstate maps implementation state to ISA state. The proof objective is that applying next_state at the ISA level corresponds to applying the CPU transition relation and then mapping the resulting implementation state back to the architectural state. The proof also has to establish that interface signals behave as described by interface macros and that the implementation reset state complies with the architectural reset state. [C6]

However, the paper notes that a full equivalence proof between ISA and RTL, as well as identification of mapping functions between architectural and implementation state, is not required before generating the ISS. The ISA can be developed in ITL early in the design process and used to generate an ISS; full confidence that the ISS complies with the design is obtained at the end of verification. [C7]

ISS generation

The reformulated architectural-style property is described as a single property that captures all behavior of the verified design through the next_state function. The paper characterizes this artifact as a formally checkable ISA description and uses it as the starting point for automatic generation of a C++ ISS. [C8]

The generated simulator core is a C++ class named Sim, which contains the code for instruction execution and holds the architectural state. The translation from the ITL description to C++ includes the following steps: generating public functions for next_state, decode, and interface macros; generating private functions for remaining macros; generating a member variable for the architectural state; replacing ITL/HDL data types and operations with C++ types and operations; and replacing update expressions with direct C++ array or struct overwrites. [C9]

A user-provided wrapper is still required around the generated simulator class. This wrapper calls the generated public functions to trigger single-instruction execution and connects the simulator core to peripheral components such as external memories or buses. The paper also states that the generated core can be integrated with commercial simulation and debugging tools. [C10]

Code-generation optimizations

To improve simulation performance, the generated C++ code uses native data types such as unsigned integers when ITL/HDL bit sizes fit within the host system word width. Larger bit vectors are represented with dedicated data structures. Basic operations such as integer arithmetic and simple logic are mapped to native C++ operations, while more complex operations such as bit slicing, bit rotation, and operations on large bit vectors use an optimized support library. [C11]

The generator also analyzes data dependencies in macro functions to identify shared expressions. It can insert additional simulator-class member variables to store temporary and intermediate results, thereby caching computations during simulation. [C12]

A further optimization is decode-result caching: the decode macro decomposes an instruction word into bit fields, stores them in a record data type, and caches the result so repeated instruction decoding can be avoided. The paper relates this technique to just-in-time compiled simulation and notes that software locality, such as loops, makes it effective for reducing simulation runtime. [C13]

Experimental results

Small pipelined processor

The first evaluation generated an ISS for a small pipelined processor. The CPU has eight 16-bit registers, a special interrupt-return-vector register, a five-stage pipeline, and a simple data-memory interface. Its instruction set contains seven instructions covering logic, arithmetic, memory access, and jump instructions. [C14]

For this processor, the paper compares three simulator implementations. An interpretive commercial-tool simulator achieved 0.22 MIPS, a just-in-time compiled simulator achieved 14.0 MIPS, and the ISS generated from the property suite achieved 7.0 MIPS. The paper states that this clearly outperforms interpretive simulation and reaches about 50% of the performance of a state-of-the-art JIT-CS simulation tool. [C15]

Industrial processor design

The second experiment generated an ISS for an industrial processor design. The CPU has 64 32-bit registers in multiple hardware contexts, a seven-stage pipeline, connections to data memory and a bus interface, and support for 88 instructions based on the DLX instruction set architecture. The processor core is about 10,000 lines of VHDL, and the final reformulated property suite is 2,000 lines of ITL. [C16]

The paper states that the property suite and its completeness were successfully checked against the processor design using OneSpin Verification Solutions, so it could be considered a correct and complete specification. [C17]

For the industrial design, a just-in-time compiled simulator built with the commercial tool suite averaged 2.5 MIPS, while the ISS generated by the paper's approach reached 1.2 MIPS. The authors state that this confirms that the generated ISS has performance comparable to modern custom-made instruction set simulators. [C18]

Design Interpretive JIT-CS Generated
P1 0.22 MIPS 14.0 MIPS 7.0 MIPS
P2 - 2.5 MIPS 1.2 MIPS

The paper attributes the remaining performance gap to technical issues, noting that commercial JIT-CS tools include many optimizations, while the presented generator is a proof of concept. It also notes that properties generated from hardware reflect hardware and pipeline effects that may not appear in a high-level ISA description and may reduce simulation performance. [C19]

Conclusion

The paper concludes that a complete architectural-style property suite can be used to generate a C++ ISS whose performance is comparable to state-of-the-art commercial tools after code-generation optimizations. It presents the method as a practical way to obtain a proven-correct and efficient instruction set simulator when formal verification is already used in industrial processor design. [C20]

Funding acknowledgment

The research was supported by the German Federal Ministry of Education and Research (BMBF) in the HERKULES project under contract number 01M3082. [C21]

CITATIONS

21 sources
21 citations
[1] C1: The paper presents an approach for generating a C++ ISS from a complete architectural-style property suite, with equivalence to the verified design by construction after successful formal verification. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[2] C2: The formal property suite is used as a single specification source, enabling automatic simulator adaptation and correct early software development through ISS. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[3] C3: Architectural-style reformulation includes explicit architectural state and interfaces, a next_state macro, interface macros, and reset-state definition. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[4] C4: The architectural state is represented as a user-defined VHDL record containing elements such as register file, status flags, and program counter; interfaces are modeled with similar data types. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[5] C5: The update keyword models writes to arrays or records, and the example next_state function updates registers for an ADD instruction using decoded instruction fields. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[6] C6: The equivalence proof uses vstate to map implementation state to ISA state and checks correspondence between the CPU transition relation and next_state, including interface and reset behavior. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[7] C7: Full ISA/RTL equivalence proof and mapping-function identification are not required before ISS generation; an ITL ISA can be used to generate an ISS early, with full confidence obtained after verification. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[8] C8: The architectural-style reformulation yields a single property using next_state, described as a formally checkable ISA description and used as the starting point for C++ ISS generation. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[9] C9: The generated simulator core is a C++ Sim class and translation steps include public/private function generation, architectural-state member generation, ITL/HDL to C++ replacement, and update-expression replacement. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[10] C10: A wrapper must call generated functions for instruction execution and connect the simulator core to peripherals; integration with commercial simulation/debugging tools is possible. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[11] C11: Code-generation optimizations map fitting bit-vector types and basic operations to native C++ constructs and use optimized libraries for complex operations and large bit vectors. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[12] C12: The generator analyzes macro data dependencies to find shared expressions and cache temporary or intermediate computation results. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[13] C13: Decode-result caching avoids repeated decoding of the same instruction and is presented as effective due to software locality such as loops. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[14] C14: The first experiment used a small pipelined processor with eight 16-bit registers, an interrupt-return-vector register, a five-stage pipeline, simple data-memory interface, and seven instructions. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[15] C15: For the small processor, interpretive, JIT-CS, and generated simulators achieved 0.22, 14.0, and 7.0 MIPS respectively, with the generated ISS reaching about 50% of JIT-CS performance. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[16] C16: The second experiment used an industrial processor with 64 32-bit registers in multiple hardware contexts, a seven-stage pipeline, memory and bus interfaces, and 88 DLX-based instructions; the core had about 10,000 VHDL lines and the property suite 2,000 ITL lines. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[17] C17: The industrial property suite and its completeness were checked against the processor design using OneSpin Verification Solutions and considered a correct and complete specification. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[18] C18: For the industrial processor, the JIT-CS simulator averaged 2.5 MIPS and the generated ISS reached 1.2 MIPS, which the paper describes as comparable to modern custom-made ISS performance. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[19] C19: The paper attributes performance differences to optimization maturity of commercial tools and to hardware and pipeline effects captured by generated properties. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[20] C20: The conclusion states that optimized generated C++ simulators can be comparable to state-of-the-art commercial tools and that the method provides a practical way to obtain a proven-correct, efficient ISS. Generating an Efficient Instruction Set Simulator from a Complete Property Suite
[21] C21: The research was supported by BMBF in the HERKULES project under contract number 01M3082. Generating an Efficient Instruction Set Simulator from a Complete Property Suite