CPU verification Wiki

Overview

CPU verification is the functional verification of CPU cores, motivated by the fact that performance-enhancing microarchitectural features can threaten functional correctness. The evidence identifies examples such as Pipeline, Multiple Instruction Issue, Out-of-Order Execution, Branch Prediction, and memory-access acceleration through Cache. These features create corner cases that require specialist verification techniques. [C1]

Constrained-random instruction-stream execution

One highlighted technique is Constrained Random Instruction Stream Generation, described in the evidence as constrained-random instruction-stream execution. In this approach, small assembler programs are generated and executed on the CPU core. The goal is to generate programs that exercise corner cases created by complex CPU performance enhancements. [C2]

Test and Verification Solutions Ltd is cited as extending its CPU verification capability with CPU verification engineers and an instruction-stream generation tool, asureISG. The source states that asureISG initially supported single-core CPU verification and was planned to be enhanced with multi-core support. [C3]

Multi-core and shared-resource concerns

The same TVS source connects multi-core verification to shared resources. It states that multiple cores are used to add performance to SoCs and products, and that those cores often share resources such as caches. This creates a need for multiple instruction streams generated by a tool that understands potential bugs in the management of shared resources. [C4]

Stimulus-generation tradeoffs

The AMD microcode-stimulus evidence describes practical tradeoffs in generating x86 opcodes:

Serial randomization achieved desired speed and memory use, but caused a significant distribution problem because each portion of the opcode was generated serially and the overall distribution could not be controlled. The result was skewed stimuli and the need for more seeds and simulations to close coverage. [C5]
A simple constrained-random approach solved the distribution issue, but reached speed and memory limits because of the complexity of the x86 instruction set, reducing simulation performance. [C6]
Randomizing instructions by choosing the opcode category first simplified the solver problem because only constraints specific to that category were active. The source reports improved speed and memory use without sacrificing distribution quality or test-level control. [C7]

Constraint-solver behavior

The evidence also describes behavior of the VCS constraint solvers used in the AMD microcode-stimulus work. In BDD solver mode, the solver elaborates the entire solution space of a randomize call before selecting a solution. This can consume substantial memory and time, although the elaborated solution space is cached to speed later randomization calls. The source notes that BDD solving can work well for CPU opcode generation when the randomize problem does not require excessive memory and the same randomize call occurs many times. [C8]

Single-class versus multiple-class generation

The AMD evidence compares single-class and multiple-class randomization architectures. In the reported runtime comparison, the multiple-class architecture was faster with either solver tested: the default RACE solver showed a 4x speedup, and the BDD solver showed a 2x speedup. Memory requirements were also significantly better for the multiple-class architecture in the BDD measurements. [C9]

The stated reason for the acceleration and memory reduction was that the new implementation presented a smaller set of variables and constraints to the solver. The source reports that the new implementation had 7x fewer constraints than the original, allowing the solver to calculate solutions more efficiently. [C10]

LLM-aided FPGA-accelerated CPU verification (ISAAC)

The ISAAC paper positions CPU verification as a critical, time-intensive, and labor-consuming bottleneck in integrated-circuit development, and reports that industrial practice relies on differential testing. The paper identifies bottlenecks at nearly every stage of the framework pipeline:

Front-end stimulus generation lacks micro-architectural awareness, yielding low-quality and redundant tests that impede coverage closure and miss corner cases. [C11]
Back-end simulation infrastructure, even with FPGA acceleration, often stalls on long-running tests and offers limited visibility, delaying feedback and prolonging the debugging cycle. [C11]

ISAAC is presented as a full-stack, Large Language Model (LLM)-aided CPU verification framework with FPGA parallelism, spanning bug categorisation, stimulus generation, and simulation infrastructure. Its front-end multi-agent stimulus engine is infused with micro-architectural knowledge and historical bug patterns to generate targeted tests that achieve coverage goals and capture elusive corner cases. [C11]

In ISAAC's back-end, a lightweight forward-snapshot mechanism and a decoupled co-simulation architecture between the Instruction Set Simulator (ISS) and the Design Under Test (DUT) enable a single ISS to drive multiple DUTs in parallel. By eliminating long-tail test bottlenecks and exploiting FPGA parallelism, simulation throughput is significantly improved. [C11]

Applied to a mature CPU that has undergone multiple successful tape-outs, ISAAC achieved up to 17,536x speed-up over software RTL simulation while detecting several previously unknown bugs, two of which are reported in the paper. [C11]

Related tooling and methodologies

The ISA specification notes page from Alastair Reid's Related Work lists CPU verification among topics that relate to instruction-set architecture specification. UVM is referenced as a tool commonly used in verification methodologies that apply to CPU verification environments. [C12]

Scope note

Based on the supplied evidence, this article covers CPU verification as it relates to CPU-core functional correctness, constrained-random instruction-stream generation, multi-core shared-resource stimulus, constraint-solver performance in opcode generation, and LLM-aided FPGA-accelerated frameworks such as ISAAC.