Skip to content
STIMSMITH

Hierarchical Constrained-Random Test Generation

Technique WIKI v2 · 5/28/2026

Hierarchical constrained-random test generation structures a constrained-random stimulus generator so that a high-level choice, such as an opcode category, is made before solving a smaller category-specific constraint problem. In the AMD x86 microcode-stimulus example, splitting a single opcode class into a base class plus category child classes improved runtime and memory while preserving distribution control and test-level knob control.

Overview

Hierarchical constrained-random test generation is a verification-stimulus technique in which a generator first chooses a high-level category and then randomizes a smaller, category-specific object. In the AMD x86 opcode-generation example, instructions were randomized by first choosing the opcode category so the constraint solver only had to consider constraints specific to that category. This simplified the problem and improved memory and speed without sacrificing distribution or test-level control. [C1]

Generator architecture

The reported implementation replaced a larger single-class opcode model with a multi-class hierarchy. To reduce the randomization problem size, the opcode class was split into multiple smaller classes, and opcodes were divided into categories that mapped well to knobs or weights in the test interface. [C2]

The hierarchy used:

  • a base instruction class containing data members common to child classes, common constraints, and most methods used to set, print, and pack data; and
  • opcode-category child classes containing constraints specific to each opcode set, including implication-operator structures based on opcode type. [C2]

This organization keeps common instruction behavior in the base class while limiting opcode-specific constraints to the subclass selected for the current generation step. [C2]

Test-layer control

The instruction generator was controlled by knobs or switches that allowed a test writer to generate constrained stimulus. In the described architecture, the test layer did not directly constrain subclass-local items. Instead, the upper-layer random sequence was controlled by knobs and selected the opcode category first; that selection determined which subclass object was allocated and added into the sequence. [C3]

If a test layer must directly control lower-level subclass items, the source describes a two-phase alternative: first randomize a wrapper class that constrains the variables controlled by the tests, then allocate and randomize the correct subclass object in the second generation phase. [C3]

Comparison with other approaches

The source contrasts this hierarchical approach with two alternatives:

  • Serial randomization achieved desired speed and memory, but because each portion of the opcode was generated serially, it provided no control over the overall distribution. The result was skewed stimulus and a need for more seeds and simulations to close coverage. [C4]
  • A simple constrained-random approach solved the distribution issue, but for the complex x86 instruction set it reached speed and memory limits and reduced simulation performance. [C4]

Choosing the opcode category before solving the opcode-specific constraints kept the distribution and test-control benefits of constrained randomization while reducing the size of the solver problem. [C1]

Profiling and optimization with VCS

The VCS constraint profiler was used to analyze runtime and memory behavior of the generators. It reported runtime data in three categories: cumulative randomize calls, individual randomize calls, and partitions. [C5]

The profiler example shows why cumulative impact can matter more than the slowest single call. A randomize call in op_gen.sv at line 4308 executed quickly, but ran 7,104 times and consumed 44 seconds of CPU time. Another randomize problem took 3.2 seconds individually, but occurred only twice, so optimizing it would have little overall impact. [C5]

VCS can partition a randomize call when unrelated random variables occur in the same call, allowing unrelated variables to be solved independently. The partition table reports the slowest partitions and often correlates with the individual and cumulative randomize-call tables. [C5]

Solver and performance results

The source reports both runtime and memory improvements for the multi-class architecture. In isolated measurements on two opcodes, the multiple-class architecture was faster with both solvers: the default RACE solver showed a 4x speedup, and the BDD solver showed a 2x speedup. [C6]

Memory also improved for the multiple-class architecture. The memory comparison focused on the BDD solver because RACE memory consumption was typically smaller and not the limiting factor. The BDD solver elaborates the entire solution space for a randomize call before selecting a solution, which can require substantial memory, although the solution space is cached to speed later calls. [C6]

The reported reason for the speed and memory improvement was the smaller set of variables and constraints in the new implementation. Profile data showed the new implementation had 7x fewer constraints than the original, allowing the solver to calculate solutions more efficiently. [C6]

Key characteristics

  • Selects a high-level opcode category before solving category-specific constraints. [C1]
  • Uses knobs or weights at the test interface to control the generated instruction mix. [C2][C3]
  • Organizes constraints into a base instruction class plus opcode-category child classes. [C2]
  • Preserves distribution and test-level control while reducing the solver problem size. [C1]
  • Uses VCS profiling to identify high-impact randomize calls, partitions, runtime cost, and memory behavior. [C5][C6]

CITATIONS

6 sources
6 citations
[1] C1: Randomizing instructions by first choosing the opcode category simplified the constraint problem and improved memory and speed without sacrificing distribution or test-level control. Generating AMD microcode stimuli using VCS constraint solver
[2] C2: The multi-class architecture split the opcode class into smaller classes, divided opcodes into categories mapped to knobs or weights, used a base class for common members/methods/constraints, and used child classes for opcode-specific constraints. Generating AMD microcode stimuli using VCS constraint solver
[3] C3: The generator was controlled by knobs or switches; the upper-layer random sequence chose the opcode category first, and a wrapper-class two-phase scheme was suggested when tests directly control lower-level items. Generating AMD microcode stimuli using VCS constraint solver
[4] C4: Serial randomization met speed and memory goals but lacked overall distribution control, causing skewed results and more seeds/simulations; simple constrained randomization solved distribution but hit speed and memory limits on the x86 instruction set. Generating AMD microcode stimuli using VCS constraint solver
[5] C5: The VCS constraint profiler reported cumulative, individual, and partition runtime data; `op_gen.sv` line 4308 ran 7,104 times and consumed 44 seconds, while another 3.2-second call occurred only twice; VCS can partition unrelated random variables for independent solving. Generating AMD microcode stimuli using VCS constraint solver
[6] C6: The multiple-class architecture improved runtime and memory; RACE showed a 4x speedup, BDD showed a 2x speedup, BDD memory was a focus because it elaborates the full solution space, and the new implementation had 7x fewer constraints. Generating AMD microcode stimuli using VCS constraint solver

VERSION HISTORY

v2 · 5/28/2026 · gpt-5.5 (current)
v1 · 5/25/2026 · gpt-5.5