Overview
Hierarchical constrained-random test generation is a verification-stimulus technique in which a generator first chooses a high-level category and then randomizes a smaller, category-specific object. In the AMD x86 opcode-generation example, instructions were randomized by first choosing the opcode category so the constraint solver only had to consider constraints specific to that category. This simplified the problem and improved memory and speed without sacrificing distribution or test-level control. [C1]
Generator architecture
The reported implementation replaced a larger single-class opcode model with a multi-class hierarchy. To reduce the randomization problem size, the opcode class was split into multiple smaller classes, and opcodes were divided into categories that mapped well to knobs or weights in the test interface. [C2]
The hierarchy used:
- a base instruction class containing data members common to child classes, common constraints, and most methods used to set, print, and pack data; and
- opcode-category child classes containing constraints specific to each opcode set, including implication-operator structures based on opcode type. [C2]
This organization keeps common instruction behavior in the base class while limiting opcode-specific constraints to the subclass selected for the current generation step. [C2]
Test-layer control
The instruction generator was controlled by knobs or switches that allowed a test writer to generate constrained stimulus. In the described architecture, the test layer did not directly constrain subclass-local items. Instead, the upper-layer random sequence was controlled by knobs and selected the opcode category first; that selection determined which subclass object was allocated and added into the sequence. [C3]
If a test layer must directly control lower-level subclass items, the source describes a two-phase alternative: first randomize a wrapper class that constrains the variables controlled by the tests, then allocate and randomize the correct subclass object in the second generation phase. [C3]
Comparison with other approaches
The source contrasts this hierarchical approach with two alternatives:
- Serial randomization achieved desired speed and memory, but because each portion of the opcode was generated serially, it provided no control over the overall distribution. The result was skewed stimulus and a need for more seeds and simulations to close coverage. [C4]
- A simple constrained-random approach solved the distribution issue, but for the complex x86 instruction set it reached speed and memory limits and reduced simulation performance. [C4]
Choosing the opcode category before solving the opcode-specific constraints kept the distribution and test-control benefits of constrained randomization while reducing the size of the solver problem. [C1]
Profiling and optimization with VCS
The VCS constraint profiler was used to analyze runtime and memory behavior of the generators. It reported runtime data in three categories: cumulative randomize calls, individual randomize calls, and partitions. [C5]
The profiler example shows why cumulative impact can matter more than the slowest single call. A randomize call in op_gen.sv at line 4308 executed quickly, but ran 7,104 times and consumed 44 seconds of CPU time. Another randomize problem took 3.2 seconds individually, but occurred only twice, so optimizing it would have little overall impact. [C5]
VCS can partition a randomize call when unrelated random variables occur in the same call, allowing unrelated variables to be solved independently. The partition table reports the slowest partitions and often correlates with the individual and cumulative randomize-call tables. [C5]
Solver and performance results
The source reports both runtime and memory improvements for the multi-class architecture. In isolated measurements on two opcodes, the multiple-class architecture was faster with both solvers: the default RACE solver showed a 4x speedup, and the BDD solver showed a 2x speedup. [C6]
Memory also improved for the multiple-class architecture. The memory comparison focused on the BDD solver because RACE memory consumption was typically smaller and not the limiting factor. The BDD solver elaborates the entire solution space for a randomize call before selecting a solution, which can require substantial memory, although the solution space is cached to speed later calls. [C6]
The reported reason for the speed and memory improvement was the smaller set of variables and constraints in the new implementation. Profile data showed the new implementation had 7x fewer constraints than the original, allowing the solver to calculate solutions more efficiently. [C6]
Key characteristics
- Selects a high-level opcode category before solving category-specific constraints. [C1]
- Uses knobs or weights at the test interface to control the generated instruction mix. [C2][C3]
- Organizes constraints into a base instruction class plus opcode-category child classes. [C2]
- Preserves distribution and test-level control while reducing the solver problem size. [C1]
- Uses VCS profiling to identify high-impact randomize calls, partitions, runtime cost, and memory behavior. [C5][C6]