Overview
The Synopsys VCS Constraint Solver was used in an AMD/Synopsys case study to generate microcode test sequences for x86 instruction verification. The approach used SystemVerilog constraint-language constructs to describe legal combinations of opcode attributes and to control value distributions for individual fields. The motivation was to replace sequential field randomization, which the authors reported produced verbose code, redundant code, and limited distribution control. [C1]
Generator architecture
The described opcode generator used two layers. The upper layer used a SystemVerilog random sequence construct with weighted knobs to control high-level distribution. The lower layer randomized an opcode class with additional constraints and weights supplied by the upper layer. Test inputs consisted of weighted values that directed the generator toward the required instruction mix, and the constraint solver applied those weights to control opcode-type distribution. [C2]
Single-class and multi-class constraint models
The initial generator prototype used a single class containing constraints for all opcodes. This coding style was flexible because constraints could be applied between any data members in the opcode class, but it also presented the solver with many random variables and a large constraint set. In the reported implementation, the opcode class had approximately 100 random variables and 800 constraint equations. [C3]
To reduce randomization problem size, the authors then used an object-oriented hierarchy with a base class for global opcode constraints and subclasses for groups of related opcodes with similar constraints. Partitioning constraints into smaller opcode groups reduced memory requirements and improved performance. [C4]
Solver behavior and profiling
The evidence distinguishes the default RACE solver from the BDD solver. For the BDD solver, the solver elaborates the entire solution space of a randomize call before choosing a solution. This can require significant memory and elaboration time, but the computed solution space is cached to accelerate subsequent randomization calls. The authors reported that BDD works well for architectures where the randomization problem does not consume excessive memory and the same randomize call is repeated many times, such as CPU opcode generation. [C5]
VCS profiling data was used to inspect randomize CPU runtime and memory usage. The VCS 2009.12 release also provided a testcase extraction feature that could automatically extract the slowest partition from each randomize call. [C6]
Reported performance characteristics
In the reported comparison, the multi-class architecture ran faster than the single-class architecture with both solvers and for both tested opcodes. The default RACE solver showed a 4x speedup, while the BDD solver showed a 2x speedup. Memory requirements were also reported to be significantly better for the multi-class architecture; memory results were measured for BDD because RACE memory consumption was typically smaller and not the limiting factor. [C7]
The authors attributed the runtime and memory improvements mainly to reducing the number of variables and constraints in the randomized problem. Their profile data showed that the new implementation had 7x fewer constraints than the original, allowing the solver to compute solutions more efficiently. [C8]
Practical implication
The case study concludes that serial randomization met speed and memory goals but caused distribution problems, while a simple constrained-random approach improved distribution but became too slow and memory-intensive for the complex x86 instruction set. Randomizing instructions by first choosing the opcode category simplified the solver problem because only category-specific constraints were present, improving memory and speed while preserving distribution and test-level control. [C9]