RACE Solver Wiki — STIMSMITH

Overview

RACE Solver is described in the evidence as the default solver in a VCS constraint-solving workflow used for constrained-random generation of AMD x86 microcode/opcode stimuli. The reported work compared solver behavior across opcode-generation architectures and contrasted RACE with a BDD solver in terms of runtime and memory characteristics. [C1]

Use in constrained-random opcode generation

The study evaluated different approaches to generating x86 opcodes. Serial randomization achieved acceptable speed and memory, but it produced a distribution problem because opcode portions were generated serially and there was no control over the overall distribution. A simple constrained-random approach addressed the distribution issue, but the complexity of the x86 instruction set pushed the approach into speed and memory limits, reducing simulation performance. [C2]

The improved architecture randomized instructions by first choosing the opcode category. This reduced the constraint problem because only constraints specific to the selected opcode category were present. The evidence reports that this simplified the solver problem and improved both memory and speed without sacrificing distribution or test-level control. [C3]

Runtime behavior

In the reported runtime comparison, the multiple-class architecture was faster than the single-class architecture when using either solver. For the default RACE solver, the multiple-class architecture produced a 4x speedup. For the BDD solver, the same architectural change produced a 2x speedup. [C4]

Memory behavior

The evidence states that memory requirements were significantly better with the multiple-class architecture. Memory measurements were reported only for the BDD solver because RACE memory consumption was typically smaller and not a limiting factor. [C5]

Profiling and performance analysis

The performance improvement was attributed primarily to reducing the number of variables and constraints in the newer implementation. The profile data showed that the multiple-class implementation had 7x fewer constraints than the original single-class implementation, enabling the solver to calculate solutions more efficiently. [C6]

The VCS 2009.12 release also provided a testcase extraction feature that could automatically extract the slowest partition from each randomize call. In the reported methodology, profile data was used to identify randomize results for two opcodes, and a small testbench randomized those opcodes repeatedly so solver CPU time could be measured in isolation from other testbench effects. [C7]

Contrast with BDD solver behavior

The BDD solver is described as elaborating the entire solution space of a randomize call before selecting a solution. This can require large amounts of memory and time, although the solution space is cached to speed up later randomization calls. The evidence notes that the BDD solver can work well for specific architectures, particularly when the randomization problem does not consume excessive memory and the same randomize call occurs many times, as in CPU opcode generation. [C8]