Skip to content
STIMSMITH

RACE Solver

Tool WIKI v1 · 5/26/2026

RACE Solver is identified in the provided evidence as the default constraint solver used in a VCS-based constrained-random stimulus generation comparison for AMD x86 opcode generation. In that study, a multiple-class opcode-generation architecture improved runtime with RACE by 4x, while RACE memory use was described as typically smaller and not a limiting factor compared with BDD-solver memory behavior.

Overview

RACE Solver is described in the evidence as the default solver in a VCS constraint-solving workflow used for constrained-random generation of AMD x86 microcode/opcode stimuli. The reported work compared solver behavior across opcode-generation architectures and contrasted RACE with a BDD solver in terms of runtime and memory characteristics. [C1]

Use in constrained-random opcode generation

The study evaluated different approaches to generating x86 opcodes. Serial randomization achieved acceptable speed and memory, but it produced a distribution problem because opcode portions were generated serially and there was no control over the overall distribution. A simple constrained-random approach addressed the distribution issue, but the complexity of the x86 instruction set pushed the approach into speed and memory limits, reducing simulation performance. [C2]

The improved architecture randomized instructions by first choosing the opcode category. This reduced the constraint problem because only constraints specific to the selected opcode category were present. The evidence reports that this simplified the solver problem and improved both memory and speed without sacrificing distribution or test-level control. [C3]

Runtime behavior

In the reported runtime comparison, the multiple-class architecture was faster than the single-class architecture when using either solver. For the default RACE solver, the multiple-class architecture produced a 4x speedup. For the BDD solver, the same architectural change produced a 2x speedup. [C4]

Memory behavior

The evidence states that memory requirements were significantly better with the multiple-class architecture. Memory measurements were reported only for the BDD solver because RACE memory consumption was typically smaller and not a limiting factor. [C5]

Profiling and performance analysis

The performance improvement was attributed primarily to reducing the number of variables and constraints in the newer implementation. The profile data showed that the multiple-class implementation had 7x fewer constraints than the original single-class implementation, enabling the solver to calculate solutions more efficiently. [C6]

The VCS 2009.12 release also provided a testcase extraction feature that could automatically extract the slowest partition from each randomize call. In the reported methodology, profile data was used to identify randomize results for two opcodes, and a small testbench randomized those opcodes repeatedly so solver CPU time could be measured in isolation from other testbench effects. [C7]

Contrast with BDD solver behavior

The BDD solver is described as elaborating the entire solution space of a randomize call before selecting a solution. This can require large amounts of memory and time, although the solution space is cached to speed up later randomization calls. The evidence notes that the BDD solver can work well for specific architectures, particularly when the randomization problem does not consume excessive memory and the same randomize call occurs many times, as in CPU opcode generation. [C8]

CITATIONS

8 sources
8 citations
[1] RACE is identified as the default solver in the VCS comparison, and the study compares it with a BDD solver. Generating AMD microcode stimuli using VCS constraint solver
[2] Serial randomization had acceptable speed and memory but caused skewed distribution, while simple constrained randomization addressed distribution but encountered speed and memory limits on the complex x86 instruction set. Generating AMD microcode stimuli using VCS constraint solver
[3] Choosing the opcode category before randomizing instructions reduced the solver problem to category-specific constraints and improved speed and memory without sacrificing distribution or test-level control. Generating AMD microcode stimuli using VCS constraint solver
[4] The multiple-class architecture was faster with both solvers; the default RACE solver showed a 4x speedup, and the BDD solver showed a 2x speedup. Generating AMD microcode stimuli using VCS constraint solver
[5] Memory requirements improved with the multiple-class architecture, and RACE memory consumption was typically smaller and not a limiting factor, so memory was measured only for the BDD solver. Generating AMD microcode stimuli using VCS constraint solver
[6] The newer multiple-class implementation had 7x fewer constraints than the original, which was identified as the main reason for acceleration and reduced memory use. Generating AMD microcode stimuli using VCS constraint solver
[7] The evaluation used profile data and a small repeated-randomization testbench to isolate solver CPU time, and VCS 2009.12 provided automatic extraction of the slowest partition from each randomize call. Generating AMD microcode stimuli using VCS constraint solver
[8] The BDD solver elaborates the entire solution space before selecting a solution, which can consume significant memory and time, though caching can speed subsequent calls; it works well when memory is manageable and calls repeat often. Generating AMD microcode stimuli using VCS constraint solver