Overview
The BDD Solver is a constraint-solving mode discussed in the context of VCS constrained-random stimulus generation for AMD microcode. In this mode, the solver elaborates the entire solution space of a randomize call before selecting a solution. This can require large amounts of memory and some elaboration time, but the elaborated solution space is cached to speed up later randomization calls. [C1]
Operating model
For a given randomization problem, the BDD Solver performs an up-front elaboration of the full solution space. The evidence specifically notes two consequences of this model:
- It can take significant memory because the entire solution space is elaborated. [C1]
- It can take time before the first solution is selected, but the resulting solution space is cached for subsequent randomization calls. [C1]
This makes the solver better suited to cases where the same randomize call is executed many times, provided the randomization problem does not exceed practical memory limits. CPU opcode generation is given as an example of such a workload. [C2]
Performance characteristics
In the reported VCS study, two randomization architectures were compared: a single-class architecture and a multiple-class architecture. The multiple-class architecture was faster with both the default RACE solver and the BDD Solver. The reported speedup was approximately 4x for RACE and 2x for the BDD Solver. [C3]
Memory results were measured for the BDD Solver only, because RACE memory consumption was described as typically smaller and not a limiting factor. The multiple-class architecture also showed significantly better memory requirements for the BDD Solver. [C4]
Profiling and diagnosis
The source describes VCS constraint-profile data for cumulative randomize CPU runtime, individual randomize CPU runtime, individual partition CPU runtime, and memory usage. These profiles are presented as especially useful for BDD Solver analysis because of the solver's full-solution-space elaboration behavior. [C5]
The VCS 2009.12 release is also reported to include a testcase extraction feature that can automatically extract the slowest partition from each randomize call. [C6]
Design implications
The study attributes the runtime and memory improvements of the multiple-class architecture to a smaller set of variables and constraints. The newer implementation reportedly had 7x fewer constraints than the original, allowing solutions to be calculated more efficiently. [C7]
For x86 opcode generation, the article concludes that serial randomization achieved speed and memory goals but caused distribution problems, while a simple constrained-random approach improved distribution but reached speed and memory limits. The improved method first chooses an opcode category, so the solver only sees constraints relevant to that category. This simplified the problem and improved memory and speed without sacrificing distribution or test-level control. [C8]