Multi-Class Randomization Architecture Wiki

Overview

Multi-Class Randomization Architecture is a hierarchical constrained-random technique for generating microcode or opcode stimuli. It was described as an alternative to both serial field randomization and a monolithic single-class constrained-random opcode model. The technique first chooses an opcode category and then randomizes a category-specific class, so the constraint solver only sees the constraints relevant to that category. [Multi-class decomposition] [Performance conclusion]

Problem Addressed

Traditional serial randomization of instruction fields can achieve acceptable speed and memory use, but it provides poor control over the overall instruction distribution because each portion of the opcode is generated independently. In the reported x86 opcode-generation context, this caused skewed results and required additional seeds and simulations to close coverage. [Serial randomization limitation]

A single-class constrained-random model improves distribution control because all opcode constraints can be expressed together, but it can become too large for efficient solving. The cited single-class opcode model contained about 100 random variables and 800 constraint equations, making randomization slower because the solver had to process many variables and a large constraint set. [Single-class baseline]

Architecture

The generator architecture has two layers. The upper layer is implemented as a SystemVerilog random sequence with weighted knobs controlling the distribution of high-level instruction items. The lower layer is an opcode class randomized with additional constraints and weights from the upper layer. Tests provide weighted values that direct the generator toward the required instruction mix, and the constraint solver applies those weights to control opcode-type distribution. [Two-layer generator architecture]

In the multi-class design, the original opcode class is split into multiple smaller classes. Opcodes are divided into categories that map well to the knobs or weights in the test interface. A base instruction class contains data members common to all child classes and most methods for setting, printing, and packing data. Data members and constraints common to every opcode are placed in the base class. Each opcode-category child class contains constraints specific to that group of opcodes, including implication constraints based on opcode type. [Multi-class decomposition]

Test-Layer Interaction

The described implementation avoided test-layer constraints that directly controlled subclass-internal items. Instead, the upper random sequence was controlled only by knobs and selected the opcode category first, allowing the generator to allocate the correct subclass object before adding it to the sequence. [Category-first allocation]

If tests must directly control lower-level subclass items, the evidence suggests a wrapper-class approach. In that approach, decisions about which subclass to randomize are made first; a wrapper class constrains variables controlled by tests, is randomized first, and is followed by allocation and randomization of the correct subclass object. [Wrapper-class consideration]

Performance Characteristics

The VCS constraint profiler was used to analyze generator runtime and memory. It reported runtime data in cumulative randomize calls, per-randomize-call data, and per-partition data. VCS can partition a randomize call when unrelated random variables occur in the same call, allowing independent solving of unrelated variables. [Constraint profiling]

In the reported comparison, a small testbench repeatedly randomized two opcodes identified from profile data so that the single-class and multi-class architectures could be compared without side effects from other testbench components. The multiple-class architecture was faster with both evaluated solvers: the default RACE solver showed a 4x speedup, and the BDD solver showed a 2x speedup. [Runtime results]

Memory requirements were also significantly better for the multiple-class architecture in the BDD-solver measurements. The article notes that BDD solving can require substantial memory because it elaborates the entire solution space of a randomize call before selecting a solution, although the solution space is cached for later calls. [Memory results]

The stated reason for the runtime and memory improvement was the smaller set of variables and constraints in the multi-class implementation. Profile data showed the new implementation had 7x fewer constraints than the original, allowing the solver to calculate solutions more efficiently. [Performance analysis]

When to Use

This architecture is appropriate when a constrained-random generator must preserve distribution control and test-level weighting while avoiding the runtime and memory cost of a large monolithic constraint problem. The reported use case was CPU opcode generation, where the same kind of randomize call can occur many times and category-specific constraints can significantly reduce solver workload. [BDD solver context] [Performance conclusion]