Skip to content
STIMSMITH

mixin_dir_instr_list function

CodeArtifact WIKI v1 · 5/26/2026

mixin_dir_instr_list is the lazy directed-stream merging routine shown in Listing 6 of the RISCV-DV refactoring described in the DVCon paper “Crafting a Million Instructions/Sec RISCV-DV.” It replaces a greedy queue-insertion approach with a two-phase scheme that records insertion locations and then builds the merged instruction list in one pass, reducing the directed-stream insertion process from O(n²) to O(n).

Overview

mixin_dir_instr_list(riscv_instr_stream[] dir_list) is presented as Listing 6, titled “Lazy merging of directed streams to make RISCV-DV scalable.” The function merges directed instruction streams into an existing randomized instruction list without immediately inserting each stream into the queue at its chosen location. Instead, it records where streams should be injected and later constructs a merged instruction dump in one pass. [C1]

Motivation

The original RISCV-DV merge process is described as greedy: it randomly chooses an injection point for each directed stream, retries if that point falls inside an already inserted directed sequence, and eventually adjusts the point to preserve directed sequence boundaries. This approach requires inserting slices into a SystemVerilog queue, which is costly because elements after the insertion point must be moved. As instruction counts and directed-stream counts grow, the average moved block size also grows, producing O(n²) complexity and poor scaling for instruction counts larger than one hundred thousand. [C2]

Algorithm

The refactored function uses a lazy merge strategy:

  1. It initializes bookkeeping based on the current instr_list length and stores the provided directed stream list in this.dir_instr_list. [C3]
  2. For each directed stream, it adds the stream length to mixed_count, chooses a random target index rnd_idx, and records the directed stream index in dir_n at that location. [C3]
  3. If another directed stream is already recorded at the same location, the function chains the streams using each stream object’s next_stream field, creating a linked-list-like structure for streams assigned to the same insertion point. [C4]
  4. After all insertion locations are recorded, the function allocates mixed_list to mixed_count and iterates over the original instr_list. At each original instruction index, it first emits any directed streams chained at that location, then emits the original instruction. [C5]
  5. Finally, it replaces the original instruction list by calling instr_list.assimilate(mixed_list). [C5]

Complexity and scaling behavior

The paper states that this algorithmic refactoring reduces the insertion process to O(n), making it scale linearly. The key change is that directed streams are not inserted into the queue one at a time; instead, their locations are tabulated first, and the merged instruction dump is created in one pass. After re-profiling, the merge process was reported to scale well and no longer be a bottleneck for large test-program generation. [C6]

Role in RISCV-DV generation

The function is part of a RISCV-DV scalability improvement aimed at generating large test programs. The surrounding discussion notes that larger test programs can produce increasingly complex program flows that are more likely to uncover processor bugs. [C7]

CITATIONS

7 sources
7 citations
[1] C1: mixin_dir_instr_list is shown as Listing 6, titled “Lazy merging of directed streams to make RISCV-DV scalable,” and merges directed streams lazily into an instruction list. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[2] C2: The original RISCV-DV merge process used greedy insertion into a SystemVerilog queue, leading to O(n²) complexity and poor scaling for very large instruction counts. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[3] C3: The function initializes mixed-list bookkeeping, stores dir_list in this.dir_instr_list, iterates over directed streams, accumulates mixed_count, selects rnd_idx, and records stream indices in dir_n. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[4] C4: When multiple directed streams target the same location, the function chains them using the next_stream field embedded in the stream object. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[5] C5: The merged list is built by iterating over instr_list, emitting any directed-stream instructions chained at each index, then the original instruction, and finally assimilating mixed_list into instr_list. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[6] C6: The refactoring reduces the merge process to O(n), avoids queue insertions, creates the merged instruction dump in one pass, and was reported to remove the merge bottleneck for large test programs. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[7] C7: Larger test programs are described as helping generate increasingly complex program flows that are more likely to uncover processor bugs. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings