Skip to content
STIMSMITH

generate_directed_instr_stream function

CodeArtifact WIKI v1 · 5/26/2026

The `generate_directed_instr_stream` function is shown in a DVCon RISCV-DV optimization paper as a parallelized implementation for randomizing directed instruction streams. It computes per-stream insertion counts from configured ratios, forks work into `generate_directed_instr_stream_idx`, assigns thread affinity, joins all forks, validates the final stream length, and shuffles the resulting instruction-stream array.

Overview

generate_directed_instr_stream is presented in Listing 10 of the DVCon paper Crafting a Million Instructions/Sec RISCV-DV under the heading “Parallelizing Randomization of the Directed Instruction Streams.” The listing shows a function that generates directed RISC-V instruction streams, using forked work items to parallelize stream generation across entries in directed_instr_stream_ratio.[C1]

Signature and inputs

The listed signature is:

void generate_directed_instr_stream(
  in int hart,
  in string label,
  in uint original_instr_cnt,
  in uint min_insert_cnt,
  in bool kernel_mode,
  out riscv_instr_stream[] instr_stream
)

The function takes a hart identifier, a label, an original instruction count, a minimum insertion count, a kernel-mode flag, and an output array of riscv_instr_stream objects.[C1]

Control flow

The function first checks cfg.no_directed_instr; if that flag is set, the function returns without generating directed instruction streams.[C1]

Otherwise, it initializes:

  • Fork[] forks
  • uint instr_stream_length = 0
  • uint stream_idx = 0

It then iterates over (stream_name, ratio) pairs in directed_instr_stream_ratio.[C1]

Insert-count calculation

For each directed stream entry, the function computes:

insert_cnt = original_instr_cnt * ratio / 1000

If this computed count is less than or equal to min_insert_cnt, it is replaced with min_insert_cnt. The insertion count is then added to instr_stream_length.[C1]

The function also emits a UVM info message of the form:

Insert directed instr stream %0s %0d/%0d times

using the stream name, computed insertion count, and original instruction count.[C1]

Parallelization strategy

For each stream entry, the function captures stream_name, ratio, stream_idx, and insert_cnt, then creates a Fork object. The fork invokes:

generate_directed_instr_stream_idx(
  hart,
  label,
  original_instr_cnt,
  kernel_mode,
  name,
  ratio,
  instr_stream,
  idx,
  cnt
)

The fork receives a thread affinity using new_fork.set_thread_affinity(forks.length), is appended to the forks array, and stream_idx is advanced by insert_cnt.[C1]

The helper generate_directed_instr_stream_idx is shown only as a declaration in the evidence; the listing states that it is “like the non-parallelized instr stream generator” and is omitted for brevity.[C1]

Finalization

After all forks are created, the function sets:

instr_stream.length = instr_stream_length

It then joins every fork with:

foreach (f; forks) f.join();

After the joins, it asserts that stream_idx == instr_stream.length and then shuffles instr_stream.[C1]

Performance context

The same paper identifies generation and randomization of directed instruction streams as the largest profiled bottleneck in RISCV-DV, noting that directed streams are created and randomized according to a command-line-specified ratio for sub-programs and the main function, and that much of the time is spent in constraint solvers.[C2]

The paper further states that the first two identified bottlenecks involve constraint solving and are linear in algorithmic complexity, and that linear-complexity algorithms scale well with multicore parallelization. The listed generate_directed_instr_stream implementation is presented in that context as a parallelized version of directed instruction stream randomization.[C2][C1]

CITATIONS

2 sources
2 citations
[1] Listing 10 presents `generate_directed_instr_stream` as a parallelized implementation for randomizing directed instruction streams, including its signature, control flow, fork creation, thread affinity assignment, join behavior, assertion, shuffle, and call to `generate_directed_instr_stream_idx`. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[2] The RISCV-DV paper identifies directed instruction stream creation and randomization as the highest-impact bottleneck, attributes much of that time to constraint solving, and notes that linear-complexity constraint-solving bottlenecks scale well with multicore parallelization. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings