Multicore Parallelization

Technique

Multicore Parallelization is a testbench optimization technique that distributes compute-intensive verification work—especially sequence and instruction randomization—across multiple CPU threads. In the provided evidence, the technique is described primarily in eUVM and applied to RISCV-DV through parallelized forks, thread affinity, workload slicing, asynchronous worker threads, and refactoring of shared static state.

First seen 5/25/2026

Last seen 5/25/2026

Evidence 7 chunks

Wiki v1

WIKI

Overview

Multicore Parallelization in this context refers to distributing testbench execution work across multiple CPU threads, with particular emphasis on compute-intensive sequence and instruction randomization. The evidence describes the technique in eUVM, where forked processes can be assigned to specific processor threads, unlike SystemVerilog fork semantics where a task created by fork executes on the same CPU thread as its parent. [C1]

The technique is motivated by the limitation that multicore support in conventional SystemVerilog-oriented flows is largely focused on RTL and gate-level simulation, while behavioral testbenches share data by reference and require user-level constructs for synchronized shared-data access. [C2]

READ FULL ARTICLE →

NEIGHBORHOOD

No graph connections found for this entity yet. It may appear in future ingestion runs.

explore full graph →

RELATIONSHIPS

3 connections

eUVM ← uses 100% 2e

eUVM enables multicore-parallelized UVM testbench implementation.

Amdahl's Law depends on → 90% 1e

Multicore parallelization gains are limited by Amdahl's Law.

Crafting a Million Instructions/Sec RISCV-DV ← introduces 100% 1e

The paper introduces multicore parallelization techniques for RISCV-DV achieving over 100x speedup.

CITATIONS

9 sources

9 citations — click to expand

[1] C1: eUVM forked tasks can be distributed across CPU threads; a large transaction container can be sliced so each fork processes one slice; `Fork` objects can be collected, assigned thread affinity, and joined. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[2] C2: SystemVerilog-oriented multicore support is limited to RTL/gate-level simulation with design-level partitioning, while behavioral testbenches share data by reference and need user-level synchronized shared-data constructs that the current SV standard lacks. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[3] C3: eUVM multicore simulators use task executors on CPU threads with synchronization barriers; scheduler limits and Amdahl’s Law constrain speedup, but compute-intensive sequence randomization can benefit, including mapping UVM agents/VIPs to separate CPU threads. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[4] C4: Sequence-level parallelization uses eUVM worker threads that are free-running, asynchronous, scheduler-decoupled, unable to wait for simulator events, and intended to exchange data through TLM FIFOs. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[5] C5: eUVM asynchronous TLM FIFOs support worker-thread communication with regular UVM tasks; async-write FIFOs use software semaphores for full-FIFO blocking and regular UVM read events for empty-FIFO receiver blocking. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[6] C6: RISCV-DV generator output is a bare-metal RISC-V assembly program, or alternatively a binary dump that can be loaded into simulation or emulation memory. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[7] C7: RISCV-DV parallelization uses parallelized forks for large instruction sequences, includes refactoring static instruction-registry variables into `riscv_instr_registry`, uses a default instruction threshold of 4000, splits large jobs into `par_num_threads` slices defaulting to 8, and requires separate solver instances per execution thread. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[8] C8: The RISCV-DV code pattern computes per-slice start and end indices, forks randomization over each slice, assigns thread affinity, appends forks, and joins them; directed instruction streams are parallelized by assigning groups to separate threads to reduce stress on thread-specific constraint solvers. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings

[9] C9: Profiling RISCV-DV reduced optimization focus from more than fifteen thousand lines to about two hundred repeatedly executed lines; `uvm_trace` is macro-level and can add overhead through operating-system clock calls, while gprof is cited for micro-level profiling. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings