Skip to content
STIMSMITH

Sequence Level Parallelism

Technique WIKI v1 · 5/26/2026

Sequence Level Parallelism is a testbench optimization technique described for eUVM that exploits multicore concurrency in UVM-style verification environments, especially simple module-level testbenches with limited UVM components. It uses free-running worker threads, asynchronous TLM FIFOs, and multicore-distributed forked tasks to parallelize transaction generation and sequence processing.

Overview

Sequence Level Parallelism is described as a technique for taking advantage of multicore concurrency in simple module-level testbenches that may contain only a limited number of UVM components. The technique appears in the context of eUVM optimization for RISCV-DV-style testbench workloads, where parallelization is applied below or within the sequence-generation layer rather than only by mapping large VIP components to different executors. [C1]

Motivation

The cited source contrasts sequence-level parallelization with VIP-level multicore mapping. VIP-level parallelism may be suboptimal because scheduler activity can deactivate task executors, and high speedup requires balanced task load across executors; different VIPs may not provide such balance. Sequence-level parallelization is introduced as a way to exploit multicore concurrency even in smaller module-level testbenches. [C1]

Worker-thread model

In the eUVM architecture described by the source, the multicore simulator includes multiple task executors and also free-running asynchronous threads called worker threads. These worker threads are hierarchically owned by the simulator but are decoupled from the scheduler, so they continue running even when the scheduler activates. Because worker threads are decoupled from the scheduler, they cannot wait for simulator events, although they can trigger events. [C2]

Data exchange and synchronization

Although worker threads share memory with simulator tasks, the source states that UVM requires data exchange to occur through TLM FIFOs in order to maintain synchronization across threads. Standard UVM TLM FIFOs use events to block reads when the FIFO is empty and writes when the FIFO is full. Since asynchronous worker threads cannot wait for simulator events, eUVM provides asynchronous TLM FIFO variants that replace event-based waiting with software semaphore mechanisms where needed. [C3]

The source describes three asynchronous FIFO use cases: asynchronous write, asynchronous read, and fully asynchronous read/write. An asynchronous write FIFO is used when a worker thread generates UVM transactions that must be transferred to a regular UVM task; when full, the worker thread blocks on a software semaphore, and when the receiving task frees a slot, the semaphore is released. A fully asynchronous FIFO uses semaphores at both read and write ports and is intended for data exchange between two asynchronous worker threads. [C4]

Sequencer parallelization pattern

Worker thread constructs in eUVM make it possible to parallelize a UVM sequencer by creating multiple free-running worker threads. In the described scheme, the worker threads feed a virtual sequencer in a round-robin fashion. This is useful when transaction generation and randomization are complex or time-consuming enough to choke an RTL simulation. [C5]

Fine-grained parallelism with multicore fork

The source also describes a finer-grained form of sequence-level parallelism using a multicore-parallelized fork. This approach is intended for cases where per-transaction generation and randomization are not expensive enough to justify the overhead of asynchronous TLM FIFO reads and writes. In standard SystemVerilog-style fork semantics, a fork-created task executes on the same CPU thread as the parent thread. In eUVM, forked tasks can instead be distributed across CPU threads associated with multiple task executors. [C6]

This form is useful when a sequence contains thousands of transactions stored in a container such as a queue or array. To accelerate generation, eUVM can slice the sequence-carrying container and split those slices across forked tasks, with each spawned fork processing one slice of the original container. [C7]

Execution semantics

The cited listing for the eUVM parallelized fork shows that a call to the fork construct returns a Fork object, which can be stored for later processing. The fork body is wrapped in a lambda function to capture the scoped variable. The set_thread_affinity method is then used to bind individual forks to parallel task executors. Finally, the forks are joined; as in SystemVerilog, the forks start executing only after a call to join or another blocking construct. [C8]

Applicability

Sequence Level Parallelism is applicable when multicore execution can reduce bottlenecks in sequence or transaction generation. The worker-thread and asynchronous FIFO pattern is useful when transaction generation and randomization are complex and time-consuming. The multicore fork pattern is positioned for scenarios with many simpler transactions where FIFO communication overhead would otherwise offset the benefit of parallelism. [C5][C6][C7]

CITATIONS

8 sources
8 citations
[1] C1: Sequence Level Parallelization is introduced as a technique for taking advantage of multicore concurrency in simple module-level testbenches, and VIP-level mapping may be suboptimal due to scheduler behavior and load-balancing issues. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[2] C2: eUVM worker threads are free-running asynchronous threads owned by the simulator, decoupled from the scheduler, unable to wait for events, and able to trigger events. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[3] C3: Worker threads share memory with tasks, but UVM data exchange is handled through TLM FIFOs; standard TLM FIFOs use events to block empty reads and full writes. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[4] C4: eUVM implements asynchronous TLM FIFO variants for async write, async read, and async read/write use cases, using semaphores where asynchronous worker threads cannot wait on simulator events. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[5] C5: eUVM worker thread constructs can parallelize a UVM sequencer by creating free-running worker threads that feed a virtual sequencer in round-robin fashion, useful when transaction generation and randomization are complex and time-consuming. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[6] C6: eUVM provides a multicore-parallelized fork for scenarios where asynchronous TLM FIFO overhead would offset gains for simple transactions, and eUVM can distribute forked tasks across CPU threads associated with multiple task executors. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[7] C7: For sequences containing thousands of transactions in a queue or array, eUVM can slice the sequence container and split slices across forked tasks, with each fork processing a slice. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[8] C8: The eUVM parallelized fork listing shows fork objects being stored, lambda wrapping for scoped-variable capture, use of set_thread_affinity to bind forks to task executors, and join semantics where forks execute after join or another blocking construct. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings