Skip to content
STIMSMITH

eUVM

Tool WIKI v1 · 5/26/2026

eUVM is a D-language-based verification/testbench environment described in the DVCon paper “Crafting a Million Instructions/Sec RISCV-DV.” In that work, eUVM is used to port and optimize RISCV-DV by enabling multicore testbench execution, refactoring shared/static state, providing profiling support through uvm_trace, and applying low-level performance techniques such as efficient shallow copying and reduced memory allocation.

Overview

eUVM is presented as a verification environment built on the D Programming Language and used in an optimized RISCV-DV port. The cited work contrasts eUVM with SystemVerilog/UVM in the context of testbench performance, especially where SystemVerilog testbenches lack user-level language constructs for synchronized shared-data access and where multicore simulator support is largely aimed at RTL or gate-level simulation rather than behavioral testbench code. [C1]

Role in RISCV-DV optimization

The paper describes an eUVM RISCV-DV port that changes the RISCV-DV architecture to better fit D-language concurrency semantics. In the original SystemVerilog code, some instruction-registry data is statically scoped in riscv_instr.sv. The eUVM port refactors those variables and related functions into a separate riscv_instr_registry class, then instantiates that registry inside the singleton riscv_instr_gen_config class to preserve singleton-like behavior while avoiding problematic global/static shared state in concurrent software. [C2]

Multicore execution model

In eUVM, the fundamental unit of testbench execution is a process, similar to SystemVerilog. A process can be declared as a task or forked from an existing task using the eUVM fork construct. The cited paper states that eUVM differs from SystemVerilog by being capable of executing threads on multiple cores and by allowing a newly forked process to be delegated to a specified processor thread. [C3]

For large RISCV-DV instruction sequences, eUVM uses a parallelized fork strategy. The optimized generator decides whether to parallelize based on instruction count, with a default threshold of 4000 in the par_instr_threshold configuration parameter. When the threshold is exceeded, it splits instruction randomization into par_num_threads slices, with the default thread count given as 8, and randomizes each slice in a separate thread. [C4]

The eUVM fork construct returns a Fork object, which can be stored in a list, configured, and joined later. The set_thread_affinity method assigns a fork to a specific execution thread. [C5]

Directed instruction-stream parallelization

For directed instruction streams, the cited implementation uses a different strategy: because there are multiple groups of directed streams, a separate thread is designated for randomizing each group. The listing shows creation of Fork objects, use of set_thread_affinity, joining of all forks, and shuffling of the resulting stream. [C6]

Profiling support

eUVM includes a uvm_trace construct used for macro-level profiling and formal identification of testbench bottlenecks. The paper cautions that each uvm_trace invocation performs an operating-system call to fetch the current clock time, so excessive use can significantly increase runtime. [C7]

Runtime optimization techniques

The paper describes several eUVM-oriented implementation techniques for reducing runtime overhead:

  • Efficient shallow copy: eUVM implements shallow copy by using D object introspection to determine the memory footprint of an object and then copying the relevant memory slice. The paper notes that this becomes a single memcopy operation and is more efficient than copying individual class elements through UVM utility copy constructs. [C8]
  • Reduced memory allocation: eUVM avoids some trivial allocations by using D’s sformat, which lets the user supply scratch memory for formatted output. In the example, a fixed-size character buffer is used for formatting a 32-bit immediate value, reducing calls to malloc by half compared with an allocation-returning string-formatting approach. [C9]

Context and motivation

The motivation for eUVM in the cited work is testbench performance. The paper states that SystemVerilog/UVM RISCV-DV execution is limited by complex constraint solving and sub-optimal algorithmic implementation, and that SystemVerilog lacks native data types, requiring DPI-based C/C++ interfacing for emulation-platform integration. It also states that computational algorithms written in SystemVerilog execute about an order of magnitude slower than corresponding C/C++ or other native-language implementations because SystemVerilog integral variables and expressions carry value-change event semantics. [C10]

CITATIONS

10 sources
10 citations
[1] C1: eUVM is discussed as a D-language-based verification environment in contrast with SystemVerilog/UVM testbench-performance limitations. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[2] C2: The eUVM RISCV-DV port refactors static RISCV-DV instruction-registry variables into a separate riscv_instr_registry class instantiated from the singleton configuration class. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[3] C3: eUVM processes can be forked, can execute on multiple cores, and can be delegated to specified processor threads. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[4] C4: eUVM parallelizes large RISCV-DV instruction randomization by thresholding at par_instr_threshold default 4000 and splitting work across par_num_threads default 8. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[5] C5: eUVM fork returns a Fork object that can be collected, configured, joined later, and assigned thread affinity through set_thread_affinity. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[6] C6: Directed instruction-stream randomization in eUVM assigns separate threads to directed-stream groups and uses Fork objects with set_thread_affinity and join. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[7] C7: eUVM provides uvm_trace for macro-level profiling, but each invocation fetches the current clock time via an operating-system call and can increase runtime if overused. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[8] C8: eUVM shallow copy uses D object introspection and a memory-slice copy, resulting in a single memcopy operation that is more efficient than element-wise UVM utility copying. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[9] C9: eUVM reduces memory-allocation overhead for formatted strings by using D sformat with caller-provided scratch memory, reducing malloc calls by half in the described example. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[10] C10: The paper motivates eUVM by identifying SystemVerilog/UVM RISCV-DV performance limits, DPI overhead concerns, lack of native data types, limited testbench multicore support, and slower algorithmic execution versus native languages. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings