Skip to content
STIMSMITH

Profiling

Technique WIKI v1 · 5/26/2026

Profiling is a performance-optimization technique used to identify execution-time bottlenecks. In the RISCV-DV testbench optimization case study, profiling narrowed more than 15,000 lines of code to about 200 repeatedly executed lines that were key to generator performance, and it helped identify four major bottlenecks.

Overview

Profiling is presented as the most important step in performance optimization for the RISCV-DV generator case study. The codebase contained more than 15,000 lines of code, and profiling reduced the optimization focus to roughly 200 repeatedly executed lines that were key to generator performance. [C1]

Purpose

The primary purpose of profiling in the cited RISCV-DV workflow was to identify performance bottlenecks before applying optimization techniques. The profiling results were combined with analysis of algorithmic complexity to rank bottlenecks by impact on overall generator execution time. [C2]

Profiling granularity

The evidence distinguishes between macro-level and micro-level profiling:

  • Macro-level profiling: uvm_trace, an eUVM construct, was used for formal identification of testbench bottlenecks. [C3]
  • Micro-level profiling: the open-source tool gprof is cited as useful for finer-grained profiling. [C4]

Example instrumentation pattern

The RISCV-DV case study shows uvm_trace calls placed around an instruction-randomization loop. The trace log records wall-clock timestamps in square brackets after the UVM TRACE tag, giving snapshots of time at trace invocation points. [C5]

uvm_trace("GEN INSTR", "START", UVM_DEBUG);
foreach (ref instr; instr_list)
  randomize_instr(instr, is_debug_program);
uvm_trace("GEN INSTR", "DONE", UVM_DEBUG);

Profiling setup considerations

Because RISCV-DV is highly parameterized, execution time varies significantly with user-selected parameters. For profiling, the cited study used the comprehensive riscv_instr_base_test with a mix of seven directed streams covering the possible RISC-V instruction categories. [C6]

Bottlenecks identified

Profiling and complexity analysis identified four major bottlenecks in decreasing order of impact: [C2]

  1. Creation and randomization of directed instruction streams, where most time was traced to constraint-solver execution.
  2. Dumping a large non-directed instruction stream into instruction lists for the main program and sub-programs, again dominated by randomization and constraint solving.
  3. Insertion of directed instruction streams into the non-directed instruction stream, which became more severe as instruction count increased and had O(n²) algorithmic complexity.
  4. Repeated creation of formatted strings for assembly output, where repeated $sformatf calls caused many memory allocations.

Optimization guidance derived from profiling

The first two bottlenecks involved constraint solving and had linear algorithmic complexity, making them suitable targets for multicore parallelization in the cited study. The fourth bottleneck also had linear complexity, but frequent memory allocation limited parallelization potential until allocation calls were reduced. The third bottleneck was attributed to sub-optimal algorithmic implementation and required a more significant architectural change. [C7]

Caution

uvm_trace is useful for macro-level profiling, but each invocation performs an operating-system call to fetch the current clock time. Excessive use can therefore cause an inordinate increase in testbench runtime. [C8]

CITATIONS

8 sources
8 citations
[1] Profiling was presented as the most important performance-optimization step and reduced the RISCV-DV focus from more than 15,000 lines to about 200 repeatedly executed performance-critical lines. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[2] Profiling results, combined with algorithmic-complexity analysis, identified four major RISCV-DV generator bottlenecks and ranked them by impact. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[3] uvm_trace is described as an eUVM construct that helps formally identify testbench bottlenecks at macro level. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[4] gprof is identified as an open-source tool useful for micro-level profiling. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[5] The example uvm_trace instrumentation surrounds an instruction-randomization loop, and UVM_TRACE log messages include wall-clock timestamps taken at trace invocation. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[6] The profiling setup used riscv_instr_base_test with seven directed streams because RISCV-DV execution time varies significantly with selected parameters. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[7] The first two bottlenecks were linear constraint-solving workloads suited to multicore parallelization; the fourth was affected by memory allocation; and the third reflected sub-optimal algorithmic implementation requiring architectural change. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings
[8] Each uvm_trace invocation performs an operating-system call to fetch the current clock time, so reckless use can increase testbench runtime significantly. [PDF] Crafting a Million Instructions/Sec RISCV-DV - DVCon Proceedings