Functional Verification of a RISC-V Vector Accelerator Wiki

Functional Verification of a RISC-V Vector Accelerator

Paper WIKI v2 · 5/27/2026

"Functional Verification of a RISC-V Vector Accelerator" describes an industrial-style verification infrastructure for an academic decoupled RISC-V vector accelerator taped out in the European Processor Initiative context. The work uses a UVM environment around the Open Vector Interface, RISCV-DV random binary generation, Spike co-simulation as a reference model, a UVM scoreboard, SystemVerilog assertions, functional and code coverage, and Jenkins/GitLab-based CI. The reported campaign found 3005 errors and reached 95.79% average functional coverage.

Overview

"Functional Verification of a RISC-V Vector Accelerator" is a paper on the functional verification of an academic RISC-V-based decoupled vector accelerator. The paper states that the accelerator was successfully taped out in the context of the European Processor Initiative and implemented version 0.7.1 of the RISC-V Vector Extension while connecting to a scalar processor core through the Open Vector Interface (OVI). [C1]

The paper's main reported contributions are an industrial-grade verification approach using a UVM testbench, reference model, assertions, and coverage; a common UVM testbench for a novel interface and large RTL project; co-simulation-based result comparison for completed vector instructions; and automated testing/regression infrastructure that reached 95.79% functional coverage. [C2]

Verified design

The design under verification was a Vector Processing Unit (VPU) based on RISC-V Vector Extension 0.7.1. It had eight vector lanes, supported vectors up to 256 elements of 64 bits, included 32 logical and 40 physical vector registers, and supported 64- and 32-bit floating-point vector operations plus 64-, 32-, 16-, and 8-bit integer vector operations. Memory operations had limited out-of-order capability, mostly between arithmetic and memory operations. [C3]

Within the system described by the paper, the scalar core executed scalar instructions and sent vector instructions to the VPU. Memory accesses for vector memory operations were also performed by the scalar core through OVI. The paper describes the accelerator as developed by BSC and connected to a scalar RISC-V core designed by SemiDynamics. [C4]

Verification methodology

The verification team built tools and utilities around the design under test to ease error detection and chose UVM because it supports modular, scalable, and reusable verification environments. The team initially considered verifying VPU submodules individually with constrained-random methods, but focused instead on the OVI interface because submodule-level verification would have required excessive effort and not all submodule specifications were final. [C5]

The UVM environment used one agent per semi-independent OVI sub-interface. For example, the issue sub-interface agent contained a sequencer, driver, and monitor connected to a virtual interface. Virtual sequences produced interface-specific transactions; drivers stimulated the corresponding sub-interfaces; monitors captured interface state and returned information to the virtual sequence through the sequencer. UVM events synchronized communication among the seven sub-interfaces. [C6]

Because the OVI sub-interfaces were highly interdependent, the environment randomized only the instructions fed to the issue sub-interface and made the other sub-interfaces react according to the driven instruction stream. [C7]

RISCV-DV test generation

The flow used RISCV-DV, described in the paper as a SystemVerilog/UVM-based open-source instruction generator for RISC-V processor verification. RISCV-DV generated random RISC-V assembly tests that supplied vector instructions for VPU testing. Because RISCV-DV supported a later RVV version than 0.7.1, the authors adapted the relevant parts for the target design. [C8]

Major RISCV-DV changes included generation of vsetvli instructions, memory-operation generation changes to vary element width and vector length, selectable data-page initialization patterns, constraints on accessed memory addresses to avoid memory exceptions—especially for indexed vector memory instructions—and adaptation to RVV 0.7.1. [C9]

Since some design modules remained under development during much of verification, the team initially blacklisted many generated-test instructions to keep tests functional. As errors were fixed, instructions were gradually removed from the blacklist until all implemented instructions were enabled. [C10]

Reference model and scoreboard

The environment used a UVM scoreboard to compare VPU results against a reference model. The chosen reference model was the RISC-V ISA simulator Spike, integrated through co-simulation. [C11]

Spike had two roles: it acted as a scalar core executing scalar instructions and providing vector instructions to UVM in program order, and it acted as the golden/reference model for checking DUT results. The authors modified Spike with SystemVerilog DPI-callable functions, a method that resumed simulation until a vector instruction executed and returned reference results to UVM, functions to read Spike memory, and a function to force reduction results into Spike to avoid divergence for unordered floating-point reductions. [C12]

The scoreboard was connected to the completed monitor. When an instruction finished, a comparison method executed. Because many vector results were written into physical vector registers rather than exposed directly as scalar outputs, the information extracted from Spike included the destination vector-register value. [C13]

Floating-point reductions

Unordered floating-point reductions required special handling. The VPU used a different reduction algorithm than Spike, which the RVV specification allowed. This could create false mismatches and leave incorrect values in Spike vector registers for later instructions. To address this, the authors built an independent C reference model for unordered reductions that implemented the DUT's reduction algorithm. For those cases, the VPU result was compared against the reduction reference model rather than Spike, and matching values were injected into Spike registers. [C14]

Memory-operation verification

The paper identifies memory operations as one of the most delicate parts of the design because the VPU had no direct memory access. Instead, it read and wrote data through the scalar core using OVI memop, load, store, and mask interfaces, requiring substantial inter-sub-interface communication. [C15]

For loads, expected memory data was obtained from Spike and written into a memory model based on the OpenTitan memory model, then sent through the VPU load sub-interface. For stores, memory contents before execution were needed to check masked operations and detect undesired writes; VPU store data was captured in the memory model and later compared with Spike values. Masked memory operations also required outgoing mask or index transactions from the VPU to support environment-side execution and comparison. [C16]

The authors also handled OVI retries, which occur when the VPU cannot handle all loaded cache lines sent by the scalar core. In that case, the instruction completes with a vstart value representing the first element not written to the vector registers, requiring re-execution from that element. Retry scenarios were randomized using UVM configuration objects and were reported as one of the primary sources of VPU errors. [C17]

Assertions, coverage, and CI

Because the VPU interface was critical, the team wrote more than 50 SystemVerilog Assertions against OVI specifications. These assertions helped identify both VPU bugs and UVM stimulation problems early in testbench development, and most targeted memory-related sub-interfaces. [C18]

The team implemented a functional coverage plan focused mostly on directly observable VPU-interface behavior, including instructions, execution parameters, and memory-sub-interface values. They also gathered coverage for selected internal modules. ISA tests covered RVV 0.7.1 instruction formats and configurations such as vector length, element width, rounding modes, and masks, while RISCV-DV random tests provided additional stress. [C19]

The environment also recorded assertion usage and code coverage from simulations run by continuous integration. The CI infrastructure was built with Jenkins and included pipelines for generating new RISCV-DV tests, retrying failed tests after DUT changes, selecting regression sets based on coverage, and running regressions before merges and weekly larger regressions. The paper also reports using GitLab for version control, issue tracking, and project documentation through GitLab Wiki guides and tutorials. [C20]

Results and lessons

The environment was used for about one year. When errors were found, the team supplied reproduction information such as binaries and faulty instructions, maintained summaries of active errors to guide debugging, and ran regressions before fixed changes could be merged. [C21]

The reported nightly randomized testing ran 24 tests per night from April to July and 50 tests per night from August to the end of November, with each test containing approximately 500 vector instructions. The authors report finding 3005 errors; memory, narrowing, and widening vector instructions accounted for about 70% of the failures. [C22]

After the nightly testing period, the authors developed additional pipelines that together ran about 600 tests per day to collect coverage and find bugs while VPU development continued. The reported average functional coverage was 95.79%; average code coverage was 72.64%, with 90.90% statement coverage and 49.83% toggle coverage. [C23]

The paper concludes that the reusable and extendable UVM environment implemented the OVI/VPU protocol and checked completed-instruction correctness. It also states that automated constrained-random test generation, simulation, error reporting, and CI/CD infrastructure found 3005 errors and reached 95.79% functional coverage. [C24]

A reported lesson was that dividing communication across several agents complicated maintenance, extension, and performance. As future work, the authors suggested using a single stimulus-producing agent so interface interaction could be handled in one module, simplifying sub-interface communication and future expansion. [C25]

CITATIONS

25 sources

25 citations

[1] The paper verifies an academic RISC-V decoupled vector accelerator taped out in the European Processor Initiative context, implementing RVV 0.7.1 and connecting through OVI. Functional Verification of a RISC-V Vector Accelerator

[2] The paper's stated contributions include UVM, a reference model, assertions, coverage, co-simulation, and automated testing/regression infrastructure reaching 95.79% coverage. Functional Verification of a RISC-V Vector Accelerator

[3] The verified VPU is RVV 0.7.1-based, has eight lanes, supports vectors up to 256 64-bit elements, has 32 logical and 40 physical registers, supports FP and integer vector operations, and has limited memory-operation out-of-order capability. Functional Verification of a RISC-V Vector Accelerator

[4] BSC developed the vector accelerator, SemiDynamics designed the connected scalar RISC-V core, and the scalar core handles scalar execution, vector issue, and vector memory accesses through OVI. Functional Verification of a RISC-V Vector Accelerator

[5] The team selected UVM for a modular, scalable, reusable verification environment and focused on OVI-level verification rather than individual VPU submodules. Functional Verification of a RISC-V Vector Accelerator

[6] The UVM environment uses agents, sequencers, drivers, monitors, virtual sequences, and UVM events across seven OVI sub-interfaces. Functional Verification of a RISC-V Vector Accelerator

[7] Because OVI sub-interfaces are highly dependent, the environment randomized only issue-interface instructions and made other sub-interfaces react. Functional Verification of a RISC-V Vector Accelerator

[8] RISCV-DV generated random RISC-V assembly tests for VPU testing and required adaptation because it implemented a later RVV version than 0.7.1. Functional Verification of a RISC-V Vector Accelerator

[9] Major RISCV-DV additions included vsetvli generation, memory-operation changes, data-page initialization selection, memory-address constraints, and RVV 0.7.1 adaptation. Functional Verification of a RISC-V Vector Accelerator

[10] Generated-test instructions were initially blacklisted while modules were under development and gradually re-enabled as errors were fixed. Functional Verification of a RISC-V Vector Accelerator

[11] The verification environment used a UVM scoreboard and Spike as the reference model in co-simulation. Functional Verification of a RISC-V Vector Accelerator

[12] Spike acted both as scalar-core instruction source and golden/reference model, with DPI-callable functions, vector-instruction stepping, memory reads, and reduction-result forcing. Functional Verification of a RISC-V Vector Accelerator

[13] The scoreboard compares results when instructions finish and includes destination vector-register values extracted from Spike. Functional Verification of a RISC-V Vector Accelerator

[14] Unordered floating-point reductions used a C reference model matching the DUT algorithm, with matching values injected into Spike. Functional Verification of a RISC-V Vector Accelerator

[15] Memory operations are delicate because the VPU accesses memory through the scalar core using memop, load, store, and mask interfaces. Functional Verification of a RISC-V Vector Accelerator

[16] Load, store, and masked memory-operation checking used Spike data and a memory model based on OpenTitan's memory model. Functional Verification of a RISC-V Vector Accelerator

[17] OVI retries use vstart for re-execution, were randomized with UVM configuration objects, and were a primary source of VPU errors. Functional Verification of a RISC-V Vector Accelerator

[18] The authors implemented more than 50 SystemVerilog Assertions for OVI behavior, mostly targeting memory-related sub-interfaces. Functional Verification of a RISC-V Vector Accelerator

[19] The functional coverage plan focused on observable VPU interface behavior, internal modules, ISA tests, and RISCV-DV random tests. Functional Verification of a RISC-V Vector Accelerator

[20] CI recorded assertion and code coverage, used Jenkins pipelines for test generation/retry/selection/regression, and used GitLab for version control, issue tracking, and documentation. Functional Verification of a RISC-V Vector Accelerator

[21] The environment was used for about one year, and the team provided reproduction/debugging information and ran regressions before merging fixes. Functional Verification of a RISC-V Vector Accelerator

[22] Nightly testing ran 24 tests per night from April to July and 50 from August to November, with about 500 vector instructions each; the campaign found 3005 errors, around 70% from memory, narrowing, and widening instructions. Functional Verification of a RISC-V Vector Accelerator

[23] After nightly runs, additional pipelines ran around 600 tests per day; average functional coverage was 95.79%, average code coverage 72.64%, statement coverage 90.90%, and toggle coverage 49.83%. Functional Verification of a RISC-V Vector Accelerator

[24] The conclusion states that the reusable UVM environment checks OVI/VPU completed-instruction correctness and that CI/CD-enabled constrained-random testing found 3005 errors and reached 95.79% functional coverage. Functional Verification of a RISC-V Vector Accelerator

[25] The authors report that dividing communication among several agents complicated maintenance, extension, and performance, and suggest a single stimulus-producing agent as future work. Functional Verification of a RISC-V Vector Accelerator

VERSION HISTORY

v2 · 5/27/2026 · gpt-5.5 (current)

v1 · 5/27/2026 · gpt-5.5

Compare with: