Overview
"Functional Verification of a RISC-V Vector Accelerator" is a paper on the functional verification of an academic RISC-V-based decoupled vector accelerator. The paper states that the accelerator was successfully taped out in the context of the European Processor Initiative and implemented version 0.7.1 of the RISC-V Vector Extension while connecting to a scalar processor core through the Open Vector Interface (OVI). [C1]
The paper's main reported contributions are an industrial-grade verification approach using a UVM testbench, reference model, assertions, and coverage; a common UVM testbench for a novel interface and large RTL project; co-simulation-based result comparison for completed vector instructions; and automated testing/regression infrastructure that reached 95.79% functional coverage. [C2]
Verified design
The design under verification was a Vector Processing Unit (VPU) based on RISC-V Vector Extension 0.7.1. It had eight vector lanes, supported vectors up to 256 elements of 64 bits, included 32 logical and 40 physical vector registers, and supported 64- and 32-bit floating-point vector operations plus 64-, 32-, 16-, and 8-bit integer vector operations. Memory operations had limited out-of-order capability, mostly between arithmetic and memory operations. [C3]
Within the system described by the paper, the scalar core executed scalar instructions and sent vector instructions to the VPU. Memory accesses for vector memory operations were also performed by the scalar core through OVI. The paper describes the accelerator as developed by BSC and connected to a scalar RISC-V core designed by SemiDynamics. [C4]
Verification methodology
The verification team built tools and utilities around the design under test to ease error detection and chose UVM because it supports modular, scalable, and reusable verification environments. The team initially considered verifying VPU submodules individually with constrained-random methods, but focused instead on the OVI interface because submodule-level verification would have required excessive effort and not all submodule specifications were final. [C5]
The UVM environment used one agent per semi-independent OVI sub-interface. For example, the issue sub-interface agent contained a sequencer, driver, and monitor connected to a virtual interface. Virtual sequences produced interface-specific transactions; drivers stimulated the corresponding sub-interfaces; monitors captured interface state and returned information to the virtual sequence through the sequencer. UVM events synchronized communication among the seven sub-interfaces. [C6]
Because the OVI sub-interfaces were highly interdependent, the environment randomized only the instructions fed to the issue sub-interface and made the other sub-interfaces react according to the driven instruction stream. [C7]
RISCV-DV test generation
The flow used RISCV-DV, described in the paper as a SystemVerilog/UVM-based open-source instruction generator for RISC-V processor verification. RISCV-DV generated random RISC-V assembly tests that supplied vector instructions for VPU testing. Because RISCV-DV supported a later RVV version than 0.7.1, the authors adapted the relevant parts for the target design. [C8]
Major RISCV-DV changes included generation of vsetvli instructions, memory-operation generation changes to vary element width and vector length, selectable data-page initialization patterns, constraints on accessed memory addresses to avoid memory exceptions—especially for indexed vector memory instructions—and adaptation to RVV 0.7.1. [C9]
Since some design modules remained under development during much of verification, the team initially blacklisted many generated-test instructions to keep tests functional. As errors were fixed, instructions were gradually removed from the blacklist until all implemented instructions were enabled. [C10]
Reference model and scoreboard
The environment used a UVM scoreboard to compare VPU results against a reference model. The chosen reference model was the RISC-V ISA simulator Spike, integrated through co-simulation. [C11]
Spike had two roles: it acted as a scalar core executing scalar instructions and providing vector instructions to UVM in program order, and it acted as the golden/reference model for checking DUT results. The authors modified Spike with SystemVerilog DPI-callable functions, a method that resumed simulation until a vector instruction executed and returned reference results to UVM, functions to read Spike memory, and a function to force reduction results into Spike to avoid divergence for unordered floating-point reductions. [C12]
The scoreboard was connected to the completed monitor. When an instruction finished, a comparison method executed. Because many vector results were written into physical vector registers rather than exposed directly as scalar outputs, the information extracted from Spike included the destination vector-register value. [C13]
Floating-point reductions
Unordered floating-point reductions required special handling. The VPU used a different reduction algorithm than Spike, which the RVV specification allowed. This could create false mismatches and leave incorrect values in Spike vector registers for later instructions. To address this, the authors built an independent C reference model for unordered reductions that implemented the DUT's reduction algorithm. For those cases, the VPU result was compared against the reduction reference model rather than Spike, and matching values were injected into Spike registers. [C14]
Memory-operation verification
The paper identifies memory operations as one of the most delicate parts of the design because the VPU had no direct memory access. Instead, it read and wrote data through the scalar core using OVI memop, load, store, and mask interfaces, requiring substantial inter-sub-interface communication. [C15]
For loads, expected memory data was obtained from Spike and written into a memory model based on the OpenTitan memory model, then sent through the VPU load sub-interface. For stores, memory contents before execution were needed to check masked operations and detect undesired writes; VPU store data was captured in the memory model and later compared with Spike values. Masked memory operations also required outgoing mask or index transactions from the VPU to support environment-side execution and comparison. [C16]
The authors also handled OVI retries, which occur when the VPU cannot handle all loaded cache lines sent by the scalar core. In that case, the instruction completes with a vstart value representing the first element not written to the vector registers, requiring re-execution from that element. Retry scenarios were randomized using UVM configuration objects and were reported as one of the primary sources of VPU errors. [C17]
Assertions, coverage, and CI
Because the VPU interface was critical, the team wrote more than 50 SystemVerilog Assertions against OVI specifications. These assertions helped identify both VPU bugs and UVM stimulation problems early in testbench development, and most targeted memory-related sub-interfaces. [C18]
The team implemented a functional coverage plan focused mostly on directly observable VPU-interface behavior, including instructions, execution parameters, and memory-sub-interface values. They also gathered coverage for selected internal modules. ISA tests covered RVV 0.7.1 instruction formats and configurations such as vector length, element width, rounding modes, and masks, while RISCV-DV random tests provided additional stress. [C19]
The environment also recorded assertion usage and code coverage from simulations run by continuous integration. The CI infrastructure was built with Jenkins and included pipelines for generating new RISCV-DV tests, retrying failed tests after DUT changes, selecting regression sets based on coverage, and running regressions before merges and weekly larger regressions. The paper also reports using GitLab for version control, issue tracking, and project documentation through GitLab Wiki guides and tutorials. [C20]
Results and lessons
The environment was used for about one year. When errors were found, the team supplied reproduction information such as binaries and faulty instructions, maintained summaries of active errors to guide debugging, and ran regressions before fixed changes could be merged. [C21]
The reported nightly randomized testing ran 24 tests per night from April to July and 50 tests per night from August to the end of November, with each test containing approximately 500 vector instructions. The authors report finding 3005 errors; memory, narrowing, and widening vector instructions accounted for about 70% of the failures. [C22]
After the nightly testing period, the authors developed additional pipelines that together ran about 600 tests per day to collect coverage and find bugs while VPU development continued. The reported average functional coverage was 95.79%; average code coverage was 72.64%, with 90.90% statement coverage and 49.83% toggle coverage. [C23]
The paper concludes that the reusable and extendable UVM environment implemented the OVI/VPU protocol and checked completed-instruction correctness. It also states that automated constrained-random test generation, simulation, error reporting, and CI/CD infrastructure found 3005 errors and reached 95.79% functional coverage. [C24]
A reported lesson was that dividing communication across several agents complicated maintenance, extension, and performance. As future work, the authors suggested using a single stimulus-producing agent so interface interaction could be handled in one module, simplifying sub-interface communication and future expansion. [C25]