Overview
In the RISC-V vector accelerator verification project described in Functional Verification of a RISC-V Vector Accelerator, the CI/CD pipeline was part of a broader verification infrastructure that included a UVM environment, Spike-based co-simulation, assertions, coverage, constrained-random binary generation, simulation, and automated error reporting. The paper states that this CI/CD infrastructure played an essential role in code health, maintainability, and coverage closure, and that the overall process found 3005 errors and reached 95.79% functional coverage.
Role in the verification flow
The pipeline supported both the RTL design team and the verification team by enabling new features to be tested and new errors to be found. When errors were detected, reproducibility information such as the binary and faulty instruction was provided, and regressions were run before changes could be merged after a tentative fix.
Continuous integration was also used to run simulations that generated and executed tests and collected coverage metrics. In addition to functional coverage, the project recorded assertion usage and code coverage from CI-run simulations.
Implementation
The CI infrastructure was built using the open-source CI server Jenkins. The authors created multiple interacting pipelines intended to keep the design as error-free as possible.
GitLab was used for version control, issue tracking, and documentation. The project also used GitLab Wiki guides and tutorials so project members could run simulations.
RISCV-DV was used in the automated test-generation flow to create random tests and random binaries for the verification environment.
Pipeline stages
The implemented CI pipelines included:
- New tests: generated random tests with RISCV-DV, compiled the DUT, executed binaries, and classified tests into passed and failed directories. Passing tests were used to create a regression set, while failing tests were kept for debugging until the corresponding error was fixed.
- Retry: for each change in the DUT repository main branch, re-executed the failed-test set and reclassified tests as passed or failed.
- Selection: every day at midnight, if the number of passed tests exceeded a threshold, ranked tests by collected coverage and created two regression sets: a large set and a small set.
- Regressions: ran the small regression set when a DUT change was a merge candidate, and ran the large regression set weekly to ensure recent changes did not break known-good tests.
Operational use and results
The environment was used for about a year. The authors report nightly runs in which 24 tests were run every night between April and July, increasing to 50 tests between August and the end of November before RTL freeze. Each test contained approximately 500 vector instructions. During the process, the team found 3005 errors; memory, narrowing, and widening vector instructions accounted for around 70% of the failed random tests described in the evidence.
The CI/CD infrastructure complemented the UVM verification environment by automating constrained-random test generation, simulation, error reporting, and regression execution. The paper reports that this process reached 95.79% functional coverage.