Bare-Metal Test Generation

Overview

Bare-metal test generation is used in RISC-V processor verification to create software-driven stimulus that exercises architectural and system behavior directly on verification targets. The provided evidence describes this approach through STING, a bare-metal functional verification tool for RISC-V that generates constrained-random and directed tests. These tests are intended to be portable across simulation, emulation, FPGA prototypes, and silicon, and are self-checking to simplify debugging [STING bare-metal generator].

The need for this style of generation is tied to RISC-V verification complexity. The RISC-V ISA is modular and has many optional extensions, which increases the challenge of achieving comprehensive verification coverage [RISC-V verification complexity]. The evidence states that comprehensive coverage typically requires more than one verification or comparison methodology and more than one stimulus technique [RISC-V verification complexity].

Role in RISC-V verification

Bare-metal test generation supports a combined stimulus strategy:

Constrained-random stimulus explores broad state spaces and can uncover unanticipated behaviors [random and directed strategy].
Directed tests provide structure and can systematically target specific ISA features or coverage gaps [random and directed strategy].
Combined random and directed stimulus is described as the most effective approach, with random testing used for breadth and directed suites used for precision [random and directed strategy].

The evidence warns that random testing alone can leave gaps. Features such as privilege-mode transitions, page-table walks, and memory protection may not be fully exercised by random generation alone [random-alone gaps]. Directed suites can address such features systematically, but may miss subtle corner-case interactions; therefore, the flow combines both techniques [random and directed strategy].

STING-based bare-metal generation

STING is described as a bare-metal, software-driven generator developed for RISC-V. It produces C++-based random streams and ASM-style directed tests, built on a lightweight kernel, libraries, and device drivers [STING architecture]. It also includes a programming framework for developing directed tests and uses stimulus graphs to control scheduling of both random and directed tests [STING architecture].

The generated programs are portable across multiple execution environments, including:

RTL simulation,
ZeBu emulation,
HAPS FPGA prototypes,
and silicon [portable stimulus].

The evidence also states that these programs are architecturally self-checking [portable stimulus]. This portability supports shift-left verification, where tests can begin in simulation and be reused in emulation, prototyping, and silicon to reduce late-stage risk [shift-left verification].

Verification targets and bug classes

Bare-metal generated tests are especially relevant for processor behaviors that are difficult to cover exhaustively with a single stimulus style. The evidence identifies STING as effective for stressing:

privilege levels,
memory protection,
control and status registers,
and hypervisor extensions [portable stimulus].

Reported issue classes exposed by STING include:

deadlocks in page-table walks,
mishandling of the fence.i instruction,
floating-point NaN quirks,
and cache-coherence conflicts [reported STING findings].

The evidence also defines several RISC-V features and behaviors commonly relevant to such tests. PMP and ePMP restrict access to memory regions to enforce privilege, isolation, and security policies [PMP definition]. Sv39 and Sv48 are RISC-V virtual-memory schemes using 39-bit and 48-bit virtual addresses and multi-level page-table structures [Sv39 Sv48 definition]. Floating-point NaNs include signalling NaNs, which raise exceptions, and quiet NaNs, which propagate silently [NaN definition]. Cache-coherence conflicts involve multi-core cache situations where accesses to the same cache line can lead to stale data, corruption, or stalls if coherence is not enforced correctly [cache-coherence definition].

Coverage and closure

Bare-metal test generation is part of a broader coverage-closure process. The evidence defines coverage closure as achieving sufficient functional and code coverage to provide confidence that relevant design behaviors have been tested [coverage closure]. Functional Coverage and Stimulus Coverage measure how thoroughly stimulus has exercised ISA features and system behaviors [functional stimulus coverage].

The evidence also notes that automatically generated coverage models, such as ImperasFC and ImperasSC, can provide detailed insight into coverage gaps and integrate with Verdi [functional stimulus coverage]. Directed stimulus from STING and directed suites such as ImperasTS can then be used to address coverage gaps found during analysis [ImperasTS closure].

Comparison and debug flow

Bare-metal generated programs can be used with simulation and reference-model comparison flows. ImperasDV integrates fast RISC-V reference models and enables lock-step comparison of RTL against a golden reference model at instruction retirement [lock-step comparison]. Lock-step comparison is described as running RTL and a golden reference model in parallel and comparing results at instruction retirement for early bug detection [lock-step comparison].

The provided evidence also places bare-metal tests in a tool flow:

VCS executes STING-generated random tests and ImperasTS directed suites to accelerate debug and coverage closure [VCS role].
Verdi is used for waveforms, mismatch tracking, and functional coverage reporting [Verdi role].
ZeBu emulation supports long software-driven tests, OS bring-up, and large-scale workloads [ZeBu role].
HAPS prototyping supports pre-silicon software development, performance validation, and extended regression cycles [HAPS role].

Practical significance

In the provided evidence, bare-metal test generation is significant because it provides portable, self-checking stimulus that can be reused across multiple verification stages. It complements directed suites, coverage analysis, lock-step reference comparison, and debug platforms. For RISC-V, where optional ISA features and extensions increase verification complexity, the evidence supports using bare-metal constrained-random and directed generation as part of a combined strategy for discovery, targeted closure, and late-stage risk reduction.