Skip to content
STIMSMITH

Bare-Metal Test Generation

Concept WIKI v1 · 5/26/2026

Bare-metal test generation is a RISC-V verification approach centered on generating portable, software-driven tests that can stress processor and SoC behavior across simulation, emulation, FPGA prototyping, and silicon. In the provided evidence, the concept is illustrated primarily through STING, which generates constrained-random and directed bare-metal tests, and through complementary directed suites, coverage analysis, debug, and lock-step comparison flows.

Bare-Metal Test Generation

Overview

Bare-metal test generation is used in RISC-V processor verification to create software-driven stimulus that exercises architectural and system behavior directly on verification targets. The provided evidence describes this approach through STING, a bare-metal functional verification tool for RISC-V that generates constrained-random and directed tests. These tests are intended to be portable across simulation, emulation, FPGA prototypes, and silicon, and are self-checking to simplify debugging [STING bare-metal generator].

The need for this style of generation is tied to RISC-V verification complexity. The RISC-V ISA is modular and has many optional extensions, which increases the challenge of achieving comprehensive verification coverage [RISC-V verification complexity]. The evidence states that comprehensive coverage typically requires more than one verification or comparison methodology and more than one stimulus technique [RISC-V verification complexity].

Role in RISC-V verification

Bare-metal test generation supports a combined stimulus strategy:

  • Constrained-random stimulus explores broad state spaces and can uncover unanticipated behaviors [random and directed strategy].
  • Directed tests provide structure and can systematically target specific ISA features or coverage gaps [random and directed strategy].
  • Combined random and directed stimulus is described as the most effective approach, with random testing used for breadth and directed suites used for precision [random and directed strategy].

The evidence warns that random testing alone can leave gaps. Features such as privilege-mode transitions, page-table walks, and memory protection may not be fully exercised by random generation alone [random-alone gaps]. Directed suites can address such features systematically, but may miss subtle corner-case interactions; therefore, the flow combines both techniques [random and directed strategy].

STING-based bare-metal generation

STING is described as a bare-metal, software-driven generator developed for RISC-V. It produces C++-based random streams and ASM-style directed tests, built on a lightweight kernel, libraries, and device drivers [STING architecture]. It also includes a programming framework for developing directed tests and uses stimulus graphs to control scheduling of both random and directed tests [STING architecture].

The generated programs are portable across multiple execution environments, including:

  • RTL simulation,
  • ZeBu emulation,
  • HAPS FPGA prototypes,
  • and silicon [portable stimulus].

The evidence also states that these programs are architecturally self-checking [portable stimulus]. This portability supports shift-left verification, where tests can begin in simulation and be reused in emulation, prototyping, and silicon to reduce late-stage risk [shift-left verification].

Verification targets and bug classes

Bare-metal generated tests are especially relevant for processor behaviors that are difficult to cover exhaustively with a single stimulus style. The evidence identifies STING as effective for stressing:

  • privilege levels,
  • memory protection,
  • control and status registers,
  • and hypervisor extensions [portable stimulus].

Reported issue classes exposed by STING include:

  • deadlocks in page-table walks,
  • mishandling of the fence.i instruction,
  • floating-point NaN quirks,
  • and cache-coherence conflicts [reported STING findings].

The evidence also defines several RISC-V features and behaviors commonly relevant to such tests. PMP and ePMP restrict access to memory regions to enforce privilege, isolation, and security policies [PMP definition]. Sv39 and Sv48 are RISC-V virtual-memory schemes using 39-bit and 48-bit virtual addresses and multi-level page-table structures [Sv39 Sv48 definition]. Floating-point NaNs include signalling NaNs, which raise exceptions, and quiet NaNs, which propagate silently [NaN definition]. Cache-coherence conflicts involve multi-core cache situations where accesses to the same cache line can lead to stale data, corruption, or stalls if coherence is not enforced correctly [cache-coherence definition].

Coverage and closure

Bare-metal test generation is part of a broader coverage-closure process. The evidence defines coverage closure as achieving sufficient functional and code coverage to provide confidence that relevant design behaviors have been tested [coverage closure]. Functional Coverage and Stimulus Coverage measure how thoroughly stimulus has exercised ISA features and system behaviors [functional stimulus coverage].

The evidence also notes that automatically generated coverage models, such as ImperasFC and ImperasSC, can provide detailed insight into coverage gaps and integrate with Verdi [functional stimulus coverage]. Directed stimulus from STING and directed suites such as ImperasTS can then be used to address coverage gaps found during analysis [ImperasTS closure].

Comparison and debug flow

Bare-metal generated programs can be used with simulation and reference-model comparison flows. ImperasDV integrates fast RISC-V reference models and enables lock-step comparison of RTL against a golden reference model at instruction retirement [lock-step comparison]. Lock-step comparison is described as running RTL and a golden reference model in parallel and comparing results at instruction retirement for early bug detection [lock-step comparison].

The provided evidence also places bare-metal tests in a tool flow:

  • VCS executes STING-generated random tests and ImperasTS directed suites to accelerate debug and coverage closure [VCS role].
  • Verdi is used for waveforms, mismatch tracking, and functional coverage reporting [Verdi role].
  • ZeBu emulation supports long software-driven tests, OS bring-up, and large-scale workloads [ZeBu role].
  • HAPS prototyping supports pre-silicon software development, performance validation, and extended regression cycles [HAPS role].

Practical significance

In the provided evidence, bare-metal test generation is significant because it provides portable, self-checking stimulus that can be reused across multiple verification stages. It complements directed suites, coverage analysis, lock-step reference comparison, and debug platforms. For RISC-V, where optional ISA features and extensions increase verification complexity, the evidence supports using bare-metal constrained-random and directed generation as part of a combined strategy for discovery, targeted closure, and late-stage risk reduction.

CITATIONS

20 sources
20 citations
[1] RISC-V verification complexity source
[2] random and directed strategy source
[3] random-alone gaps source
[4] STING bare-metal generator source
[5] STING architecture source
[6] portable stimulus source
[7] shift-left verification source
[8] reported STING findings source
[9] PMP definition source
[10] Sv39 Sv48 definition source
[11] NaN definition source
[12] cache-coherence definition source
[13] coverage closure source
[14] functional stimulus coverage source
[15] ImperasTS closure source
[16] lock-step comparison source
[17] VCS role source
[18] Verdi role source
[19] ZeBu role source
[20] HAPS role source