Overview
RTL (Register-Transfer Level) verification is the activity of checking that a hardware design, described at the register-transfer level of abstraction, behaves as intended. Evidence describes it as a primary bottleneck in modern hardware development, reported to consume 60-70% of development time. [C1] Bugs that escape RTL verification are described as lengthening design cycles and producing significant follow-up costs. [C2]
The same evidence observes that, although Large Language Models (LLMs) have shown promise for RTL automation, their performance and research focus have overwhelmingly centered on RTL generation rather than verification. [C1]
Simulation-based RTL verification
Simulation is described as a prevalent verification approach because of its ease of use and scalability. In processor RTL verification, this makes stimulus generation and comparison against a reference behavior central parts of the workflow. [C2]
Cross-level testing with an ISS reference
A documented RISC-V processor-verification approach uses cross-level testing: the RTL core under test is run in a tightly coupled co-simulation setting together with an Instruction Set Simulator (ISS) acting as the reference model. [C3] The testbench generates an endless instruction stream on the fly during simulation, with no restrictions on the generated stream. The same stream is fed to both the ISS and the RTL core, and the results are compared after each executed instruction, so RTL-core errors are detected immediately when they occur. [C3]
Test-generation considerations
The same RISC-V cross-level work contrasts generated testing with official hand-written RISC-V test suites. Those suites are described as targeting basic sanity checks and a few corner-case scenarios across instruction-set extensions, but their overall coverage is described as very limited and unsuitable for continuous testing. [C4]
RISC-V random program generation with riscv-dv
Evidence identifies riscv-dv as an open-source random instruction generator developed by CHIPS Alliance for RISC-V processor verification. Its SystemVerilog UVM-based class structure is described as helpful for verifying RISC-V IP, and the generated random tests can be run directly with the design IP. [C5]
Within that ecosystem, the class riscv_asm_program_gen.sv (described in the gen_program() function and its helpers) generates the complete RISC-V assembly program used to verify RISC-V IP, with support for customization of RISC-V GPR usage and instruction selection. [C5][C6]
Sections generated by this class include the initialization routine, instruction section, data section, stack section, page tables, and interrupt and exception handling. [C5]
Configuration randomization
The class riscv_instr_gen_config is randomized from the test riscv_instr_base_test.sv. This randomization decides the RISC-V extension used, the supported privilege mode, the instruction counts for the main program and subprograms, and whether the program must generate break instructions via variables such as no_ebreak, no_dret, no_fence, and no_wfi. [C6] Many other configuration variables can be set true or false based on DUT features and testbench stimulus-generation requirements. [C6]
Program generation pipeline
The gen_program() function is the main entry point for generating all sections of the program. After being called from the upper layer, it invokes other functions in riscv_asm_program_gen one by one. [C6]
The pipeline includes: [C6]
get_directed_instr_stream()andadd_directed_instr_stream(), which select the ratio of instruction generation (e.g.,riscv_jal_instr ratio: 30/1000).gen_program_header(), which fills theinstr_streamstring array with header instructions such as.include "user_init.s"and callsgen_section("_start", str)to insert them.init_gpr(), which initializes general-purpose registers with random values.generate_directed_instr_stream(), which decides the ratio and inserts directed instruction streams, randomizing instructions and selectingrs1,rs2, andrdbased on instruction type. Thepost_random()function ofriscv_instris used to produce instructions that use GPRs x0 to x31 across all instructions.- A check that controls the ratio of any illegal or HINT instructions; if that ratio is zero, no illegal or HINT instructions are generated.
riscv_instr_sequence::generate_instr_stream, which usesconvert2asm()to convert the instruction stream to assembly strings.main_program[hart].generate_instr_stream(), which converts the instruction stream to string format.insert_sub_program(sub_program[hart], instr_stream)when sub-program instructions need to be generated.- Host-interface instructions added by
gen_section, such asstr=write_tohost:,str=sw gp, tohost, t1,instr[0]=sw gp, tohost, t1, andstr=_exit:. push_gpr_to_kernel_stack(), which pushes general-purpose registers to the stack for trap handling, andgen_section()selectingstr=mtvec_handler, which definesexception_handlerandinterrupt_handler.
The combined result is a full RISC-V assembly language program with random instructions and random GPR selections across different instruction patterns, suitable as verification stimuli for RISC-V IP. [C5][C6]
LLM-based agentic RTL verification
Recent evidence describes RTL verification approaches that combine LLMs with programmatic tool use. [C1][C7]
Pro-V multi-agent system and Sampling&Filtering
The Pro-V system is described as an efficient program-generation multi-agent system for automatic RTL verification. Compared to direct RTL-based sampling in CorrectBench, it proposes a more efficient and robust Sampling&Filtering mechanism that is fully decoupled from RTL generation. [C7]
In this mechanism, the agent samples N Program Emulator candidates M1, …, MN (with N = 5 in the reported study). Each candidate produces a corresponding signal reference result candidate R1, …, RN. The results are categorized into three cases: [C7]
- Consistent Outputs – all results are identical and merged into a single representative output.
- Outlier Detection – if a unique result
Rjdiffers from all the others in{R1, R2, …, Rn}(for example,j = 4), the outlier is filtered out. - Partial Consistent – the remaining partial-consistency cases are merged while abstaining from diversity.
This filtering mechanism is described as efficiently providing the most informative inputs to a downstream LLM-as-a-Judge module for further evaluation. [C7]
PRO-V-R1 open-source agentic framework
PRO-V-R1 is described as the first trainable open-source agentic framework for autonomous RTL verification, with three reported contributions: [C1]
- PRO-V sys – a modular agentic system that couples LLM-based reasoning with programmatic tool use for RTL verification.
- A data-construction pipeline that leverages existing RTL datasets to build simulation-validated, expert-level trajectories tailored for supervised fine-tuning (SFT) of RTL verification agents.
- An efficient reinforcement learning (RL) algorithm that uses verification-specific rewards derived from program-tool feedback to optimize the end-to-end verification workflow.
The system is positioned against existing methods that rely on large-scale proprietary models (such as GPT-4o) to generate Python-based functional references, which are described as incurring high cost and data-privacy risks. PRO-V-R1 is described as filling the previously absent role of an end-to-end open-source solution for autonomous verification. [C1]
Empirically, PRO-V-R1 is reported to achieve a 57.7% functional correctness rate and 34.0% robust fault detection, compared with the base model's 25.7% and 21.8% respectively, and is described as outperforming large-scale proprietary LLMs in functional correctness and showing comparable robustness for fault detection. [C1]
Formal verification for security: Contract Shadow Logic
Evidence also documents a formal approach to RTL verification in the security domain. Modern out-of-order processors face speculative execution attacks, and although software and hardware mitigations have been proposed, new attacks continue to arise from unknown vulnerabilities. [C8]
Contract Shadow Logic is described as a formal verification technique that can considerably improve RTL verification scalability while being applicable to different defense mechanisms against speculative execution attacks. The technique leverages computer-architecture design insights to improve verification performance when checking security properties formulated as software-hardware contracts for secure speculation. [C8]
The verification scheme is described as accessible to computer architects and as requiring minimal formal-method expertise. It is reported to have been evaluated on multiple RTL designs, including three out-of-order processors, and to exhibit a significant advantage in finding attacks on insecure designs and deriving complete proofs on secure designs compared with the baseline and two state-of-the-art verification schemes, LEAVE and UPEC. [C8]
RISC-V case study
The cross-level testing approach described above was evaluated on the 32-bit pipelined RISC-V core of the MINRES The Good Folk (TGF) Series. The authors report that the approach found several serious bugs in the industrial core and processed more than 200 million instructions per hour on a standard laptop. [C9]
Role in processor development
Across the cited evidence, RTL verification is positioned as a practical check of an RTL implementation against expected behavior. In simulation-based flows, generated instruction streams exercise the design while an ISS or generated assembly-test workflow provides a way to expose mismatches and detect design errors before they become more costly later in development. [C3][C5] LLM-based agentic systems extend this by generating program emulators and signal references that the verification pipeline can compare against, while formal techniques such as Contract Shadow Logic push the same goal into the security-property domain for processors vulnerable to speculative execution attacks. [C1][C7][C8]