Skip to content
STIMSMITH

Benchmarks

Concept

Benchmarks are evaluation artifacts used to measure and compare systems. In CPU verification evidence, they appear as performance-oriented patterns and as part of a RISC-V core verification strategy alongside compliance tests, direct tests, random instruction generation, and simulation-based checking. Broader benchmark research also highlights that benchmark quality, runnability, documentation, and auditability affect whether results are reliable and comparable.

First seen 5/27/2026
Last seen 5/28/2026
Evidence 3 chunks
Wiki v1

WIKI

Benchmarks

Benchmarks are evaluation infrastructures or workloads used to assess systems and enable systematic comparison. In hardware and CPU verification contexts, benchmarks can be part of a performance verification plan: they provide patterns used to measure processor performance aspects and identify bottlenecks.

Role in CPU verification

READ FULL ARTICLE →

NEIGHBORHOOD

No graph connections found for this entity yet. It may appear in future ingestion runs.

explore full graph →

RELATIONSHIPS

3 connections
The paper uses benchmarks to evaluate the functional performance of the RISC-V core.
Dhrystone ← part of 95% 1e
Dhrystone is one of the benchmarks used in the evaluation.
Coremark ← part of 95% 1e
Coremark is one of the benchmarks used in the evaluation.

CITATIONS

6 sources
6 citations — click to expand
[1] Benchmarks are evaluation infrastructures used to identify trends and support systematic comparisons. Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks
[2] CPU performance verification plans can capture patterns or benchmarks used to measure processor performance aspects and bottlenecks, with examples including specint, lmbench, and dhrystone. UVM based design verification of a RISC-V CPU core - POLITesi
[3] The RISC-V verification thesis uses benchmarks together with direct tests, a random instruction generator, RISC-V toolchain components, and the RISC-V compliance test suite to evaluate functional performance and correctness under different scenarios. UVM based design verification of a RISC-V CPU core - POLITesi
[4] The thesis experimental evaluation chapter contains a Benchmarks section with Dhrystone and Coremark subsections. UVM based design verification of a RISC-V CPU core - POLITesi
[5] A 2026 study of 31 LLM safety benchmarks found that only 39% of benchmark repositories ran without modification, only 16% had flawless installation guides, and ad-hoc modifications can reduce comparability of downstream evaluations. Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks
[6] BenchGuard proposes automated auditing of task-oriented, execution-based agent benchmarks and found author-confirmed issues, including fatal errors that made some benchmark tasks unsolvable. BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks