Benchmarks

Concept

Benchmarks are evaluation artifacts used to measure and compare systems. In CPU verification evidence, they appear as performance-oriented patterns and as part of a RISC-V core verification strategy alongside compliance tests, direct tests, random instruction generation, and simulation-based checking. Broader benchmark research also highlights that benchmark quality, runnability, documentation, and auditability affect whether results are reliable and comparable.

First seen 5/27/2026

Last seen 5/28/2026

Evidence 3 chunks

Wiki v1

WIKI

Benchmarks

Benchmarks are evaluation infrastructures or workloads used to assess systems and enable systematic comparison. In hardware and CPU verification contexts, benchmarks can be part of a performance verification plan: they provide patterns used to measure processor performance aspects and identify bottlenecks.

Role in CPU verification

READ FULL ARTICLE →

NEIGHBORHOOD

No graph connections found for this entity yet. It may appear in future ingestion runs.

explore full graph →

RELATIONSHIPS

3 connections

UVM Based Design Verification of a RISC-V CPU Core ← uses 100% 1e

The paper uses benchmarks to evaluate the functional performance of the RISC-V core.

Dhrystone ← part of 95% 1e

Dhrystone is one of the benchmarks used in the evaluation.

Coremark ← part of 95% 1e

Coremark is one of the benchmarks used in the evaluation.

LINKED ENTITIES

3 links

UVM Based Design Verification of a RISC-V CPU Core USES Extracted graph relationship

Dhrystone PART_OF Extracted graph relationship

Coremark PART_OF Extracted graph relationship

CITATIONS

6 sources

6 citations — click to expand

[1] Benchmarks are evaluation infrastructures used to identify trends and support systematic comparisons. Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

[2] CPU performance verification plans can capture patterns or benchmarks used to measure processor performance aspects and bottlenecks, with examples including specint, lmbench, and dhrystone. UVM based design verification of a RISC-V CPU core - POLITesi

[3] The RISC-V verification thesis uses benchmarks together with direct tests, a random instruction generator, RISC-V toolchain components, and the RISC-V compliance test suite to evaluate functional performance and correctness under different scenarios. UVM based design verification of a RISC-V CPU core - POLITesi

[4] The thesis experimental evaluation chapter contains a Benchmarks section with Dhrystone and Coremark subsections. UVM based design verification of a RISC-V CPU core - POLITesi

[5] A 2026 study of 31 LLM safety benchmarks found that only 39% of benchmark repositories ran without modification, only 16% had flawless installation guides, and ad-hoc modifications can reduce comparability of downstream evaluations. Benchmark of Benchmarks: Unpacking Influence and Code Repository Quality in LLM Safety Benchmarks

[6] BenchGuard proposes automated auditing of task-oriented, execution-based agent benchmarks and found author-confirmed issues, including fatal errors that made some benchmark tasks unsolvable. BenchGuard: Who Guards the Benchmarks? Automated Auditing of LLM Agent Benchmarks