UCB1 Algorithm

Technique

UCB1 is an Upper Confidence Bound multi-armed bandit algorithm used to select among candidate virtual sequences by balancing exploitation of sequences with high observed reward and exploration of less-used sequences. In the cited UVM-based RISC-V verification flow, UCB1 selects one virtual sequence per trial, updates rewards from functional coverage data, and can reach the same coverage as random sequence selection in substantially fewer trials.

First seen 5/28/2026

Last seen 5/28/2026

Evidence 4 chunks

Wiki v1

WIKI

Overview

The UCB1 Algorithm is described in the provided evidence as an Upper Confidence Bound multi-armed bandit algorithm used to orchestrate simulation tests. In the verification flow, the available virtual sequences are treated like bandit "arms" or slot machines, and the algorithm selects which sequence to run based on rewards observed in previous trials. Its goal is to maximize and speed up functional coverage while still trying sequences that have not been used often enough to rule out high payoff potential.

Role in a MAB-based verification flow

READ FULL ARTICLE →

NEIGHBORHOOD

No graph connections found for this entity yet. It may appear in future ingestion runs.

explore full graph →

RELATIONSHIPS

4 connections

Multi-Armed Bandit implements → 98% 2e

UCB1 is the specific multi-armed bandit algorithm implemented in the verification framework.

Reward Function uses → 95% 2e

UCB1 uses the reward function to guide sequence selection decisions.

virtual sequence uses → 90% 1e

UCB1 orchestrates the execution of virtual sequences to maximize coverage.

Functional Coverage evaluates → 90% 1e

UCB1 evaluates functional coverage to measure simulation effectiveness.

LINKED ENTITIES

4 links

Multi-Armed Bandit IMPLEMENTS Extracted graph relationship

Reward Function USES Extracted graph relationship

virtual sequence USES Extracted graph relationship

Functional Coverage EVALUATES Extracted graph relationship

CITATIONS

9 sources

9 citations — click to expand

[1] UCB1 is an Upper Confidence Bound multi-armed bandit algorithm used to exploit effective sequences while exploring less-used sequences for potentially higher payoff. [PDF] UVM-based verification of RISC-V superscalar processors

[2] The MAB verification framework selects one virtual sequence based on collected reward, applies it to the DUT, gathers coverage data, and uses a scoreboard to check functional specifications. [PDF] UVM-based verification of RISC-V superscalar processors

[3] The UCB1 selection procedure initializes rewards by playing each virtual sequence once, then selects the sequence maximizing a mean reward plus uncertainty estimate, observes the trial reward, and updates the mean reward. [PDF] UVM-based verification of RISC-V superscalar processors

[4] The uncertainty term decreases when a sequence is selected, increases for idle sequences as trials advance, and helps UCB1 balance exploration and exploitation. [PDF] UVM-based verification of RISC-V superscalar processors

[5] Rewards in the cited framework are based on active coverage-bin hits, are bounded in [0, 1], and are logarithmically renormalized to make small rewards more distinguishable near hard-to-cover corner cases. [PDF] UVM-based verification of RISC-V superscalar processors

[6] In the RISC-V instruction fetch example, a virtual sequence contains four sequences, one per interface, and the framework uses parameterized constrained-random sequences to mimic interface behavior. [PDF] UVM-based verification of RISC-V superscalar processors

[7] The cited experiment selects K = 40 virtual sequences and treats coverpoints as fully covered when bins are hit at least 100 times. [PDF] UVM-based verification of RISC-V superscalar processors

[8] Across five seeds, random sequence selection for 5000 trials reached 82% to 91% coverage, while UCB1 reached the same coverage in 1507 to 2590 trials, averaging 1988 trials and about a 60% trial saving. [PDF] UVM-based verification of RISC-V superscalar processors

[9] When UCB1 ran for the same 5000 trials as random selection, it reached 88% to 96% coverage, and the evidence states that UCB1 identifies the potential of the selected sequence set faster rather than improving the sequence set's quality. [PDF] UVM-based verification of RISC-V superscalar processors