Skip to content
STIMSMITH

instruction fetch matching algorithm

CodeArtifact WIKI v1 · 5/26/2026

The instruction fetch matching algorithm is a co-simulation mechanism for feeding a consistent instruction stream to a pipelined RTL core and an instruction-set simulator (ISS). It handles RTL pre-fetching, jumps, traps, and on-the-fly instruction generation by keeping a fetch-order queue of pending instructions and matching ISS fetch requests against both program counter and expected instruction.

Overview

The instruction fetch matching algorithm addresses a co-simulation problem: an RTL core and an instruction-set simulator (ISS) must receive the same instruction stream, but a pipelined RTL core can pre-fetch instructions that the ISS has not yet requested. Because jumps and traps can invalidate or redirect pre-fetched instructions, direct program-counter-only matching is insufficient. [C1]

The algorithm keeps a queue of pending instructions fetched by the RTL core but not yet consumed by the ISS. Each queue entry stores both the instruction and its program counter (PC). When the ISS requests its next instruction, the algorithm searches the pending queue for an entry whose PC and instruction both match the ISS-side expectation. [C2]

Problem addressed

The cited co-simulation setup generates an endless, unrestricted instruction stream. This includes memory-access instructions, jump instructions including self-loops from on-the-fly generation, and special RISC-V CSR access instructions. The observable architectural state of the ISS and RTL core is expected to remain identical, such as matching register updates. [C3]

This setup creates two implementation challenges: detecting when an instruction is completed in the RTL core, and feeding the same instruction stream into both the RTL core and ISS. The second challenge requires special handling because the RTL core can pre-fetch several instructions due to pipelining. [C4]

Why direct PC matching fails

A direct match based only on PC does not work because the RTL core may fetch an instruction for a PC before the ISS has reached that same PC. The evidence gives a one-instruction backward jump example: a jump J from address 8 to address 4 causes the RTL core to begin pre-fetching from address 4 before J is fully completed, and a new instruction can be generated for address 8 before the ISS has fetched and executed J. [C5]

Jumps, traps, pipeline flushes, and multi-cycle or stalled pipeline stages can create delays and gaps that must be considered when deciding which RTL instruction has completed. [C6]

Algorithm structure

The algorithm operates around an InstrStream::next_ISS_instr(PC, expected_instr) routine shown in the evidence. The routine searches a pending_instrs queue and repeatedly pops entries until it finds a matching (iPC, i) pair. A match requires both iPC = PC and i == expected_instr. On match, it returns the instruction; if no match is found before the queue is empty, it reports a mismatch. [C2]

function InstrStream::next_ISS_instr(PC, expected_instr)
    while not pending_instrs_queue.empty() do
        (iPC, i) <- pending_instrs_queue.pop()
        if iPC = PC and i == expected_instr then
            return i
    report mismatch()

The queue is populated when instructions are fetched by the RTL core. The evidence states that the instruction stream stores pending instructions in fetch order and includes the PC alongside the generated instruction. [C2]

Role of the core adapter

The matching algorithm depends on the core adapter to extract the last completed instruction from the RTL core by analyzing pipeline signals. The last completed RTL instruction is passed together with the ISS PC when fetching the next ISS instruction. [C2]

The core adapter also hides implementation details of the core, observes internal signal changes—especially pipeline signals—and notifies the test controller whenever the RTL core completes one instruction. It preserves correct order in the presence of illegal instructions and provides access to RTL register values for comparison with the ISS. [C7]

Mismatch semantics

If the ISS attempts to fetch an instruction that cannot be matched against the pending queue, the algorithm reports a mismatch. The evidence describes this as a mismatch between the RTL core and ISS because the ISS tried to fetch an instruction that was not delivered to the RTL core. [C8]

The approach deliberately avoids feeding the completed instruction sequence from the core adapter directly into the ISS, because doing so would rely on the correctness of instruction propagation inside the RTL core, which is the unit under test. [C8]

Key properties

  • Maintains a fetch-order queue of pending RTL-fetched instructions. [C2]
  • Stores both PC and generated instruction per pending entry. [C2]
  • Matches ISS requests using both PC and expected instruction, not PC alone. [C2]
  • Reports a mismatch when no matching pending instruction exists. [C2]
  • Supports unrestricted generated instruction streams in the cited co-simulation context. [C3]
  • Avoids trusting the RTL core's completed-instruction stream as the ISS input. [C8]

CITATIONS

8 sources
8 citations
[1] C1: RTL pre-fetching, jumps, and traps make it difficult to feed the same instruction sequence to RTL core and ISS, and PC-only matching is insufficient.
[2] C2: The algorithm stores RTL-fetched pending instructions with their PC in a fetch-order queue, searches the queue for a matching PC and expected instruction, returns the instruction on match, and reports mismatch otherwise.
[3] C3: The surrounding approach generates an endless unrestricted instruction stream, including memory accesses, jumps including self-loops, and special RISC-V CSR access instructions, while expecting identical observable architectural state between ISS and RTL core.
[4] C4: The implementation challenges are detecting completed RTL instructions and feeding the same instruction stream into the RTL core and ISS.
[5] C5: A one-instruction backward jump example shows why direct PC matching fails: the RTL core can pre-fetch from the target before the jump is fully completed and before the ISS has fetched and executed the jump.
[6] C6: Pipeline flushes, traps, stalled stages, multi-cycle operations, delays, and gaps must be considered when detecting completed instructions.
[7] C7: The core adapter observes internal pipeline signal changes, notifies the test controller when the RTL core completes an instruction, preserves order for illegal instructions, and provides RTL register values for ISS comparison.
[8] C8: On no match, a mismatch is reported because the ISS tried to fetch an instruction not delivered to the RTL core; the completed RTL instruction sequence is not directly fed to the ISS to avoid relying on the RTL core under test.