instruction fetch matching algorithm Wiki

Overview

The instruction fetch matching algorithm addresses a co-simulation problem: an RTL core and an instruction-set simulator (ISS) must receive the same instruction stream, but a pipelined RTL core can pre-fetch instructions that the ISS has not yet requested. Because jumps and traps can invalidate or redirect pre-fetched instructions, direct program-counter-only matching is insufficient. [C1]

The algorithm keeps a queue of pending instructions fetched by the RTL core but not yet consumed by the ISS. Each queue entry stores both the instruction and its program counter (PC). When the ISS requests its next instruction, the algorithm searches the pending queue for an entry whose PC and instruction both match the ISS-side expectation. [C2]

Problem addressed

The cited co-simulation setup generates an endless, unrestricted instruction stream. This includes memory-access instructions, jump instructions including self-loops from on-the-fly generation, and special RISC-V CSR access instructions. The observable architectural state of the ISS and RTL core is expected to remain identical, such as matching register updates. [C3]

This setup creates two implementation challenges: detecting when an instruction is completed in the RTL core, and feeding the same instruction stream into both the RTL core and ISS. The second challenge requires special handling because the RTL core can pre-fetch several instructions due to pipelining. [C4]

Why direct PC matching fails

A direct match based only on PC does not work because the RTL core may fetch an instruction for a PC before the ISS has reached that same PC. The evidence gives a one-instruction backward jump example: a jump J from address 8 to address 4 causes the RTL core to begin pre-fetching from address 4 before J is fully completed, and a new instruction can be generated for address 8 before the ISS has fetched and executed J. [C5]

Jumps, traps, pipeline flushes, and multi-cycle or stalled pipeline stages can create delays and gaps that must be considered when deciding which RTL instruction has completed. [C6]

Algorithm structure

The algorithm operates around an InstrStream::next_ISS_instr(PC, expected_instr) routine shown in the evidence. The routine searches a pending_instrs queue and repeatedly pops entries until it finds a matching (iPC, i) pair. A match requires both iPC = PC and i == expected_instr. On match, it returns the instruction; if no match is found before the queue is empty, it reports a mismatch. [C2]

function InstrStream::next_ISS_instr(PC, expected_instr)
    while not pending_instrs_queue.empty() do
        (iPC, i) <- pending_instrs_queue.pop()
        if iPC = PC and i == expected_instr then
            return i
    report mismatch()

The queue is populated when instructions are fetched by the RTL core. The evidence states that the instruction stream stores pending instructions in fetch order and includes the PC alongside the generated instruction. [C2]

Role of the core adapter

The matching algorithm depends on the core adapter to extract the last completed instruction from the RTL core by analyzing pipeline signals. The last completed RTL instruction is passed together with the ISS PC when fetching the next ISS instruction. [C2]

The core adapter also hides implementation details of the core, observes internal signal changes—especially pipeline signals—and notifies the test controller whenever the RTL core completes one instruction. It preserves correct order in the presence of illegal instructions and provides access to RTL register values for comparison with the ISS. [C7]

Mismatch semantics

If the ISS attempts to fetch an instruction that cannot be matched against the pending queue, the algorithm reports a mismatch. The evidence describes this as a mismatch between the RTL core and ISS because the ISS tried to fetch an instruction that was not delivered to the RTL core. [C8]

The approach deliberately avoids feeding the completed instruction sequence from the core adapter directly into the ISS, because doing so would rely on the correctness of instruction propagation inside the RTL core, which is the unit under test. [C8]

Key properties

Maintains a fetch-order queue of pending RTL-fetched instructions. [C2]
Stores both PC and generated instruction per pending entry. [C2]
Matches ISS requests using both PC and expected instruction, not PC alone. [C2]
Reports a mismatch when no matching pending instruction exists. [C2]
Supports unrestricted generated instruction streams in the cited co-simulation context. [C3]
Avoids trusting the RTL core's completed-instruction stream as the ISS input. [C8]