Instruction Caching Wiki

Overview

Instruction caching refers to caching instruction-related information so it can be reused when the same instruction stream or instruction word is encountered again. In the instruction-set simulator (ISS) context documented here, the cached object is the result of decoding an instruction: a decode function or macro decomposes the instruction word into bit fields, stores those fields in a record-like data structure, and avoids decoding the same instruction repeatedly during simulation. [C1]

Use in instruction-set simulation

In just-in-time compiled simulation (JIT-CS), information about previously decoded instructions is stored in a cache and reused when an instruction is executed again. The cited source states that this can provide simulation performance comparable to compiled simulation while retaining the flexibility of an interpretive approach. [C2]

A generated ISS can apply a similar technique. The simulator keeps decoded fields of the current instruction in an instruction_t-style structure, and uses this cached decode information to avoid repeated decoding. The same source notes that software locality—such as loops—makes this an efficient way to reduce simulation run time. [C3]

Reported measurements in the same work show the performance context for this kind of optimization: for a small processor design P1, interpretive simulation reached 0.22 MIPS, JIT-CS reached 14.0 MIPS, and the generated ISS reached 7.0 MIPS; for an industrial design P2, JIT-CS reached 2.5 MIPS and the generated ISS reached 1.2 MIPS. [C4]

Hardware instruction-cache variants

The term also appears in processor architecture contexts. The Sphynx study examined sharing an instruction cache among multiple independent processor cores in a massively parallel machine running the same program, with the goal of reducing replicated on-chip instruction storage and die area. [C5]

A later hierarchical instruction-cache design for ultra-low-power tightly coupled processor clusters used private L1 instruction caches backed by a shared L1.5 cache over a two-cycle-latency interconnect. That work identified instruction-cache architecture as a bottleneck for timing, bandwidth, and power, and reported up to 20% higher operating frequency and up to 17% higher maximum performance compared with the stated state of the art. [C6]

Relationship to just-in-time compiled simulation

Instruction caching is closely related to Just-in-Time Compiled Simulation because JIT-CS, as described in the evidence, caches information about previously decoded instructions and reuses it on later executions. [C2]