Overview
A Branch Target Buffer (BTB) is a branch-prediction structure used by the instruction-fetch path of a CPU. It records the target Program Counter (PC) address of branch instructions so the processor can determine taken-branch destinations more quickly. In a cited RISC-V superscalar processor description, the Instruction Fetch (IF) unit fetches instructions from the instruction cache, predicts the next PC, and includes a BTB as part of its dynamic predictor alongside the Branch History Table (BHT) and the Return Address Stack (RAS). [C1]
Role in the Branch Prediction Subsystem
The BTB is one of several complementary predictor structures in modern CPUs. The BHT maintains prior branch outcomes to predict direction (taken vs. not-taken), while the RAS stores decoded function-call return addresses and supplies predicted PCs for ret instructions. The BTB supplies branch target addresses, completing the address-prediction side of the dynamic predictor. [C1]
BTB Interaction with the L1 Instruction Cache
High-performance BTBs are tightly coupled with the L1 instruction cache (L1I) at the front-end. Even with highly accurate branch predictors, BTB and L1I misses remain frequent in modern server workloads because code footprints keep growing. The cited industry trend is to use very large BTBs (hundreds of KB per core) combined with a decoupled front-end that provides fetch-directed L1I instruction prefetching, in order to bring delivered front-end performance closer to the ideal BTB. [P1]
Academic proposals such as BTB prefetching and using retire-order streams for learning have not produced significant gains on modern wider and deeper cores, motivating alternative last-level BTB designs. [P1]
MicroBTB: Compressed Last-Level BTB Design
A key observation behind the MicroBTB (MBTB) proposal is that not all branch instructions require a full branch target address; instead the branch target can be stored as a branch offset relative to the branch instruction. Storing relative offsets enables multiple branches to be packed per BTB entry. The MBTB design is skewed-indexed and compressed to reduce the skewness that this compression otherwise introduces. Evaluated on 100 industry-provided server workloads, a 4K-entry MBTB provides 17.61% performance improvement compared to an 8K-entry baseline BTB design while saving 47.5 KB of storage per core. [P1]
BTB Reverse Engineering on ARM
Although BTBs play a critical role in efficient branch prediction and in mitigating hardware attacks such as Spectre, proprietary CPUs (Intel, AMD, Apple, Qualcomm) do not publicly document their BTB implementations. Previous reverse-engineering work has primarily targeted Intel x86, recovering BTB capacity and associativity. Recent work adapts these x86 methodologies to ARM by identifying ARM-specific PMU events and reproducing the reverse-engineering flow. On the quad-core Cortex-A72 of the Raspberry Pi 4B, the recovered BTB parameters are: capacity of 4K entries, a set index using bits 5 through 15 of the PC (11 bits total), and 2 ways per set. These findings fill a gap in public knowledge of ARM BTB implementations and are useful for compiler design and hardware-attack mitigation. [P2]
BTB in Transient Execution Attacks (Spectre-V2)
The BTB is one of the microarchitectural resources exploited by Spectre-style transient execution attacks. The transient-execution attack model consists of four steps: training the target microarchitecture, triggering a transient window through the trained state, accessing sensitive data and encoding it into a side channel, and decoding the secret from the side channel. Different transient-window types require different training patterns. Unlike Spectre-V1, where the training and transient-execution sections are independent as long as the branch instructions share an address offset, Spectre-V2 requires different arguments (a0) to switch between training and exploiting the BTB with the same code. Spectre-RSB similarly targets a different structure, the Return Stack Buffer (RSB). [C2]
Role in Branch Validation and Redirection
BTB-derived target predictions feed a speculative control-flow mechanism whose predictions must be verified. In the cited RISC-V superscalar design, the Instruction Decode stage contains a flush controller that determines whether a branch predicted during IF was correct. If the branch was mispredicted, the controller redirects the pipeline and issues a flush. It compares the resolved branch result produced by the execute-stage branch resolve unit and carried by a predictor-update signal against the stored initial prediction, with a FIFO holding branch predictions in program order for this comparison. [C1]
Functional Verification Coverage
BTB behavior is an explicit functional-coverage target in the cited UVM-based verification work for a RISC-V superscalar IF unit. The IF-unit coverpoints include reading from every line of the BTB array, writing to every line of the BTB array, observing the BTB full condition, and observing the BTB empty condition. The same coverage table includes BHT and RAS coverpoints, reflecting that BTB verification is performed as part of the larger branch-prediction subsystem. [C1]
Processor Fuzzing and RTL Coverage
In the ProcessorFuzz framework, which uses a CSR-transition coverage metric together with ISA-simulator feedback to guide fuzzing, the BTB appears as a named RTL block in the Rocket Core RTL coverage illustration. The block sits alongside DCache, MulDiv, and the Rocket core itself, with a coverage map updated each cycle. This indicates that BTB state is treated as a first-class coverage target when processor fuzzers drive RTL designs. [C3]
Related Entities
- Spectre-V2: A transient execution attack that USES the BTB; it requires different arguments (a0) to switch between training and exploiting the BTB with the same code, unlike Spectre-V1 whose training and transient sections can be generated independently. [C2]
- Rocket Core: BTB is one of the named RTL blocks in ProcessorFuzz's Rocket Core RTL coverage illustration. [C3]