Overview
Register renaming is a microarchitectural stage used in a superscalar out-of-order processor to eliminate register name dependencies, specifically Write-After-Read (WAR) and Write-After-Write (WAW) hazards. It does this by mapping architectural, or logical, registers from the instruction set architecture to a larger pool of physical registers. In the two-way RISC-V superscalar out-of-order processor described in the evidence, decoded instructions are sent to the Register Renaming stage after instruction decode, and up to two instructions can be renamed per cycle. [1]
How renaming works
In a merged physical register file implementation, architectural and speculative state are held in a single physical register file. Register renaming translates the logical destination register of each instruction that produces data into a physical destination register, commonly denoted Pdst. Logical source registers are translated into corresponding physical source specifiers, or Psrc values. Thus, during renaming, the instruction's logical register specifiers are replaced with physical register specifiers before the instruction proceeds to later pipeline stages. [2]
Register renaming can be performed one instruction at a time in scalar processors or multiple instructions at once in superscalar processors. The cited two-way RISC-V processor performs two-way register renaming: two instructions can be renamed and retired per clock cycle. [2]
Main hardware structures
The register renaming stage described in the source consists of three main hardware arrays: the Free List, the Register Alias Table, and the Checkpoint Table. [2]
Free List
The Free List is a FIFO structure initialized with physical destination registers when the core powers on. When an instruction needs a destination register, the Free List provides a free Pdst. The renamed instruction is then sent to the Reservation Station, where it waits to execute; when the instruction executes, it updates the physical register identified by its Pdst. [2]
Register Alias Table
The Register Alias Table stores the most recent mapping from each logical register specifier to a physical destination register. It is used to rename source operands: logical input registers are looked up in the table, and their renamed physical registers are forwarded to the Reservation Station so the processor can determine when the instruction can execute. [2]
Checkpoint Table
The Checkpoint Table stores snapshots of the Register Alias Table. In the described design, a snapshot is taken whenever an incoming branch instruction is encountered. This supports fast restoration of register-renaming state after pipeline flushes. When a mispredicted branch causes a flush, the Register Alias Table is restored from the checkpoint associated with the offending branch, and physical destination registers allocated after that instruction are returned to the Free List. [2]
Role in the out-of-order pipeline
In the referenced two-way RISC-V superscalar out-of-order processor, the front end fetches and decodes up to two instructions per cycle. Decoded instructions enter the Register Renaming stage, which maps architectural registers to physical registers and removes WAR and WAW hazards. Renamed instructions then enter the Issue Queue and proceed through the Instruction Issue stage, which uses a scoreboard to track the status of each physical register. Based on scoreboard information, up to two ready instructions may be issued per cycle for out-of-order execution. [1]
Verification context
The register renaming unit was one of the main units verified in a UVM-based verification flow for the two-way RISC-V superscalar processor, alongside Instruction Fetch, Issue, and ReOrder Buffer units. [3]
A feedback-based verification technique was applied to the Register Renaming Sub-system. The method used constrained-random sequences and direct-test sequences, then adjusted each sequence's simulation duration based on its incremental contribution to functional coverage. Compared with conventional simulation that ran each test sequence for a fixed, pre-decided number of cycles, the reported results achieved much higher functional coverage in substantially less simulation time, with approximately 70% simulation-time savings for different test-sequence sets. [4]