
32
Evaluating and Programming the 29K RISC Family
branch prediction. The decoder can not wait for the outcome of the branch instruction
to be know before it starts fetching the new instruction stream. It must examine the
instruction currently being decoded, and determine if a branch is present. When a
branching instruction is found, the decoder must predict both if the branch will be
taken and the target of the branch. This enables instructions to be fetched and
decoded along the predicted path. Of course, unconditional branches also benefit
from early fetching of their target instruction sequence; and they do not require
branch prediction support.
The instruction decode sequence for the previous code example is shown in
Figure 1-11 using branch prediction. Without waiting for the conditional–jump
instruction in the second entry of the cache block to execute, the decoder predicts the
branch will be taken and in the next cycle starts decoding the block containing the
target instruction. This results in a decode rate of 2.33 instructions per cycle. If the
prediction is correct, the decoder should be able to sustain a decode rate which
prevents
starving
the function units of instructions.
Figure 1-11.
Four–Instruction Decoder with Branch Prediction
add gr98,gr98,10
sll gr99,gr99,2
cpgt gr97,gr97,gr98 jmpt gr97,L14
add lr4,lr4,gr99
jmp
L16
const
lr10,0
time
in cycles
Average Decode = 7/3
rate = 2.33 instructions/cycle
Cache block being decoded
Branch prediction supports speculative instruction fetching. It results in
instructions being placed in the instruction window which may be speculatively
dispatched and executed. If the branch is wrongly predicted, instructions still waiting
in reservation stations must be cancelled. Any wrongly predicated instructions which
reach execution must not be retired. This requires considerable support circuitry. For
this reason scoreboarding is used by some processors to support speculative
instruction fetching. With scoreboarding the decoder sets a scoreboard bit for each
instruction’s destination register. Since there is only one bit indicating there is a
pending update, there can be only one such update per register. Consequently, the
decoder stalls when encountering an instruction required to update a register which
already has a pending update. The scoreboarding mechanism is simpler to implement
than register renaming using a reorder buffer. However, its restrictions limit the
decoder’s ability to speculatively fetch instruction further ahead of actual execution.
This has been shown to result in about 21% poorer performance when a
four–instruction decoder is used [Johnson 1991].