
MOTOROLA
Chapter 7. Instruction Timing
7-17
Execution Unit Timings
Performance features such as branch folding and static branch prediction help minimize
penalties associated with flow control operations. The timing for branch instruction
execution is determined by many factors including the following:
Whether the branch requires prediction
Whether the branch is predicted as taken or not taken
Whether the branch is taken
Whether the target instruction stream is in the on-chip cache
Whether the prediction is correct
7.4.1.1
Branch Folding
When a branch instruction is encountered by the fetcher, the BPU immediately tries to pull
that instruction out of the instruction stream and resolve it. When the BPU removes the
branch instruction from the stream, the subsequent instruction is shifted down to take the
place of the removed branch instruction. This technique is called branch folding. Often, it
eliminates the penalties of flow control instructions because instruction execution proceeds
as though the branch were never there.
If the folded branch instruction changes program flow (the branch is said to be taken), the
BPU immediately requests the instructions at the new target from the on-chip cache. In
most cases, the new instructions arrive in the IQ before any bubbles are introduced into the
execution units. If the folded branch does not change program flow (the branch is not
taken), the branch instruction is already removed and execution continues as if there were
never a branch in the original sequence.
When a conditional branch cannot be resolved due to a CR data dependency, the branch is
executed by means of static branch prediction and instruction fetching proceeds down the
predicted path. If the prediction is incorrect when the branch is resolved, the IQ and all
subsequently executed instructions are purged, instructions executed before the predicted
branch are allowed to complete, and instruction fetching resumes down the correct path.
There are several situations where instruction sequences create dependencies that prevent
a branch instruction from being resolved immediately, thereby causing execution of the
subsequent instruction stream based on the predicted outcome of the branch instruction.
The instruction sequences, and the resulting action of the branch instruction is described as
follows:
An
mtspr
(LR) followed by a
bclr
—Fetching is stopped and the branch waits for the
mtspr
to execute.
An
mtspr
(CTR) followed by a
bcctr
—Fetching is stopped and the branch waits for
the
mtspr
to execute.
An
mtspr
(CTR) followed by a
bc
(CTR)—Fetching is stopped and the branch waits
for the
mtspr
to execute. (Note: Branch conditions can be a function of the CTR and
F
Freescale Semiconductor, Inc.
For More Information On This Product,
Go to: www.freescale.com
n
.