
7-12
G2 PowerPC Core Reference Manual
For More Information On This Product,
Go to: www.freescale.com
MOTOROLA
Timing Considerations
1. In cycle 1, instructions 0 and 1 are dispatched to the IU and FPU, respectively.
Notice that for instructions to be dispatched, they must be assigned positions in the
CQ. In this case, because the CQ is empty, instructions 0 and 1 take the two lowest
CQ entries (CQ0 and CQ1). Instructions 2 and 3 are fetched from the instruction
cache.
2. At least two IQ positions were available in the IQ in cycle 1, so in cycle 2,
instructions 4 and 5 are fetched. Instruction 4 is a branch unconditional instruction
that resolves immediately as taken. Because the branch is taken and does not
update CTR or LR, it can be folded from the IQ. Instruction 0 completes, writes
back its results, and vacates the CQ by the end of the clock cycle. Instruction 1
enters the second FPU execute stage, instruction 2 enters the single-stage IU, and
instruction 3 is dispatched into the first FPU stage.
3. In cycle 3, target instructions 6 and 7 are fetched, replacing the folded
br
instruction 4 and instruction 5. Instruction 1 enters the last FPU execute stage,
instruction 2 has executed but must remain in the CQ until instruction 1 completes.
Note that it can make its results available to subsequent instructions, but cannot be
removed from the CQ. Instruction 3 passes into the last FPU execute stage. Note
that all three FPU stages are full. To allow for the potential need for
denormalization, the dispatch logic prevents instruction 7 (
fadd
) from being
dispatched in the next clock cycle.
4. In cycle 4, target instructions (8 and 9) are fetched. Instruction 1 completes in cycle
4, allowing instruction 2, which had finished executing in the previous clock cycle,
to be removed from the CQ. Instruction 6 replaces instruction 3 in the first stage of
the FPU. Also, as will be shown in cycle 5, a single-cycle stall occurs when the
FPU pipeline is full.
5. In cycle 5, instruction 3 completes, instruction 6 continues through the FPU
pipeline, and although the first stage of the FPU pipeline is free, instruction 7
cannot be dispatched because of the potential need for one of the previous
floating-point instructions to require denormalization. Because instruction 7 cannot
be dispatched neither can instruction 8. This dispatch stall causes the instruction
queue to become full when instructions 10 and 11 are fetched.
6. In cycle 6, instruction 12 is fetched. Instruction 7 is dispatched to the first FPU
stage, so instruction 8 can also be dispatched to the IU. Instructions 9 and 10 move
to IQ0 and IQ1, but because instructions 9, 10, and 11 are integer instructions, only
one instruction is dispatched in each of the next two clock cycles. Note that moving
instruction 12 (
fadd
) up further in the program flow would improve dispatch
throughput.
7. In cycle 7, instruction 6 completes, instruction 7 is in the second FPU execute
stage, and although instruction 8 has executed, it must wait for instruction 7 to
complete. Instruction 9 dispatches to the IU. Instructions 10 and 11 move down in
the IQ. Fetching resumes with instructions 13 and 14.
F
Freescale Semiconductor, Inc.
n
.