
MOTOROLA
Chapter 7. Instruction Timing
7-9
Timing Considerations
instructions are fetched from the on-chip instruction cache. If the instruction request hits in
the on-chip instruction cache, it can usually present the first two instructions of the new
instruction stream in the next clock cycle, giving enough time for the next pair of
instructions to be fetched from the cache with no idle cycles. Instructions not in the
instruction cache are fetched from system memory.
Branch instructions that do not update the LR or CTR are removed from the instruction
stream either by branch folding or removal of fall-through branch instructions, as described
in Section 7.4.1.1, “Branch Folding.” Branch instructions that update the LR or CTR are
treated as if they require dispatch (even through they are not dispatched to an execution unit
in the process). They are assigned a position in the CQ to ensure that the CTR and LR are
updated sequentially.
All other instructions are dispatched from IQ0 and IQ1. The dispatch rate depends on the
availability of resources such as the execution units, rename registers, and CQ entries, and
on the serializing behavior of some instructions. Instructions are dispatched in program
order; an instruction in IQ1 can be dispatched at the same time as one in IQ0, but cannot be
dispatched ahead of one in IQ0.
Instruction state and all information required for completion is kept in the five-entry, FIFO
completion queue. A completion queue entry is allocated for each instruction when it is
dispatched to an execute unit; if no entry is available, the dispatch unit stalls. A maximum
of two instructions per cycle may be completed and retired from the completion queue, and
the flow of instructions can stall when a longer-latency instruction reaches the last position
in the completion queue. Store instructions and instructions executed by the FPU and SRU
(with the exception of integer add and compare instructions) can only be retired from the
last position in the completion queue. Subsequent instructions cannot be completed and
retired until that longer-latency instruction completes and retires. Examples of this are
shown in Section 7.3.2.2, “Cache Hit,” and Section 7.3.2.3, “Cache Miss.”
The rate of instruction completion is also affected by the ability to write instruction results
from the rename registers to the architected registers. The G2 core can perform two
write-back operations from the rename registers to the GPRs each clock cycle, but can
perform only one write-back per cycle to the CR, FPR, LR, and CTR.
7.3.2
Instruction Fetch Timing
Instruction fetch latency depends on the fetch hits of the on-chip instruction cache. If no hit
occurs, a memory transaction is required, in which case fetch latency is affected by bus
traffic, bus clock speed, and memory translation. These conditions are discussed in the
following sections.
F
Freescale Semiconductor, Inc.
For More Information On This Product,
Go to: www.freescale.com
n
.