
7-6
G2 PowerPC Core Reference Manual
For More Information On This Product,
Go to: www.freescale.com
MOTOROLA
Instruction Timing Overview
The decode/dispatch stage consists of the time it takes to fully decode the instruction
and dispatch it from the instruction queue to the appropriate execution unit.
Instruction dispatch requires the following:
— Instructions can be dispatched only from the two lowest instruction queue
entries, IQ0 and IQ1.
— A maximum of two instructions can be dispatched per clock cycle.
— Only one instruction can be dispatched to each execution unit per clock cycle.
— There must be a vacancy in the specified execution unit.
— A rename register must be available for each destination operand specified by the
instruction.
— For an instruction to dispatch, the appropriate execution unit must be available
and there must be an open position in the CQ. If no entry is available, the
instruction remains in the IQ.
The execute stage consists of the time between dispatch to the execution unit (or
reservation station) and the point at which the instruction vacates the execution unit.
Most integer instructions have a one-cycle latency; results of these instructions can
be used in the clock cycle after an instruction enters the execution unit. However,
integer multiply and divide instructions take multiple clock cycles to complete. The
IU can process all integer instructions.
The LSU and FPU are pipelined, as shown in Figure 7-2.
The complete (complete/write-back) pipeline stage maintains the correct
architectural machine state and commits it to the architectural registers at the proper
time. If the completion logic detects an instruction containing an exception status,
all following instructions are canceled, their execution results in rename registers are
discarded, and the correct instruction stream is fetched.
The complete stage ends when the instruction is retired. Two instructions can be
retired per cycle. Instructions are retired only from the two lowest CQ entries, CQ0
and CQ1.
The notation conventions used in the instruction timing examples are as follows:
Fetch—The fetch stage includes the time between when an instruction is
requested and when it is brought into the instruction queue. This latency can
vary greatly, depending on whether the instruction is in the on-chip cache or
system memory (in which case latency can be affected by bus speed and
traffic on the system bus, and address translation dispatches). Therefore, in
the examples in this chapter, the fetch stage is usually idealized; that is, an
instruction is usually shown to be in the fetch stage when it is a valid
instruction in the instruction queue. The instruction queue has six entries,
IQ0–IQ5.
F
Freescale Semiconductor, Inc.
n
.