
MOTOROLA
Chapter 7. Instruction Timing
7-3
Instruction Timing Overview
An instruction can spend multiple cycles in one stage. An integer multiply, for
example, takes multiple cycles in the execute stage. When this occurs, subsequent
instructions may stall.
In some cases, an instruction may also occupy more than one stage simultaneously,
especially in the sense that a stage can be seen as a physical resource—for example,
when instructions are dispatched they are assigned a place in the CQ at the same time
they are passed to the execute stage. They can be said to occupy both the complete
and execute stages in the same clock cycle.
Stall—An occurrence when an instruction cannot proceed to the next stage.
Store Queue—Holds store operations that have not been committed to memory,
resulting from completed or retired instructions.
Superscalar—A superscalar processor is one that can dispatch multiple instructions
concurrently from a conventional linear instruction stream. In a superscalar
implementation, multiple instructions can be in the same stage at the same time.
Throughput—A measure of the number of instructions that are processed per cycle.
For example, a series of double-precision floating-point multiply instructions has a
throughput of one instruction per clock cycle.
Write-back—Write-back (in the context of instruction handling) occurs when a
result is written from the rename registers into the architectural registers (typically
the GPRs and FPRs or the store queue).
7.2
Instruction Timing Overview
The G2 core design minimizes average instruction execution latency, the number of clock
cycles it takes to fetch, decode, dispatch, and execute instructions and make the results
available for a subsequent instruction. Some instructions, such as loads and stores, access
memory and require additional clock cycles between the execute phase and the write-back
phase. These latencies vary depending on whether the access is to cacheable or
noncacheable memory, whether it hits in the L1 cache, whether the cache access generates
a write-back to memory, whether the access causes a snoop hit from another device that
generates additional activity, and other conditions that affect memory accesses.
The G2 core implements many features to improve throughput, such as pipelining,
superscalar instruction dispatch, branch folding, removal of fall-through branches,
two-level speculative branch handling, and multiple execution units that operate
independently and in parallel.
As an instruction of load/store and floating-point units passes from stage to stage in a
pipelined system, the following instruction can follow through the stages as the former
instruction vacates them, allowing several instructions to be processed simultaneously.
While it may take several cycles for an instruction to pass through all the stages, when the
pipeline has been filled, one instruction can complete its work on every clock cycle.
F
Freescale Semiconductor, Inc.
For More Information On This Product,
Go to: www.freescale.com
n
.