10
SME1430LGA-360
SME1430LGA-440
SME1430LGA-480
Highly Integrated 64-Bit RISC; L2-Cache, DRAM, PCI Interfaces
UltraSPARC-IIi CPU
May 1999
Sun Microsystems, Inc
Prefetch and Dispatch Unit (PDU)
The PDU fetches instructions before they are needed in the pipeline, so that the execution units do not starve
for instructions. Instructions can be prefetched from all levels of the memory hierarchy, including the instruc-
tion cache, the external cache, and the main memory. To prefetch across conditional branches, a dynamic
branch prediction scheme is implemented in hardware, based on a two-bit history of the branch. A “next
eld” associated with every four instructions in the I-Cache points to the next I-Cache line to be fetched. This
makes it possible to follow taken branches and provides the same instruction bandwidth achieved during
sequential code. Up to 12 prefetched instructions are stored in the instruction buffer sent to the rest of the
pipeline.
Translation Lookaside Buffers (iTLB and dTLB)
The Translation Lookaside Buffers provide mapping between 44-bit virtual addresses and 34-bit physical
addresses. A 64-entry iTLB is used for instructions and a 64-entry dTLB for data, and both are fully associa-
tive. The UltraSPARC-IIi CPU provides hardware support for a software-based TLB miss strategy. For
low-latency miss handling, a separate set of global registers is available whenever such a trap is encountered.
Page sizes of 8 KB, 64 KB, and 512 KB and 4 MB are supported.
Integer Execution Unit (IEU)
Two Arithmetic Logic Units (ALUs) form the main computational part of the IEU. An early-nish-detect
multi-cycle integer multiplier and a multi-cycle integer divider are also part of the IEU. Eight register win-
dows and four sets of global registers are provided (normal, alternate, MMU and interrupt globals). The trap
registers (the UltraSPARC-IIi CPU supports ve levels of traps) are part of the IEU.
Floating-Point Unit (FPU)
The separation of the execution units in the FPU allows the UltraSPARC-IIi CPU to issue and execute two
oating-point instructions per cycle. Source data and results data are stored in the 32-entry register le, where
each entry can contain a 32- or 64-bit value. Most instructions are fully pipelined (throughput of one per
cycle), have a latency of three, and are not affected by the precision of the operands (same latency for single or
double precision).
The divide and square-root instructions are not pipelined. These take 12 cycles to execute in single precision
(22 cycles in double precision) but they do not stall the processor. Instructions, following the divide/square
root can be issued, executed, and retired to the register le before the divide/square root nishes. A precise
exception model is maintained by synchronizing the oating-point pipe with the integer pipe and by predict-
ing traps for long-latency operations.
Graphics Unit (GRU)
The UltraSPARC-IIi CPU introduces a comprehensive set of graphics instructions (VIS) that provide indus-
try-leading support for two-dimensional and three-dimensional image and video processing, image
compression, audio processing, and similar functions. Sixteen-bit and 32-bit partitioned add, boolean, and
compare are provided. Eight-bit and 16-bit partitioned multiplies are supported. Single cycle pixel distance,
data alignment, packing and merge operations are all supported in the GRU.