
Rev. D
|
Page 4 of 48
|
May 2012
DUAL COMPUTE BLOCKS
The ADSP-TS203S processor has compute blocks that can
execute computations either independently or together as a sin-
gle-instruction, multiple-data (SIMD) engine. The processor
can issue up to two compute instructions per compute block
each cycle, instructing the ALU, multiplier, or shifter to perform
independent, simultaneous operations. Each compute block can
execute eight 8-bit, four 16-bit, two 32-bit, or one 64-bit SIMD
computations in parallel with the operation in the other block.
These computation units support IEEE 32-bit single-precision
floating-point, extended-precision 40-bit floating point, and
8-, 16-, 32-, and 64-bit fixed-point processing.
The compute blocks are referred to as X and Y in assembly syn-
tax, and each block contains three computational units—an
ALU, a multiplier, a 64-bit shifter—and a 32-word register file.
Register File—each compute block has a multiported
32-word, fully orthogonal register file used for transferring
data between the computation units and data buses and for
storing intermediate results. Instructions can access the
registers in the register file individually (word-aligned), in
sets of two (dual-aligned), or in sets of four (quad-aligned).
ALU—the ALU performs a standard set of arithmetic
operations in both fixed- and floating-point formats. It also
performs logic and permute operations.
Multiplier—the multiplier performs both fixed- and
floating-point multiplication and fixed-point multiply and
accumulate.
Shifter—the 64-bit shifter performs logical and arithmetic
shifts, bit and bit stream manipulation, and field deposit
and extraction operations.
Using these features, the compute blocks can
Provide 8 MACS per cycle peak and 7.1 MACS per cycle
sustained 16-bit performance and provide 2 MACS per
cycle peak and 1.8 MACS per cycle sustained 32-bit perfor-
mance (based on FIR)
Execute six single-precision floating-point or execute 24
fixed-point (16-bit) operations per cycle, providing
3G FLOPS or 12.0G/s regular operations performance at
500 MHz
Perform two complex 16-bit MACS per cycle
DATA ALIGNMENT BUFFER (DAB)
The DAB is a quad-word FIFO that enables loading of quad-
word data from nonaligned addresses. Normally, load instruc-
tions must be aligned to their data size so that quad words are
loaded from a quad-aligned address. Using the DAB signifi-
cantly improves the efficiency of some applications, such as
FIR filters.
DUAL INTEGER ALU (IALU)
The processor has two IALUs that provide powerful address
generation capabilities and perform many general-purpose inte-
ger operations. The IALUs are referred to as J and K in assembly
syntax and have the following features:
Provide memory addresses for data and update pointers
Support circular buffering and bit-reverse addressing
Perform general-purpose integer operations, increasing
programming flexibility
Include a 31-word register file for each IALU
As address generators, the IALUs perform immediate or indi-
rect (pre- and post-modify) addressing. They perform modulus
and bit-reverse operations with no constraints placed on mem-
ory addresses for the modulus data buffer placement. Each
IALU can specify either a single-, dual-, or quad-word access
from memory.
The IALUs have hardware support for circular buffers, bit
reverse, and zero-overhead looping. Circular buffers facilitate
efficient programming of delay lines and other data structures
required in digital signal processing, and they are commonly
used in digital filters and Fourier transforms. Each IALU pro-
vides registers for four circular buffers, so applications can set
up a total of eight circular buffers. The IALUs handle address
pointer wraparound automatically, reducing overhead, increas-
ing performance, and simplifying implementation. Circular
buffers can start and end at any memory location.
Because the IALU’s computational pipeline is one cycle deep, in
most cases integer results are available in the next cycle. Hard-
ware (register dependency check) causes a stall if a result is
unavailable in a given cycle.
PROGRAM SEQUENCER
The ADSP-TS203S processor’s program sequencer supports:
A fully interruptible programming model with flexible pro-
gramming in assembly and C/C++ languages; handles
hardware interrupts with high throughput and no aborted
instruction cycles
A 10-cycle instruction pipeline—four-cycle fetch pipe and
six-cycle execution pipe—computation results available
two cycles after operands are available
Supply of instruction fetch memory addresses; the
sequencer’s instruction alignment buffer (IAB) caches up
to five fetched instruction lines waiting to execute; the pro-
gram sequencer extracts an instruction line from the IAB
and distributes it to the appropriate core component for
execution
Management of program structures and program flow
determined according to JUMP, CALL, RTI, RTS instruc-
tions, loop structures, conditions, interrupts, and software
exceptions
Branch prediction and a 128-entry branch target buffer
(BTB) to reduce branch delays for efficient execution of
conditional and unconditional branch instructions and
zero-overhead looping; correctly predicted branches occur
with zero overhead cycles, overcoming the five-to-nine
stage branch penalty
Compact code without the requirement to align code in
memory; the IAB handles alignment