
TMX320C6201
DIGITAL SIGNAL PROCESSOR
SPRS051B – JANUARY 1997 – REVISED JUNE 1997
12
POST OFFICE BOX 1443
HOUSTON, TEXAS 77251–1443
CPU architecture
The ’C6200 CPU is the central building block of all members of the ’320C62xx family of digital signal processors
(DSPs), including the ’320C6201.
The CPU uses very-long instruction words (256-bits wide) to supply up to eight 32-bit instructions to the eight
functional units during every clock cycle. Fetch packets are always 256-bits wide; however, the execute packets
can vary in size as shown in Figure 1. The variable-length execute packets are a key memory-saving feature,
distinguishing the ’C6200 CPU from other VLIW architectures.
The CPU features two sets of functional units. Each set contains four units and a register file. The two register
files contain 16 32-bit registers each for the total of 32 general-purpose registers. The two sets of functional
units, along with two register files, comprise sides A and B of the CPU (see Figure 1). The four functional units
on each side of the CPU can share the 16 registers belonging to that side. Additionally, each side features a
single data bus connected to all registers on the other side, by which the two sets of functional units can
cross-exchange data from the register files on opposite sides. While register access by functional units on the
same side of the CPU as the register file can service all the units in a single clock cycle, register access using
the register file across the CPU supports only one read and one write per cycle.
Another key feature of the ’C6200 CPU is the load/store architecture, where all instructions operate on registers
(as opposed to data in memory). Two sets of data addressing units (.D1 and .D2) are exclusively responsible
for all data transfers between the register files and the memory. The data address driven by the .D units allows
data addresses generated from one register file to be used in data loads and stores affecting the other register
file. The ’C6200 CPU supports a variety of indirect-addressing modes using linear- or circular-addressing
modes with 5- or 15-bit offsets. All instructions are conditional, and most instructions can access any one of the
32 registers. Some registers are also singled out to support specific addressing or to hold the condition for
conditional instructions (if the condition is not automatically “true”). The two .M functional units are dedicated
to multiplies. The two .S and .L functional units perform general arithmetic, logical, and branch functions with
results available at the rate of every clock cycle (the latency can vary between one and five cycles due to the
multi-stage execution pipeline, with most instructions executing in one cycle).
The VelociTI VLIW processing flow begins when a 256-bit-wide instruction fetch packet (IFP) is fetched from
the internal program memory (that also can be configured as cache). The 32-bit instructions destined for the
individual functional units are linked together by the bit in the least significant bit (LSB) position of the
instructions. The instructions that are linked together for simultaneous execution (up to eight in total) comprise
an execute packet. A 0 in the LSB of an instruction breaks the chain, effectively placing the instructions that
follow it in the next execute packet. If an execute packet crosses the fetch packet boundary (256 bits wide), the
compiler places it in the next fetch packet, while the remainder of the current fetch packet is padded with NOP
instructions. The number of execute packets within a fetch packet can vary from one to eight. Execute packets
are dispatched to their respective functional units at the rate of one per clock cycle (see Figure 1) and the next
256-bit fetch packet is not fetched until all the execute packets from the current fetch packet have been
dispatched. After decoding, the instructions simultaneously drive all active functional units for a maximum
execution rate of eight instructions every clock cycle. While most results are stored in 32-bit registers, they can
be stored in memory as bytes or half-words as well, effectively making all stores and loads byte-addressable
for savings in memory requirements.
P