CPU (DSP Core) Description
16
April 2004
Revised May 2005
SPRS247E
2.4
CPU (DSP Core) Description
The CPU fetches VelociTI
advanced very-long instruction words (VLIWs) (256 bits wide) to supply up to
eight 32-bit instructions to the eight functional units during every clock cycle. The VelociTI
VLIW architecture
features controls by which all eight units do not have to be supplied with instructions if they are not ready to
execute. The first bit of every 32-bit instruction determines if the next instruction belongs to the same execute
packet as the previous instruction, or whether it should be executed in the following clock as a part of the next
execute packet. Fetch packets are always 256 bits wide; however, the execute packets can vary in size. The
variable-length execute packets are a key memory-saving feature, distinguishing the C64x CPUs from other
VLIW architectures. The C64x
VelociTI.2
extensions add enhancements to the TMS320C62x
DSP
VelociTI
architecture. These enhancements include:
Register file enhancements
Data path extensions
Quad 8-bit and dual 16-bit extensions with data flow enhancements
Additional functional unit hardware
Increased orthogonality of the instruction set
Additional instructions that reduce code size and increase register flexibility
The CPU features two sets of functional units. Each set contains four units and a register file. One set contains
functional units .L1, .S1, .M1, and .D1; the other set contains units .D2, .M2, .S2, and .L2. The two register
files each contain 32 32-bit registers for a total of 64 general-purpose registers. In addition to supporting the
packed 16-bit and 32-/40-bit fixed-point data types found in the C62x
VelociTI
VLIW architecture, the
C64x
register files also support packed 8-bit data and 64-bit fixed-point data types. The two sets of functional
units, along with two register files, compose sides A and B of the CPU [see the functional block and CPU (DSP
core) diagram, and Figure 2
3]. The four functional units on each side of the CPU can freely share the 32
registers belonging to that side. Additionally, each side features a “data cross path”—a single data bus
connected to all the registers on the other side, by which the two sets of functional units can access data from
the register files on the opposite side. The C64x CPU pipelines data-cross-path accesses over multiple clock
cycles. This allows the same register to be used as a data-cross-path operand by multiple functional units in
the same execute packet. All functional units in the C64x CPU can access operands via the data cross path.
Register access by functional units on the same side of the CPU as the register file can service all the units
in a single clock cycle. On the C64x CPU, a delay clock is introduced whenever an instruction attempts to read
a register via a data cross path if that register was updated in the previous clock cycle.
In addition to the C62x
DSP fixed-point instructions, the C64x
DSP includes a comprehensive collection
of quad 8-bit and dual 16-bit instruction set extensions. These VelociTI.2
extensions allow the C64x CPU
to operate directly on packed data to streamline data flow and increase instruction set efficiency.
Another key feature of the C64x CPU is the load/store architecture, where all instructions operate on registers
(as opposed to data in memory). Two sets of data-addressing units (.D1 and .D2) are responsible for all data
transfers between the register files and the memory. The data address driven by the .D units allows data
addresses generated from one register file to be used to load or store data to or from the other register file.
The C64x .D units can load and store bytes (8 bits), half-words (16 bits), and words (32 bits) with a single
instruction. And with the new data path extensions, the C64x .D unit can load and store doublewords (64 bits)
with a single instruction. Furthermore, the non-aligned load and store instructions allow the .D units to access
words and doublewords on any byte boundary. The C64x CPU supports a variety of indirect addressing modes
using either linear- or circular-addressing with 5- or 15-bit offsets. All instructions are conditional, and most
can access any one of the 64 registers. Some registers, however, are singled out to support specific
addressing modes or to hold the condition for conditional instructions (if the condition is not automatically
“true”).
TMS320C62x and C62x are trademarks of Texas Instruments.