CPU (DSP Core) Description
17
April 2004
Revised May 2005
SPRS247E
The two .M functional units perform all multiplication operations. Each of the C64x .M units can perform two
16
×
16-bit multiplies or four 8
×
8-bit multiplies per clock cycle. The .M unit can also perform 16
×
32-bit multiply
operations, dual 16
×
16-bit multiplies with add/subtract operations, and quad 8
×
8-bit multiplies with add
operations. In addition to standard multiplies, the C64x .M units include bit-count, rotate, Galois field multiplies,
and bidirectional variable shift hardware.
The two .S and .L functional units perform a general set of arithmetic, logical, and branch functions with results
available every clock cycle. The arithmetic and logical functions on the C64x CPU include single 32-bit, dual
16-bit, and quad 8-bit operations.
The processing flow begins when a 256-bit-wide instruction fetch packet is fetched from a program memory.
The 32-bit instructions destined for the individual functional units are “l(fā)inked” together by “1” bits in the least
significant bit (LSB) position of the instructions. The instructions that are “chained” together for simultaneous
execution (up to eight in total) compose an execute packet. A “0” in the LSB of an instruction breaks the chain,
effectively placing the instructions that follow it in the next execute packet. A C64x
DSP device enhancement
now allows execute packets to cross fetch-packet boundaries. In the TMS320C62x
/TMS320C67x
DSP
devices, if an execute packet crosses the fetch-packet boundary (256 bits wide), the assembler places it in
the next fetch packet, while the remainder of the current fetch packet is padded with NOP instructions. In the
C64x
DSP device, the execute boundary restrictions have been removed, thereby, eliminating all of the
NOPs added to pad the fetch packet, and thus, decreasing the overall code size. The number of execute
packets within a fetch packet can vary from one to eight. Execute packets are dispatched to their respective
functional units at the rate of one per clock cycle and the next 256-bit fetch packet is not fetched until all the
execute packets from the current fetch packet have been dispatched. After decoding, the instructions
simultaneously drive all active functional units for a maximum execution rate of eight instructions every clock
cycle. While most results are stored in 32-bit registers, they can be subsequently moved to memory as bytes,
half-words, or doublewords. All load and store instructions are byte-, half-word-, word-, or
doubleword-addressable.
For more details on the C64x CPU functional units enhancements, see the following documents:
TMS320C6000 CPU and Instruction Set Reference Guide
(literature number SPRU189)
TMS320C64x Technical Overview
(literature number SPRU395)
TMS320C67x is a trademark of Texas Instruments.