
PRODUCT SPECIFICATION
5-1
Cache Architecture
Chapter 5
by Eino Jacobs
5.1
MEMORY SYSTEM OVERVIEW
The high-performance video and audio throughput of
TM1300 is implemented by its DSPCPU and autono-
mous I/O and co-processing units, but the foundation of
this processing is the TM1300 memory hierarchy. To get
the full potential of the chip’s processing units, the mem-
ory hierarchy must read and write data (and DSP CPU
instructions) fast enough to keep the units busy.
To meet the requirements of its target applications,
TM1300’s memory hierarchy must satisfy the conflicting
goals of low cost, simple system design (e.g., low parts
count), and high performance. Since multimedia video
streams can require relatively large temporary storage, a
significant amount of external DRAM is required. Mini-
mizing the cost of bulk memory is important.
TM1300’s memory system achieves a good compromise
between cost and performance by coupling substantial
on-chip caches with a glueless interface to synchronous
DRAM (SDRAM). SDRAM provides higher bandwidth
than standard DRAM for only a small cost premium. A
block diagram of the memory system is shown in
Figure 5-1. SDRAM permits TM1300 to use a narrower
and simpler interface than would be required to achieve
similar performance with standard DRAM.
The separate on-chip data and instruction caches serve
only the DSPCPU since the data access patterns of the
autonomous I/O and graphics units exhibit little or no lo-
cality of reference (they access each piece of the multi-
media data stream only once in each operation).
Without the caches, the CPU would not be able to
achieve its performance potential. SDRAM has enough
bandwidth to handle serial streams of multimedia data,
but its bandwidth and latency are insufficient to satisfy
the CPU’s high rate of random data accesses and re-
peated instruction accesses.
Table 5-1 shows bandwidth parameters for the TM1300
DSPCPU and the main-memory interface. Although 400
MB/s is a lot of bandwidth, it is clear that the SDRAM
alone cannot keep up with the CPU’s maximum require-
ments for instructions and data. Luckily, multimedia algo-
rithms resemble other computer programs in terms of lo-
cality of reference, so the on-chip caches typically supply
the majority of instructions and data to the DSPCPU. The
wide paths to the caches are matched to the bandwidth
requirements of the DSPCPU.
VLIW
CPU
Three
Branch
Units
Decompressor
32KB, 8-way
Instruction
Cache
Two
Memory
Units
16KB, 8-way
Data
Cache
Three sets, each has address,
opcode, condition, and guard
224 bits of decompressed
instruction
Two sets, each has a guard,
opcode, data, and two
address components
Main
Memory
Interface
SDRAM
Main
Memory
Internal data highway:
32-bit address, 32-bit
data
To on-chip
peripherals
Main-memory bus:
glueless, SDRAM
control with 32-bit
data
Figure 5-1. The main components of the TM1300 memory system.
Table 5-1. 100-MHz TM1300 memory bandwidth
parameters
Magnitude
Use
2800 MB/s
Instruction bandwidth (224 bits/instruction)
800 MB/s
Data bandwidth (two 32-bit memory ports)
400 MB/s
Main-memory bandwidth (one 32-bit port)