Philips Semiconductors
Cache Architecture
PRELIMINARY SPECIFICATION
5-11
1. The stall signal is asserted to prevent activity in the
DSPCPU and data cache.
2. The valid bits for all blocks in the instruction cache are
reset.
3. At the completion of the block invalidation scan, the
stall signal to the DSPCPU and data cache are deas-
serted.
4. The DSPCPU begins normal operation with an in-
struction fetch from the address reset_vector.
The initialization process takes 512 clock cycles. Reset
sets reset_vector equal to DRAM_BASE so that program
execution starts at the initial value of DRAM_BASE. The
initial value of DRAM_BASE is determined as described
in
Section 5.2,
“
DRAM Aperture.
”
5.5
LRU ALGORITHM
When a cache miss occurs, the block containing the re-
quested data must be brought into the cache to replace
an existing cache block. The LRU algorithm is responsi-
ble for selecting the replacement victim by selecting the
least-recently-used block.
The 8-way set-associative caches implement a hierarchi-
cal LRU replacement algorithm as follows. Eight sets are
partitioned into four groups of two elements each. To se-
lect the LRU element:
First, the LRU pair is selected out of the four pairs
using a four-way LRU algorithm.
Second, the LRU element of the pair is selected
using a two-way LRU algorithm.
5.5.1
Two-Way Algorithm
The two-way LRU requires an administration of one bit
per pair of elements. On every cache hit to one of the two
blocks, the cache writes once to this bit (just a write, not
a read-modify-write). If the even-numbered block is ac-
cessed, the LRU bit is set to
‘
1
’
; if the odd-numbered
block is accessed, the LRU bit is set to
‘
0
’
. On a miss, the
cache replaces the LRU element, i.e. if the LRU bit is
‘
0
’
,
the even numbered element will be replaced; if the LRU
bit is
‘
1
’
, the odd numbered element will be replaced.
5.6
CACHE COHERENCY
The PNX1300 hardware does not implement coherency
between the caches and main memory. Generalized co-
herency is the responsibility of software, which can use
the special operations dcb, dinvalid, and iclr to enforce
cache/memory synchronization.
5.6.1
Example 1: Data-Cache/Input-Unit
Coherency
Before the CPU commands the video-in unit to capture a
video frame, the CPU must be sure that the data cache
contains no blocks that are in the address region that the
video-in unit will use to store the input frame. If the video-
in unit performs its input function to an address region
and the data cache does hold one or more blocks from
that region, any of the following may happen:
A miss in the data cache may cause a dirty block to
be copied back to the address region being used by
the video-in unit. If the video-in unit already stored
data in the block, the write-back will corrupt the frame
data.
The CPU will read stale data from the cache instead
of from the block in main memory. Even though the
video-in unit stored new video data in the block in
main memory, the cache contents will be used
instead because it is still valid in the cache.
To prevent erroneous copybacks or the use of stale data,
the CPU must use dinvalid operations to invalidate all
blocks in the address region that will be used by the VI
unit.
5.6.2
Example 2: Data-Cache/Output-Unit
Coherency
Before the CPU commands the video-out unit to send a
frame of video, the CPU must be sure that all the data for
the frame has been written from the data cache to the re-
gion of main memory that the video-out unit will output.
Explicit action is necessary because the data cache
—
with its copyback write policy
—
will hold an exclusive
copy of the data until it is either replaced by the LRU al-
gorithm or the CPU explicitly forces it to be copied back
to main memory.
Before an output command is issued to the video-out
unit, the CPU must execute dcb operations to force co-
herency between cache contents and main memory.
5.6.3
Example 3: Instruction-Cache/Data-
Cache Coherency
If code prepared by a program running on the CPU must
be subsequently executed, coherency between the in-
struction and data caches must be enforced. This is ac-
complished by a two-step process:
1. Coherency between the data cache and main memo-
ry must be enforced since the instruction cache can
fetch instructions only from main memory.
2. Coherency between the instruction cache and main
memory is enforced by executing an iclr operation.
The CPU will now be able to fetch and execute the new
instructions.
5.6.4
Example 4: Instruction-Cache/Input-
Unit Coherency
When an input unit is used to load program code into
main memory, the iclr operation must be issued before
attempting to execute the new code.
5.6.5
Four-Way Algorithm
For administration of the four-way algorithm, the cache
maintains an upper-left triangular matrix
‘
R
’
of 1-bit ele-
ments without the diagonal. R contains six bits (in gener-