Philips Semiconductors
Cache Architecture
PRELIMINARY SPECIFICATION
5-9
5.4.3
Miss Processing Order
When a miss occurs, the instruction cache starts filling
the requested block from the beginning of the block. The
DSPCPU is stalled until the entire block is fetched and
stored in the cache.
5.4.4
Replacement Policy
The hierarchical LRU replacement policy implemented
by the instruction cache is identical to that implemented
by the data cache. See
Section 5.3.4,
“
Replacement Pol-
icies, Coherency,
”
for a description of the hierarchical
LRU algorithm.
5.4.5
Location of Program Code
All program code must first be loaded into SDRAM. The
instruction cache cannot fetch instructions from other
memories or devices. In particular, the cache cannot
fetch code from on-chip devices or over the PCI bus.
5.4.6
Branch Units
The instruction cache is closely coupled to three branch
units. Each unit can accept a branch independently, so
three branches can be processed simultaneously in the
same cycle.
Branches in PNX1300 are called
‘
delayed branches
’
be-
cause the effect of a successful (taken) branch is not
seen in the flow of control until some number of cycles af-
ter the successful branch is executed. The number of cy-
cles of latency is called the branch delay. On PNX1300,
the branch delay is three cycles.
Although three branches can be executed simultaneous-
ly, correct operation of the DSPCPU requires that only
one branch be successful (taken) in any one cycle.
DSPCPU operation is undefined if more than one con-
current branch operation is successful.
Each branch unit takes four inputs from the DSPCPU:
the branch opcode, a guard bit, a branch condition, and
a branch target address. A branch is deemed successful
if and only if the opcode is a branch opcode, the guard bit
is TRUE (i.e., = 1), and the condition (determined by the
opcode) is satisfied.
5.4.7
Coherency: Special iclr Operation
A program can exercise some control over the operation
of the instruction cache by executing the special iclr op-
eration. This operation causes the instruction cache to
clear the valid bits for all blocks in the cache, including
locked blocks. The LRU replacement status of all blocks
is reset to its initial value. The CPU is stalled while iclr is
executing.
See
Section 5.6,
“
Cache Coherency,
”
for further discus-
sion of coherency issues.
5.4.8
Reading Tags and Cache Status
The instruction cache supports read access to its tag and
status bits, but not through special operations as with the
data cache. Since the instruction cache and branch units
can execute only resultless operations, access to the in-
struction-cache tags and status bits is implemented us-
ing normal load operations executed by the DSPCPU
that reference a special region in the MMIO address ap-
erture. The region is 64 KB long and starts at
MMIO_BASE. Instruction cache tags and status bits are
read-only; store operations to this region have no effect.
MMIO operations to this special region are only allowed
by the DSPCPU, not by any other masters of the on-chip
data highway, such as external PCI initiators.
Programmer’s note:
Tag and status information cannot
be read by PCI access, but only by DSPCPU access.
Tag and status read cannot be scheduled in the same cy-
cle with or one cycle after an iclr operation.
Reading A Tag And Valid Bit.
To read the tag and valid
bit for a block in the instruction cache, a program can ex-
ecute a ld32 operation directed at the instruction-cache
region in the MMIO aperture. The top of
Figure 5-10
shows the required format for the target address. The
most-significant 16 bits must be equal to MMIO_BASE,
the least-significant 15 bits select the block (by naming
the set and set member), and bit 15 must be set to zero
to perform a tag read. Note that in PNX1300, valid set
numbers range from 0 to 63. Space to encode set num-
bers 64 to 511 is provided for future extensions.
A ld32 with an address as specified above returns a 32-
bit result with the format shown at the top of
Figure 5-11
.
Bit 20 contains the state of the valid bit, and the least-sig-
nificant 20 bits contain the tag for the block addressed by
the ld32.
Reading The LRU Bits.
To read the LRU bits for a set in
the instruction cache, a program can execute a ld32 op-
eration as above but using the address format shown at
the bottom of
Figure 5-10
. In this format, bit 15 is set to
one to perform the read of the LRU bits, and the
tag_i_mux field is set to zeros because it is not needed.
Table 5-13. Instruction Address Field Partitioning
Field
Address
Bits
Purpose
Offset
Set
5..0
11..6
Byte offset into a set
Selects one of the sets in the cache (one
of 64 in the case of PNX1300)
Compared against address tags of set
members
Tag
31..12
0
Offset
Set
Tag
31
5
6
11
12
Instruction Cache
Address
Figure 5-9. Instruction-cache address partitioning.