
276
Evaluating and Programming the 29K RISC Family
The Am29000 processor can cache 32 branch targets. The arrangement is the
usual two sets with 16 blocks (or entries) in each set. The Am29050 processors is
configurable to cache 64 branch targets, each block containing four instructions.
Alternatively, 128 blocks, still arranged in two sets, can be used to contain only two
instructions. The smaller block size makes more effective use of the cache when the
BTC is required to hide a smaller instruction memory access latency (see section
1.9).
The programmer has little control over BTC operation; it is maintained
internally by processor hardware. There is no means of accessing or preloading the
cache via the cache interface registers provided on other 29K family members.
Additionally, there are no cache lock bits provided for in the CFG register. The cache
can be disabled by setting the CD bit in the CFG register; and invalidated by
executing an INV or IRETINV instruction.
5.13.4 Am29030 2–bus Microprocessor
The Am29030 has an 8K byte instruction cache; 4K bytes being provided by
each of the two columns. The Am29035 only provides column 0 and hence has 4K of
cache (this results in the Am29030 having typically a 20% performance advantage
for large programs). These processors were the first 29K family members to have non
BTC–type instruction cache. When a branch instruction is executed and the block
(cache entry) containing the target instruction sequence is not found in the cache, the
processor fetches the missing block and marks it valid. Complete blocks are always
fetched, even if the target instruction lies at the
end
of the block. However, the cache
forwards instructions to the decoder without waiting for the block to be reloaded. If
the cache is enabled and the block to be replaced in the cache is invalid and locked,
then the fetched block is placed in the cache. Note, complete blocks are fetched even
when the cache is disabled. This is a little wasteful if the target of a jump or branch is
not the first address in a block.
Blocks are tagged on a per–block basis. There is only one Valid bit in the block
status information. This bit is not set until the processor has fetched an entire block
with no errors. Blocks which are fetched ahead during prefetch buffer filling are not
marked valid if execution does not continue into the block. Filling the prefetch buffer
in this way enables burst–mode access to be maintained for longer intervals; and
hence reduce overall access delays. LOAD or STORE instructions can occur at any
time; however, the Am29030 processor completes the fetch of the current block
before starting the data access. This is because it is probably more efficient to
complete the instruction fetch, which is likely in single–cycle burst–mode. The
cache reload characteristics of the Am29030 processor (reload blocking) further
emphasise the importance of scheduling LOAD instructions ahead of the time the
data is required for further operations. The current tools for the 29K family do not
support code positioning such that the target of call and jump instructions begin on a