
12
Evaluating and Programming the 29K RISC Family
the BTC, an external memory access is performed to start filling the Instruction Pre-
fetch Buffer (IPB). With the Am29000 processor the fetch stage of the processor
pipeline is fed from the IPB, but the Am29050 can by–pass the fetch stage and feed
the first instruction directly into the decode pipeline stage using the instruction for-
warding technique. By–passing also enables up to four cycles of external memory
latency to be hidden when a BTC hit occurs (see section 1.10).
The Am29050 incorporates a Translation Look–Aside Buffer (TLB) for
Memory Management Unit support, just like the Am29000 processor. However it
also has two region mapping registers. These permit large areas of memory to be
mapped without using up the
smaller
TLB entries. They are very useful for mapping
large data memory regions, and their use reduces the TLB software management
overhead.
The processor can also speed up data memory accesses by making the access
address available a cycle earlier than the Am29000. The method is used to reduce
memory load operations which have a greater influence on pipeline stalling than
store operations. Normally the address of a load appears on the address bus at the start
of the cycle following the execution of the load instruction. If virtual addressing is in
use, then the TLB registers are used to perform address translation during the second
half of the load execute–cycle. To save a cycle, the Am29050 must make the physical
address of the load available at the start of the load instruction execution. It has two
ways of doing this.
The access address of a load instruction is specified by the RB field of the
instruction (see Figure 1–13). A 4–entry Physical Address Cache (PAC) memory is
used to store most recent load addresses. The cache entries are tagged with RB field
register numbers. When a load instruction enters the decode stage of the pipeline, the
RB field is compared with one of the PAC entries, using a direct mapping technique,
with the lower 2–bits of the register number being used to select the PAC entry. When
a match occurs the PAC supplies the address of the load, thus avoiding the delay of
reading the register file to obtain the address from the register selected by the RB field
of the LOAD instruction. If a PAC miss occurs, the new physical address is written to
the appropriate PAC entry. The user has no means of controlling the PAC; its opera-
tion is completely determined by the processor hardware.
The second method used by the Am29050 processor to reduce the effect of pipe-
line stalling occurring as a result of memory load latency is the Early Address Gener-
ator (EAG). Load addresses are frequently formed by preceding the load with
CONST, CONSTH and ADD type instructions. These instructions prepare a general
purpose register with the address about to be used during the load. The EAG circuitry
continually generates addresses formed by the use of the above instructions in the
hope that a load instruction will immediately follow and use the address newly
formed by the preceding instructions. The EAG must make use of the TLB address
translation hardware in order to make the physical address available at the start of the