
43
Chapter 1 Architectural Overview
which achieves virtual memory addressing with little TLB reload activity and with
only a small amount of chip area being required.
Increased performance is achieved by the inclusion of separate 4k byte instruc-
tion and 2k byte data caches. As with all 29K instruction caches, address tags are
based on virtual addresses when address translation is turned on. The first processor
in the 29K Family to have a
conventional
instruction cache was the Am29030. The
Am29240 cache is similar in operation to the Am29030’s cache. However, the
Am29240 processor has four valid bits per cache entry (four instructions) in place of
the previous one bit. This offers a performance advantage as cache blocks need only
be partially filled and need not be fetched according to block boundaries (more on
this in section 5.13.5).
The data cache always operates with physical addresses. The block size is 16
bytes and there is one valid bit per block. This means that compete data blocks must
be fetched when data cache reload occurs. A “write–through” policy is supported by
the cache which ensures that external memory is always consistent with cache con-
tents. Cache blocks are only allocated for data loaded from DRAM or ROM address
regions. Access to other address regions is not cached. A two word write–through
buffer is used to assist with writes to memory. It enables multiple store instructions to
be in–execution without the processor pipeline stalling. Data accesses which hit in
the cache require 1–cycle access times. The data cache operation is explained in de-
tail in section 5.14.
Scalable bus clocking is supported; enabling the processor to run at twice the
speed of the off–chip memory system. Scalable Clocking was first introduced with
the Am29030 processors, and is described in the previous section describing the
Am29030. If cache hit rates are sufficiently high, Scalable Clocking enables high
performance systems to be built around relatively slow memory systems. It also of-
fers an excellent upgrade path when addition performance is required in the future.
Initially the ROM memory region is assumed to have four cycle access times
(three wait states) and no burst–mode –– same as Am29200. The four banks within
the region can be programmed for zero wait–state read and one wait–state write, or
another combination suitable for slower memory devices.
DRAM, unlike ROM, is always assumed to have 3–cycle access times. Howev-
er, if page–mode DRAM is used it is possible to achieve 1–cycle burst–mode ac-
cesses. Burst–mode is used when consecutive memory addresses are being accessed,
such as during instruction fetching. The Am29200 microcontroller supports 4–cycle
DRAM access with 2–cycle burst. The faster DRAM interface of the Am29240
should result in a substantial performance gain. Additionally, the 3–cycle initial
DRAM access can be reduced to 2–cycle if the required 1–cycle precharge can be
hidden. This is explained in section 1.14.1 under the
Am29200 and Am29205
sub-
heading. Consequently the Am29240 DRAM is often referred to as 2/1 rather than
3/1.