
200
Evaluating and Programming the 29K RISC Family
changed, but fifteen temporary global registers and three static global registers must
be allocated. Note, with the Am29050 processor, only the integer divide instructions
are not directly supported by the processor hardware and require trapware support.
This requires six temporary global registers and no static global registers.
If all of the local registers are given over to User-mode code use, then interrupt
and trap handlers must also assume that the local registers are being used and may not
be arbitrarily rewritten, unless the values they contain are saved upon entry, and are
restored prior to exit. If a cache window size (
rfb–rab
) less than the physical register
file size is used, then a number of non-static temporary local registers can be made
available for handler use.
Fortunately, most interrupt handlers can operate very efficiently using only a
few temporary registers. It is recommended that global registers
gr64–gr67
(
it0–it3
)
be allocated for this purpose. However, additional temporary registers
kt0–kt3
may
be used for interrupt handlers if these registers are not used by the operating system.
4.3.2 Interrupt Latency
The determination of the number of cycles required to reach the first instruction
of an interrupt or trap handler is a little complicated. First consider the case for the
non-vector fetch, table of handlers method.
An external interrupt line may have to be held active for one cycle before the
processor internally recognizes it. Once recognized, one cycle is required to internal-
ly synchronize the processor. Now any in-progress load or store must be completed
(
Dc
cycles, where 0
≤
Dc
≤
Dw,
note
Dw
is the number of cycles required to complete
a data memory write and is often greater than
Dr
, the number of cycles required to
complete a data memory read). One cycle is then required to calculate the vector. The
first instruction can then be fetched
(Ir
cycles) and presented to the instruction fetch
unit. One cycle is required by the fetch unit and a further cycle by the decode unit
before the instruction reaches execute. If the first instruction is found in the cache,
then the Branch Target Cache memory forwards the instruction directly to the decode
unit. The total latency (minimum of five cycles for the hit case) is given by the equa-
tion below.
delay(miss) = 1 + 1 + Dc + 1 + Ir + 1 + 1
delay(hit) = 1 + 1 + Dc + 1 + 1 + 1
Now let’s consider the case for a table of vectors, that is the VF bit in the CFG
register is set (always the case for 2–bus processors and microcontrollers). The vec-
tor must still be calculated and any in-progress load or store completed before the
vector can be fetched from data memory. Additionaly, if the processor has a data
cache, the cache state is synchronized after any current data access is completed.