446
Evaluating and Programming the 29K RISC Family
Some applications may be more, or less, affected by changes in the window size;
it very much depends on the procedure’s register requirements and on the level of
procedure nesting. To study these effects further, the Am29040 based system was
used to run the Stanford benchmark. Separate results for six of the integer routines
taken from the integer–part of the Stanford code are shown on Figure 8-12. The
routines were chosen because of their diversity in function and similarity in
execution times. This similarity made for clearer scaling and hence easier
comparison of the results.
Routines which have a small register requirement are unaffected by the
reduction in window size. The Towers and Queens tests are supported by recursive
procedure calls. Consequently, these routines show a marked loss of performance
when operating with reduced window sizes. The reduction in performance is less
than one might expect. Applications which have a small dynamic register–stack
requirement, experience
moderate
loss of performance when operating with a
reduced window size. However, future 29K processors which use 3x or 4x scalable
clocking technology and superscalar execution are likely to show a relatively greater
loss of performance with reduced window sizes. At higher execution speeds, the
cost
of
going
off–chip is increased.
We have looked at the loss of performance associated with reduced window
sizes but what, if any, are the benefits It was already stated that task context
switching can be improved. This is true, but needs further explanation. Most
operating system manufacturers provide basic context switch times for benchmarks
run on their product. These benchmarks typically indicate a
raw
context switch time
of 10 to 20 micro seconds. Longer or shorter times are possible depending on the
implementation and the speed of the system memory. Benchmark programs usually
measure synchronous context switch times; these are shorter than asynchronous
switch times. When a synchronously saved context is switched–in, only the current
activation record need be restored in the register cache (typically 12 registers). With
an asynchronously saved context, the register cache must be restored to the position
at which the context was saved (several activation records). Hence, asynchronous
switches take longer than synchronous switches. How much longer depends on a
number of factors.
An operating system may be written in C, and the context switch code may
occur at a depth of several levels of procedure nesting. As well as these operating
system related activation records, the register cache will contain activation records
relating to the application task. It is not possible to state, in general, just how much of
the register cache is in use at the point the context switch occurs; but certainly the
worst–case condition is known. The maximum number of local registers which
would require saving or restoring is limited to the window size. Consequently,
reducing the window size, reduces the worst–case context switch time. It will have