289
Chapter 5 Operating System Issues
most recent) value. A solution to this problem is for each processor to make virtual
address access to the shared memory pages and mark the pages as non cachable.
However, unless all memory is marked non cachable the plan would require that
software arrange for data intended for shared memory to appear in a range of
contiguous non cached memory. There would need to be an agreement with the
operating system that the selected address range was not to be cached. Such a
mechanism would be undesirable, inflexible, and difficult to retrofit to existing
software.
With systems incorporating multiple Am29040 processors, each processor may
cache the same memory location. This is desirable, as access to the cache is much
faster than off–chip access. The processor supports three interface signal pins which
facilitate “bus watching” for data reads with cache block granularity. The technique
requires little software support, and existing programs can benefit without any
modifications. The on–chip protocol supporting the interface signals ensures that
each memory access is consistent.
When a load is performed, all processors watching the bus determine if they
have a currently cached copy of the requested data. If they do, they assert the HIT
signal pin. The protocol will enable one cache to identify itself as the
owner
of the
data. This cache will assert both the HIT and the DI (“data intervention”) signals. The
processor requesting the load is satisfied by the intervening cache. The load will
cause a block to be allocated with the S bit set in the tag. This indicates the data is
shared. The processor can continue to access the data from the cache. Additionally,
all processors asserting the HIT signal will realize that another processor is sharing
the data and will set the S bit in their cached copy. If any processor modifies a block
tagged with the same address, that processor will perform a “write broadcast” as a
result of the S bit being set. This does not cause the system memory to be updated, but
enables the snooping processors to update their cached copies. A processor asserts
the WBC signal pin during the write broadcast and becomes the owner of the shared
block. The processor will remain the owner of the block until another processor gains
ownership by performing a write broadcast itself. When a processor performs a write
broadcast it checks to see if another processor is asserting the HIT signal, if not then
the processor realizes it is now the only processor caching the data and therefor clears
the S bit.
To summarize, bus watching of reloads is used to detect sharing of data. When
data is shared all caches set the S bit in the cached block. The processor which
satisfied the block reload (in place of the memory) is the owner of the block and has
the S and M (modified) bits set in the block tag. Writes to shared data create write
broadcasts on the bus to inform other caches of the change of value. Ownership of a
block is transferred to the processor performing the write broadcast. Cache to cache
communications via write broadcasts is a lot faster than accessing slower system
memory.