
MOTOROLA
MCF5407 Integrated ColdFire Microprocessor Product Brief
7
ColdFire Module Description
time stalled waiting for instructions. To maximize the performance of conditional branch instructions, the
Version 4 IFP implements a sophisticated two-level acceleration mechanism.
The first level is an 8-entry, direct-mapped branch cache with a 2-bit prediction state (strongly/weakly,
taken/not-taken) for each entry. The branch cache implements instruction folding techniques that allow
conditional branch instructions which are predicted correctly as taken to execute in zero cycles.
For those conditional branches with no information in the branch cache, a second-level, direct-mapped
prediction table containing 128 entries is accessed. Again, each entry uses the same 2-bit prediction state
definition as the branch cache. This branch prediction state is then used to predict the direction of prefetched
conditional branch instructions.
Other change-of-flow instructions, including unconditional branches, jumps, and subroutine calls, use a
similar mechanism where the IFP calculates the target address. The performance of subroutine return
instructions is improved through the use of a four-entry, LIFO return stack.
In all cases, these mechanisms allow the IFP to redirect the fetch stream down the path predicted to be taken
well in advance of the actual instruction execution. The net effect is significantly improved performance.
1.3.1.2
Operand Execution Pipeline (OEP)
The prefetched instruction stream is gated from the FIFO buffer into the five-stage OEP. The OEP consists
of two, traditional two-stage RISC compute engines with a register file access feeding an arithmetic/logic
unit (ALU). The compute engine located at the top of the OEP is typically used for operand memory address
calculations (the address ALU), while the compute engine located at the bottom of the pipeline is used for
instruction execution (the execution ALU). The resulting structure provides 3.9 Gbytes/S data operand
bandwidth at 162 MHz to the two compute engines and supports single-cycle execution speeds for most
instructions, including all load, store and most embedded-load operations. In response to users and
developers, the V4 design supports execution of the ColdFire Revision B instruction set, which adds a small
number of new instructions to improve performance and code density.
The OEP also implements two advanced performance features. It dynamically determines the appropriate
location of instruction execution (either in the address ALU or the execution ALU) based on the pipeline
state. The address compute engine, in conjunction with register renaming resources, can be used to execute
a number of heavily-used opcodes and forward the results to subsequent instructions without any pipeline
stalls. Additionally, the OEP implements instruction folding techniques involving MOVE instructions so
that two instructions can be issued in a single machine cycle. The resulting microarchitecture approaches
the performance of a full superscalar implementation, but at a much lower silicon cost.
1.3.1.3
MAC Module
The MAC unit provides signal processing capabilities for the MCF5407 in a variety of applications
including digital audio and servo control. Integrated as an execution unit in the processor's OEP, the MAC
unit implements a three-stage arithmetic pipeline optimized for 16 x 16 multiplies. Both 16- and 32-bit input
operands are supported by this design in addition to a full set of extensions for signed and unsigned integers
plus signed, fixed-point fractional input operands.
1.3.1.4
Integer Divide Module
Some embedded applications can benefit greatly from the integer divide unit. Integrated as another engine
in the processor’s OEP, the divide module performs a variety of operations using signed and unsigned
integers. The module supports word and longword divides producing quotients and/or remainders.