
Creating Performance with Stamina for Wireless Communications White Paper, Rev. 2
6
Freescale Semiconductor
operation. The i.MX cycles are engineered to clock on an off at the individual instruction level, and not
only in the ARM core but in the peripherals as well.
Partitioning
—Many tasks in an embedded system can be implemented in either dedicated hardware or in
software on a programmable core. Generally, software is more flexible and cheaper, but hardware is faster
and consumes less current. The challenge is partitioning tasks between hardware and software to take best
advantage of their attributes to get the fastest yet most efficient solution. A better model for speed and
efficiency can be realized by committing intensive machine cycling tasks to hardware accelerators rather
than software. However, this must be accomplished in a manner that preserves the flexibility of software
where that flexibility is most important.
For instance, stable, computer intensive functions that are required in key applications can be built in
hardware to get the performance edge when flexibility is not the major issue. Functions that are less
well-defined and can change frequently, (such as Digital Rights Management where there is no universal
standard), the value of software flexibility may outweigh the benefits of dedicated hardware. Intelligent
hardware/software partitioning decisions of this sort accomplish a great deal toward high performance,
low power results.
It’s Not About the Megahertz
High megahertz processors on a very wide, very fast bus can afford to run flat-out, pushing data
unencumbered down the pipe. However, in portable wireless systems, buses tend to be much slower and
narrower and can only carry so much data at a time. When you increase the CPU clock you need to feed
it with data and instructions. As this data is pushed through the bus and hits a bottleneck, say, trying to read
and write to the external main memory, an over-clocked processor will just burn up excess energy with the
extra megahertz and subsequently drain the battery. This can be somewhat mitigated by employing L2
(Level 2) cache in the system, which stores the most critical and often-used information in SRAM very
close to the processor. Future i.MX offerings are expected to employ L2 cache, which reduces the number
of times the CPU has to hit the external memory bottleneck, but that alone doesn’t address the limitations
of the entire system bus.
We have significant experience managing system buses to get the most out of our processors—to get the
optimum balance of performance and power consumption. Our CPU Platform based on the ARM926T
core (used in our second generation i.MX solutions) is designed with a multi-master crossbar switch is
engineered to allow us multiple masters (including the CPU) to talk with multiple slaves (peripherals)
simultaneously. This type of switch design has been used for some time in large high-speed data networks,
but now we have successfully transferred this technology to the portable wireless environment. The
crossbar switch allows us to maximize the system level performance by keeping all the engines fed with
data and instructions so there is little “dead time” and less “waiting” for another data transfer to complete.
Improved performance is not about just cranking up the clock and getting more megahertz through the
CPU, it’s about using the megahertz you have to get more work from the entire system without wasting
any time or energy in the process. This will become more critical as we move from GSM (Global System
for Mobile communications) to far more sophisticated and feature rich technologies, such as General
Packet Radio System (GPRS), Enhanced Data for GSM Evolution (EDGE), and Universal Mobile
Telecommunications Service (UMTS). Smartphones and PDAs are expected to employ combinations of