PRELIMINARY SPECIFICATION
4-1
Custom Operations for Multimedia
Chapter 4
by Gert Slavenburg, Pieter v.d. Meulen, Yong Cho, Sang-Ju Park
4.1
CUSTOM OPERATIONS OVERVIEW
In this document, the generic PNX1300 name refers
to the PNX1300 Series, or the PNX1300/01/02/11
products.
Custom operations in the PNX1300 DSPCPU architec-
ture are specialized, high-function operations designed
to dramatically improve performance in important multi-
media applications. When properly incorporated into ap-
plication source code, custom operations enable an ap-
plication to take advantage of the highly parallel
PNX1300 microprocessor implementation. Achieving a
similar performance increase through other means
—
e.g., executing a higher number of traditional micropro-
cessor instructions per cycle
—
would be prohibitively ex-
pensive for PNX1300
’
s low-cost target applications.
Custom operations are simple to understand and consis-
tent in their definition, but their unusual functions make it
difficult for automatic code generation algorithms to use
them effectively. Consequently, custom operations are
inserted into source code by the programmer. To make
this process as painless as possible, custom operation
syntax is consistent with the C programming language,
and, just as with all other operations generated by the
compiler, the scheduler takes care of register allocation,
operation packing, and flow analysis.
4.1.1
Custom Operation Motivation
For both general-purpose and embedded microproces-
sor-based applications, programming in a high-level lan-
guage is desirable. To effectively support optimizing
compilers and a simple programming model, certain mi-
croprocessor architecture features are needed, such as
a large, linear address space, general-purpose registers,
and register-to-register operations that directly support
the manipulation of linear address pointers. A common
choice in microprocessor architectures is 32-bit linear
addresses, 32-bit registers, and 32-bit integer opera-
tions. PNX1300 is such a microprocessor architecture.
For the data manipulation in many algorithms, however,
32-bit data and operations are wasteful of expensive sil-
icon resources. Important multimedia applications, such
as the decompression of MPEG video streams, spend
significant amounts of execution time dealing with eight-
bit data items. Using 32-bit operations to manipulate
small data items makes inefficient use of 32-bit execution
hardware in the implementation. If these 32-bit resources
could be used instead to operate on four eight-bit data
items simultaneously, performance would be improved
by a significant factor with only a tiny increase in imple-
mentation cost.
Getting the highest execution rate from standard micro-
processor resources is one of the motivations behind
custom operations in PNX1300. A range of custom oper-
ations is provided that each processes
—
simultaneous-
ly
—
four 8-bit or two 16-bit data items. There is little cost
difference between a standard 32-bit ALU and one that
can process either one pair of 32-bit operands or four
pairs of eight-bit operands, but there is a big perfor-
mance difference for PNX1300
’
s target applications.
PNX1300
’
s custom operations go beyond simply making
the best use of standard resources. Some custom oper-
ations combine several simple operations. These combi-
nations are tailored specifically to the needs of important
multimedia applications. Some high-function custom op-
erations eliminate conditional branches, which helps the
scheduler make effective use of all five operation slots in
each PNX1300 instruction. Filling up all five slots is es-
pecially important in the inner loops of computational in-
tensive multimedia applications.
In short, custom operations help PNX1300 reach its
goals of extremely high multimedia performance at the
lowest possible cost.
4.1.2
Introduction to Custom Operations
Table 4-1
and
Table 4-2
contain two listings of the cus-
tom operations available in the PNX1300 architecture.
Table 4-1
groups the custom operations by type of func-
tion while
Table 4-2
lists the operations by operand size.
For more detailed information about the custom opera-
tions,
Appendix A,
“
PNX1300/01/02/11 DSPCPU Opera-
tions.
”
Some operations exist in several versions that differ in
the treatment of their operands and results, and the mne-
monics for these versions make it easy to select the ap-
propriate operation. For example, the sum of products
operations all have
“
fir
”
in their mnemonics; the prefix
and suffix of the mnemonic expresses the treatment of
the operands and result. The ifir8ii operation treats both
of its operands as signed (ifir8ii) and produces a signed
result (ifir8ii). The ifir8iu operation treats its first operand
as signed (ifir8iu), the second as unsigned (ifir8i u), and
produces a signed result (ifir8iu). The ume8ii operation
implements an eight-bit motion-estimation; it treats both
operands as signed but produces an unsigned result.
The operations beginning with
“
dsp
”
implement a clip-
ping (sometimes called saturating) function before stor-