PNX1300/01/02/11 Data Book
Philips Semiconductors
4-6
PRELIMINARY SPECIFICATION
uclipi (m, n)
{
if (m < 0) return 0;
else if (m > n) return n;
else return m;
}
To make is easier to see how these operations can sub-
sume all the code in
Figure 4-5
,
Figure 4-6
shows the
same code rearranged to group the related functions.
Now it should be clear that the quadavg operation can re-
place the first four lines of the loop assuming that we can
get the individual 8-bit elements of the back[] and for-
ward[] arrays positioned correctly into the bytes of a 32-
bit word. That, of course, is easy: simply align the byte ar-
rays on word boundaries and access them with word (in-
teger) pointers.
Similarly, it should now be clear that the dspuquadaddui
operation can replace the remaining code (except, of
course, for storing the result into the destination[] array)
assuming, as above, that the 8-bit elements are aligned
and packed into 32-bit words.
Figure 4-7
shows the new code. The arrays are now ac-
cessed in 32-bit (int-sized) chunks, the loop iteration con-
trol has been modified to reflect the
‘
four-at-a-time
’
oper-
ations, and the quadavg and dspuquadaddui operations
have replaced the bulk of the loop code. Finally,
Figure 4-8
shows a more compact expression of the loop
code, eliminating the temporary variable. Note that
PNX1300 C compiler does the optimization by itself.
Again, note that the code in
Figure 4-7
and
Figure 4-8
assumes that the character arrays are 32-bit word
aligned and padded if necessary to fill an integral number
of 32-bit words.
The original code required three additions, one shift, two
tests, three loads, and one store per pixel. The new code
using custom operations requires only two custom oper-
ations, three loads, and one store for
four
pixels, which is
more than a factor of six improvement. The actual perfor-
mance improvement can be even greater depending on
how well the compiler is able to deal with the branches in
the original version of the code, which depends in part on
the surrounding code. Reducing the number of branches
almost always improves the chances of realizing maxi-
mum performance on the PNX1300 CPU.
The code in
Figure 4-8
illustrates several aspects of us-
ing custom operations in C-language source code. First,
the custom operations require no special declarations or
syntax; they appear to be simple function calls. Second,
there is no need to explicitly specify register assignments
for sources, destinations, and intermediate results; the
compiler and scheduler assign registers for custom oper-
ations just as they would for built-in language operations
such as integer addition. Third, the scheduler packs cus-
tom operations into PNX1300 VLIW instructions as effec-
tively as it packs operations generated by the compiler
for native language constructs.
Thus, although the burden of making effective use of
custom operations falls on the programmer, that burden
consists only of discovering the opportunities for exploit-
ing the operations and then coding them using standard
C-language notation. The compiler and scheduler take
care of the rest.
void reconstruct (unsigned char *back,
unsigned char *forward,
char *idct,
unsigned char *destination)
{
int i, temp;
for (i = 0; i < 64; i += 4)
{
temp = ((back[i+0] + forward[i+0] + 1) >> 1) + idct[i+0];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+0] = temp;
temp = ((back[i+1] + forward[i+1] + 1) >> 1) + idct[i+1];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+1] = temp;
temp = ((back[i+2] + forward[i+2] + 1) >> 1) + idct[i+2];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+2] = temp;
temp = ((back[i+3] + forward[i+3] + 1) >> 1) + idct[i+3];
if (temp > 255) temp = 255;
else if (temp < 0) temp = 0;
destination[i+3] = temp;
}
}
Figure 4-5. MPEG frame reconstruction code using PNX1300 custom operations; compare with
Figure 4-4
.