
TM1300 Data Book
Philips Semiconductors
4-8
PRODUCT SPECIFICATION
Figure 4-9 shows the original source code for the match-
cost loop. Unlike the previous example, the code is not a
self-contained function. Somewhere early in the code,
the arrays A[][] and B[][] are declared; somewhere be-
tween those declarations and the loop of interest, the ar-
rays are filled with data.
4.4.1
A Simple Transformation
First, we will look at the simplest way to use a TM1300
custom operation.
We start by noticing that the computation in the loop of
Figure 4-9 involves the absolute value of the difference
of two unsigned characters (bytes). By now, we are fa-
miliar with the fact that TM1300 includes a number of op-
erations that process all four bytes in a 32-bit word simul-
taneously.
Since
the
match-cost
calculation
is
fundamental to the MPEG algorithm, it is not surprising
to find a custom operation—ume8uu—that implements
this operation exactly.
To understand how ume8uu can be used in this case, we
need to transform the code as in the previous example.
Though the steps are presented here in detail, a pro-
grammer with a even a little experience can often per-
form these transformations by visual inspection.
To use a custom operation that processes 4 pixel values
simultaneously, we first need to create 4 parallel pixel
computations. Figure 4-10 shows the loop of Figure 4-9
unrolled by a factor of 4. Unfortunately, the code in the
unrolled loop is not parallel because each line depends
on the one above it. Figure 4-11 shows a more parallel
version of the code from Figure 4-10. By simply giving
each computation its own cost variable and then sum-
ming the costs all at once, each cost computation is com-
pletely independent.
void reconstruct (unsigned char *back,
unsigned char *forward,
char *idct,
unsigned char *destination)
{
int i;
int *i_back
= (int *) back;
int *i_forward = (int *) forward;
int *i_idct
= (int *) idct;
int *i_dest
= (int *) destination;
for (i = 0; i < 16; i += 1)
i_dest[i] = DSPUQUADADDUI(QUADAVG(i_back[i], i_forward[i]), i_idct[i]);
}
Figure 4-8. Final version of the frame-reconstruction code.
unsigned char A[16][16];
unsigned char B[16][16];
.
for (row = 0; row < 16; row += 1)
{
for (col = 0; col < 16; col += 1)
cost += abs(A[row][col] – B[row][col]);
}
Figure 4-9. Match-cost loop for MPEG motion estimation.
unsigned char A[16][16];
unsigned char B[16][16];
.
for (row = 0; row < 16; row += 1)
{
for (col = 0; col < 16; col += 4)
{
cost += abs(A[row][col+0] – B[row][col+0]);
cost += abs(A[row][col+1] – B[row][col+1]);
cost += abs(A[row][col+2] – B[row][col+2]);
cost += abs(A[row][col+3] – B[row][col+3]);
Figure 4-10. Unrolled, but not parallel, version of the loop from Figure 4-9.