This is a final report for an older project done for a
Berkeley class, so it is a rather low level description.
This project is not currently active, but I might continue it
You may want to skip to the result
section to see if this is of any interest to you.
High Quality Chroma Key
The main goal of this project is to write a high quality chroma key function
based on processing of foreground and/or background frames before combining
them into an output image. Such preprocessing is needed to fight some common
chroma key problems: hard edges around foreground objects and poor treatment
of colors close to the key color. Additionally, a high quality chroma key
algorithm is intended to serve as an example of a fairly complex real-world
algorithm which would be interesting to implement using Intel MMX or other
multimedia enhanced architecture. Taking into account large amount of effort
required for assembly language implementation of any algorithm and little
time available for the project, every effort was made to come up with a
most simple algorithm which would still produce good results.
A chroma key function is implemented based on a modified version of the
algorithm described in  . The algorithm works in (Y, Cb, Cr) color space
which is MPEG's native color space. In this way, incorporation of the chroma
key function into an existing system will not require extra color space
conversion. Schematic description of the algorithm is shown below. The
fundamental problem for a chroma key function is, first, to decide which
pixels in the foreground image belong to the blue backing and which - to
needed foreground objects and, second, how to identify and process boundary
pixels. Image value at such pixels is due to contributions from both an
object and the blue backing: pixel_value = k*Object_color + (1-k)*Blue_backing_color.
One would like to solve this equation for two unknowns, k and Object_color
having only one independent measurement, pixel_value. So, as we can see,
this problem is fundamentally underdefined.
If we come up with some reasonable estimate for the contribution of
the blue backing to the pixel value, the next step would be to subtract
this contribution from the pixel value and replace it with a scaled version
of the background image. In other words, the overall operation performed
by chrome key is CK_result = (fg_pixel_value - blue_backing_contribution)
+ Kbg*bg_pixel_value (eq 1). The problem now is to provide a concrete algorithm
for determining Kbg and blue_backing_contribution from the image data.
One such an algorithm is presented below.
First, we rotate (Cb, Cr) coordinate system by an angle defined by the
key color to obtain (X,Z) coordinate system. Second, we use a parameter
alfa (30 to 60 degrees were used for different images) to divide the color
space into two regions, one where the processing will be applied and the
one where foreground will not be changed (where Kbg = 0 and blue_backing_contrubution
= 0 in eq.1 above). Then we "suppress" pixel value (if it is inside the
working region) by applying the transformation Z' = Z X' = Z/tg(alfa).
This corresponds to projecting the pixel value vector along X axis onto
the line dividing the two regions. In this way, we subtract more from a
pixel if it is further away from the dividing line. All pixels on the X
axis are projected into zero chromaticity point. Amount of this suppression,
Kfg = X - abs(Z)/tg(alfa) is used to suppress the third channel, Y' = Y
- y*Kfg, where y is such that Y' = 0 for the key color. Next, we calculate
Kbg by interpolating between Kbg = 1 on the border of the area shown black
on the picture ("beyond" the key color Kbg is also set to 1) and Kbg =
0 on the line separating the working region from the rest of the color
space. Lines shown on the picture are lines of constant value of Kbg. Up
to this point, all transformations applied to the image are continuous,
so the result of applying eq.1 with blue_backing_contribution and Kbg calculated
in the described way is guaranteed not to have any hard edged not present
in the initial image. Unfortunately, real images contain noise which becomes
very noticeable after chroma key processing due to the fact that noise
in calculated Kgb gets multiplied by the pixel value in the background
image. Probably the simplest way to treat the noise is used here - a circle
of a certain radius around the key color is treated as if it is the key
color: Kbg = 1 within this circle. This can introduce discontinuities on
the circle boundary.
Although multiple enhancements to the algorithm described here are possible
(more sophisticated noise treatment is an obvious example), simplicity
of the algorithm is of great value both by itself and with regard to MMX
implementation which has to be done in assembly language. The algorithm
makes use of only two parameters (other that the key color) which need
to be adjusted for visual quality of the result - noise level and processing
angle and their meaning (what will happen if one increases or decreases
parameter value), is clear.
From the implementation point of view, the chroma keyer has the following
Results and discussion.
Typical raw foreground image to be used in chroma key is a set of objects/people
shot against very smooth blue backing. Quality of the image of this backing,
as it will be clear from the examples presented below, is crucial for the
quality of the overall process (as it is for any other chroma key implementation).
Unfortunately, it's hard to find "real" images (or frame sequences) intended
to be used for chroma keying, so the testing has been done on images which
only very remotely can be treated as a good material. Nevertheless, the
results are quite encouraging.
In examples below, please click on the image to see a large picture.
Example 1. Noise and color nearness. Foreground is an
example of colors in wanted objects being close to the key color and some
substantial noise in the "backing".
Original background and foreground images, respectively:
Result of applying chroma key with some default parameter settings (assuming
no noise). Results of noise in the foreground are apparent:
This image, in fact, represents the worst case - large area of constant
color in the background on which even small noise is very noticeable. The
same combined scene after adjusting the noise parameter looks nicer but,
as noted before, some hard edge appears around the barb wire:
Example 2. Fine detail treatment and intermediate results. This
is an example of a transparent object in the foreground scene (a cloud)
which would be lost with any standard chroma key. Common processing would
also have trouble with grass/sky and tree/sky border lines due to some
very fine details present there.
Original background and foreground images, respectively:
Output of foreground suppresser and backgroung*Kbg, respectively:
Combined image (sum of the two above):
The image in the last example was processed with noise level set to zero
which allowed the cloud above the tree to stay visible. Compare this image
with the output of a chroma key function which performs hard switch between
foreground and background:
As a second part of the project, it was planed to re-implement the chroma
key algorithm on MMX architecture as an example of a fairly complex real-world
application. One problem with this is that the algorithm used is essentially
a floating point one. It was not immediately clear to me that its integer
re-implementation will be able to provide the same final quality. The algorithm
was re-implemented using a combination of regular integer and fixed point
arithmetic. Images produced with integer version of the chroma key were
visually identical to those created by a floating point one. The price
was to reduce the working angle parameter range so that its tangent fit
into reasonable fixed precision representation - this disallowed small
and large angles and although did not present a problem for the test images,
can be a potential problem. Second problem is that the original version
uses division and there is no pdiv operation in the MMX instruction set.
This was possible to overcome by sacrificing some flexibility of the algorithm
- for example, at this stage noise circle was chosen instead of more involved
treatment which also was considered. Performance of the two versions (integer
and floating point) were compared by measuring time required to perform
chroma key on several images on a 180 MHz Pentium processor and taking
an average of the results. Performance measurements is a field of its own
and I'm far from claiming that absolute numbers of these very rough measurements
have any far reaching meaning, but they do show that integer implementation
is about three times faster than the original floating point one. This
is not unexpected taking into account larger latency of floating point
instructions compared with integer ones and apparent inability of gcc compiler
(optimization flag O2) to achieve perfect instruction scheduling. Size
of the assembly code is also smaller for integer version - 346 vs. 396
instructions inside the main loop.
Another implementation is currently being written using MMX emulating
functions provided by Intel with "The complete guide to MMX technology"
book  and software MMX registers. Currently, foreground suppresser and
key generator (see the chroma keyer structure diagram above) are written
and tested. This implementation should be 1-to-1 convertible into "real"
MMX by anyone who would like to do it - simply replace a function call
by a corresponding assembly language instruction. Most annoying problem
I encounter so far while implementing the algorithm in MMX is the different
representation for Cb, Cr and Y components (signed vs. unsigned) which
has to be explicitly addressed. There are certain things which MMX architecture
would have benefited from if they were present. First, a parallel division
(16-bit number divided by 8-bit number) would extent applicability of MMX.
In the current study, it was possible to avoid division by changing the
algorithm but this might be not an option for other cases. Second, images
have a tendency to be clustered, which means that chroma key processing
should not be applied to large areas. With current architecture, we essentially
apply the processing to all points in the image and then mask out the part
which should not have been processed. Adding an instruction which does
a conditional branch if the content of a 64-bit MMX register is zero would
avoid applying a several hundred instruction worth of processing to a piece
of data in none of the eight (or four) pixels needs it. Even if a penalty
for such instruction is several cycles per point, it can easily save half
of the total processing time for typical images.
Conclusion and future work.
A relatively simple high quality chroma key algorithm was developed based
on  . It has been implemented using both integer and floating point
arithmetic and performance results of the two were roughly compared. Because
people expressed enough interest to it during the poster session, I intend
to finish the implementation with simulation of MMX functions by Intel's
regular C routines and software MMX registers. I'd also like to do
"real" chroma-keying (i.e. on video sequences) - video might behave
differently from a single image but I suspect that at worst it will be
just like a hard - switching chroma key algorithm.
I would like to thank Professor
Rowe for suggesting this project to me and for helpful discussion during
the midway project progress presentation.
. Keith Jack, "Video Demystified", Independent Pub Group (Computer),
. Intel Corporation, "The Complete Guide to MMX Technology", McGraw
Hill Text, 1997.
Promised MMX-ready version of the chroma keyer is now available here
along with mdefs.h
file. This is a MMX re-implementation of integer
only chroma key. Please read the comments before using either of these.