Temporal Rate Conversion
Marsh Microsoft Technical Evangelist, TV and Video
Updated: December 4, 2001
On This Page
What are our temporal options?
Option 1: Operate display at input video rate
60Hz is OK for CRT displays in 10-foot viewing distance applications
It is acceptable to use 60Hz in cases where the display will be viewed from 10 feet away, for example, in an American family room PCTV application, but 60Hz is no good for desktop PC use because the flicker is intolerable when you sit close.
Unfortunately, 50Hz is just too slow to be tolerable for graphics, so for CRT displays in Europe, even for 10-foot viewing distances, it's a case of waiting for a proper motion vector-steered temporal rate conversion solution.
On desktop PCs: Change to 60Hz when you care about video quality
On desktop PCs it is unacceptable, because of the flicker, to run a PC CRT monitor at 60Hz when doing word processing and similar tasks. In applications such as this, the white background would flicker unbearably at 60Hz. But, as we have seen, if you display the 60Hz source video at the desktop monitor rate of 75Hz, it will judder badly. One possible compromise is to change the refresh rate to 60Hz for the cases where you care about the video quality; for cases that you don't care, just tolerate the judder.
Video analysis software can potentially make the best compromises automatically. It can determine whether the proposed temporal rate conversion will suffer from judder. For example, if it is film-originated material, then it will not judder. If it is a 50Hz source, and the display is set to 85Hz, then this will also be OK, because the difference frequency is high enough. It can also be set to only worry about removing the judder if the video window is larger than a particular size on the screen. By combining all this information with user preferences (all of which sensibly default if the user does not want to adjust them), it can decide whether to automatically change the display rate to 60Hz or leave it at the faster desktop PC rate of say 75Hz.
Slow "judder" is seen as "jumps"
When setting the display rate to be the same as the video source, you actually need to do it quite accurately. If the difference frequency is very small, then the judder is perceived as a periodic jump. This is caused by a frame drop or repeat. You see this on things like the NBC stock ticker. It is not acceptable to set the graphics card to 60Hz if the source material is actually 59.94. This is a difference frequency of 0.06Hz and will cause a jump every 16.7 seconds.
There are two basic strategies for stopping the "jump" from being a problem. The first option is to use a Phase Locked Loop (PLL) to genlock the oscillator on the PC's graphics card to the incoming signal, so that it is exactly synchronized with the oscillator in the TV station. This technique can run into problems with noisy TV reception and when channel changing. You don't want the entire timing of the PC monitor that is displaying both graphics and video to collapse just because the TV signal was temporarily interrupted by a bus driving past.
The second option is to get the clock frequency on the graphics card to accurately operate at the specified source video frequency (such as 59.94Hz). It won't however be synchronized with the TV station's clock, so they will drift relative to each other, leading to an occasional dropped or repeated frame. The idea is that these jumps will occur so infrequently that they will not be annoying. Don't forget that you will only see the jump if it occurs in the middle of your eye tracking an object that is moving.
Probably the best compromise is to use a method somewhere in between the two extremes. An example of this is a "Software PLL" that monitors the error between the two clocks and fractionally adjusts (tweaks or pulls) the clock on the graphics card when the error becomes too large.
Option 2: Use flat-panel displays instead of CRTs
Flat-panel displays avoid the temporal rate conversion problem because they can be run at the video source frequency without incurring flicker
Flat-panel displays don't suffer from flicker because of the sample-and-hold characteristics of the pixels. This means that they can be driven at the same temporal rate as the video source, such as 60Hz, thus avoiding the need to do temporal rate conversion.
LCD desktop monitor
If the judder problem only occurs on CRTs then get rid of CRTs
As we have seen, the problem of judder is confined to CRT displays. Looking at it another way, the CRT is the only display type that suffers from judder, because it is the only display type that is capable of good motion portrayal. It's a matter of personal opinion. Personally, I would prefer to watch a flat-panel display with smearing, rather than have to watch a CRT with flicker or a CRT with judder. My preferred choice, though, would be a CRT display that did not have flicker or judder because it produces the best image quality.
Given that the worst case is a CRT with judder, and if you believe that motion vector-steered temporal rate conversion is too hard to be implemented at consumer price points, then replacing CRTs with LCD and plasma displays is not a bad strategy.
The biggest consumer barrier to replacing CRTs with flat panels is cost
Despite the fact that the video image quality is not able to be as good on flat-panel displays, they do have other features that make them desirable to consumers. Benefits include:
They take up less desk space
You can hang them on the wall
Plasma displays can have larger screen sizes than direct-view CRT displays
They don't suffer from convergence and focusing problems
You can carry them without breaking your back
They look cool and impress your friends
If you assume these benefits make up for the loss in picture quality, then it all comes down to price in the shops. Flat-panel displays currently are 2 or 3 times the cost of a CRT display that provides the same functionality, but they are coming down in price.
42" plasma display
When will flat-panel displays penetrate the desktop PC market?
The good news is that over-capacity is driving LCD suppliers into the desktop market and driving down prices. Prices are falling a lot faster than was projected last year. The devaluation of Asian currencies is also helping in the short term, but may cause problems later if investment in new plants is cut.
The sweet spot for LCD desktop PC monitors is likely to be 15" with 1024x768 resolution and a 4:3 aspect ratio. For plasma displays, the sweet spot is likely to be 42" with 854x480 resolution and a 16:9 aspect ratio.
Initiatives based around DFP (Digital Flat Panel) interface are also helping establish flat panels in the desktop market. There are still some standards battles to fight before a DFP interface (or one of its competitors) becomes the norm on desktop PCs. Clearly what needs to happen is that a single standard needs to emerge in order to give consumers confidence that their new flat-panel displays will not quickly become obsolete.
The big question is whether LCDs will take over the desktop PC market. If so, then will they be able to do it quicker than the PC industry can come up with a motion vector-steered temporal rate conversion solution? If they will take over the market only about a year later than the 2-year timeframe for developing a motion vector solution, then we can probably just put up with the judder for that extra year.
Obviously, there is no absolute answer to the question. It is unlikely that LCDs will be the majority display for low-cost PCs, even in the 5- to 10-year timeframe. But perhaps video quality is not that important to the low-cost segment. Also it will be a while before motion vector-steered solutions can meet the required price points for this segment.
In the top-end desktop PC market, it is possible that flat-panel displays will make significant inroads, and this could mean a lesser requirement for motion vector-steered solutions. Time will tell. The purpose of this paper is to present facts rather than advocating which horse to back.
LCD desktop monitor
Option 3: Operate CRT display at a very fast refresh rate
Avoid judder by having a high difference frequency
This strategy for avoiding judder relies on the fact that if the difference frequency between the video input rate and the display rate is large enough, then you will not see judder. It is necessary to have a difference frequency of at least about 35Hz for the judder to be reduced to an acceptable level. You can use a linear temporal rate converter to process the video source of say 60Hz into a faster rate such as 95 or 100Hz. If the video source is only 50Hz, then you only need to run the display at 85Hz.
Today's PC monitors can only scan at around 75Hz at the resolutions users want to run them at
The maximum scan rate of a typical PC monitor is a limiting factor. The important spec figure is the horizontal scan rate (stated in KHz). This is the rate that the electron beam (dot) can be moved over the phosphor screen. Given a particular scan rate, you can use this to make a high resolution picture that has a slow refresh rate, or a low resolution picture that has a fast refresh rate, or anywhere inbetween.
After reviewing the specs for typical CRT monitors currently on sale, the following are my conclusions as to the maximum capabilities of each CRT monitor category:
14" 38KHz--800x600 progressive 72Hz
15" 64KHz--1024x768 progressive 80Hz
17" 69KHz--1280x1024 progressive 75Hz
21" 94KHz--1600x1200 progressive 75Hz
Given the limited scan rate of PC monitors, different users want to make different trade-offs between resolution and refresh rate, so standardizing on a particular refresh rate is not possible
Some PC users are very annoyed by flicker and so are likely to go for a higher refresh rate such as 85Hz, even though this means that there will not be as much scan rate left to produce a higher resolution. For other users, getting the maximum possible resolution is the most important thing, even if they have to put up with the slight flicker than you get when using, say, 72Hz.
Because it is not possible to standardize on a particular PC scan rate, broadcasting a higher temporal rate does not help
When the new TV standard was being developed, about 1996, one proposal was to broadcast, for example, a 72Hz TV signal. This can actually be done quite efficiently with the MPEG compression standard since it just involves adding more B frames, which don't need much transmitted data. Various proposals were considered, some involving temporal layering.
The problem is, however, that for some applications, 72Hz is too slow and for some applications, it is too fast. There is no one rate that fits all applications, so the conclusion was to stick with the existing 60Hz rate and convert it to higher rates when necessary. This is a sensible conclusion, but does rely on a blind hope that temporal rate conversion can be done at reasonable quality at consumer price points.
CRT scan rates are not as high as we would like
The horizontal scan rate specifies how fast the CRT is able to move the electron spot across the phosphor screen. It does not make much difference to the CRT whether the scanning of the beam is used to make more fields or frames or lines per field or frame. Given the finite amount of scan rate available, if you increase the temporal rate, then you need to decrease the number of lines.
For a top of the range 27"-36" direct view CRT-based living-room TV at the end of 1999, it is reasonable to expect a maximum scan rate of about three times NTSC line rate, or about 47KHz. Even this fairly modest scan rate is not with us in 1998 as a consumer product. A scan rate of 47KHz corresponds to 1024 x 768 progressive at 60Hz. This operating point is about as far as you can go with a consumer CRT design without incurring excessive costs. Up to this scan rate is obtainable with a not-insignificant cost increase over current 15.7KHz designs, but at least it can still be done at consumer price points.
This maximum CRT operating point is likely to be with us for a long time and probably for the rest of the CRT's natural life in consumer TVs. As well as cost, one of the big reasons why consumer CRTs can't scan fast is the large angle of deflection, which is needed to keep the depth of TVs to a minimum in the family living room. They also use high beam currents to provide high (excessive) light output, so the beam is harder to deflect.
PC monitors use a much smaller angle of deflection and have not got such a severe cost constraint. They are also not required to give out such a large light output, as they are intended for close up viewing. A typical 17" monitor is able to scan at around 70KHz. This allows 1280x1024 progressive at 75Hz or 1024x768 progressive at 85Hz.
Running CRTs at a fast scan rate in order to avoid judder is only viable on very top end PCs
As stated earlier, it is necessary to have a difference frequency between the native video rate and the display rate of at least 35Hz, before the judder is reduced to acceptable levels. Even in this situation, all that is really happening is that blur and smear are being substituted for the judder. This is an acceptable approach, as blur and smear are subjectively much less annoying than judder. The real problem with this option is that there is not enough available scan rate in most CRT displays.
Because of the fact that for 60Hz video, you would need to operate at a display refresh rate of at least 95Hz, it is really not an option for the 60Hz market, except perhaps in the top 1% of desktop PCs. For the 50Hz market, it is a much more viable technique, since you only need to operate the display at 85Hz to achieve the necessary difference frequency. Also in the 50Hz market, you are limited in your choice of other techniques, because running the display at 50Hz is not viable due to the excessive flicker.
Option 4: Use Motion Vector Steered temporal rate conversion
The only solution for video-orientated material on 75Hz CRT displays is motion vector-steered temporal rate conversion
A motion vector-steered system basically interpolates along the movement axis the same as the eye does. The problem is that it is difficult and potentially expensive to use motion vector standards conversion in domestic TVs and PCs.
If you cut corners you will see artifacts because, unlike with the motion vectors used in MPEG data compression, it is an open loop process. Any errors in the motion vectors show up as artifacts on the screen.
Temporal rate conversion is necessary in the TV broadcast world for the standards conversion of 50Hz PAL to 60Hz NTSC (so that European shows can be seen on American TV) and vice-versa. In recent years this has been perfected by products such as the Snell and Wilcox Alchemist PhC that use motion vector steering to avoid the judder that is produced if linear conversion techniques are used. Products such as the Alchemist PhC standards converter are very high quality and also extremely expensive (in the $100K to $200K range) and are intended for use in professional broadcast TV studios. Considerable challenges are faced in getting the technology down to consumer receiver price points. Despite this, much can be learned from these professional studio products.
TV standards of the world. NTSC is dark blue, PAL is yellow, SECAM is red.
What follows is a description of the task that a motion vector steered "standards converter" performs. There are various aspects associated with performing PAL to NTSC conversion, but by far the biggest challenge is converting 50Hz to 60Hz (or vice-versa). The temporal conversion task that this studio equipment performs is the same as the process needed to convert 60Hz video material to 75Hz for use on a CRT PC monitor. Because of this, it is useful to examine how a motion vector-steered standards converter works. The Snell and Wilcox Alchemist PhC standards converter is used as an example, because it represents the current state of the art.
Motion Vector-Steered Standards Conversion
Interpolation in standards conversion
Essentially what happens in "standards conversion" is that a nice smooth curve joins all the samples given by the input video signal. This is the same process that a filtered digital-to-analog conversion would have done. Once you have a continuous curve, samples can be taken at any place as required by the output video standard you want to produce. The curve-fitting process is actually done by a digital low-pass filter. The impulse response of an ideal low-pass filter with a cut-off at half the sample frequency is a sinx/x curve that passes through zero at the site of all other samples except the center one. You can therefore achieve the curve fitting by replacing each input sample by a sinx/x function. The process is carried out for the picture lines (vertical spatial samples), the horizontal pixels (horizontal spatial samples), and the fields (temporal samples).
Smooth curve is fitted to the input samples; output samples are then tapped off.
The sinx/x function actually has a response that spreads out to infinity in both directions. In practice, it is sufficient to take only about two reference samples on either side of the output sample you want to obtain, as after that the values of the function have diminished to small values. This means that for vertical spatial interpolation you need to take a minimum of four video lines and for temporal interpolation you need to take four fields. A converter that does this is called a 16-point converter (4 x 4). In digital filter terminology, the number of input samples that contribute to each output pixel in each dimension is called the number of taps. The converter referred to above has four vertical taps and four temporal taps. If you don't use enough taps, you produce a filter that lets through lots of ripples in the stop band, which can be seen as beating in the picture.
Problem with linear standards converters
The term "linear" is used to refer to a system that does not use motion vectors. It relies solely on standard digital filters. Without motion vector steering, with a conventional "linear" standards converter, you get judder and blur whenever things move fast. It's worth just recapping why this judder occurs.
If an object is moving, it will be in a different place on each successive field. Interpolating (averaging) between four fields gives four images of the object on the output field. The position of the dominant image will not move smoothly, so it will be seen to judder.
The judder can also be explained in terms of sampling theory. If the field rate is 50Hz, then by the sampling theorem, the maximum movement frequency allowable in the signal being sampled is 25Hz. Unfortunately, objects move a lot faster than this, so temporal aliasing nearly always occurs. This is not too much of a problem when a human views the material on a TV using the native video standard, because of the eye's ability to track moving objects. When the eye tracks the motion of the object of interest, the moving object is stationary relative to the eye's retina (it's as if it were not moving). This means that the temporal aliases are not seen. Unfortunately, when the video signal passes through a conventional linear standards converter, the aliasing causes interpolation theory to break down. The converter cannot tell the aliasing from genuine signals and resamples both to produce the output fields. These multiple alias images are the cause of the perceived judder.
A linear standards converter is faced with a dilemma of whether to keep the annoying judder or to apply considerable low pass filtering to change the object into a low resolution blur as it moves.
Despite the judder problems, many linear standards converters are still in use with English TV shows shown on American TV. Next time you are watching such a show, look at the background as the camera pans right or left, and you will see the judder.
The standards conversion solution: motion vector steering
Motion vector steering is a way of modifying the action of a standards converter so that it follows moving objects to eliminate judder in the same way that the eye does. As there is no judder, you don't need the low-pass filtering, so moving objects don't suffer the loss of resolution, and the picture stays looking clean and sharp as it moves.
A motion vector-steered standards converter analyzes the stream of input fields and identifies each object in the scene and figures out how it is moving. From that, it is able to work out where all the objects will be at the time that it wants to generate each output field.
Each region with the same movement is allocated a vector.
In practice, what happens is that the objects are shifted in the input fields on either side of the required output field and then the interpolating temporal digital filter method is used to find the exact pixel intensity values.
The signals that describe the motion of each of the objects within a scene are called motion vectors. It is the job of the motion estimation system to find these motion vectors.
Interpolation axis not parallel to time axis for moving objects.
Techniques for finding motion vectors
Standards conversion is a more demanding application than data compression. Failing to find a motion vector in data compression just means that the difference data increases (the amount of data goes up a bit), but in a standards converter, it leads to a noticeable artifact. It is no good to just take a technique designed for data compression and expect it to work well for standards conversion.
Block matching methods
This involves the searching for, and matching of, luminance levels of rectangular blocks of pixels. The method has the following problems:
Because the block-matching method uses luminance values, it is often confused by camera flash lights, fades, and objects moving into shade.
As it is just matching pixels, it is only accurate to the nearest pixel.
If the block size used is small, then you get lots of false matching; if it is big, then you miss small objects.
Because of the impractical amount of processing power that would be required, the method is limited to a small movement search range. This means the movement speed it can accommodate is low.
Straight block matchingA very large number of comparisons must be performed, and the number increases in an N to the power 4 law as you increase the range of motion that can be handled. Given that there is only a small amount of time available for processing (the time between temporal samples, which is under 17ms), it is only possible to use a small window (a small movement range).
Variable resolution block matchingThis is an attempt to reduce the number of comparisons needed. Initial passes use a very low-pass-filtered image to locate coarse picture movements and then home in on the areas in which large movements have occurred. By doing this, there is a significant chance that small moving objects will be completely missed. For example, in sports action, the ball would be lost.
Hierarchical spatial correlation (also block matching)Start using a large block of pixels for comparison (which avoids false matching) and then divide down when you know basically where things have moved to. This suffers from many of the same block matching problems as the other methods. It can only cope with scenes with a small number of moving objects.
Phase Correlation uses the fact that [displacement (from one temporal sample to another)] is proportional to [phase difference (from one temporal sample to another)] divided by [frequency (of the component being looked at)].
An FFT (Fast Fourier Transform) is performed on each field, mapping it into a two-dimensional frequency domain represented as amplitude and phase. The amplitudes are normalized to avoid any reliance on lighting levels, and then for each frequency component the phase values are subtracted to find the phase differences. Once the inverse FFT is performed, this gives a correlation surface in which the peaks represent movement.
What the peaks mean
If there is no movement then you just get a single large peak in the center of the correlation surface. If there is movement, then you get peaks displaced from the center point. The distance from the center is the distance moved between fields. The direction of the peak from the center is the angle of movement. The size of the peak is proportional to the number of pixels that have that movement. These are the candidate motion vectors together with a value that determines their significance.
The normalization, or boosting up of the frequency components is vital to the process. It makes the peaks into sharp peaks rather than just gentle mounds. With a sharp peak, it is easier to find the exact coordinates of the summit. With sharp peaks, there is less danger that the foot-hills from two adjacent peaks will add up to produce a false peak.
Phase correlation gets most of its information from edges in the picture since these generate a large number of frequencies.
Features of Phase Correlation
Phase Correlation actually measures the movement between fields rather than trying to infer it from luminance matches.
It is a fundamental strength of Phase Correlation that it actually measures the direction and speed of moving objects rather than trying to estimate, extrapolate, or search for them.
Because it does not use luminance amplitude information, it is not confused by fades, objects moving into shade, or flash guns.
It is not fooled by a noisy signal.
It will not miss small moving objects since they generate readily detectable high-frequency peaks in the correlation surface.
Because it is a purely mathematical process, it is accurate to subpixel resolution.
As well as producing the motion vectors, it is also able to assign a confidence value to each vector it produces. This is very useful when using the vectors to build real pictures.
Given that systems need to be able to cope with a large movement range, Phase Correlation actually requires less computation than block matching, so therefore larger windows can be processed, giving a larger movement range. Also larger windows mean that there is less chance of false matches occurring since lower frequency components can be compared.
Phase Correlation is better able to cope with periodic structures such as grills and fences. Because of the bigger windows, there is much more chance that a subtle feature of the grill can be used to identify one bar of the grill from another. Also, of course, in a block matching method, these subtle features would all just get lost in the noise anyway.
The development of this method owes a lot to research done at the BBC Research Labs. Their conclusion was that Phase Correlation was the only way of achieving the required accuracy.
Picture building using motion vectors
It takes more than a system that can accurately produce motion vectors to produce a good standards converter. The other parts have to be good too, since much of the magic is in the image building.
An image can be regarded as regions of pixels that are referred to as objects. The first task is to figure out the directions and distances that each of those objects will have moved at the intermediate time that you want to extract the output field. Once you have shifted the objects by the appropriate amount, you can then do the standards conversion interpolation process to the same accuracy as if the picture was not moving.
The first step in standards conversion is pre-processing. The incoming picture is divided into overlapping windows to make the processing more manageable and to allow parallel processing. Each window is however large enough to cope with a large movement range and the full judder visibility range of the human eye.
Finding the motion
Phase Correlation, as described above, is used to accurately determine the direction and distance of the movements of even very small fast moving objects. Although this process accurately finds the motion vectors, it does not tell us which pixels go with which motion vector.
Assigning the motion to pixel areas
The Phase Correlation process produces the correlation surface with peaks representing movement, but it is the job of the Image Correlator to figure out which pixels each movement belongs to. Candidate motion vectors are taken one at a time, starting with the one that corresponded to the largest number of pixel movements (the largest peak in the correlation surface). The entire input field is shifted by an amount specified by the candidate vector and then this is compared with the next input field in the sequence. Any pixels that are found to now be in the same place in each of the two input fields therefore must have the motion described by that vector. After the appropriate discarding of spurious pixels, the pixels with the same movement are grouped together as an object and assigned the motion vector. The picture shifting is actually done by, first, address shifting and then by using an interpolator to get the sub-pixel accuracy.
This process is not trying to look for motion, since this is accurately known from the Phase Correlator, but instead is looking for the outline of objects that have that known motion. The process is repeated for each of the candidate vectors. Some of the candidate vectors will be spurious ones and will not produce any pixel matches, so will be discarded.
Candidate vectors determined by the position of the peaks relative to the center.
Other vector-assignment techniques used by the image correlator
The method described up until now has been reasonably well defined. We now get to the black art stuff. There is actually considerably more related to the vector-assignment process than has just been outlined. The ability to eliminate spurious vectors, and establish confidence levels for the others, is essential to the ability to achieve artifact free conversion.
To help in the process of assigning motion vectors and to increase the confidence levels of the assignments, it is also necessary to take a step back to form a top-level view of what is going on in the scene. For example, it is necessary to establish, by looking at the candidate motion vectors from all the processing windows, to what extent the camera is panning or zooming or both. This can be done by combining all the vectors from all the windows into a histogram and looking for peaks. Having this top level information is useful when trying to decide between spurious and valid peaks in the correlation surface.
Where windows are adjacent and overlap, a window needs to add the candidate vectors from the neighboring windows to its list, so they too can be checked for pixel correlation. Around the outside of the picture various boundary effects can potentially confuse things.
Obscured and revealed backgrounds need special attention, and this is done by doing the shifting and the pixel correlations in both the forward and reverse directions along the candidate vector path. In the case of camera panning, the processes are identical, but if objects are moving, the forward correlation process is the only way of finding areas revealed, and backward correlation is the only way of finding the areas obscured. Both forward and backward motion vectors are fed to the motion vector-steered converter, and the appropriate one is used when building the objects in the required shifted input fields.
Motion vector steering
By simple geometry, the motion vectors are split into two. The first part says how to get from the current input field to the required output field point in time, and the second says how to get from the output field to the next input field. By doing offset writes to the input field RAMs, the objects in the current input field are shifted by the first part of the vector set, and those in the next input field are shifted by the second part of the vector set. The picture shifting is done by first address shifting and then by using an interpolator to get the sub-pixel accuracy. Because of lateral movement, it is also a requirement to use an interpolator in the horizontal dimension.
The actual writing to the source RAMs is controlled by yet more intelligence in the system that can adjust what gets written depending on vector confidence values. For areas where motion compensation has produced low confidence factors, then it mixes in some of the original video to fill in the gaps. The picture building is now complete, and the result is that each object is perfectly lined up in the current input field, the required output field, and the next input field. Another way of looking at this is that everything is now parallel to the time axis.
As far as the interpolation standards conversion process is concerned, it's as if the objects had not moved, and so all of the judder and loss of resolution problems go away. An interpolating digital filter is used to find the exact pixel intensity values. In practice even the actual interpolation stage does a final tweak based on the relative confidence factors of each of the four fields.
Note that the required interpolation axis is not parallel to the time axis.
We have seen that for close-up viewing of a CRT display, to avoid flicker it is necessary to have a refresh rate of something like 75Hz. For distance viewing, the minimum rate is 60Hz. This means that for desktop applications and in European living-room applications, we need to be able to increase the temporal rate of the incoming video if we are going to continue to use CRT displays.
CRTs have some negative and some positive features. On the negative side, they suffer from flicker, have judder if linear temporal rate conversion is used, and are bulky. On the positive side, their impulse characteristic gives them superior motion portrayal compared with the blur-and-smear you get from flat-panel displays.
The problems with temporal rate conversion all come down to the fact that the temporal sampling rate used for video is not fast enough to fully portray the motion in the scene, so the result is temporal aliasing. You cannot just use linear interpolation techniques, because a linear system has no way to distinguish the valid motion frequencies from the frequency components due to aliasing. The only way to do it properly is to do what the eye does--track the motion of the objects in the scene.
Various options are open to us depending on the application. For American living-room applications, it is acceptable to just stick with 60Hz and therefore avoid the problems of temporal rate conversion. Using 60Hz is not acceptable for desktop CRT monitors, although you can switch to 60Hz when you want to push your chair back and view full-screen video. Even when operating at 60Hz, you need to be careful to accurately set the display rate to be the same as the source, or the result will be periodic jumps.
The second option is to stop using CRTs in desktop applications and in European living rooms. Flat-panel displays are becoming more common and affordable. They are not capable of achieving the dynamic resolution and picture quality that properly driven CRTs can achieve, but they do have other advantages. In addition to impressing your friends, they don't suffer from flicker, so can be driven at the native video source rate, avoiding the temporal rate conversion judder problem. When the cost becomes more reasonable, they are likely to provide a good solution.
The amount of judder you get when doing linear temporal rate conversion is inversely proportional to the difference frequency between the source video and the display rate. This means that 24Hz film-originated material can be linearly converted to, say, a display rate of 75Hz without judder. A difference frequency of something like 35Hz is required before the judder is reduced to an acceptable level. If the video is 60Hz, then the display rate would need to be about 95Hz, which is not very practical given the limited scan rate available in PC monitors. If the video is 50Hz, then the display rate only needs to be 85Hz, which is a bit more reasonable. The use of a high refresh rate to solve the judder problem is definitely an option for top-end desktop PCs in Europe.
The "Holy Grail" solution to the judder problem is to do what the eye does--track the moving objects in the scene. This technique is referred to as motion vector-steered temporal rate conversion and is employed in top-end NTSC/PAL standards converters used in the professional TV broadcast studio world.
To do motion vector steering, it is necessary to assign accurate motion vectors to every moving object in the scene. It is then necessary to shift these objects to where they should be at the point in time that you want to create the new frame. This process must also take account of the fact that objects change shape with time and move in front and behind each other. The process is extremely complex, but does work, as proved by products such as the Snell and Wilcox Alchemist PhC standards converter described in this paper. To do judder-free 60Hz-to-75Hz temporal rate conversion for PC CRT monitors, we would need to do the same process, but at a consumer price point.
The challenge is to see whether someone can design a motion vector-steered standards converter on a single chip. The closest example available today at consumer price levels is the Philips Melzonnic field doubler designed for European 100Hz TVs. "Consumer pricing" would be around $10 for a chip that can perform this function. Reasonably priced consumer TVs using similar motion vector steering techniques have been available in the shops in Europe since 1995. Philips is now working to apply this technology to the problem of converting 60Hz material to 75 Hz display rates. At the WinHEC 98 conference, they demonstrated a system doing judder-free 60-to-75Hz conversion. It was not perfected and it produced various artifacts, but the work is very encouraging.
The big question is whether to put the effort into solving the flicker and judder problems associated with CRT displays or whether to just replace them as soon as possible with flat-panel displays.