Understanding Requirements for High-Quality 3D Video: A Test in Stereo Perception

By Dmitriy Vatolin, YUVsoft

Nowadays, 3D display devices such as stereo 3D projectors for cinema, stereo TV and autostereoscopic glasses-free displays have become widespread. There is little doubt that the future of digital video is 3D, but how do different people perceive stereo, what is important for a comfortable 3D viewing experience, and how do variations in perception affect the requirements for high-quality 3D displays and stereo video processing algorithms?

Although the first stereoscopic videos appeared in the first half of 20th century, the particulars of human stereo perception and the question of how to measure stereo quality are still understudied. Despite extensive research, especially in recent years, no commonly-accepted approaches to subjective and objective measurement of stereo quality have emerged. Moreover, a lack of data and insufficient understanding of how the human visual system works with stereo images have made developing practical guidelines for digital stereo difficult. These areas require further study to refine the principles of stereo vision and to develop standards for appealing stereo video that is easy on the viewer’s eyes.

Initial questions

My company’s main interests are stereo correction, stereo conversion to multiview and 2D-to-stereo 3D conversion using depth maps. From a practical perspective, the following initial questions arise when developing algorithms and software for 3D video conversion and quality improvement:

• How significant is the difference in stereo perception among different people? What does the stereo perception ‘distribution’ function look like?
• Which characteristics of a stereo image are important for subjective perceived quality?
• In which cases are stereo artefacts (due to imperfect 2D-to-3D conversion, stereo mismatch or a distorted depth or disparity map) noticeable, and when are they not so noticeable? When are perfectly-detailed depth maps important, and when are they superfluous?

Figure 1 - Example of a 2D image and its synthesised depth map. Brighter areas correspond to closer objects and darker areas to more-distant objects.

A number of tests were conducted to find partial answers to these questions, and the work is still ongoing. The following issues were studied first:

• Stereo sensitivity, ability to perceive 3D.
• Subjective stereo acuteness.

Setup

For these tests, computer-generated stereo images (stereograms) were used. Each image was acquired by taking a uniform 2D texture image and synthesising a stereomate through 2D image displacement according to a given depth map. Smaller shifts correspond to smaller depth values, larger shifts to bigger values, and no shift to a 0 depth value (black in the depth map). Each depth map contained an easily-recognisable pattern that usually took the form of Arabic numerals. Because of texture uniformity, the encoded pattern could be quickly discerned only in 3D.

Figure 2 - Test image generation.

A sequence of stereo images was presented to an individual, each image within several seconds of the previous one, and the individual then wrote a free-response statement of what he or she saw in each image. 40 respondents, randomly selected from students and post-graduate students, were tested. Some of the respondents had very little experience of digital stereo video, and no stereographers were included in the sample. The participants were asked about their visual acuity, and the collected information was used in the analysis to account for this factor, but it was not measured exactly. The data was accumulated and studied according to various criteria and classification factors.

Different textures were chosen and classified qualitatively according to recognition simplicity, colour, detail level and dispersion. The resulting set contained more than 20 samples.

Figure 3 - Texture samples.

Equipment Specifications

We used Digital Projection Titan 1080p-700 projectors, a 9m wide screen, and circular polarised glasses. The distance of the viewer from the screen was approximately 10m, and the maximum disparity between left and right view images was approximately +/-0.2% of the screen width, corresponding to about +/-2cm on the screen. Images were generated and shown in full 1080p resolution.

Stereo sensitivity

For the stereo sensitivity test, different numbers were used, such as the number 38 shown in Figure 2. The numbers had random positions and varying sizes. Depending on the pattern size, the stereo sensitivity experiment was divided into a ‘normal’ test (large-sized numbers) and an ‘advanced’ test (small-sized numbers).

The results can be analyzed both in terms of respondents and in terms of textures. Figure 4 shows the distribution of correct responses for all textures in the advanced test. Around 30% of tested individuals provided correct responses in more than 90% of cases, and another 30% were not able to provide correct responses in more than 90% of cases. The latter were classified as practically 3D-blind according to this test.

Figure 4 - Correct response distribution for the stereo sensitivity advanced test.

The stereo sensitivity distribution in this case looks similar to the normal distribution, but a lack of sufficient data prevents any definite conclusions at this point – it is possible that ‘cut-off’ points and gaps exist, as shown below. Also, exact information on 2D visual acuity is needed to accurately differentiate results and account for outliers.

To clarify the conclusion about ’3D blind’ and ’3D keen’ groups, the histogram can be inverted by changing axes and aggregating data for the percentage of correct responses, as shown below.

Figure 5 - Number of respondents per number of correct responses for the stereo sensitivity advanced test.

Caveats

An important caveat must be made at this point: the term ’3D blind’ should not be interpreted as the absence of or great weakness in depth perception as such. It simply means that the viewer has weak stereoscopic vision in the case of digital stereo. The brain receives many visual depth cues, including perspective, familiar objects and their relative sizes, shades, edge sharpness and so on, and it also estimates depth from motion when the head and body are moving and from occlusions of moving objects. So, a person usually has very good depth perception, even with only 2D vision.

The gaps present in the ‘medium’ sensitivity group could have been caused either by insufficient data or by existing phenomena, resulting in a more complex distribution than current data would indicate.

Results for the advanced test could be significantly biased because of low acuity of 2D vision in some respondents. Results for the normal test proved to be quite similar, however, the data exhibits the same large groups for low and high sensitivity to stereo, although the ’3D keen’ group is larger in the normal case. Declared visual acuity data confirms that the ’3D blind’ group includes more than just short-sighted people. So, although low visual acuity (especially in the case of varying acuity between the left and right eyes) greatly decreases 3D sensitivity, other factors also contribute. Some participants had normal or close-to-normal 2D vision but were found to be nearly blind in the 3D sense.

Figure 6 - Distribution of correct responses for the stereo sensitivity normal test.

Figure 7 - Number of respondents as a function of correct response percentage for the stereo sensitivity normal test.

The analysis by textures confirms the reasonable hypothesis that texture type greatly influences pattern visibility in stereo.

Figure 8 - Distribution of correct responses by texture for the stereo sensitivity normal test.

The effects of textures

Clearly some hard textures complicate 3D vision, obfuscating disparity perception and interpretation. Not surprisingly, more-detailed textures with greater dispersion allow viewers to more easily notice 3D signs, and smoother, less detailed textures with fewer sharp edges tend to hide 3D signs. Hard (difficult) textures are more sparse – they contain fewer easily-recognisable feature points to match left- and right-view images and thus highlight disparity, making 3D signs less visible. No exceptions were found in this test.

Figure 9 - 'Easy' texture sample.

Figure 10 - 'Hard' texture sample.

Changing responses over time

When analysing how the percentage of correct responses changed over time, clear signs of adaptation became apparent: on average, responses for the first test images were more often incorrect than they were for the final ones. Moreover, after training during the test, participants sometimes were able to see signs in the advanced test that they failed to discover in the normal test for the same textures.

Stereo acuteness

For the stereo acuteness test, or disparity acuteness measurement, subjective susceptibility to small depth changes was tested using the pattern shown in Figure 11 below.

Figure 11 - Depth map pattern for the stereo acuteness test.

The first column corresponds to the maximum difference in stereo disparity between the texture background and collar areas, that is, taking into account negative and positive displacements, about 0.2 + 0.2 = 0.4% of the screen width, or 8 pixels, and the eighth column to about 0.05% of the screen width. Participants were asked to write down the configuration of collars for each column in the pattern. As in the stereo sensitivity test, different textures were displayed.

Because of the smaller relative size of the collars, they were less visible than digits in the stereo sensitivity test, but people with good 3D vision were still able to discern them easily. For hard textures, more than 80% of participants failed to correctly identify the configurations in almost each column. In most cases, respondents only correctly identified the configurations of one or two columns.

The resulting overall data reveals quite similar but slowly falling susceptibility to varying depth grades, with a steep cut-off point for the eighth column, as shown in Figure 12.

Figure 12 - Correct responses by pattern column.

The irregular shape of the histogram may be caused by random deviation owing to the rather small sample, imprecise texture uniformity and finite accuracy of the stereo generator used in the experiment, as the difference in disparity between adjacent columns corresponds to one pixel.

The difference in the disparity of the texture background and collar areas for the eighth column was about 0.5cm on the screen and one pixel in terms of image resolution. A careful study is needed to determine the true causes of the cut-off point in the data for this disparity. In addition to the above-mentioned consideration, limitations of the projecting hardware may also have affected the results.

Conclusion

On the basis of these stereo sensitivity and acuteness tests, several conclusions can be drawn and common-knowledge opinions confirmed:

• Variance in subjective stereo perception is very large. Up to 30% of test participants are barely susceptible to stereo for, apparently, various reasons – even in the case of close-to-normal 2D vision. Weak 2D vision is not the only cause of bad stereo vision, although it does have an influence. Thus, the question arises, to what extent is it conditioned by the particulars of artificial digital stereo, and to what degree is it due to the individual properties of eyes and the brain’s sensory system?

The second question relates to whether there is a strong relationship between stereo sensitivity and comfort when viewing the same stereo video. If such a relationship exists, how should 3D video be prepared for people with different stereo perception characteristics?

• Subjective stereo perception is adaptive. This conclusion does not relate only to the latency of proper eye convergence; some brain learning is involved. After training, people notice more 3D details under the same conditions. This result is related to the assertion that drastic depth changes over time should be avoided in stereo video, as they ‘defocus’ stereo vision.

• For depth map construction and stereo generation, an important conclusion is that roughness and deviations in depth that are irrelevant to the underlying 2D image are noticeable as unpleasant artefacts only in highly-detailed areas with sharp edges. So, the masking effect of rough surfaces in 2D images, when artefacts in detailed areas are often imperceptible, works in the opposite way with stereo. Likely, very irregular textures are still less revealing because the brain must match numerous random-looking features that are hard to discern.

• 2D-to-3D conversion and stereo correction artefacts in flat uniform areas are invisible to nearly all viewers; only the borders of such areas should be accurately processed.

If you have a 3D display or a pair of red/cyan glasses (red for left eye, cyan for right) you can check your stereo vision using the methods described in this article by watching the following video:

Dmitriy Vatolin is the CEO of YUVsoft, an R&D company offering professional software for 2D-to-stereo 3D semiautomatic conversion and stereo processing. YUVsoft would like to thank the Video Group of the Graphics & Media Lab at Moscow State University’s Faculty of Computational Mathematics and Cybernetics for conducting the tests described in this article.

www.yuvsoft.com

This entry was posted in Articles, S3D Display Articles and tagged , , , . Bookmark the permalink.

Leave a Reply