Lossy Compressed Image Formats Study

Mozilla Corporation, October 2013

Introduction

This study compares the compression performance of four different image formats: JPEG, JPEG XR, WebP, and HEVC-MSP. The latter three formats were chosen because they are frequently discussed as possible JPEG successors.

It is our intent to only address compression performance in this study. Other technical, legal, and market factors that might be considered when evaluating codecs are outside the scope of this study.

Quality Comparison Algorithms

We chose to test with four algorithms:

Y-SSIM
- Structural Similarity algorithm [4] applied to luma channel only.
RGB-SSIM
- Average of Structural Similarity algorithm [4] applied to R, G, and B channels.
IW-SSIM
- Information Content Weighted Structural Similarity algorithm [5] applied to luma channel only.
PSNR-HVS-M
- Peak Signal to Noise Ratio taking into account Contrast Sensitivity Function (CSF) and between-coefficient contrast masking of DCT basis functions [6].

All of these algorithms compare two images and return a number indicating the degree to which the second image is similar to the first. In all cases, no matter what the scale, higher numbers indicate a higher degree of similarity.

It's unclear which algorithm is best in terms of human visual perception, so we tested with four of the most respected algorithms.

Image Sets

Lenna: Widely used Lenna image, a 512x512 PNG.
Kodak: 24 PNG images from the Kodak Lossless True Color Image Suite.
Tecnick: 100 images from Tecnick's public test images. Images used are the original size RGB color images.

We had planned to include a set of very large images (~20 megapixels each) but some encoders had issues with them. Because this does not allow for a full comparison, the image set was cut for this study. Encoder developers were notified of any problems found.

Methodology

All evaluation results should be easily reproducible using publicly available tools.

MATLAB is the only non-free (as in beer) software used. We recommend tweaking the test harness to use GNU Octave if you would like to test without access to MATLAB.

The following software is used to generate results for this study:

Testing scripts written specifically for this study were used to obtain all results. The scripts are available on github.
Encoding and decoding for JPEG, JPEG XR, and WebP is done via custom encoder and decoder wrappers, largely in order to input and output Y'CbCr 4:2:0 consistently. C source code for these wrappers is available in the same github repository as the test scripts. Such a wrapper is not necessary for HEVC-MSP. After JPEG encoding, a program called jpgcrush is used to further optimize the JPEG because the JPEG encoder does not implement this lossless optimization.
identify is used to extract width and height information from images, and convert is used to convert between PNG and PPM formats. Both tools are part of the ImageMagick tools. This study uses version 6.8.6-6.
libjpeg is used to encode and decode JPEG images. This study uses version 9.
jxrlib is used to encode and decode JPEG XR images. This study uses version 1.1.
libwebp is used to encode and decode WebP images. This study uses version 0.3.1.
TAppEncoderStatic is used to encode HEVC-MSP bitstreams, and TAppDecoderStatic is used to decode HEVC-MSP bitstreams. Both tools are part of the jctvc-hm software package. This study uses r3531 of the SVN-based source code. See testing code to learn about invocation.
ssim (Structural Similarity) is used to calculate an RGB-SSIM value between two PNG files. We use a C++ implementation. We average the R, G, and B results.
IW-SSIM is measured using MATLAB scripts provided by Zhou Wang of U Waterloo. Some function implementations provided by matlabPyrTools. These are run using MATLAB.
Y-SSIM is measured using a MATLAB script (ssim.m) provided by Zhou Wang of U Waterloo. It is run using MATLAB.
dump_psnrhvs is a C++ program, part of daala's tools, used to calculate a PSNR-HVS-M value between two Y'CbCr files. Daala git revision 4997e529e81bd3f50051fb84ff2aa6759b24caa6 was used.

PNG test images are converted to CCIR 601 full-range Y'CbCr 4:2:0, which is then fed directly into the encoders. In order to convert back to PNG for quality scoring (e.g. SSIM) we decode to Y'CbCr 4:2:0 and then encode that to PNG. Doing this consistently allows us to avoid pre and post processing done by production encoders and decoders, and we test the encoding algorithms themselves as closely as possible. Direct encoding and decoding for JPEG, JPEG XR, and WebP is done via custom encoder and decoder programs (source code on github with the testing scripts) which call directly into the encoding and decoding APIs. The HEVC-MSP encoder and decoder accept and output Y'CbCr 4:2:0 directly, so no custom program is necessary.

HEVC-MSP files are penalized 80 bytes per image file because HEVC-MSP is just a bitstream with no container. This penalty approximates the size of container data.

The algorithm used, where F is the format being evaluated and Q is a JPEG quality level:

Compress source PNG image to JPEG at quality Q, with Y'CbCr 4:2:0 as the intermediate.
Convert JPEG back to PNG using Y'CbCr 4:2:0 as the intermediate.
Record quality score between the source PNG and the PNG produced from the JPEG, as well as the JPEG's file size.
Perform binary search of the target format's quality range, with interpolation, to find the file size of the image in format F that matches the JPEG quality score. The same process used for JPEG is used to compress and compare images in the target format F.
Calculate the file size ratio for format F to JPEG.

For image sets including multiple images, the result will be the arithmetic mean.

Note: Video formats such as VP8 and HEVC typically use 'studio swing' Y'CbCr with a restricted range of 16 to 235 instead of full range of 0 to 255. When working with RGB data the scaling for studio swing is accomplished as part of the colorspace conversion process. Some image formats derived from video formats, such as WebP, inherit video's conventional range in their common RGB conversions. In our study we adopted a methodology which uses identical colorspace conversion for all formats because the objective metrics were developed against greyscale images. While these metrics correlate well with perception in their intended applications they are known to exaggerate the perceptual impact of small brightness or contrast shifts that can be caused by differences in colorspace conversion. As a result the study does not consider the effect of colorspace or range difference that would typically be found in production, and manual visual spot checking did not suggest the conversion had a large effect on perceptual quality.

Results (Raw Data)

The following Excel (.xlsx) files contain the full results for this study. These can be opened with MS Excel or LibreOffice 4.1.x.

Change over JPEG Quality Range at Equivalent Y-SSIM

The goal for this section is to visualize file size ratios, where JPEG is always 1.0, over a range of JPEG qualities. File sizes are recorded at equivalent Y-SSIM values. In each graph, the Y axis represents file size ratios. The X axis represents a range of JPEG quality values. There is one graph for each image set.

Graph 1: Lenna.png, Y-SSIM quality metric, lower is better

Graph 2: Average for Kodak image set, Y-SSIM quality metric, lower is better

Graph 3: Average for Tecnick image set, Y-SSIM quality metric, lower is better

Change over JPEG Quality Range at Equivalent RGB-SSIM

The goal for this section is to visualize file size ratios, where JPEG is always 1.0, over a range of JPEG qualities. File sizes are recorded at equivalent RGB-SSIM values. In each graph, the Y axis represents file size ratios. The X axis represents a range of JPEG quality values. There is one graph for each image set.

Graph 1: Lenna.png, RGB-SSIM quality metric, lower is better

Graph 2: Average for Kodak image set, RGB-SSIM quality metric, lower is better

Graph 3: Average for Tecnick image set, RGB-SSIM quality metric, lower is better

Change over JPEG Quality Range at Equivalent IW-SSIM

The goal for this section is to visualize file size ratios, where JPEG is always 1.0, over a range of JPEG qualities. File sizes are recorded at equivalent IW-SSIM values. In each graph, the Y axis represents file size ratios. The X axis represents a range of JPEG quality values. There is one graph for each image set.

Graph 1: Lenna.png, IW-SSIM quality metric, lower is better

Graph 2: Average for Kodak image set, IW-SSIM quality metric, lower is better

Graph 3: Average for Tecnick image set, IW-SSIM quality metric, lower is better

Change over JPEG Quality Range at Equivalent PSNR-HVS-M

The goal for this section is to visualize file size ratios, where JPEG is always 1.0, over a range of JPEG qualities. File sizes are recorded at equivalent PSNR-HVS-M values. In each graph, the Y axis represents file size ratios. The X axis represents a range of JPEG quality values. There is one graph for each image set.

Graph 1: Lenna.png, PSNR-HVS-M quality metric, lower is better

Graph 2: Average for Kodak image set, PSNR-HVS-M quality metric, lower is better

Graph 3: Average for Tecnick image set, PSNR-HVS-M quality metric, lower is better

Bibliography and Relevant Reading

WebP Compression Study, Draft 0.1. May 18, 2011. Google.
HD View: JPEG XR updates. May 30, 2013. Matt Uyttendaele (Microsoft).
Structural similarity. Wikipedia.
The SSIM Index for Image Quality Assessment.
- Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004.
IW-SSIM: Information Content Weighted Structural Similarity Index for Image Quality Assessment.
- Zhou Wang and Qiang Li, "Information Content Weighting for Perceptual Image Quality Assessment," IEEE Transactions on Image Processing, vol. 20, no. 5, pp. 1185-1198, May 2011.
Nikolay Ponomarenko homepage - PSNR-HVS-M download page.
- Nikolay Ponomarenko, Flavia Silvestri, Karen Egiazarian, Marco Carli, Jaakko Astola, Vladimir Lukin, On between-coefficient contrast masking of DCT basis functions, CD-ROM Proceedings of the Third International Workshop on Video Processing and Quality Metrics for Consumer Electronics VPQM-07, Scottsdale, Arizona, USA, 25-26 January, 2007, 4 p.

Contributors

Mozilla Corporation

Josh Aas (Primary)
Gregory Maxwell
Jeff Muizelaar
Tim Terriberry