INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION

ISO/IEC JTC 1/SC 29/WG 11

CODING OF MOVING PICTURES AND AUDIO

ISO/IEC JTC 1/SC 29/WG 11 N7465

Poznań, PL – July 2005

Source:

Leonardo Chiariglione

Title:

Description Lossless coding of oversampled audio

Status:

Approved

 

1         Introduction

With the advent of high-capacity storage media in the early nineties, the interest for high-resolution audio home delivery has significantly increased. This trend has been recognized by the music industry, and has seeded the conception of Super Audio CD (SA-CD), an audio delivery format that combines the desired ultra high audio quality with the desire to reproduce both stereo and multi-channel audio recordings. A one-bit digital storage format has been found to comply with the most demanding consumer requirements with respect to audio quality. The one-bit coding format is referred to as “DSD” (Direct Stream Digital). Although on SA-CD a data rate of 64 times 44.1 kS/s (“64 Fs”), roughly equaling 2.8 MS/s is employed, DSD also allows for higher oversampling rates of 128 and 256 times fs that are mainly used for archiving purposes.

2         Motivation

The high data rates taken by the DSD format, clearly call for a lossless compression technique, for more efficient usage of storage space. This has resulted in a lossless coding technique for one-bit oversampled data, which has been coined “DST” (Direct Stream Transfer) [1,2].

3         Overview of technology

Within the lossless encoder and decoder, one can distinguish three stages of framing, prediction, and entropy coding. The framing process divides the original one-bit audio stream consisting of samples b Î {0, 1} into frames of length 37,632 bits, corresponding to 1/75 of a second, assuming a sampling rate of 2.8MS/s. The framing provides for easy “random” access to the audio data during playback. Prediction filtering is the first necessary step in the process of (audio) data compression.

Figure 1 Schematic overview of the encoder and decoder.

The prediction filtering step, shown in more detail in Figure 1a, attempts to remove redundancy from the audio bit stream b, by creating a new bit stream e, which is not redundant. Together with the prediction filter coefficients h, error stream e carries the same information as b. It is clear that the prediction signal z is multi-bit. The prediction bits q are derived from the multi-bit values z by simple truncation, indicated by the block labeled Q(z). As becomes clear from the decoder diagram (Figure 1b), the computationally demanding design of the prediction filter is only required in the encoder. The player only has to perform the, much less demanding, decoding process, where the most expensive operation is the FIR filtering process. Since the filtering needs to be performed on a one-bit signal, the implementation is straightforward and does not pose any problems. To enable complete reconstruction of the original bit stream on the decoder side, the prediction filter coefficients and the error bits have to be transferred for each frame. The decoder calculates the original bit stream from the error bits and the predictions. When proper prediction filters are used, the signal e will consist of more zeroes than ones and can thus result in a possible compression gain. Arithmetic encoding methods can be used successfully when accurate information on the probabilities of the symbols “0” or “1” is available. By also conveying this probability information in the bitstream, "arithmetic coding" is thus able to approach the upper limit to the achievable compression. In a full encoder, every channel has its own source model (consisting of the prediction filter and probability table), whereas only a single arithmetic encoder is used. To exploit the correlation between channels, however, it is also possible to let channels share prediction filters and/or probability tables. The lossless compression performance is demonstrated with wide-band 256 Fs recordings, and 128 and 64 Fs down converted versions of these, demonstrating the scalability of the algorithm. As is illustrated in Figure 2, the compression ratio h, typically amounts to 2.7-2.8 for an sampling rate of 64 times Fs.

Figure 2 - Compression ratio for 1000 frames of a recording (classical music, 6 channels).

[1]           Derk Reefman and Erwin Janssen, “One-bit audio: an overview”, JAES, vol. 52, no. 2, February 2004.

[2]           Erwin Janssen, Eric Knapen, Derk Reefman and Fons Bruekers, "Lossless compression of one-bit audio", ICASSP 2004.

4         Target applications

Archiving and storage of 1-bit oversampled audio signals and SA-CD.