The filter bank used in MPEG Layer-3 is a hybrid filter bank which consists of a polyphase filter bank and a Modified
Discrete Cosine Transform (MDCT). This hybrid form was chosen for reasons of compatibility to its predecessors, Layer-1
The perceptual model mainly determines the quality of a given encoder implementation. It uses either a separate filter
bank or combines the calculation of energy values (for the masking calculations) and the main filter bank. The output
of the perceptual model consists of values for the masking threshold or the allowed noise for each coder partition.
If the quantization noise can be kept below the masking threshold, then the compression results should be indistinguishable
from the original signal.
Joint stereo coding takes advantage of the fact that both channels of a stereo channel pair contain far the same information.
These stereophonic irrelevancies and redundancies are exploited to reduce the total bitrate. Joint stereo is used in cases
where only low bitrates are available but stereo signals are desired.
Quantization and Coding
A system of two nested iteration loops is the common solution for quantization and coding in a Layer-3 encoder.
Quantization is done via a power-law quantizer. In this way, larger values are automatically coded with less accuracy
and some noise shaping is already built into the quantization process.
The quantized values are coded by Huffman coding. As a specific method for entropy coding, Huffman coding is lossless.
This is called noiseless coding because no noise is added to the audio signal.
The process to find the optimum gain and scalefactors for a given block, bit-rate and output from the perceptual model
is usually done by two nested iteration loops in an analysis-by-synthesis way:
- Inner iteration loop (rate loop)
The Huffman code tables assign shorter code words to (more frequent) smaller quantized values. If the number of bits
resulting from the coding operation exceeds the number of bits available to code a given block of data, this can be
corrected by adjusting the global gain to result in a larger quantization step size, leading to smaller quantized values.
This operation is repeated with different quantization step sizes until the resulting bit demand for Huffman coding is
small enough. The loop is called rate loop because it modifies the overall coder rate until it is small enough.
- Outer iteration loop (noise control/distortion loop)
To shape the quantization noise according to the masking threshold, scalefactors are applied to each scalefactor band.
The systems starts with a default factor of 1.0 for each band. If the quantization noise in a given band is found to
exceed the masking threshold (allowed noise) as supplied by the perceptual model, the scalefactor for this band is
adjusted to reduce the quantization noise. Since achieving a smaller quantization noise requires a larger number of
quantization steps and thus a higher bitrate, the rate adjustment loop has to be repeated every time new scalefactors
are used. In other words, the rate loop is nested within the noise control loop. The outer (noise control) loop is
executed until the actual noise (computed from the difference of the original spectral values minus the quantized
spectral values) is below the masking threshold for every scalefactor band (i.e. critical band).
The backward compatible surround extension for MP3, introduced in 2004, is based on the binaural cue coding approach developed in cooperation with Agere Systems.
Several input audio channels are combined into a stereo output signal by a downmix process. In parallel, the most salient
inter-channel cues describing the multi-channel sound image are extracted from the input channels and coded compactly as
surround side information. The sum signal is stereo MP3 encoded and transmitted together with the surround information to the
receiver. There, after decoding the MP3 data, the surround decoder generates a multi-channel output signal from the sum signal
and the spatial cue information by re-synthesizing channel output signals which carry the relevant inter-channel cues,
such as inter-channel time difference, inter-channel level difference and inter-channel coherence.
Read more on MP3 Surround (pdf, 9 pages, 366 kB)