Mp3 Limitations

The Mp3 format has some limitations restricting its coding efficiency compared to other formats. Some of them could be considered as design issues, and other ones are simply tools or features not available in Mp3, but available in other coding schemes.

Mp3 limitations

Joint stereo limitations:

Mp3 can not switch joint stereo mode for specifics scalefactor bands. If joint stereo is used, it has to be used for all the bands. This is rather inoptimal, and is limiting the use of joint stereo. As an example, imagine the following situation:
The lower frequencies are featuring an instrument playing on the far left, and frequencies around 1500Hz are featuring a singer in the middle of the stage.
In such a situation, it is not possible to use joint stereo with Mp3 because of the lower frequencies part which is too different between both channels. A further bitrate reduction could have been achieved if it was possible to toggle joint stereo mode on a scalefactor band basis. (in this case regular stereo would have been used for the lower frequencies, and Middle/Side stereo for the remaining part of the frequency spectrum)

Too limited maximum frame size:

Even if a buffer is available (the bit reservoir), the total size of information belonging to a frame (data inside the frame + data from the bit reservoir) is limited. The ISO standard defines the maximum size to be the size of the buffer for 320kbps frame. Unfortunately, in some (limited) cases this limit seems to be too low, leading to unavoidable degradations of the sound quality.

Inoptimal window sizes:

The time/frequency resolution of Mp3 is inoptimal. It is either 576 samples for a long block, or 192 samples for a short block.
On long blocks, the number of samples is limiting the frequency resolution, and so the coding efficiency.
On short blocks, the number of samples (being too high) is limiting the time resolution. 192 samples are translated into a time resolution of 4.3ms for a sampling frequency of 44.1kHz. This is too high in case of some percussive sounds, and can lead to a lack of sharpness, or to pre-echo.
This was corrected by the ISO comitee in the design of AAC, which is using window sizes of 1024 samples (in case of long blocks) or 128 samples (in case of short blocks)

Scalefactor band 21 problem:

The last scalefactor band (sfb21 for long blocks or sfb12 for short blocks) has no own scalefactor. This scalefactor band covers the range from 16kHz up to the higher frequency limit, when using 44.1 or 48kHz sampling frequency.
If the resolution of this part of the spectrum must be increased (determined by the psychoacoustic model), the local scalefactor, which is missing, can not be used to adjust resolution. In this case, the only solution is to adjust the global gain value, but this global gain is impacting every scalefactor band.
To increase sfb21 resolution, the global gain value has to be reduced. To balance this, scalefactors of other scalefactor bands can be reduced. But once they reach a value of 0, they can not be reduced anymore, meaning that an higher than needed resolution will locally be used in those bands, leading to an inflate of the bitrate. When encoding sfb21 content, it is common to encounter some scalefactor bands that are encoded with a too high resolution just to accomodate the coding needs of sfb21

Hybrid transform scheme:

Layer III is using MDCT transforms, bu in order to maintain backward compatibility with Layer II, it does the MDCT transform on top of the 32 subbands produced by the PQMF filter of Layer II.
While the MDCT stage itself is lossless, it is not the case for the PQMF filter bank. In the transform process, this first stage introduces some noise that can not be totally removed. Using a plain MDCT from the beginning would produce a better result (but would loose compatibility with Layer II).

Mixed blocks limitation:

The Mp3 standard allows mixed blocks, but only in a limited way.
Mixed blocks are blocks where the 2 first subbands are using long block structure, while the upper bands are using short block structure. This is usefull to reduce pre-echo in case of transcients, while keeping a good frequency resolution in the lower part of the spectrum. Unfortunately, as defined by the ISO standard, it is not possible for a mixed block to follow or to be followed by a short block. This is a severe restriction regarding when to use mixed blocks, and is imposing additionnal complexity to the encoder in order to be able to use them.

Missing features

An other point is that some new encoding schemes are featuring additionnal coding tools: