How It Works

So how does it all work? The key to ATRAC's success, and that of its successors, is that it is lossy. Based on the principles of psychoacoustic theory, lossy encoding throws away information that's deemed 'inaudible' or imperceptible to human beings, thus reducing the amount of information that needs to be stored without losing too much in terms of quality.

Several techniques are used to decide what needs to be thrown away, or more aggressively compressed. Below, I've outlined the fundamental methods used by ATRAC, MP3 and other compression techniques.

The most important technique to understand about lossy audio encoding is masking. This relies on the phenomenon whereby quiet sounds are drowned out by louder ones. Lossy encoders take advantage of this, deem those sounds inaudible and throw them away completely, or simply dedicated less space in the final file to them. Imagine having a conversation next to a runway at Heathrow. Odds are that, when a 737 fires up its engines, you'll suddenly not be able to hear the person next to you. This is what the masking technique relies on, although the effect is somewhat more subtle.
----
/94/152b33/b5c1/5982-Threshholdhearingfigure1.gif

A simplified representation of the masking effect at work. Sounds falling under the dotted line cannot be heard and can, therefore, be aggressively compressed or simply thrown away.

----
Another psychoacoustic effect that music compression relies on is the variable sensitivity of the human ear to different segments of the audio spectrum. As well as not being able to hear above 20kHz and below 20Hz, the human ear's sensitivity varies logarithmically within that range. It's much more sensitive to sounds, for example, at 4kHz than it is at, say 100Hz.

Audio compression algorithms take this information into account when coding and compressing music and split the audio signal into different bands. The original ATRAC system, for example, split the signal into three bands (0 to 5.5 kHz, 5.5 to 11 kHz, and 11-22 kHz), allocating more bits to the band in the more critical parts of the audio spectrum than to the ones at its extremes.

comments powered by Disqus