AUDIO FILE FORMAT
- From Wikipedia
An audio file format is a container format for storing audio data on a computer system.
The general approach towards storing digital audio is to sample the audio voltage which, on playback, would correspond to a certain position of the membrane in a speaker of the individual channels with a certain resolution — the number of bits per sample — in regular intervals (forming the sample rate). This data can then be stored uncompressed or compressed to reduce the file size.
|
Types of formats
It is important to distinguish between a file format and a codec. A codec performs the encoding and decoding of the raw audio data while the data itself is stored in a file with a specific audio file format. Though most audio file formats support only one audio codec, a file format may support multiple codecs, as AVI does.
There are three major groups of audio file formats:
- Uncompressed audio formats, such as WAV, AIFF and AU;
- formats with lossless compression, such as FLAC, Monkey's Audio (filename extension APE), WavPack (filename extension WV), Shorten, Tom's lossless Audio Kompressor (TAK), TTA, Apple Lossless and lossless Windows Media Audio (WMA).
- formats with lossy compression, such as MP3, Vorbis, Musepack, lossy Windows Media Audio (WMA) and AAC.
Uncompressed audio format
There is one major uncompressed audio format, PCM, which is usually stored as a .wav on Windows or as .aiff on Mac OS.
WAV is a flexible file format designed to store more or less any combination of sampling rates or bitrates. This makes it an adequate file format for storing and archiving an original recording. A lossless compressed format would require more processing for the same time recorded, but would be more efficient in terms of space used.
WAV, like any other uncompressed format, encodes all sounds, whether they are complex sounds or absolute silence, with the same number of bits per unit of time.
As an example, a file containing a minute of playing by a symphonic orchestra would be the same size as a minute of absolute silence if they were both stored in WAV. If the files were encoded with a lossless compressed audio format, the first file would be marginally smaller, and the second file taking up almost no space at all. However, to encode the files to a lossless format would take significantly more time than encoding the files to the WAV format.
Recently some new lossless formats have been developed (for example TAK), which aim is to achieve very fast coding with good compression ratio.
The WAV format is based on the RIFF file format, which is similar to the IFF format.
BWF (Broadcast Wave Format) is a standard audio format created by the European Broadcasting Union as a successor to WAV.
BWF allows metadata to be stored in the file.
See European Broadcasting Union: Specification of the Broadcast Wave Format — A format for audio data files in broadcasting. EBU Technical document 3285, July 1997.
This format is the primary recording format used in many professional Audio Workstations used in the Television and Film industry. Stand-alone, file based, multi-track recorders from Sound Devices[1], Zaxcom[2], HHB USA[3], Fostex, and Aaton[4] all use BWF as their preferred file format for recording multi-track audio files with SMPTE Time Code reference. This standardized Time Stamp in the Broadcast Wave File allows for easy synchronization with a separate picture element.
Lossless audio formats
Lossless audio formats (such as the most widespread[5] FLAC, WavPack, Monkey's Audio) provide a compression ratio of about 2:1.
Free and open file formats
- wav – standard audio file container format used mainly in Windows PCs. Commonly used for storing uncompressed (PCM), CD-quality sound files, which means that they can be large in size — around 10 MB per minute. Wave files can also contain data encoded with a variety of codecs to reduce the file size (for example the GSM or mp3 codecs). Wav files use a RIFF structure.
- ogg – a free, open source container format supporting a variety of codecs, the most popular of which is the audio codec Vorbis. Vorbis offers better compression than MP3 but is less popular.
- mpc - Musepack or MPC (formerly known as MPEGplus, MPEG+ or MP+) is an open source lossy audio codec, specifically optimized for transparent compression of stereo audio at bitrates of 160–180 kbit/s. Musepack and Ogg Vorbis are rated as the two best available codecs for high-quality lossy audio compression in many double-blind listening tests. Nevertheless, Musepack is even less popular than Ogg Vorbis and nowadays is used mainly by the audiophiles.
- flac – a lossless compression codec. This format is a lossless compression as like zip but for audio. If you compress a PCM file to flac and then restore it again it will be a perfect copy of the original. (All the other codecs discussed here are lossy which means a small part of the quality is lost). The cost of this losslessness is that the compression ratio is not good. Flac is recommended for archiving PCM files where quality is important (e.g. broadcast or music use).
- aiff – the standard audio file format used by Apple. It is like a wav file for the Mac.
- raw – a raw file can contain audio in any codec but is usually used with PCM audio data. It is rarely used except for technical tests.
- au – the standard audio file format used by Sun, Unix and Java. The audio in au files can be PCM or compressed with the μ-law, a-μlaw or G729 codecs.
Open file formats
- gsm – designed for telephony use in Europe, gsm is a very practical format for telephone quality voice. It makes a good compromise between file size and quality. Note that wav files can also be encoded with the gsm codec.
- dct – A variable codec format designed for dictation. It has dictation header information and can be encrypted (often required by medical confidentiality laws).
- vox – the vox format most commonly uses the Dialogic ADPCM (Adaptive Differential Pulse Code Modulation) codec. Similar to other ADPCM formats, it compresses to 4-bits. Vox format files are similar to wave files except that the vox files contain no information about the file itself so the codec sample rate and number of channels must first be specified in order to play a vox file.
- aac – the Advanced Audio Coding format is based on the MPEG2 and MPEG4 standards. aac files are usually ADTS or ADIF containers.
- mp4/m4a – MPEG-4 audio most often AAC but sometimes MP2/MP3
Proprietary formats
- mp3 – the MPEG Layer-3 format is the most popular format for downloading and storing music. By eliminating portions of the audio file that are essentially inaudible, mp3 files are compressed to roughly one-tenth the size of an equivalent PCM file while maintaining good audio quality.
- wma – the popular Windows Media Audio format owned by Microsoft. Designed with Digital Rights Management (DRM) abilities for copy protection.
- atrac (.wav) – the older style Sony ATRAC format. It always has a .wav file extension. To open these files simply install the ATRAC3 drivers.
- ra – a Real Audio format designed for streaming audio over the Internet. The .ra format allows files to be stored in a self-contained fashion on a computer, with all of the audio data contained inside the file itself.
- ram – a text file that contains a link to the Internet address where the Real Audio file is stored. The .ram file contains no audio data itself.
- dss – Digital Speech Standard files are an Olympus proprietary format. It is a fairly old and poor codec. Prefer gsm or mp3 where the recorder allows. It allows additional data to be held in the file header.
- msv – a Sony proprietary format for Memory Stick compressed voice files.
- dvf – a Sony proprietary format for compressed voice files; commonly used by Sony dictation recorders.
- mp4 – A proprietary version of AAC in MP4 with Digital Rights Management developed by Apple for use in music downloaded from their iTunes Music Store.
- iKlax – An iKlax Media proprietary format, the iKlax format is a multi-track digital audio format allowing various actions on musical data, for instance on mixing and volumes arrangements.
See also
References
External links
- WikiRecording's Guide to Audio File Formats for Recording
- WikiRecording's Sound Designer II (SDII) File Format Article
Ext. | Description |
669 | Composer 669 module |
669 | UNIS Composer module |
AIFC | Compressed Audio Interchange Format File |
AIFF | Audio Interchange Format File . |
AIS | Velvet Studio Instrument |
Akai sampler disk and file formats ., . | |
AKP | Akai S5000/S6000 Program File . |
ALAW | Raw A-law data |
AMS | Extreme Tracker Module |
AMS | Velvet Studio Module |
APEX | AVM Sample Studio bank |
ASE | Velvet Studio Sample |
ASF | Microsoft Advanced Streaming Format . |
ASX | Microsoft Advanced Streaming Format Metafile . |
AU | Sun/Next Audio File (linear m-law or A-law) |
AVI | Microsoft Audio Video Interleave File |
AVR | Audio Visual Research sound file |
C01 | Typhoon wave file |
CDA | CD Audio Track |
CDR | Raw Audio-CD data |
CMF | Creative Labs Music File |
DCM | DCM Module |
DEWF | Macintosh SoundCap/SoundEdit recorder instrument |
DF2 | Defractor 2 Extended Instrument |
DFC | Defractor Instrument |
DIG | Digilink format |
DIG | Sound Designer I audio |
DLS | Downloadable Sounds . |
DMF | Delusion Digital Music File |
DSF | Delusion Digital Sound File |
DSM | Digital Sound module |
DSP | Dynamic Studio Professional module |
DTM | DigiTrekker module |
DWD | DiamondWare Digitized audio |
EDA | Ensoniq ASR disk image |
EDE | Ensoniq EPS disk image |
EDK | Ensoniq KT disk image |
EDQ | Ensoniq SQ1/SQ2/KS32 disk image |
EDS | Ensoniq SQ80 disk image |
EDV | Ensoniq VFX-SD disk image |
EFA | Ensoniq ASR file |
EFE | Ensoniq EPS family instrument |
EFK | Ensoniq KT file |
EFQ | Ensoniq SQ1/SQ2/KS32 file |
EFS | Ensoniq SQ80 file |
EFV | Ensoniq VFX-SD file |
EMB | Everest embedded bank file |
EMD | ABT Extended module |
ESPS | ESPS audio file |
EUI | Ensoniq EPS family compacted disk image |
F32 | Raw 32-bit IEEE floating point waveform values |
F64 | Raw 64-bit IEEE floating point waveform values |
F2R | Farandoyle Linear module |
F3R | Farandoyle Blocked Linear module |
FAR | Farandoyle Composer module |
FFF | Gravis UltraSound PnP bank |
FSM | Farandoyle Composer WaveSample |
FZB | Casio FZ-1 Bank dump |
FZF | Casio FZ-1 Full dump |
FZV | Casio FZ-1 Voice dump |
G721 | Raw CCITT G.721 4bit ADPCM format data |
G723 | Raw CCITT G.723 or 5bit ADPCM format data |
G726 | Raw CCITT G.726 2, 3, 4 or 5bit ADPCM format data |
GIG | GigaSampler file |
GKH | Ensoniq EPS (VFX, SD, EPS, ASR, TS) family disk image |
GSM | Raw GSM 6.10 audio stream or raw 'byte aligned' GSM 6.10 audio stream |
GSM | US Robotics voice modems GSM QuickLink/VoiceGuide/RapidComm |
IFF | Interchange Format File |
INI | Gravis UltraSound bank setup extract plus patch files |
INS | Ensoniq instrument |
INS | Sample Cell/II instrument |
IT | Impulse Tracker module |
ITI | Impulse Tracker instrument |
ITS | Impulse Tracker sample |
K25 | Kurzweil K2500 (identicle to KRZ) |
K26 | Kurzweil K2600 (identicle to KRZ) |
KMP | Korg Trinity KeyMap |
KRZ | Kurzweil K2000 |
KSC | Korg Trinity Script |
KSF | Korg Trinity Sample File |
MAT | Matlab variables binary |
MED | MED/OctaMED module |
MID | Standard MIDI song/track information ., . |
MOD | Amiga SoundTracker / Protracker / NoiseTracker / Fastracker / Startrekker / TakeTracker module |
MPEG | MPEG-1 (Moving Picture Experts Group) Audio Layer I, II and III compressed audio |
MP2 | |
MP3 | |
MT2 | MadTracker 2 module |
MTE | MadTracker 2 Envelope |
MTI | MadTracker 2 instrument |
MTM | MultiTracker module |
MTP | MadTracker 2 Pattern |
MTS | MadTracker 2 Sample |
MTX | MadTracker 2 Extension |
MWS | MWave DSP synth's instrument extract |
NST | NoiseTracker Module |
OKT | Oktalizer module |
PAC | SBStudio II Package or Song . |
PAT | Advanced Gravis Ultrasound / Forte tech .patch |
PBF | Turtle Beach Pinnacle Bank File |
PRG | Akai MPC2000 Program File ., WAVmaker program |
PHY | PhyMod Physical Modeling data |
PSM | Protracker Studio module |
PTM | PolyTracker module |
RA | RealNetworks RealAudio compressed streaming data |
RAM | RealNetworks RealAudio Metafile |
RAW | PCM signed raw audio |
RBS | Propellerhead's Rebirth Song File |
RMF | Beatnik's multimedia Rich Music Format |
ROL | Adlib Synthesized Instrument Music file |
RTI | RealTracker instrument |
RTM | RealTracker module |
RTS | RealTracker sample |
S3I | Scream Tracker v3 instrument |
S3M | Scream Tracker v3 module |
SAM | MODEDIT sample file |
SB | Raw signed PCM 8bit data |
SBK | Emu Systems SoundFont Bank patch collection |
SBI | SoundBlaster Instrument |
SD | Sound Designer I audio |
SD2 | Sound Designer II flattened audio or data fork |
SDK | Roland S-550/S-50/W-30 Disk Image |
SDS | MIDI Sample Dump Standard . |
SDX | Sample Dump Exchange |
SF | IRCAM Sound File" |
SF2 | Emu Systems SoundFont v2.0 patch collection . |
SMP | SampleVision audio, AdLib Gold Sample |
SND | Akai MPC-series sample ., ., PCM unsigned raw audio, NeXT Sound, Macintosh Sound Resource |
SOU | SBStudio II audio |
SPPACK | SPPack sound sample |
STM | Scream Tracker Module 1 & 2 |
STX | Scream Tracker Module |
SW | Raw signed PCM 16bit data |
SYX | Raw MIDI System Exclusive message(s) |
SYH | Synchomatic Instrument |
SYW | Yamaha SY-85/SY-99 Wave audio |
TD0 | Akai Teledisk Sound Library . |
TXT | ASCII text parameter description |
TXT | ASCII text formatted audio data |
TXW | Yamaha TX-16W Wave audio |
UB | Raw unsigned PCM 8bit data |
ULAW | Raw m-law (CCITT G.711) data |
ULT | UltraTracker module |
UNI | UNIMOD module |
UW | Raw unsigned PCM 16bit data |
UWF | UltraTracker WaveSample |
VOC | Creative Labs audio |
VMD | Convox Raw sample |
VMF | Convox SpeechThing / Voice Master sample |
VOX | Dialogic ADPCM audio |
W01 | Yamaha TX16W or SY-series wave |
WAV | Microsoft Windows RIFF WAVE ., . |
WFB | Turtle Beach WaveFront bank |
WFD | Turtle Beach WaveFront drumkit |
WFP | Turtle Beach WaveFront program |
WOW | Grave Composer module |
XI | Fastracker 2.0 instrument |
XM | Fastracker 2.0 module |
XP | Fastracker 2.0 pattern |
XT | Fastracker 2.0 track |
An audio format is a medium for storing sound and music. The term is applied to both the physical recording media and the recording formats of the audio content – in computer science it is often limited to the audio file format, but its wider use usually refers to the physical method used to store the data.
Music is recorded and distributed using a variety of audio formats, some of which store additional information.
[edit] Timeline of audio format developments
Year | Media formats | Recording formats |
---|---|---|
1877 | Phonograph cylinder | Mechanical analog; "hill-and-dale" grooves, vertical stylus motion |
1883 | Music roll | Mechanical digital (automated musical instruments) |
1895 | Gramophone record | Mechanical analog; lateral grooves, horizontal stylus motion |
1898 | Wire recording | Analog; magnetization; no "bias" |
1925 | Electrical cut record | Mechanical analog; electrically cut from amplified microphone signal, lateral grooves, horizontal stylus motion, discs at 7", 10", 12", most at 78 rpm |
1930s | Reel-to-Reel, Magnetic Tape | Analog; magnetization; "bias" dramatically increases linearity/fidelity, tape speed at 30 ips, later 15 ips with NAB equalization; refined speeds: 7 1/2 ips, 3 3/4 ips, 1 7/8 ips |
1930s | Electrical transcriptions | Mechanical analog; electrically cut from amplified microphone signal, high fidelity sound, lateral or vertical grooves, horizontal or vertical stylus motion, most discs 16" at 33 1/3 rpm |
1948 (Commercial release) | Vinyl Record | Analog, with preemphasis and other equalization techniques (LP, RIAA); lateral grooves, horizontal stylus motion; discs at 7" (most 45 rpm), 10" and 12" (most 33 1/3 rpm) |
1957 | Stereophonic Vinyl Record | Analog, with preemphasis and other equalization techniques. Combination lateral/vertical stylus motion with each channel encoded 45 degrees to the vertical. |
1962 | 4-Track (Stereo-Pak) | Analog, 1/4 inch wide tape, 3 3/4 inches/sec, endless loop cartridge. |
1963 | Compact Cassette | Analog, with bias, preemphasis, 0.15 inch wide tape, 17/8 inches/sec. 1970: introduced Dolby noise reduction. |
1965 | 8-Track (Stereo-8) | Analog, 1/4 inch wide tape, 3 3/4 inches/sec, endless loop cartridge. |
1969 | Microcassette | Analog, 1/8 inch wide tape, used generally for notetaking, mostly mono, some stereo. 2.4 cm/s or 1.2 cm/s. |
1969 | Minicassette | Analog, 1/8 inch wide tape, used generally for notetaking, 1.2 cm/s |
1970 | Quadraphonic 8-Track (Quad-8) (Q8) | Analog, 1/4 inch wide tape, 3 3/4 inches/sec, 4 Channel Stereo, endless loop cartridge. |
1971 | Quadraphonic Vinyl Record (CD-4) (SQ Matrix) | |
1975 | Betamax Digital Audio | 'Dolby Stereo' cinema surround sound |
1976 | Elcaset | |
1978 | Laserdisc | |
1982 | Compact Disc (CD-DA) | PCM |
1985 | Audio Interchange File Format (AIFF) | |
1985 | Sound Designer (by Digidesign) | |
1987 | Digital Audio Tape (DAT) | |
1991 | MiniDisc (MD) | ATRAC |
1992 | Digital Compact Cassette (DCC) | |
1992 | WAVEform (WAV)
Dolby Digital surround cinema sound | |
1993 | Digital Theatre System (DTS)
Sony Dynamic Digital Sound (SDDS) | |
1995 | MP3 | |
1997 | DVD | Dolby Digital |
1997 | DTS-CD | DTS Audio |
1999 | DVD-Audio | |
1999 | Super Audio CD (SACD) | |
1999 | Windows Media Audio (WMA) | |
1999 | The True Audio Lossless Codec (TTA) | |
2000 | Free Lossless Audio Codec (FLAC) | |
2001 | Advanced audio coding (AAC) | |
2002 | Ogg Vorbis | |
2003 | DualDisc | |
2004 | Apple Lossless (ALE or ALAC) | |
2005 | HD DVD | |
2005 | OggPCM | |
2006 | Blu-Ray |
Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are implemented in computer software as audio codecs. Generic data compression algorithms perform poorly with audio data, seldom reducing file sizes much below 87% of the original, and are not designed for use in real time. Consequently, specific audio "lossless" and "lossy" algorithms have been created. Lossy algorithms provide far greater compression ratios and are used in mainstream consumer audio devices.
As with image compression, both lossy and lossless compression algorithms are used in audio compression, lossy being the most common for everyday use. In both lossy and lossless compression, information redundancy is reduced, using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to describe the data.
The trade-off of slightly reduced audio quality is clearly outweighed for most practical audio applications where users cannot perceive any difference and space requirements are substantially reduced. For example, on one CD, one can fit an hour of high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in MP3 format at medium bit rates.
Lossless audio compression
Lossless audio compression allows one to preserve an exact copy of one's audio files, in contrast to the irreversible changes from lossy compression techniques such as Vorbis and MP3. Compression ratios are similar to those for generic lossless data compression (around 50–60% of original size), and substantially less than for lossy compression (which typically yield 5–20% of original size).
[edit] Use
The primary use of lossless encoding are:
- Archives
- For archival purposes, one naturally wishes to maximize quality.
- Editing
- Editing lossily compressed data leads to digital generation loss, since the decoding and re-encoding introduce artifacts at each generation. Thus audio engineers use lossless compression.
- Audio quality
- Being lossless, these formats completely avoid compression artifacts. Audiophiles thus favor lossless compression.
A specific application is to store lossless copies of audio, and then produce lossily compressed versions for a digital audio player. As formats and encoders improve, one can produce updated lossily compressed files from the lossless master.
As file storage and communications bandwidth have become less expensive and more available, lossless audio compression has become more popular.
[edit] Formats
Shorten was an early lossless format; newer ones include Free Lossless Audio Codec (FLAC), Apple's Apple Lossless, MPEG-4 ALS, Monkey's Audio, and TTA.
Some audio formats feature a combination of a lossy format and a lossless correction; this allows stripping the correction to easily obtain a lossy file. Such formats include MPEG-4 SLS (Scalable to Lossless), WavPack, and OptimFROG DualStream.
Some formats are associated with a technology, such as:
- Direct Stream Transfer, used in Super Audio CD
- Meridian Lossless Packing, used in DVD-Audio, Dolby TrueHD, Blu-ray and HD DVD
[edit] Difficulties in lossless compression of audio data
It is difficult to maintain all the data in an audio stream and achieve substantial compression. First, the vast majority of sound recordings are highly complex, recorded from the real world. As one of the key methods of compression is to find patterns and repetition, more chaotic data such as audio doesn't compress well. In a similar manner, photographs compress less efficiently with lossless methods than simpler computer-generated images do. But interestingly, even computer generated sounds can contain very complicated waveforms that present a challenge to many compression algorithms. This is due to the nature of audio waveforms, which are generally difficult to simplify without a (necessarily lossy) conversion to frequency information, as performed by the human ear.
The second reason is that values of audio samples change very quickly, so generic data compression algorithms don't work well for audio, and strings of consecutive bytes don't generally appear very often. However, convolution with the filter [-1 1] (that is, taking the first difference) tends to slightly whiten (decorrelate, make flat) the spectrum, thereby allowing traditional lossless compression at the encoder to do its job; integration at the decoder restores the original signal. Codecs such as FLAC, Shorten and TTA use linear prediction to estimate the spectrum of the signal. At the encoder, the estimator's inverse is used to whiten the signal by removing spectral peaks while the estimator is used to reconstruct the original signal at the decoder.
[edit] Evaluation criteria
Lossless audio codecs have no quality issues, so the usability can be estimated by
- Speed of compression and decompression
- Degree of compression
- Software and hardware support
- Robustness and error correction
[edit] Lossy audio compression
Lossy audio compression is used in an extremely wide range of applications. In addition to the direct applications (mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television; streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of the original stream, rather than 50 percent to 60 percent), by discarding less-critical data.
The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are coded with decreased accuracy or not coded at all.
While removing or reducing these 'unhearable' sounds may account for a small percentage of bits saved in lossy compression, the real savings comes from a complementary phenomenon: noise shaping. Reducing the number of bits used to code a signal increases the amount of noise in that signal. In psychoacoustics-based lossy compression, the real key is to 'hide' the noise generated by the bit savings in areas of the audio stream that cannot be perceived. This is done by, for instance, using very small numbers of bits to code the high frequencies of most signals - not because the signal has little high frequency information (though this is also often true as well), but rather because the human ear can only perceive very loud signals in this region, so that softer sounds 'hidden' there simply aren't heard.
If reducing perceptual redundancy does not achieve sufficient compression for a particular application, it may require further lossy compression. Depending on the audio source, this still may not produce perceptible differences. Speech for example can be compressed far more than music. Most lossy compression schemes allow compression parameters to be adjusted to achieve a target rate of data, usually expressed as a bit rate. Again, the data reduction will be guided by some model of how important the sound is as perceived by the human ear, with the goal of efficiency and optimized quality for the target data rate. (There are many different models used for this perceptual analysis, some better suited to different types of audio than others.) Hence, depending on the bandwidth and storage requirements, the use of lossy compression may result in a perceived reduction of the audio quality that ranges from none to severe, but generally an obviously audible reduction in quality is unacceptable to listeners.
Because data is removed during lossy compression and cannot be recovered by decompression, some people may not prefer lossy compression for archival storage. Hence, as noted, even those who use lossy compression (for portable audio applications, for example) may wish to keep a losslessly compressed archive for other applications. In addition, the technology of compression continues to advance, and achieving a state-of-the-art lossy compression would require one to begin again with the lossless, original audio data and compress with the new lossy codec. The nature of lossy compression (for both audio and images) results in increasing degradation of quality if data are decompressed, then recompressed using lossy compression.
[edit] History
A large variety of real, working audio coding systems were published in a collection in the IEEE Journal on Selected Areas in Communications (JSAC), February 1988. While there were some papers from before that time, this compendium of papers documented an entire variety of finished, working audio coders, nearly all of them using perceptual (i.e. masking) techniques and some kind of frequency analysis and back-end noiseless coding.[1] Several of these papers remarked on the difficulty of obtaining good, clean digital audio for research purposes. Most, if not all, of the authors in the JSAC edition were also active in the MPEG-1 Audio committee.
The world's first commercial broadcast automation audio compression system was developed by Oscar Bonello, an Engineering professor at the University of Buenos Aires.[2] In 1983, using the psychoacoustic principle of the masking of critical bands first published in 1967,[3] he started developing a practical application based on the recently developed IBM PC computer, and the broadcast automation system was launched in 1987 under the name Audicom. 20 years later, almost all the radio stations in the world were using similar technology, manufactured by a number of companies.
[edit] Coding methods
[edit] Transform domain methods
In order to determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components is determined by first calculating a masking threshold, below which it is estimated that sounds will be beyond the limits of human perception.
The masking threshold is calculated using the absolute threshold of hearing and the principles of simultaneous masking - the phenomenon wherein a signal is masked by another signal separated by frequency - and, in some cases, temporal masking - where a signal is masked by another signal separated by time. Equal-loudness contours may also be used to weight the perceptual importance of different components. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models.
[edit] Time domain methods
Other types of lossy compressors, such as the linear predictive coding (LPC) used with speech, are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) prior to quantization. LPC may also be thought of as a basic perceptual coding technique; reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal, partially masking it.
[edit] Applications
Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (digital generation loss). This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3), as a megabyte can store about a minute's worth of music at adequate quality.
[edit] Usability
Usability of lossy audio codecs is determined by:
- Perceived audio quality
- Compression factor
- Speed of compression and decompression
- Inherent latency of algorithm (critical for real-time streaming applications; see below)
- Software and hardware support
Lossy formats are often used for the distribution of streaming audio, or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen.
Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time in order to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.
In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples which must be analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed in order to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication).
[edit] Speech encoding
Speech encoding is an important category of audio data compression. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using relatively low bit rates.
This is accomplished, in general, by some combination of two approaches:
- Only encoding sounds that could be made by a single human voice.
- Throwing away more of the data in the signal -- keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing.
Perhaps the earliest algorithms used in speech encoding (and audio data compression in general) were the A-law algorithm and the µ-law algorithm.
[edit] Glossary
- ABR
- Average bitrate
- CBR
- Constant bitrate
- VBR
- Variable bitrate
[edit] References
- ^ Journal on Selected Areas in Communications, February 1988
- ^ Solidyne... 40 years of innovation
- ^ The Ear as a Communication Receiver. English translation of Das Ohr als Nachrichtenempfänger by Eberhard Zwicker and Richard Feldtkeller. Translated from German by Hannes Müsch, Søren Buus, and Mary Florentine. Originally published in 1967; Translation published in 1999
[edit] See also
'연구하는 인생 > ♣COMPUTER' 카테고리의 다른 글
keyboard shortcuts (0) | 2008.06.11 |
---|---|
Container format (digital) (0) | 2008.06.07 |
MIDI - 위키백과 (0) | 2008.06.07 |
Rip CDs with Windows Media Player 11 (0) | 2008.05.21 |
Rip CDs with Windows Media Player 10 (0) | 2008.05.21 |