Audio file format

연구하는 인생/♣COMPUTER

Audio file format

hanngill 2008. 6. 7. 20:31

http://blog.daum.net/hanngil

AUDIO FILE FORMAT

- From Wikipedia

An audio file format is a container format for storing audio data on a computer system.

The general approach towards storing digital audio is to sample the audio voltage which, on playback, would correspond to a certain position of the membrane in a speaker of the individual channels with a certain resolution — the number of bits per sample — in regular intervals (forming the sample rate). This data can then be stored uncompressed or compressed to reduce the file size.

Types of formats

It is important to distinguish between a file format and a codec. A codec performs the encoding and decoding of the raw audio data while the data itself is stored in a file with a specific audio file format. Though most audio file formats support only one audio codec, a file format may support multiple codecs, as AVI does.

There are three major groups of audio file formats:

Uncompressed audio formats, such as WAV, AIFF and AU;
formats with lossless compression, such as FLAC, Monkey's Audio (filename extension APE), WavPack (filename extension WV), Shorten, Tom's lossless Audio Kompressor (TAK), TTA, Apple Lossless and lossless Windows Media Audio (WMA).
formats with lossy compression, such as MP3, Vorbis, Musepack, lossy Windows Media Audio (WMA) and AAC.

Uncompressed audio format

There is one major uncompressed audio format, PCM, which is usually stored as a .wav on Windows or as .aiff on Mac OS.

WAV is a flexible file format designed to store more or less any combination of sampling rates or bitrates. This makes it an adequate file format for storing and archiving an original recording. A lossless compressed format would require more processing for the same time recorded, but would be more efficient in terms of space used.

WAV, like any other uncompressed format, encodes all sounds, whether they are complex sounds or absolute silence, with the same number of bits per unit of time.

As an example, a file containing a minute of playing by a symphonic orchestra would be the same size as a minute of absolute silence if they were both stored in WAV. If the files were encoded with a lossless compressed audio format, the first file would be marginally smaller, and the second file taking up almost no space at all. However, to encode the files to a lossless format would take significantly more time than encoding the files to the WAV format.

Recently some new lossless formats have been developed (for example TAK), which aim is to achieve very fast coding with good compression ratio.

The WAV format is based on the RIFF file format, which is similar to the IFF format.

BWF (Broadcast Wave Format) is a standard audio format created by the European Broadcasting Union as a successor to WAV.

BWF allows metadata to be stored in the file.

See European Broadcasting Union: Specification of the Broadcast Wave Format — A format for audio data files in broadcasting. EBU Technical document 3285, July 1997.

This format is the primary recording format used in many professional Audio Workstations used in the Television and Film industry. Stand-alone, file based, multi-track recorders from Sound Devices^[1], Zaxcom^[2], HHB USA^[3], Fostex, and Aaton^[4] all use BWF as their preferred file format for recording multi-track audio files with SMPTE Time Code reference. This standardized Time Stamp in the Broadcast Wave File allows for easy synchronization with a separate picture element.

Lossless audio formats

Lossless audio formats (such as the most widespread^[5] FLAC, WavPack, Monkey's Audio) provide a compression ratio of about 2:1.

Free and open file formats

wav – standard audio file container format used mainly in Windows PCs. Commonly used for storing uncompressed (PCM), CD-quality sound files, which means that they can be large in size — around 10 MB per minute. Wave files can also contain data encoded with a variety of codecs to reduce the file size (for example the GSM or mp3 codecs). Wav files use a RIFF structure.
ogg – a free, open source container format supporting a variety of codecs, the most popular of which is the audio codec Vorbis. Vorbis offers better compression than MP3 but is less popular.
mpc - Musepack or MPC (formerly known as MPEGplus, MPEG+ or MP+) is an open source lossy audio codec, specifically optimized for transparent compression of stereo audio at bitrates of 160–180 kbit/s. Musepack and Ogg Vorbis are rated as the two best available codecs for high-quality lossy audio compression in many double-blind listening tests. Nevertheless, Musepack is even less popular than Ogg Vorbis and nowadays is used mainly by the audiophiles.
flac – a lossless compression codec. This format is a lossless compression as like zip but for audio. If you compress a PCM file to flac and then restore it again it will be a perfect copy of the original. (All the other codecs discussed here are lossy which means a small part of the quality is lost). The cost of this losslessness is that the compression ratio is not good. Flac is recommended for archiving PCM files where quality is important (e.g. broadcast or music use).
aiff – the standard audio file format used by Apple. It is like a wav file for the Mac.
raw – a raw file can contain audio in any codec but is usually used with PCM audio data. It is rarely used except for technical tests.
au – the standard audio file format used by Sun, Unix and Java. The audio in au files can be PCM or compressed with the μ-law, a-μlaw or G729 codecs.

Open file formats

gsm – designed for telephony use in Europe, gsm is a very practical format for telephone quality voice. It makes a good compromise between file size and quality. Note that wav files can also be encoded with the gsm codec.
dct – A variable codec format designed for dictation. It has dictation header information and can be encrypted (often required by medical confidentiality laws).
vox – the vox format most commonly uses the Dialogic ADPCM (Adaptive Differential Pulse Code Modulation) codec. Similar to other ADPCM formats, it compresses to 4-bits. Vox format files are similar to wave files except that the vox files contain no information about the file itself so the codec sample rate and number of channels must first be specified in order to play a vox file.
aac – the Advanced Audio Coding format is based on the MPEG2 and MPEG4 standards. aac files are usually ADTS or ADIF containers.
mp4/m4a – MPEG-4 audio most often AAC but sometimes MP2/MP3

Proprietary formats

mp3 – the MPEG Layer-3 format is the most popular format for downloading and storing music. By eliminating portions of the audio file that are essentially inaudible, mp3 files are compressed to roughly one-tenth the size of an equivalent PCM file while maintaining good audio quality.
wma – the popular Windows Media Audio format owned by Microsoft. Designed with Digital Rights Management (DRM) abilities for copy protection.
atrac (.wav) – the older style Sony ATRAC format. It always has a .wav file extension. To open these files simply install the ATRAC3 drivers.
ra – a Real Audio format designed for streaming audio over the Internet. The .ra format allows files to be stored in a self-contained fashion on a computer, with all of the audio data contained inside the file itself.
ram – a text file that contains a link to the Internet address where the Real Audio file is stored. The .ram file contains no audio data itself.
dss – Digital Speech Standard files are an Olympus proprietary format. It is a fairly old and poor codec. Prefer gsm or mp3 where the recorder allows. It allows additional data to be held in the file header.
msv – a Sony proprietary format for Memory Stick compressed voice files.
dvf – a Sony proprietary format for compressed voice files; commonly used by Sony dictation recorders.
mp4 – A proprietary version of AAC in MP4 with Digital Rights Management developed by Apple for use in music downloaded from their iTunes Music Store.
iKlax – An iKlax Media proprietary format, the iKlax format is a multi-track digital audio format allowing various actions on musical data, for instance on mixing and volumes arrangements.

References

^ Sound Devices, LLC | Professional, Portable Audio Products

^ Zaxcom

^ HHB USA | Essential Tools for Audio Professionals

^ http://aaton.com

^ 2008 ripping/encoding general poll - Hydrogenaudio Forums

External links

Retrieved from "http://en.wikipedia.org/wiki/Audio_file_format"

The list of audio file formats

*Ext.*	*Description*
669	Composer 669 module
669	UNIS Composer module
AIFC	Compressed Audio Interchange Format File
AIFF	Audio Interchange Format File .
AIS	Velvet Studio Instrument
	Akai sampler disk and file formats ., .
AKP	Akai S5000/S6000 Program File .
ALAW	Raw A-law data
AMS	Extreme Tracker Module
AMS	Velvet Studio Module
APEX	AVM Sample Studio bank
ASE	Velvet Studio Sample
ASF	Microsoft Advanced Streaming Format .
ASX	Microsoft Advanced Streaming Format Metafile .
AU	Sun/Next Audio File (linear m-law or A-law)
AVI	Microsoft Audio Video Interleave File
AVR	Audio Visual Research sound file
C01	Typhoon wave file
CDA	CD Audio Track
CDR	Raw Audio-CD data
CMF	Creative Labs Music File
DCM	DCM Module
DEWF	Macintosh SoundCap/SoundEdit recorder instrument
DF2	Defractor 2 Extended Instrument
DFC	Defractor Instrument
DIG	Digilink format
DIG	Sound Designer I audio
DLS	Downloadable Sounds .
DMF	Delusion Digital Music File
DSF	Delusion Digital Sound File
DSM	Digital Sound module
DSP	Dynamic Studio Professional module
DTM	DigiTrekker module
DWD	DiamondWare Digitized audio
EDA	Ensoniq ASR disk image
EDE	Ensoniq EPS disk image
EDK	Ensoniq KT disk image
EDQ	Ensoniq SQ1/SQ2/KS32 disk image
EDS	Ensoniq SQ80 disk image
EDV	Ensoniq VFX-SD disk image
EFA	Ensoniq ASR file
EFE	Ensoniq EPS family instrument
EFK	Ensoniq KT file
EFQ	Ensoniq SQ1/SQ2/KS32 file
EFS	Ensoniq SQ80 file
EFV	Ensoniq VFX-SD file
EMB	Everest embedded bank file
EMD	ABT Extended module
ESPS	ESPS audio file
EUI	Ensoniq EPS family compacted disk image
F32	Raw 32-bit IEEE floating point waveform values
F64	Raw 64-bit IEEE floating point waveform values
F2R	Farandoyle Linear module
F3R	Farandoyle Blocked Linear module
FAR	Farandoyle Composer module
FFF	Gravis UltraSound PnP bank
FSM	Farandoyle Composer WaveSample
FZB	Casio FZ-1 Bank dump
FZF	Casio FZ-1 Full dump
FZV	Casio FZ-1 Voice dump
G721	Raw CCITT G.721 4bit ADPCM format data
G723	Raw CCITT G.723 or 5bit ADPCM format data
G726	Raw CCITT G.726 2, 3, 4 or 5bit ADPCM format data
GIG	GigaSampler file
GKH	Ensoniq EPS (VFX, SD, EPS, ASR, TS) family disk image
GSM	Raw GSM 6.10 audio stream or raw 'byte aligned' GSM 6.10 audio stream
GSM	US Robotics voice modems GSM QuickLink/VoiceGuide/RapidComm
IFF	Interchange Format File
INI	Gravis UltraSound bank setup extract plus patch files
INS	Ensoniq instrument
INS	Sample Cell/II instrument
IT	Impulse Tracker module
ITI	Impulse Tracker instrument
ITS	Impulse Tracker sample
K25	Kurzweil K2500 (identicle to KRZ)
K26	Kurzweil K2600 (identicle to KRZ)
KMP	Korg Trinity KeyMap
KRZ	Kurzweil K2000
KSC	Korg Trinity Script
KSF	Korg Trinity Sample File
MAT	Matlab variables binary
MED	MED/OctaMED module
MID	Standard MIDI song/track information ., .
MOD	Amiga SoundTracker / Protracker / NoiseTracker / Fastracker / Startrekker / TakeTracker module
MPEG	MPEG-1 (Moving Picture Experts Group) Audio Layer I, II and III compressed audio
MP2
MP3
MT2	MadTracker 2 module
MTE	MadTracker 2 Envelope
MTI	MadTracker 2 instrument
MTM	MultiTracker module
MTP	MadTracker 2 Pattern
MTS	MadTracker 2 Sample
MTX	MadTracker 2 Extension
MWS	MWave DSP synth's instrument extract
NST	NoiseTracker Module
OKT	Oktalizer module
PAC	SBStudio II Package or Song .
PAT	Advanced Gravis Ultrasound / Forte tech .patch
PBF	Turtle Beach Pinnacle Bank File
PRG	Akai MPC2000 Program File ., WAVmaker program
PHY	PhyMod Physical Modeling data
PSM	Protracker Studio module
PTM	PolyTracker module
RA	RealNetworks RealAudio compressed streaming data
RAM	RealNetworks RealAudio Metafile
RAW	PCM signed raw audio
RBS	Propellerhead's Rebirth Song File
RMF	Beatnik's multimedia Rich Music Format
ROL	Adlib Synthesized Instrument Music file
RTI	RealTracker instrument
RTM	RealTracker module
RTS	RealTracker sample
S3I	Scream Tracker v3 instrument
S3M	Scream Tracker v3 module
SAM	MODEDIT sample file
SB	Raw signed PCM 8bit data
SBK	Emu Systems SoundFont Bank patch collection
SBI	SoundBlaster Instrument
SD	Sound Designer I audio
SD2	Sound Designer II flattened audio or data fork
SDK	Roland S-550/S-50/W-30 Disk Image
SDS	MIDI Sample Dump Standard .
SDX	Sample Dump Exchange
SF	IRCAM Sound File"
SF2	Emu Systems SoundFont v2.0 patch collection .
SMP	SampleVision audio, AdLib Gold Sample
SND	Akai MPC-series sample ., ., PCM unsigned raw audio, NeXT Sound, Macintosh Sound Resource
SOU	SBStudio II audio
SPPACK	SPPack sound sample
STM	Scream Tracker Module 1 & 2
STX	Scream Tracker Module
SW	Raw signed PCM 16bit data
SYX	Raw MIDI System Exclusive message(s)
SYH	Synchomatic Instrument
SYW	Yamaha SY-85/SY-99 Wave audio
TD0	Akai Teledisk Sound Library .
TXT	ASCII text parameter description
TXT	ASCII text formatted audio data
TXW	Yamaha TX-16W Wave audio
UB	Raw unsigned PCM 8bit data
ULAW	Raw m-law (CCITT G.711) data
ULT	UltraTracker module
UNI	UNIMOD module
UW	Raw unsigned PCM 16bit data
UWF	UltraTracker WaveSample
VOC	Creative Labs audio
VMD	Convox Raw sample
VMF	Convox SpeechThing / Voice Master sample
VOX	Dialogic ADPCM audio
W01	Yamaha TX16W or SY-series wave
WAV	Microsoft Windows RIFF WAVE ., .
WFB	Turtle Beach WaveFront bank
WFD	Turtle Beach WaveFront drumkit
WFP	Turtle Beach WaveFront program
WOW	Grave Composer module
XI	Fastracker 2.0 instrument
XM	Fastracker 2.0 module
XP	Fastracker 2.0 pattern
XT	Fastracker 2.0 track

An audio format is a medium for storing sound and music. The term is applied to both the physical recording media and the recording formats of the audio content – in computer science it is often limited to the audio file format, but its wider use usually refers to the physical method used to store the data.

Music is recorded and distributed using a variety of audio formats, some of which store additional information.

[edit] Timeline of audio format developments

Year	Media formats	Recording formats
1877	Phonograph cylinder	Mechanical analog; "hill-and-dale" grooves, vertical stylus motion
1883	Music roll	Mechanical digital (automated musical instruments)
1895	Gramophone record	Mechanical analog; lateral grooves, horizontal stylus motion
1898	Wire recording	Analog; magnetization; no "bias"
1925	Electrical cut record	Mechanical analog; electrically cut from amplified microphone signal, lateral grooves, horizontal stylus motion, discs at 7", 10", 12", most at 78 rpm
1930s	Reel-to-Reel, Magnetic Tape	Analog; magnetization; "bias" dramatically increases linearity/fidelity, tape speed at 30 ips, later 15 ips with NAB equalization; refined speeds: 7 1/2 ips, 3 3/4 ips, 1 7/8 ips
1930s	Electrical transcriptions	Mechanical analog; electrically cut from amplified microphone signal, high fidelity sound, lateral or vertical grooves, horizontal or vertical stylus motion, most discs 16" at 33 1/3 rpm
1948 (Commercial release)	Vinyl Record	Analog, with preemphasis and other equalization techniques (LP, RIAA); lateral grooves, horizontal stylus motion; discs at 7" (most 45 rpm), 10" and 12" (most 33 1/3 rpm)
1957	Stereophonic Vinyl Record	Analog, with preemphasis and other equalization techniques. Combination lateral/vertical stylus motion with each channel encoded 45 degrees to the vertical.
1962	4-Track (Stereo-Pak)	Analog, ¹/₄ inch wide tape, 3 ³/₄ inches/sec, endless loop cartridge.
1963	Compact Cassette	Analog, with bias, preemphasis, 0.15 inch wide tape, 1⁷/₈ inches/sec. 1970: introduced Dolby noise reduction.
1965	8-Track (Stereo-8)	Analog, ¹/₄ inch wide tape, 3 ³/₄ inches/sec, endless loop cartridge.
1969	Microcassette	Analog, ¹/₈ inch wide tape, used generally for notetaking, mostly mono, some stereo. 2.4 cm/s or 1.2 cm/s.
1969	Minicassette	Analog, ¹/₈ inch wide tape, used generally for notetaking, 1.2 ^cm/_s
1970	Quadraphonic 8-Track (Quad-8) (Q8)	Analog, ¹/₄ inch wide tape, 3 ³/₄ inches/sec, 4 Channel Stereo, endless loop cartridge.
1971	Quadraphonic Vinyl Record (CD-4) (SQ Matrix)
1975	Betamax Digital Audio	'Dolby Stereo' cinema surround sound
1976	Elcaset
1978	Laserdisc
1982	Compact Disc (CD-DA)	PCM
1985		Audio Interchange File Format (AIFF)
1985		Sound Designer (by Digidesign)
1987	Digital Audio Tape (DAT)
1991	MiniDisc (MD)	ATRAC
1992	Digital Compact Cassette (DCC)
1992		WAVEform (WAV) Dolby Digital surround cinema sound
1993		Digital Theatre System (DTS) Sony Dynamic Digital Sound (SDDS)
1995		MP3
1997	DVD	Dolby Digital
1997	DTS-CD	DTS Audio
1999	DVD-Audio
1999	Super Audio CD (SACD)
1999		Windows Media Audio (WMA)
1999		The True Audio Lossless Codec (TTA)
2000		Free Lossless Audio Codec (FLAC)
2001		Advanced audio coding (AAC)
2002		Ogg Vorbis
2003	DualDisc
2004		Apple Lossless (ALE or ALAC)
2005	HD DVD
2005		OggPCM
2006	Blu-Ray

Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are implemented in computer software as audio codecs. Generic data compression algorithms perform poorly with audio data, seldom reducing file sizes much below 87% of the original, and are not designed for use in real time. Consequently, specific audio "lossless" and "lossy" algorithms have been created. Lossy algorithms provide far greater compression ratios and are used in mainstream consumer audio devices.

As with image compression, both lossy and lossless compression algorithms are used in audio compression, lossy being the most common for everyday use. In both lossy and lossless compression, information redundancy is reduced, using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to describe the data.

The trade-off of slightly reduced audio quality is clearly outweighed for most practical audio applications where users cannot perceive any difference and space requirements are substantially reduced. For example, on one CD, one can fit an hour of high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in MP3 format at medium bit rates.

Lossless audio compression

Lossless audio compression allows one to preserve an exact copy of one's audio files, in contrast to the irreversible changes from lossy compression techniques such as Vorbis and MP3. Compression ratios are similar to those for generic lossless data compression (around 50–60% of original size), and substantially less than for lossy compression (which typically yield 5–20% of original size).

[edit] Use

The primary use of lossless encoding are:

Archives: For archival purposes, one naturally wishes to maximize quality.
Editing: Editing lossily compressed data leads to digital generation loss, since the decoding and re-encoding introduce artifacts at each generation. Thus audio engineers use lossless compression.
Audio quality: Being lossless, these formats completely avoid compression artifacts. Audiophiles thus favor lossless compression.

A specific application is to store lossless copies of audio, and then produce lossily compressed versions for a digital audio player. As formats and encoders improve, one can produce updated lossily compressed files from the lossless master.

As file storage and communications bandwidth have become less expensive and more available, lossless audio compression has become more popular.

[edit] Formats

Shorten was an early lossless format; newer ones include Free Lossless Audio Codec (FLAC), Apple's Apple Lossless, MPEG-4 ALS, Monkey's Audio, and TTA.

Some audio formats feature a combination of a lossy format and a lossless correction; this allows stripping the correction to easily obtain a lossy file. Such formats include MPEG-4 SLS (Scalable to Lossless), WavPack, and OptimFROG DualStream.

Some formats are associated with a technology, such as:

Direct Stream Transfer, used in Super Audio CD
Meridian Lossless Packing, used in DVD-Audio, Dolby TrueHD, Blu-ray and HD DVD

[edit] Difficulties in lossless compression of audio data

It is difficult to maintain all the data in an audio stream and achieve substantial compression. First, the vast majority of sound recordings are highly complex, recorded from the real world. As one of the key methods of compression is to find patterns and repetition, more chaotic data such as audio doesn't compress well. In a similar manner, photographs compress less efficiently with lossless methods than simpler computer-generated images do. But interestingly, even computer generated sounds can contain very complicated waveforms that present a challenge to many compression algorithms. This is due to the nature of audio waveforms, which are generally difficult to simplify without a (necessarily lossy) conversion to frequency information, as performed by the human ear.

The second reason is that values of audio samples change very quickly, so generic data compression algorithms don't work well for audio, and strings of consecutive bytes don't generally appear very often. However, convolution with the filter [-1 1] (that is, taking the first difference) tends to slightly whiten (decorrelate, make flat) the spectrum, thereby allowing traditional lossless compression at the encoder to do its job; integration at the decoder restores the original signal. Codecs such as FLAC, Shorten and TTA use linear prediction to estimate the spectrum of the signal. At the encoder, the estimator's inverse is used to whiten the signal by removing spectral peaks while the estimator is used to reconstruct the original signal at the decoder.

[edit] Evaluation criteria

Lossless audio codecs have no quality issues, so the usability can be estimated by

Speed of compression and decompression
Degree of compression
Software and hardware support
Robustness and error correction

[edit] Lossy audio compression

Lossy audio compression is used in an extremely wide range of applications. In addition to the direct applications (mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television; streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of the original stream, rather than 50 percent to 60 percent), by discarding less-critical data.

The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are coded with decreased accuracy or not coded at all.

While removing or reducing these 'unhearable' sounds may account for a small percentage of bits saved in lossy compression, the real savings comes from a complementary phenomenon: noise shaping. Reducing the number of bits used to code a signal increases the amount of noise in that signal. In psychoacoustics-based lossy compression, the real key is to 'hide' the noise generated by the bit savings in areas of the audio stream that cannot be perceived. This is done by, for instance, using very small numbers of bits to code the high frequencies of most signals - not because the signal has little high frequency information (though this is also often true as well), but rather because the human ear can only perceive very loud signals in this region, so that softer sounds 'hidden' there simply aren't heard.

If reducing perceptual redundancy does not achieve sufficient compression for a particular application, it may require further lossy compression. Depending on the audio source, this still may not produce perceptible differences. Speech for example can be compressed far more than music. Most lossy compression schemes allow compression parameters to be adjusted to achieve a target rate of data, usually expressed as a bit rate. Again, the data reduction will be guided by some model of how important the sound is as perceived by the human ear, with the goal of efficiency and optimized quality for the target data rate. (There are many different models used for this perceptual analysis, some better suited to different types of audio than others.) Hence, depending on the bandwidth and storage requirements, the use of lossy compression may result in a perceived reduction of the audio quality that ranges from none to severe, but generally an obviously audible reduction in quality is unacceptable to listeners.

Because data is removed during lossy compression and cannot be recovered by decompression, some people may not prefer lossy compression for archival storage. Hence, as noted, even those who use lossy compression (for portable audio applications, for example) may wish to keep a losslessly compressed archive for other applications. In addition, the technology of compression continues to advance, and achieving a state-of-the-art lossy compression would require one to begin again with the lossless, original audio data and compress with the new lossy codec. The nature of lossy compression (for both audio and images) results in increasing degradation of quality if data are decompressed, then recompressed using lossy compression.

[edit] History

A large variety of real, working audio coding systems were published in a collection in the IEEE Journal on Selected Areas in Communications (JSAC), February 1988. While there were some papers from before that time, this compendium of papers documented an entire variety of finished, working audio coders, nearly all of them using perceptual (i.e. masking) techniques and some kind of frequency analysis and back-end noiseless coding.^[1] Several of these papers remarked on the difficulty of obtaining good, clean digital audio for research purposes. Most, if not all, of the authors in the JSAC edition were also active in the MPEG-1 Audio committee.

Solidyne 922: The world's first commercial audio bit compression card for PC, 1990

The world's first commercial broadcast automation audio compression system was developed by Oscar Bonello, an Engineering professor at the University of Buenos Aires.^[2] In 1983, using the psychoacoustic principle of the masking of critical bands first published in 1967,^[3] he started developing a practical application based on the recently developed IBM PC computer, and the broadcast automation system was launched in 1987 under the name Audicom. 20 years later, almost all the radio stations in the world were using similar technology, manufactured by a number of companies.

[edit] Coding methods

[edit] Transform domain methods

In order to determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components is determined by first calculating a masking threshold, below which it is estimated that sounds will be beyond the limits of human perception.

The masking threshold is calculated using the absolute threshold of hearing and the principles of simultaneous masking - the phenomenon wherein a signal is masked by another signal separated by frequency - and, in some cases, temporal masking - where a signal is masked by another signal separated by time. Equal-loudness contours may also be used to weight the perceptual importance of different components. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models.

[edit] Time domain methods

Other types of lossy compressors, such as the linear predictive coding (LPC) used with speech, are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) prior to quantization. LPC may also be thought of as a basic perceptual coding technique; reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal, partially masking it.

[edit] Applications

Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (digital generation loss). This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3), as a megabyte can store about a minute's worth of music at adequate quality.

[edit] Usability

Usability of lossy audio codecs is determined by:

Perceived audio quality
Compression factor
Speed of compression and decompression
Inherent latency of algorithm (critical for real-time streaming applications; see below)
Software and hardware support

Lossy formats are often used for the distribution of streaming audio, or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen.

Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time in order to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.

In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples which must be analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed in order to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication).

[edit] Speech encoding

Speech encoding is an important category of audio data compression. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using relatively low bit rates.

This is accomplished, in general, by some combination of two approaches:

Only encoding sounds that could be made by a single human voice.
Throwing away more of the data in the signal -- keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing.

Perhaps the earliest algorithms used in speech encoding (and audio data compression in general) were the A-law algorithm and the µ-law algorithm.

[edit] Glossary

ABR: Average bitrate
CBR: Constant bitrate
VBR: Variable bitrate

[edit] References

^ Journal on Selected Areas in Communications, February 1988
^ Solidyne... 40 years of innovation
^ The Ear as a Communication Receiver. English translation of Das Ohr als Nachrichtenempfänger by Eberhard Zwicker and Richard Feldtkeller. Translated from German by Hannes Müsch, Søren Buus, and Mary Florentine. Originally published in 1967; Translation published in 1999

[edit] See also

'연구하는 인생 > ♣COMPUTER' 카테고리의 다른 글

keyboard shortcuts (0)	2008.06.11
Container format (digital) (0)	2008.06.07
MIDI - 위키백과 (0)	2008.06.07
Rip CDs with Windows Media Player 11 (0)	2008.05.21
Rip CDs with Windows Media Player 10 (0)	2008.05.21

현재글Audio file format

연꽃인생 THE LOTUS LIFE

주체사상 비판, 주계약, 폴더소유권이전, 인생관리, 노무현, 무릉도원, 유적지 답사, 이명박 오바마, 한미 합동 군사훈련, 종계약,

Today :
Yesterday :

연꽃인생 THE LOTUS LIFE

Audio file format