연구하는 인생/♣COMPUTER

Audio file format

hanngill 2008. 6. 7. 20:31

http://blog.daum.net/hanngil

 

AUDIO FILE FORMAT

  - From Wikipedia

 

An audio file format is a container format for storing audio data on a computer system.

The general approach towards storing digital audio is to sample the audio voltage which, on playback, would correspond to a certain position of the membrane in a speaker of the individual channels with a certain resolution — the number of bits per sample — in regular intervals (forming the sample rate). This data can then be stored uncompressed or compressed to reduce the file size.

 

 Types of formats

It is important to distinguish between a file format and a codec. A codec performs the encoding and decoding of the raw audio data while the data itself is stored in a file with a specific audio file format. Though most audio file formats support only one audio codec, a file format may support multiple codecs, as AVI does.

There are three major groups of audio file formats:

 Uncompressed audio format

There is one major uncompressed audio format, PCM, which is usually stored as a .wav on Windows or as .aiff on Mac OS.

WAV is a flexible file format designed to store more or less any combination of sampling rates or bitrates. This makes it an adequate file format for storing and archiving an original recording. A lossless compressed format would require more processing for the same time recorded, but would be more efficient in terms of space used.

WAV, like any other uncompressed format, encodes all sounds, whether they are complex sounds or absolute silence, with the same number of bits per unit of time.

As an example, a file containing a minute of playing by a symphonic orchestra would be the same size as a minute of absolute silence if they were both stored in WAV. If the files were encoded with a lossless compressed audio format, the first file would be marginally smaller, and the second file taking up almost no space at all. However, to encode the files to a lossless format would take significantly more time than encoding the files to the WAV format.

Recently some new lossless formats have been developed (for example TAK), which aim is to achieve very fast coding with good compression ratio.

The WAV format is based on the RIFF file format, which is similar to the IFF format.

BWF (Broadcast Wave Format) is a standard audio format created by the European Broadcasting Union as a successor to WAV.

 BWF allows metadata to be stored in the file.

See European Broadcasting Union: Specification of the Broadcast Wave Format — A format for audio data files in broadcasting. EBU Technical document 3285, July 1997.

This format is the primary recording format used in many professional Audio Workstations used in the Television and Film industry. Stand-alone, file based, multi-track recorders from Sound Devices[1], Zaxcom[2], HHB USA[3], Fostex, and Aaton[4] all use BWF as their preferred file format for recording multi-track audio files with SMPTE Time Code reference. This standardized Time Stamp in the Broadcast Wave File allows for easy synchronization with a separate picture element.

 Lossless audio formats

Lossless audio formats (such as the most widespread[5] FLAC, WavPack, Monkey's Audio) provide a compression ratio of about 2:1.

 Free and open file formats

  • wav – standard audio file container format used mainly in Windows PCs. Commonly used for storing uncompressed (PCM), CD-quality sound files, which means that they can be large in size — around 10 MB per minute. Wave files can also contain data encoded with a variety of codecs to reduce the file size (for example the GSM or mp3 codecs). Wav files use a RIFF structure.
  • ogg – a free, open source container format supporting a variety of codecs, the most popular of which is the audio codec Vorbis. Vorbis offers better compression than MP3 but is less popular.
  • mpc - Musepack or MPC (formerly known as MPEGplus, MPEG+ or MP+) is an open source lossy audio codec, specifically optimized for transparent compression of stereo audio at bitrates of 160–180 kbit/s. Musepack and Ogg Vorbis are rated as the two best available codecs for high-quality lossy audio compression in many double-blind listening tests. Nevertheless, Musepack is even less popular than Ogg Vorbis and nowadays is used mainly by the audiophiles.
  • flac – a lossless compression codec. This format is a lossless compression as like zip but for audio. If you compress a PCM file to flac and then restore it again it will be a perfect copy of the original. (All the other codecs discussed here are lossy which means a small part of the quality is lost). The cost of this losslessness is that the compression ratio is not good. Flac is recommended for archiving PCM files where quality is important (e.g. broadcast or music use).
  • aiff – the standard audio file format used by Apple. It is like a wav file for the Mac.
  • raw – a raw file can contain audio in any codec but is usually used with PCM audio data. It is rarely used except for technical tests.
  • au – the standard audio file format used by Sun, Unix and Java. The audio in au files can be PCM or compressed with the μ-law, a-μlaw or G729 codecs.

 Open file formats

  • gsm – designed for telephony use in Europe, gsm is a very practical format for telephone quality voice. It makes a good compromise between file size and quality. Note that wav files can also be encoded with the gsm codec.
  • dct – A variable codec format designed for dictation. It has dictation header information and can be encrypted (often required by medical confidentiality laws).
  • vox – the vox format most commonly uses the Dialogic ADPCM (Adaptive Differential Pulse Code Modulation) codec. Similar to other ADPCM formats, it compresses to 4-bits. Vox format files are similar to wave files except that the vox files contain no information about the file itself so the codec sample rate and number of channels must first be specified in order to play a vox file.
  • aac – the Advanced Audio Coding format is based on the MPEG2 and MPEG4 standards. aac files are usually ADTS or ADIF containers.
  • mp4/m4a – MPEG-4 audio most often AAC but sometimes MP2/MP3

 Proprietary formats

  • mp3 – the MPEG Layer-3 format is the most popular format for downloading and storing music. By eliminating portions of the audio file that are essentially inaudible, mp3 files are compressed to roughly one-tenth the size of an equivalent PCM file while maintaining good audio quality.
  • wma – the popular Windows Media Audio format owned by Microsoft. Designed with Digital Rights Management (DRM) abilities for copy protection.
  • atrac (.wav) – the older style Sony ATRAC format. It always has a .wav file extension. To open these files simply install the ATRAC3 drivers.
  • ra – a Real Audio format designed for streaming audio over the Internet. The .ra format allows files to be stored in a self-contained fashion on a computer, with all of the audio data contained inside the file itself.
  • ram – a text file that contains a link to the Internet address where the Real Audio file is stored. The .ram file contains no audio data itself.
  • dss – Digital Speech Standard files are an Olympus proprietary format. It is a fairly old and poor codec. Prefer gsm or mp3 where the recorder allows. It allows additional data to be held in the file header.
  • msv – a Sony proprietary format for Memory Stick compressed voice files.
  • dvf – a Sony proprietary format for compressed voice files; commonly used by Sony dictation recorders.
  • mp4 – A proprietary version of AAC in MP4 with Digital Rights Management developed by Apple for use in music downloaded from their iTunes Music Store.
  • iKlax – An iKlax Media proprietary format, the iKlax format is a multi-track digital audio format allowing various actions on musical data, for instance on mixing and volumes arrangements.

 See also

 References

 External links

 
 
The list of audio file formats
 
Ext. Description
669 Composer 669 module
669 UNIS Composer module
AIFC Compressed Audio Interchange Format File
AIFF Audio Interchange Format File .
AIS Velvet Studio Instrument
Akai sampler disk and file formats ., .
AKP Akai S5000/S6000 Program File .
ALAW Raw A-law data
AMS Extreme Tracker Module
AMS Velvet Studio Module
APEX AVM Sample Studio bank
ASE Velvet Studio Sample
ASF Microsoft Advanced Streaming Format .
ASX Microsoft Advanced Streaming Format Metafile .
AU Sun/Next Audio File (linear m-law or A-law)
AVI Microsoft Audio Video Interleave File
AVR Audio Visual Research sound file
C01 Typhoon wave file
CDA CD Audio Track
CDR Raw Audio-CD data
CMF Creative Labs Music File
DCM DCM Module
DEWF Macintosh SoundCap/SoundEdit recorder instrument
DF2 Defractor 2 Extended Instrument
DFC Defractor Instrument
DIG Digilink format
DIG Sound Designer I audio
DLS Downloadable Sounds .
DMF Delusion Digital Music File
DSF Delusion Digital Sound File
DSM Digital Sound module
DSP Dynamic Studio Professional module
DTM DigiTrekker module
DWD DiamondWare Digitized audio
EDA Ensoniq ASR disk image
EDE Ensoniq EPS disk image
EDK Ensoniq KT disk image
EDQ Ensoniq SQ1/SQ2/KS32 disk image
EDS Ensoniq SQ80 disk image
EDV Ensoniq VFX-SD disk image
EFA Ensoniq ASR file
EFE Ensoniq EPS family instrument
EFK Ensoniq KT file
EFQ Ensoniq SQ1/SQ2/KS32 file
EFS Ensoniq SQ80 file
EFV Ensoniq VFX-SD file
EMB Everest embedded bank file
EMD ABT Extended module
ESPS ESPS audio file
EUI Ensoniq EPS family compacted disk image
F32 Raw 32-bit IEEE floating point waveform values
F64 Raw 64-bit IEEE floating point waveform values
F2R Farandoyle Linear module
F3R Farandoyle Blocked Linear module
FAR Farandoyle Composer module
FFF Gravis UltraSound PnP bank
FSM Farandoyle Composer WaveSample
FZB Casio FZ-1 Bank dump
FZF Casio FZ-1 Full dump
FZV Casio FZ-1 Voice dump
G721 Raw CCITT G.721 4bit ADPCM format data
G723 Raw CCITT G.723 or 5bit ADPCM format data
G726 Raw CCITT G.726 2, 3, 4 or 5bit ADPCM format data
GIG GigaSampler file
GKH Ensoniq EPS (VFX, SD, EPS, ASR, TS) family disk image
GSM Raw GSM 6.10 audio stream or raw 'byte aligned' GSM 6.10 audio stream
GSM US Robotics voice modems GSM QuickLink/VoiceGuide/RapidComm
IFF Interchange Format File
INI Gravis UltraSound bank setup extract plus patch files
INS Ensoniq instrument
INS Sample Cell/II instrument
IT Impulse Tracker module
ITI Impulse Tracker instrument
ITS Impulse Tracker sample
K25 Kurzweil K2500 (identicle to KRZ)
K26 Kurzweil K2600 (identicle to KRZ)
KMP Korg Trinity KeyMap
KRZ Kurzweil K2000
KSC Korg Trinity Script
KSF Korg Trinity Sample File
MAT Matlab variables binary
MED MED/OctaMED module
MID Standard MIDI song/track information ., .
MOD Amiga SoundTracker / Protracker / NoiseTracker / Fastracker / Startrekker / TakeTracker module
MPEG MPEG-1 (Moving Picture Experts Group) Audio Layer I, II and III compressed audio
MP2
MP3
MT2 MadTracker 2 module
MTE MadTracker 2 Envelope
MTI MadTracker 2 instrument
MTM MultiTracker module
MTP MadTracker 2 Pattern
MTS MadTracker 2 Sample
MTX MadTracker 2 Extension
MWS MWave DSP synth's instrument extract
NST NoiseTracker Module
OKT Oktalizer module
PAC SBStudio II Package or Song .
PAT Advanced Gravis Ultrasound / Forte tech .patch
PBF Turtle Beach Pinnacle Bank File
PRG Akai MPC2000 Program File ., WAVmaker program
PHY PhyMod Physical Modeling data
PSM Protracker Studio module
PTM PolyTracker module
RA RealNetworks RealAudio compressed streaming data
RAM RealNetworks RealAudio Metafile
RAW PCM signed raw audio
RBS Propellerhead's Rebirth Song File
RMF Beatnik's multimedia Rich Music Format
ROL Adlib Synthesized Instrument Music file
RTI RealTracker instrument
RTM RealTracker module
RTS RealTracker sample
S3I Scream Tracker v3 instrument
S3M Scream Tracker v3 module
SAM MODEDIT sample file
SB Raw signed PCM 8bit data
SBK Emu Systems SoundFont Bank patch collection
SBI SoundBlaster Instrument
SD Sound Designer I audio
SD2 Sound Designer II flattened audio or data fork
SDK Roland S-550/S-50/W-30 Disk Image
SDS MIDI Sample Dump Standard .
SDX Sample Dump Exchange
SF IRCAM Sound File"
SF2 Emu Systems SoundFont v2.0 patch collection .
SMP SampleVision audio, AdLib Gold Sample
SND Akai MPC-series sample ., ., PCM unsigned raw audio, NeXT Sound, Macintosh Sound Resource
SOU SBStudio II audio
SPPACK SPPack sound sample
STM Scream Tracker Module 1 & 2
STX Scream Tracker Module
SW Raw signed PCM 16bit data
SYX Raw MIDI System Exclusive message(s)
SYH Synchomatic Instrument
SYW Yamaha SY-85/SY-99 Wave audio
TD0 Akai Teledisk Sound Library .
TXT ASCII text parameter description
TXT ASCII text formatted audio data
TXW Yamaha TX-16W Wave audio
UB Raw unsigned PCM 8bit data
ULAW Raw m-law (CCITT G.711) data
ULT UltraTracker module
UNI UNIMOD module
UW Raw unsigned PCM 16bit data
UWF UltraTracker WaveSample
VOC Creative Labs audio
VMD Convox Raw sample
VMF Convox SpeechThing / Voice Master sample
VOX Dialogic ADPCM audio
W01 Yamaha TX16W or SY-series wave
WAV Microsoft Windows RIFF WAVE ., .
WFB Turtle Beach WaveFront bank
WFD Turtle Beach WaveFront drumkit
WFP Turtle Beach WaveFront program
WOW Grave Composer module
XI Fastracker 2.0 instrument
XM Fastracker 2.0 module
XP Fastracker 2.0 pattern
XT Fastracker 2.0 track

An audio format is a medium for storing sound and music. The term is applied to both the physical recording media and the recording formats of the audio content – in computer science it is often limited to the audio file format, but its wider use usually refers to the physical method used to store the data.

Music is recorded and distributed using a variety of audio formats, some of which store additional information.

[edit] Timeline of audio format developments

Year Media formats Recording formats
1877 Phonograph cylinder Mechanical analog; "hill-and-dale" grooves, vertical stylus motion
1883 Music roll Mechanical digital (automated musical instruments)
1895 Gramophone record Mechanical analog; lateral grooves, horizontal stylus motion
1898 Wire recording Analog; magnetization; no "bias"
1925 Electrical cut record Mechanical analog; electrically cut from amplified microphone signal, lateral grooves, horizontal stylus motion, discs at 7", 10", 12", most at 78 rpm
1930s Reel-to-Reel, Magnetic Tape Analog; magnetization; "bias" dramatically increases linearity/fidelity, tape speed at 30 ips, later 15 ips with NAB equalization; refined speeds: 7 1/2 ips, 3 3/4 ips, 1 7/8 ips
1930s Electrical transcriptions Mechanical analog; electrically cut from amplified microphone signal, high fidelity sound, lateral or vertical grooves, horizontal or vertical stylus motion, most discs 16" at 33 1/3 rpm
1948 (Commercial release) Vinyl Record Analog, with preemphasis and other equalization techniques (LP, RIAA); lateral grooves, horizontal stylus motion; discs at 7" (most 45 rpm), 10" and 12" (most 33 1/3 rpm)
1957 Stereophonic Vinyl Record Analog, with preemphasis and other equalization techniques. Combination lateral/vertical stylus motion with each channel encoded 45 degrees to the vertical.
1962 4-Track (Stereo-Pak) Analog, 1/4 inch wide tape, 3 3/4 inches/sec, endless loop cartridge.
1963 Compact Cassette Analog, with bias, preemphasis, 0.15 inch wide tape, 17/8 inches/sec. 1970: introduced Dolby noise reduction.
1965 8-Track (Stereo-8) Analog, 1/4 inch wide tape, 3 3/4 inches/sec, endless loop cartridge.
1969 Microcassette Analog, 1/8 inch wide tape, used generally for notetaking, mostly mono, some stereo. 2.4 cm/s or 1.2 cm/s.
1969 Minicassette Analog, 1/8 inch wide tape, used generally for notetaking, 1.2 cm/s
1970 Quadraphonic 8-Track (Quad-8) (Q8) Analog, 1/4 inch wide tape, 3 3/4 inches/sec, 4 Channel Stereo, endless loop cartridge.
1971 Quadraphonic Vinyl Record (CD-4) (SQ Matrix)
1975 Betamax Digital Audio 'Dolby Stereo' cinema surround sound
1976 Elcaset
1978 Laserdisc
1982 Compact Disc (CD-DA) PCM
1985 Audio Interchange File Format (AIFF)
1985 Sound Designer (by Digidesign)
1987 Digital Audio Tape (DAT)
1991 MiniDisc (MD) ATRAC
1992 Digital Compact Cassette (DCC)
1992 WAVEform (WAV)

Dolby Digital surround cinema sound

1993 Digital Theatre System (DTS)

Sony Dynamic Digital Sound (SDDS)

1995 MP3
1997 DVD Dolby Digital
1997 DTS-CD DTS Audio
1999 DVD-Audio
1999 Super Audio CD (SACD)
1999 Windows Media Audio (WMA)
1999 The True Audio Lossless Codec (TTA)
2000 Free Lossless Audio Codec (FLAC)
2001 Advanced audio coding (AAC)
2002 Ogg Vorbis
2003 DualDisc
2004 Apple Lossless (ALE or ALAC)
2005 HD DVD
2005 OggPCM
2006 Blu-Ray

 

 

 

Audio compression is a form of data compression designed to reduce the size of audio files. Audio compression algorithms are implemented in computer software as audio codecs. Generic data compression algorithms perform poorly with audio data, seldom reducing file sizes much below 87% of the original, and are not designed for use in real time. Consequently, specific audio "lossless" and "lossy" algorithms have been created. Lossy algorithms provide far greater compression ratios and are used in mainstream consumer audio devices.

As with image compression, both lossy and lossless compression algorithms are used in audio compression, lossy being the most common for everyday use. In both lossy and lossless compression, information redundancy is reduced, using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to describe the data.

The trade-off of slightly reduced audio quality is clearly outweighed for most practical audio applications where users cannot perceive any difference and space requirements are substantially reduced. For example, on one CD, one can fit an hour of high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in MP3 format at medium bit rates.

 

Lossless audio compression

Lossless audio compression allows one to preserve an exact copy of one's audio files, in contrast to the irreversible changes from lossy compression techniques such as Vorbis and MP3. Compression ratios are similar to those for generic lossless data compression (around 50–60% of original size), and substantially less than for lossy compression (which typically yield 5–20% of original size).

[edit] Use

The primary use of lossless encoding are:

Archives
For archival purposes, one naturally wishes to maximize quality.
Editing
Editing lossily compressed data leads to digital generation loss, since the decoding and re-encoding introduce artifacts at each generation. Thus audio engineers use lossless compression.
Audio quality
Being lossless, these formats completely avoid compression artifacts. Audiophiles thus favor lossless compression.

A specific application is to store lossless copies of audio, and then produce lossily compressed versions for a digital audio player. As formats and encoders improve, one can produce updated lossily compressed files from the lossless master.

As file storage and communications bandwidth have become less expensive and more available, lossless audio compression has become more popular.

[edit] Formats

Shorten was an early lossless format; newer ones include Free Lossless Audio Codec (FLAC), Apple's Apple Lossless, MPEG-4 ALS, Monkey's Audio, and TTA.

Some audio formats feature a combination of a lossy format and a lossless correction; this allows stripping the correction to easily obtain a lossy file. Such formats include MPEG-4 SLS (Scalable to Lossless), WavPack, and OptimFROG DualStream.

Some formats are associated with a technology, such as:

[edit] Difficulties in lossless compression of audio data

It is difficult to maintain all the data in an audio stream and achieve substantial compression. First, the vast majority of sound recordings are highly complex, recorded from the real world. As one of the key methods of compression is to find patterns and repetition, more chaotic data such as audio doesn't compress well. In a similar manner, photographs compress less efficiently with lossless methods than simpler computer-generated images do. But interestingly, even computer generated sounds can contain very complicated waveforms that present a challenge to many compression algorithms. This is due to the nature of audio waveforms, which are generally difficult to simplify without a (necessarily lossy) conversion to frequency information, as performed by the human ear.

The second reason is that values of audio samples change very quickly, so generic data compression algorithms don't work well for audio, and strings of consecutive bytes don't generally appear very often. However, convolution with the filter [-1 1] (that is, taking the first difference) tends to slightly whiten (decorrelate, make flat) the spectrum, thereby allowing traditional lossless compression at the encoder to do its job; integration at the decoder restores the original signal. Codecs such as FLAC, Shorten and TTA use linear prediction to estimate the spectrum of the signal. At the encoder, the estimator's inverse is used to whiten the signal by removing spectral peaks while the estimator is used to reconstruct the original signal at the decoder.

[edit] Evaluation criteria

Lossless audio codecs have no quality issues, so the usability can be estimated by

  • Speed of compression and decompression
  • Degree of compression
  • Software and hardware support
  • Robustness and error correction

[edit] Lossy audio compression

Lossy audio compression is used in an extremely wide range of applications. In addition to the direct applications (mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television; streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of the original stream, rather than 50 percent to 60 percent), by discarding less-critical data.

The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are coded with decreased accuracy or not coded at all.

While removing or reducing these 'unhearable' sounds may account for a small percentage of bits saved in lossy compression, the real savings comes from a complementary phenomenon: noise shaping. Reducing the number of bits used to code a signal increases the amount of noise in that signal. In psychoacoustics-based lossy compression, the real key is to 'hide' the noise generated by the bit savings in areas of the audio stream that cannot be perceived. This is done by, for instance, using very small numbers of bits to code the high frequencies of most signals - not because the signal has little high frequency information (though this is also often true as well), but rather because the human ear can only perceive very loud signals in this region, so that softer sounds 'hidden' there simply aren't heard.

If reducing perceptual redundancy does not achieve sufficient compression for a particular application, it may require further lossy compression. Depending on the audio source, this still may not produce perceptible differences. Speech for example can be compressed far more than music. Most lossy compression schemes allow compression parameters to be adjusted to achieve a target rate of data, usually expressed as a bit rate. Again, the data reduction will be guided by some model of how important the sound is as perceived by the human ear, with the goal of efficiency and optimized quality for the target data rate. (There are many different models used for this perceptual analysis, some better suited to different types of audio than others.) Hence, depending on the bandwidth and storage requirements, the use of lossy compression may result in a perceived reduction of the audio quality that ranges from none to severe, but generally an obviously audible reduction in quality is unacceptable to listeners.

Because data is removed during lossy compression and cannot be recovered by decompression, some people may not prefer lossy compression for archival storage. Hence, as noted, even those who use lossy compression (for portable audio applications, for example) may wish to keep a losslessly compressed archive for other applications. In addition, the technology of compression continues to advance, and achieving a state-of-the-art lossy compression would require one to begin again with the lossless, original audio data and compress with the new lossy codec. The nature of lossy compression (for both audio and images) results in increasing degradation of quality if data are decompressed, then recompressed using lossy compression.

[edit] History

A large variety of real, working audio coding systems were published in a collection in the IEEE Journal on Selected Areas in Communications (JSAC), February 1988. While there were some papers from before that time, this compendium of papers documented an entire variety of finished, working audio coders, nearly all of them using perceptual (i.e. masking) techniques and some kind of frequency analysis and back-end noiseless coding.[1] Several of these papers remarked on the difficulty of obtaining good, clean digital audio for research purposes. Most, if not all, of the authors in the JSAC edition were also active in the MPEG-1 Audio committee.

Solidyne 922: The world's first commercial audio bit compression card for PC, 1990

The world's first commercial broadcast automation audio compression system was developed by Oscar Bonello, an Engineering professor at the University of Buenos Aires.[2] In 1983, using the psychoacoustic principle of the masking of critical bands first published in 1967,[3] he started developing a practical application based on the recently developed IBM PC computer, and the broadcast automation system was launched in 1987 under the name Audicom. 20 years later, almost all the radio stations in the world were using similar technology, manufactured by a number of companies.

[edit] Coding methods

[edit] Transform domain methods

In order to determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components is determined by first calculating a masking threshold, below which it is estimated that sounds will be beyond the limits of human perception.

The masking threshold is calculated using the absolute threshold of hearing and the principles of simultaneous masking - the phenomenon wherein a signal is masked by another signal separated by frequency - and, in some cases, temporal masking - where a signal is masked by another signal separated by time. Equal-loudness contours may also be used to weight the perceptual importance of different components. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models.

[edit] Time domain methods

Other types of lossy compressors, such as the linear predictive coding (LPC) used with speech, are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) prior to quantization. LPC may also be thought of as a basic perceptual coding technique; reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal, partially masking it.

[edit] Applications

Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (digital generation loss). This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3), as a megabyte can store about a minute's worth of music at adequate quality.

[edit] Usability

Usability of lossy audio codecs is determined by:

  • Perceived audio quality
  • Compression factor
  • Speed of compression and decompression
  • Inherent latency of algorithm (critical for real-time streaming applications; see below)
  • Software and hardware support

Lossy formats are often used for the distribution of streaming audio, or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen.

Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time in order to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.

In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples which must be analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed in order to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication).

[edit] Speech encoding

Speech encoding is an important category of audio data compression. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using relatively low bit rates.

This is accomplished, in general, by some combination of two approaches:

  • Only encoding sounds that could be made by a single human voice.
  • Throwing away more of the data in the signal -- keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing.

Perhaps the earliest algorithms used in speech encoding (and audio data compression in general) were the A-law algorithm and the µ-law algorithm.

[edit] Glossary

ABR
Average bitrate
CBR
Constant bitrate
VBR
Variable bitrate

[edit] References

[edit] See also

'연구하는 인생 > ♣COMPUTER' 카테고리의 다른 글

keyboard shortcuts  (0) 2008.06.11
Container format (digital)  (0) 2008.06.07
MIDI - 위키백과  (0) 2008.06.07
Rip CDs with Windows Media Player 11  (0) 2008.05.21
Rip CDs with Windows Media Player 10  (0) 2008.05.21