A vocoder or voder (a portmanteau of voice and encoder) is a speech analyzer and synthesizer. It was originally developed as a speech coder for telecommunications applications in the 1930s, the idea being to code speech for transmission. Its primary use in this fashion is for secure radio communication, where voice has to be digitized, encrypted and then transmitted on a narrow, voice-bandwidth channel. The vocoder has also been used extensively as an electronic musical instrument.

The vocoder is related to, but essentially different from, the computer algorithm known as the "phase vocoder".

How a vocoder works

Vocoder theory

The human voice consists of sounds generated by the opening and closing of the glottis by the vocal cords, which produces a periodic waveform with many harmonics. This basic sound is then filtered by the nose and throat (a complicated resonant piping system) to produce differences in harmonic content (formants) in a controlled way, creating the wide variety of sounds used in speech. There is another set of sounds, known as the unvoiced and plosive sounds, which are not modified by the mouth in the same fashion.

The vocoder examines speech by finding this basic carrier wave, which is at the fundamental frequency, and measuring how its spectral characteristics are changed over time by recording someone speaking. This results in a series of numbers representing these modified frequencies at any particular time as the user speaks. In doing so, the vocoder dramatically reduces the amount of information needed to store speech, from a complete recording to a series of numbers. To recreate speech, the vocoder simply reverses the process, creating the fundamental frequency in an oscillator, then passing it through a stage that filters the frequency content based on the originally recorded series of numbers.

Early vocoders

Most analog vocoder systems use a number of frequency channels, all tuned to different frequencies (using band-pass filters). The various values of these filters are stored not as the raw numbers, which are all based on the original fundamental frequency, but as a series of modifications to that fundamental needed to modify it into the signal seen in the output of that filter. During playback these settings are sent back into the filters and then added together, modified with the knowledge that speech typically varies between these frequencies in a fairly linear way. The result is recognizable speech, although somewhat "mechanical" sounding. Vocoders also often include a second system for generating unvoiced sounds, using a noise generator instead of the fundamental frequency.

The first experiments with a vocoder were conducted in 1928 by Bell Labs engineer Homer Dudley, who eventually patented it in 1935. Dudley's vocoder was used in the SIGSALY system, which was built by Bell Labs engineers (Alan Turing was briefly involved) in 1943. The SIGSALY system was used for encrypted high-level communications during WW-II. Later work in this field has been conducted by James Flanagan.

Linear prediction-based vocoders

Since the late 1970s, most non-musical vocoders have been implemented using linear prediction, whereby the target signal's spectral envelope (formant) is estimated by an all-pole IIR filter. In linear prediction coding, the all-pole filter replaces the bandpass filter bank of its predecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) and again at the decoder to re-apply the spectral shape of the target speech signal. In contrast with vocoders realized using bandpass filter banks, the location of the linear predictor's spectral peaks is entirely determined by the target signal and need not be harmonic, i.e., a whole-number multiple of the basic frequency.

Modern vocoder implementations

Even with the need to record several frequencies, and the additional unvoiced sounds, the compression of the vocoder system is impressive. Standard systems to record speech record a frequency from about 500 Hz to 3400 Hz, where most of the frequencies used in speech lie, which requires 64kbit/s of bandwidth (the Nyquist rate). However a vocoder can provide a reasonably good simulation with as little as 2400 bit/s of data rate, a 26× improvement.

Several vocoder systems are used in NSA encryption systems:
  • LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding
  • Code Excited Linear Prediction, (CELP), 2400 and 4800 bit/s, Federal Standard 1016, used in STU-III
  • Continuously Variable Slope Delta-modulation (CVSD), 16 Kbit/s, used in wide band encryptors such as the KY-57.
  • Mixed Excitation Linear Prediction (MELP), MIL STD 3005, 2400 bit/s, used in the Future Narrowband Digital Terminal FNBDT, NSA's 21st century secure telephone.
  • Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32Kbit/s used in STE secure telephone
(ADPCM is not a proper vocoder but rather a waveform codec. ITU has gathered G.721 along with some other ADPCM codecs into G.726.)

Musical applications

For musical applications, a source of musical sounds is used as the carrier, instead of extracting the fundamental frequency. For instance, one could use the sound of a guitar as the input to the filter bank, a technique that became popular in the 1970s.

Musical history

In 1970, electronic music pioneers Wendy Carlos and Robert Moog developed one of the first truly musical vocoders. A 10-band device inspired by the vocoder designs of Homer Dudley, it was originally called a spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier signal came from a Moog modular synthesizer, and the modulator from a microphone input. The output of the 10-band vocoder was fairly intelligible, but relied on specially articulated speech. Later improved vocoders use a high-pass filter to let some sibilance through from the microphone; this ruins the device for its original speech-coding application, but it makes the "talking synthesizer" effect much more intelligible.

Carlos' and Moog's vocoder was featured in several recordings, including the soundtrack to Stanley Kubrick's A Clockwork Orange, in which the vocoder sang the vocal part of Beethoven's Ninth Symphony. Also featured in the soundtrack was a piece called "Timesteps," which featured the vocoder in two sections. Originally, "Timesteps" was intended as merely an introduction to vocoders for the "timid listener", but Kubrick chose to include the piece on the soundtrack, much to the surprise of Wendy Carlos.

In the late 1970s, vocoders began to appear in pop music, for example on disco recordings. A typical example is Giorgio Moroder's 1977 album From Here to Eternity. Pink Floyd made extensive use of the vocoder on the album Animals. The machine is featured prominently on the Alan Parsons Project album, Tales of Mystery and Imagination and later on the I Robot album. Vocoders are often used to create the sound of a robot talking, as in the Styx song "Mr. Roboto". It was also used for the introduction to the Main Street Electrical Parade at Disneyland.

Vocoders have appeared on pop recordings from time to time ever since, most often simply as a special effect rather than a featured aspect of the work. However, many experimental electronic artists of the New Age music genre often utilize vocoder in a more comprehensive manner in specific works, such as Jean Michel Jarre (on Zoolook, 1984) and Mike Oldfield (on Five Miles Out, 1982). There are also some artists who have made vocoders an essential part of their music, overall or during an extended phase. Examples include the German synthpop group Kraftwerk, jazz/fusion keyboardist Herbie Hancock during his late 1970s disco period, Patrick Cowley's later recordings and more recently, avant-garde pop group Trans Am.

The song "O Superman" by avant-garde musician Laurie Anderson is a popular recording released in 1981 that incorporates the device. Neil Young made extensive use of vocoders on his 1982 electro-pop album Trans. The KLF used vocoder-distorted voices in their 1991 "Stadium House" mix Last Train to Trancentral (Live from the Lost Continent). British rock band Queen used a vocoder for the hit "Radio Ga Ga" in 1983. In 1998, Marilyn Manson utilized the technology extensively on their glam- and 70s-influenced album Mechanical Animals, such as in "User Friendly" and "Posthuman" among other tracks. Since 1998, Manson has favored the live concert use of the vocoder, most notably on "Antichrist Superstar". The bands Mogwai, The Faint, Air, Ween, and Death from Above 1979 all have made extensive vocoder use. Daron Malakian, guitarist of System of a Down has used one in the songs "Sugar", "War?" and "Old School Hollywood". Muse also used a vocoder on their latest album, Black Holes and Revelations, particularly in the recorded-live version of "Supermassive Black Hole". French house music duo Daft Punk are also very well-known for their use of vocoders (on tracks that contain lyrics). During his live performances, singer-songwriter Martin Sexton is well know for singing into a vocoder to simulate lead guitar while he simultaneously plays rhythm guitar.

Music icon Paul McCartney used the vocoder on his 1982 hit album Tug of War.

Legendary funk/pop artist Prince recorded the vocals to his 2006 song "Incense and Candles" using a vocoder. This song can be heard on the album 3121.

Sam La More and GT's new wave / electro supergroup Tonite Only recorded their hits "Danger (The Bomb)" and "Where The Party's At" using a Clavia Nord Modular vocoder.

Nodisco was the first band that recorded full vocal lines in Italian with a vocoder, in several songs from the album Pensiero Attivo, in 2004. Eurodance/techno band Eiffel 65 uses a vocoder and/or Auto-Tune in almost every song on their first two albums Europop and Contact!, but less so on their self-titled album, Eiffel 65

Imogen Heap uses the vocoder and her voice only for the song "Hide and Seek" on the album Speak for Yourself. She manipulates this via a MIDI keyboard to create the harmonies she intends her modified voice to create.

T-Pain's signature vocals in his songs are frequently confused as being a vocoder, but are rather the work of Auto-Tune software.[1]

Radiohead used the vocoder heavily on their critically acclaimed album Kid A as well as many of their other albums.

Other voice effects

"Robot voices" became a recurring element in popular music during the late twentieth century. Several methods of producing variations on this effect have arisen, of which the vocoder is only one. It is still the best known and most widely-used, though the following other pieces of music technology are often confused with the vocoder:

The Talk box (Sonovox), Autotuner, Linear predictive coding, Ring modulator, Speech synthesis, and Comb filter.

The sub-page Robotic voice effects includes more detailed comparisons.

Television, film and game applications

Vocoders have also been used in television,film and games usually for robots or talking computers:
Example of vocoder
demonstration of the "robotic voice" effect found in film and television.
Problems listening to the file? See media help

Classic example of a singing vocoded voice.
Problems listening to the file? See media help

See also


Cited references

1. ^ Kevin Clark, T-Pain: Musical Bartender, HipHopDX.com, 2007-04-30.

External links

A portmanteau (IPA: /pɔərtˈmæntoʊ/) is a word or morpheme that fuses two or more words or word parts to give a combined or loaded meaning.
..... Click the link for more information.
Speech communication refers to the processes associated with the production and perception of sounds used in spoken language. A number of academic disciplines study speech and speech sounds, including acoustics, psychology, speech pathology, linguistics, and computer science.
..... Click the link for more information.
Synthesizer is generally any kind of electronic musical instrument, or electronic device capable of producing or manipulating audio tones, such as musical notes, through audio signal processing.
..... Click the link for more information.
Telecommunication is the transmission of signals over a distance for the purpose of communication. In modern times, this process typically involves the sending of electromagnetic waves by electronic transmitters, but in earlier times telecommunication may have involved the use of
..... Click the link for more information.
Centuries: 19th century - 20th century - 21st century

1900s 1910s 1920s - 1930s - 1940s 1950s 1960s
1930 1931 1932 1933 1934
1935 1936 1937 1938 1939

- -
- The 1930s
..... Click the link for more information.
In communications, a code is a rule for converting a piece of information (for example, a letter, word, or phrase) into another form or representation, not necessarily of the same type.
..... Click the link for more information.
Digitizing or digitization is representing an object, image, document or a signal (usually an analog signal) by a discrete set of its points or samples. The result is called "digital representation" or, more specifically, a "digital image", for the object, and "digital
..... Click the link for more information.
encryption is the process of transforming information (referred to as plaintext) to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key.
..... Click the link for more information.
electronic musical instrument is a musical instrument that produces its sounds using electronics. In contrast, the term electric instrument is used to mean instruments whose sound is produced mechanically, and only amplified or altered electronically - for example an electric
..... Click the link for more information.
A phase vocoder is a type of vocoder which preserves both frequency and phase information.

A similar computer algorithm (referred to by the same name) allows frequency-domain modifications to a digital sound file (typically time expansion/compression and pitch shifting).
..... Click the link for more information.
human voice consists of sound made by a human using the vocal folds for talking, singing, laughing, crying, screaming etc. The vocal folds, in combination with the lips, the tongue, the lower jaw, and the palate, are capable of producing highly intricate arrays of sound.
..... Click the link for more information.
The space between the vocal cords is called the glottis.


As the vocal cords vibrate, the resulting vibration produces a "buzzing" quality to the speech, called voice or voicing.
..... Click the link for more information.
The vocal folds, also known popularly as vocal cords, are composed of twin infoldings of mucous membrane stretched horizontally across the larynx. They vibrate, modulating the flow of air being expelled from the lungs during phonation.
..... Click the link for more information.
harmonic of a wave is a component frequency of the signal that is an integer multiple of the fundamental frequency. For example, if the frequency is f, the harmonics have frequency 2f, 3f, 4f, etc.
..... Click the link for more information.
An audio filter is a type of filter used for processing sound signals. Many types of filters exist for applications including graphic equalizers, synthesizers, sound effects, CD players and virtual reality systems.
..... Click the link for more information.
resonance is the tendency of a system to oscillate at maximum amplitude at a certain frequency. This frequency is known as the system's resonance frequency. When damping is small, the resonance frequency is approximately equal to the natural frequency of the system, which
..... Click the link for more information.
formant is a peak in an acoustic frequency spectrum which results from the resonant frequencies of any acoustic system. It is most commonly invoked in phonetics or acoustics involving the resonant frequencies of vocal tracts or musical instruments.
..... Click the link for more information.
In phonetics, voice or voicing is one of the three major parameters used to describe a sound. It is usually treated as a binary parameter with sounds being described as either voiceless (unvoiced) or voiced
..... Click the link for more information.
stop, plosive, or occlusive is a consonant sound produced by stopping the airflow in the vocal tract. The terms plosive and stop are usually used interchangeably, but they are not perfect synonyms.
..... Click the link for more information.
In telecommunications, a carrier wave, or carrier is a waveform (usually sinusoidal) that is modulated (modified) with an input signal for the purpose of conveying information, for example voice or data, to be transmitted.
..... Click the link for more information.
fundamental tone, often referred to simply as the fundamental and abbreviated fo, is the lowest frequency in a harmonic series.

The fundamental frequency (also called a natural frequency
..... Click the link for more information.
electronic oscillator is an electronic circuit that produces a repetitive electronic signal, often a sine wave or a square wave.

A low-frequency oscillator (LFO) is an electronic oscillator that generates an AC waveform between 0.1 Hz and 10 Hz.
..... Click the link for more information.
An analog or analogue signal is any time continuous signal where some time varying feature of the signal is a representation of some other time varying quantity. It differs from a digital signal in that small fluctuations in the signal are meaningful.
..... Click the link for more information.
band-pass filter is a device that passes frequencies within a certain range and rejects (attenuates) frequencies outside that range. An example of an analogue electronic band-pass filter is an RLC circuit (a resistor-inductor-capacitor circuit).
..... Click the link for more information.
Bell Laboratories (also known as Bell Labs and formerly known as AT&T Bell Laboratories and Bell Telephone Laboratories) is part of the research and development organization of Alcatel-Lucent and previously the United States Bell System.
..... Click the link for more information.
SIGSALY (also known as the X System, Project X, Ciphony I, and the Green Hornet) was a secure speech system used in World War II for the highest-level Allied communications.
..... Click the link for more information.
Bell Laboratories (also known as Bell Labs and formerly known as AT&T Bell Laboratories and Bell Telephone Laboratories) is part of the research and development organization of Alcatel-Lucent and previously the United States Bell System.
..... Click the link for more information.
Alan Mathison Turing, OBE, FRS (23 June 1912 – 7 June 1954) was an English mathematician, logician, and cryptographer.

Turing is often considered to be the father of modern computer science.
..... Click the link for more information.
SIGSALY (also known as the X System, Project X, Ciphony I, and the Green Hornet) was a secure speech system used in World War II for the highest-level Allied communications.
..... Click the link for more information.
James Loton Flanagan is an electrical engineer, and was Rutgers' vice president for research until 2004. He is also director of Rutgers' Center for Advanced Information Processing and the Board of Governors Professor of Electrical and Computer Engineering.
..... Click the link for more information.

This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.