14. Vocoder, Auto-tune & Talbox

Hi, I am woochia, and this is sound design theory.
Today I would like to talk about 3 effects designed to be used on voices and that are easily mistaken. Talking about them will be a good opportunity to talk a bit about formants, as they all utilise formants to alter the sound.
The 3 effects we'll tackle today are the vocoder, the auto-tune and the talkbox effects.

FORMANT:

So we talked about formants in the episode about filters, but really, what are they?

When a sound is played in a cavity (that can be a room, that can be a box, or whatever), the object containing the sound starts to resonate. And this will make some harmonics in the sound louder, due to the frequencies of resonance of the object it is played in.
So this will create peaks in the audio spectrum and it is those peaks that we call formants.

And when we are talking, it's exactly what we use to articulate vowels.
When we talk, our vocal cords vibrate, and then we use the position of our mouth and our tongue to produce different formants. That's what make a E, a A and a O sound different.

They can all have the same fundamental note, but the harmonics are different. And if the harmonics are different, the sound is different.

You can actually note the frequency of these formants for each vowel, and then use filters tuned to these frequencies to make vowels out of any sound. That's exactly what we did in the follow-up video of the episode about filters.

(Usually there are at least three peaks, three formants to consider to recreate a vowel accurately.)

Now that we know what formants are, we can see more easily what the vocoder and auto-tune effects are - which are often mistaken for one another - and then we'll see the talkbox effect.

VOCODER:

The vocoder, as an instrument, looks like a synth with a microphone, but it's now available as VSTs and softwares. For this video, we'll take Ableton's vocoder stock effect as an example.

The purpose of a vocoder effect is to mix the sound of a voice with the sound of an instrument, often a synth. So we would still hear that instruments but it would articulate words just like the voice.

So to do that we need 2 signals: a carrier and a modulator.

One will give the base sound we're going to hear, so that's the carrier, here it will be a synth.
And the second signal will change properties of the first one, that will be the modulator, here it will be a voice.

So in Ableton you would put the vocoder effect on the track of the modulator (so that would be the track with the voice), and in the carrier section of the effect, you can go get the track with the synth.

The way it works, is that the vocoder splits both signals in several bands. Then it will analyse the level of each band for the modulator and apply those same levels to the carrier sound.

You can see this like an EQ effect that is applied to the carrier so it's harmonic content follows the harmonic content of the modulator.

And that means a couple of things:
-This will work better if the sound of the carrier is rich in harmonics, as we're going to carve frequencies out of it. It can be a synth with a sawtooth waveform, a sound with a heavy distortion on it, this kind of things.
-This also means that the carrier will inherit the rhythmic qualities of the modulator as well. If the modulator is not playing, no frequency band will be open, so we won't hear the carrier neither. So you could kind of beatbox in the mic to chop up the synth for example.

So, as we talked about formants, if the voice says "a", the vocoder will pick up the formants of that "a" and apply it to the carrier, so the instrument will say "a" as well.

Then some vocoders will have a formant knob that is pretty cool. It will shift the formants up and down, so the tone of the voice will sound higher or lower, but the pitch will actually stay the same. It will sound more like some kind of filter. And on a voice, that's how you can change the gender.

Now you can emphasize the effect of that formant knob by playing with the settings of those frequency bands.
See here, you can set the range of frequencies that will be scanned by the vocoder. Right now it goes from 80Hz to 10kHz, but if we make that range narrower, we should hear the effect of the formant knob better.

You can also choose the number of bands you want to use, which can drastically change the sound. And you can even set the level of each band if you want to use that as an EQ. And to clean the signal further you also have a gate here, so the sound would be processed only when it is louder thant a certain level.

On this vocoder you also have a setting for the width of each band which is pretty interesting. At 100%, each band start where the other begins. But if you make that narrower, you can hear how the sound gets more "whistly". As there will be less frequencies in each band, it will make it sound closer to a sinewave.
And you can also make this bandwidth higher than a 100%. In this case, each band will overlap with the next one.

So this is how mostly the vowels will be processed, because this will pick up mostly formants, and formants are mostly vowel. So how do we get the rest of the signal ? How do we get the consonants ?

Well for that, you should have a knob or a section called unvoiced. With a knob that will allow you to bring back up all the consonants.

It's the part of the signal that is pitchless, like the sibilances, that will be applied to a layer of white noise. Here the big knob is simply the level of that layer, and the sensibility is a bit like the depth knob, it's the strength of the filter applied for the modulator to the carrier.
This is of course good to make a vocoder voice more intelligible, but it's also fun to use on more rhythmic elements like beatboxing for example.

Now if you want to experiment more on that kind of sound, you can also select noise as the modulator.

In this case you won't need a separate track, as noise will be used instead of a synth, and this noise is generated directly by the effect.

You can then play with this XY pad to change the character of the noise. On the X axis it's some downsampling, I explained what it is in the last video about bitcrushers, and on the Y axis you can change de density of the noise.

You can try this mode on anything, but I find it particularly cool on drum sounds. Especially if you play with the envelope. And if you keep only the high end with the band EQ, you can also use it to enhance the transients of the drums.

In the carrier modes you also have two others options here.

You can use the modulator as it's own carrier. So that will make a resynthetised version of the same sound. That's mostly if you only want to play with all those settings without actually blending the modulator with another sound, which can also be great for sound design.

And you have a pitch tracker mode, which will use an internal oscillator that will attempt to follow the pitch of the modulator. You can use the high And low parameters to tell it where to search for the pitch to track.

Then, instead of changing the carrier, you can also change the modulator, it doesn't need to be a voice. So if you have a synth that have movement in its pitch, you can use it as the modulator to apply this movement to another synth, or to a noise.

Or you could use a very low sine wave as the modulator to mix it with the texture of a rich carrier, to get a gritty bass.

Because remember, the vocoder works with bands. So if the sinewave triggers one band, it's the whole band that will be activated on the carrier. And then you can turn down the depth knob so an even larger band of the carrier can be heard.

These can be a be some nice effects to try there.

AUTO TUNE:

Now the auto-tune is often mistaken for a vocoder, as it can also produce kind of a robotic voice. But the way these effects work are totally different.

The auto-tune effect, unlike the vocoder, doesn't use a carrier and a modulator signals, it uses only the signal of the voice it is affecting.
Also, even if it can affect formants, it doesn't use formants in its core processing.

So how does it works ?

The auto-tune effect is a pitch correction tool. You run a voice through it, it will analyse the frequencies that compose it, and then give you options to correct the pitch of the voice so it matches the notes of a scale you want to use.

Auto-tune is actually the name of a particular effect from the company Antares audio technologies.

(Both Antares and auto tune were created by Andy Hildebrand who was originally a engineer specialised in sismology who worked for oil companies. He developed a method based on auto-correlation that allowed to send accoustic waves in the ground to see if the soils were exploitable to extract oil.
Long story short, he retired. And then one of his wife's friend asked him if he could create a program to make her sing in tune.
It was supposed to be a joke, but then he realised he could use the same algorithms he used in sismology and apply them to music. So he created Antares Audi technologies, and by the end of 1996, he created the auto-tune. And 25 years later, this thing is all over the place. It's used on every major hit song in more or less noticeable ways.)

And this technology is now available in other competitor's software's like Melodyne, Variaudio, wave tune, isotope nectar's pitch correction, V-vocal, voiceworks, etc...

All this story to say that

Actually, I researched how exactly it processes the sound under the hood, but it is still very opaque.
So let's see how to use it instead.

Basically you put auto tune on a track.
You select the key of the song you're working on, so auto tune can correct the pitch of the voice to that scale.

Then you want to select what type of instrument it is processing, so you can have better result.
There's instrument and bass instrument, so you can use is on other things than a voice.

And finally you have the speed at which auto tune will correct the pitch of each note. You can make it sound petty natural if you keep it rather low, but you can also make it sound synthetic if you put it a 0ms. At zero milliseconds, each note will snap instantly to their corrected position, so you will miss the natural transition between each note. That's often what is identified as the "auto tune effect".

If you're looking for a more transparent use of the auto-tune, there are other parameters that are here to help:

- The humanise function will adjust the speed of the effect depending on the speed of the notes, so faster notes should be adjusted faster and vise versa.

- With the flex-tune parameter, notes will be corrected only when they are close enough to the target note. This is to keep the expressiveness of the singer

- The natural vibrato parameter narrows the range of the vibratos the more you turn it up. That is independent of the pitch correction.

Thats a lot of options, all related to pitch to manipulate it.
But see here at the top, you have 2 modes for all of this: classic and formant.

And with formant, you have A throat parameter that is similar to the formant knob we saw in the vocoder.

TALKBOX:

This one will be a lot quicker to explain.
The talkbox is an effect that can't really be replicated with a VST. I mean there are some but... Meh.

So basically, with a talkbox, the sound is sent through a plastic tube.
And this tube goes into your mouth. So the sound will be played in your mouth.

And from there you can articulate words to shape that sound with your mouth.
It's a bit like when you play music on your phone and place it close to your mouth to make some wah wah sounds.

Here it's directly the mouth that will shape formants organically, to physically shape the sound.
It's a very fun effects to use, but I can't really demonstrate it as I don't own one myself.

You can hear a talk box is songs that I'm not going to play, for copyright reasons.
But you can hear it on a guitar solo in the middle of the song Jambi by Tool at 4:10, which is one of the best song in the world
Or in the intro of bon Jovi's "Livin' on a prayer".