Briegh's
Voice Mastering
Guide
Skywind Voice Mastering
By Nathan Dickson (Briegh)
nathandickson.com
Contents:
(Disclaimer: This guide was written in 2020 and so some of the workflow described may not match the current Skywind workflow)
Step 1 — Bring all files into your DAW
Most likely any DAW will work—they are all fairly capable these days—but I use the latest version of Pro Tools. My approach is to create a different Pro Tools session file and folder for each artist on the theory that all audio files from the same artist will likely need similar if not the same processing, even if done a few months apart. With a new artist, I duplicate an existing session file, give it a new name matching the artist and probably the character they are voicing, then import their audio files into that session. My session files all have one mono track, and one stereo master track. The mono track is panned hard right so that audio only comes out my right speaker. Into that mono track, I drop their files and Pro Tools auto-magically sequences them for me in alphabetical file name order.
Note: Do not strip any content from the beginning of any of the files. Even if there is a minute of silence at the beginning of one of the files, that silence is needed because the sections of audio to be kept have already been time indexed to somewhere within that file and removing any portion of audio beforehand will mess that up.
If you are using headphones to master, I will assume that you are aware of the limitations and will handle things accordingly, perhaps use my existing files as a reference or even audio work in Skyrim or Oblivion in an attempt to match the vibe.
Step 2 — Listen and Analyze
Voice work is being recorded on a volunteer basis, typically in non-studio environments, so there is no built-in standard for signal levels, room treatment, microphone type, microphone placement, recording equipment, EQ, and so forth. Thus, it’s incumbent on the mastering engineer to listen through the audio files and identify issues such as low signal level, unacceptable distortion (which basically means any), recording environments that might be too noisy to repair, whether or not the artist is prone to “eat the microphone” so to speak, whether or not the artist correctly used a pop filter to eliminate or reduce plosives, and so on. As always, experience and practice helps here, but let your ear guide you and be as discerning as possible.
Note that if the audio files are from an artist for whom I have already processed some of their work, I listen to see if they sound the same or if something has changed in their setup.
One thing I look for is a visual indication that one or more of the files is different from the rest, like very low volume with barely discerned audio waves drawn in its rectangle, or very loud compared to the others. Using a level-assessment plug-in, I try to gauge the best representative audio segment and set all the others to within a few decibels in signal level. Ideally, they would all be in the neighborhood of -20 dBFS RMS before any signal processing via plug-ins. Variations of a few decibels are tolerable. Pro Tools gives me the ability to raise or lower the level of individual audio segments and I do that to try to get some consistency throughout the entire chain of segments. I only do this if there is some wide variation in the segments.
Step 3 — Apply signal processing
I run the audio through a chain of plug-ins to try to reduce or eliminate noise, remove rumble and plosives, reduce midrange honk, and tame variations in signal level without harming the audio quality.
1. Noise Reduction
The first plug-in I sometimes use is Z-Noise from Waves Audio. I typically only resort to this if there is persistent hiss or other ambient noise like an HVAC system lying under the vocal. Z-Noise allows me to fingerprint a section of the recording where there is no voice using its Learn feature, so I hunt down the best one or two second sample of that noise and let the plug-in identify it for removal. I never use the Adaptive mode, but I do have it on Smooth instead of normal. It’s been so long that I don’t remember why, but it probably gave me better results at the time. Your mileage may vary if you use Z-Noise.
I set the plug-in to its highest quality and complexity, and use the sliders to eliminate usually between 6 and 15 dB of noise with a sensitivity of between 6 and 24 dB. There are no hard and fast rules except this: do not harm the desired signal. Z-Noise has a Difference feature (bottom right button) where I can hear what is actually being removed or subtracted from the audio and I never go far enough to where I can hear more than the barest hint of voice. I sometimes go that far to make sure I have enough noise, but then I back it off to hear only the hiss or hum or whatever it is that I don’t want poking forward when the voice is silent. It’s okay if there is still a slight bit of noise in the recording.
2. EQ
Pro Tools comes with a nice-sounding and flexible EQ plug-in with many fun controls and modes of operation. I wouldn’t say it’s the best EQ one could pick for all uses, but it’s more than fine for voice work. Whichever EQ you use, you are going to want to focus on a few things:
Remove low frequencies. In the HPF section of EQ III, I am high-passing audio with a slope of 12 db per octave with a threshold at about 120 Hz. I will move that up or down, depending upon the voice—lower if it’s a gravely male Orc or higher if it’s a dainty princess elf. The idea is to keep the most of the voice, but drop any low frequency rumble.
I use a gentle lower shelf in LF set to a Q of 0.5 to gradually roll off the lower end trying to eliminate microphone proximity effect. You can see here I have the gain around -7 dB. This is all to ear and can vary depending upon the voice. Likewise, I choose the frequency based upon the voice. The goal here is to make it so the voice sounds like they are a person four to six feet away from you. Close your eyes and try to picture a person sitting that far away. If there is too much proximity effect, then either lower the gain or raise the frequency threshold or both. You don’t want it to sound like someone is eating your ear, but having a pleasant conversation from a normal distance.
Most peoples’ voices have a bit of midrange honk to them and microphones pick that up. I tend to dip the voice anywhere from 350 Hz to 600 Hz, depending upon where their honk happens to be. I might only dip it a few decibels to several. You can see here that I lowered the honk around 580 Hz by 3.4 dB with a Q of 1.86, which is about half an octave wide. This is all done by ear, not by numbers, so it helps to close your eyes and just listen a lot, then make adjustments, then do that again.
3. Downward Expansion
Pro Tools also has a fairly useful expander plug-in called Dyn3 Expander/Gate. The purpose of using this plug-in is to remove more noise, but do so when there is no voice present. I am using the Look Ahead feature, which helps the plug-in anticipate signal changes, using a conservative -12 dB Range (meaning the noise will be 12 dB lower than it is when there is no voice), a very fast attack of 10 microseconds, an expansion ratio of 2 to 1, a hold of 50 milliseconds, a release of 50 milliseconds, and an activation threshold of -34 dB. I usually have to adjust the threshold depending upon the source material. The goal is to have the expander open up when the voice is there, but shut back down when it is absent. The reason I use an expander instead of a gate is that I want the transition between vocal to not-vocal and back to be short and smooth, not immediate and abrupt.
4. Taming Plosives
So far, there have been very few files without some type of plosive, however subtle. The trick to reducing or even eliminating them is to use a multi-band compressor like Waves C4, but only engage the low frequency trigger and only affect the lower frequencies. You can see here I have Bypass enabled for all but the first compressor on the left. This ensures most of the signal will pass through unmodified. We are only interested in the frequencies below 180 Hz when they suddenly and erroneously become too loud, thus the first Crossover is set there. You can move that to some other frequency based on the voice. I have the range set to remove as much as 16 dB of signal with a fairly fast attack and release: 5 milliseconds and 15 milliseconds. That too is up to adjustment for the voice. I move the threshold slider up or down depending upon how prominent the plosives are. Sometimes I remove none. On this one, I used -32.5 dB. Moving the slider down makes the trigger more sensitive and the compressor squashes more signal.
5. First Pass General Compression
You don’t want to crunch the audio in one go as that can harm the quality of the signal. The solution is to run the signal through multiple compression stages using different compressors.
My first compression stage is a Waves CLA-76 compressor, modeled on Universal Audio’s classic 1176 studio workhorse. It is set to a 4 to 1 ratio, with a gentle attack of about 4 and a fast release. I don’t want to squash things too much, so compression is slow-ish and never more than 4 or 5 decibel reduction on the meter. It’s okay if the signal is still too dynamic coming out because we have two more passes to go. With the CLA-76 you compress more or less by turning the input control. Your compressor might vary.
6. DeEssing
Before we compress the signal again, we need to tame the high end or it will unfairly bias the next compressor in the chain. To do that, I use the Waves Renaissance DeEsser, which I find gives stellar results. My crossover frequency is around 3400 Hz in Split mode, meaning that anything lower than that is left mostly untouched, sort of like with the C4 compressor. It seems like I am being aggressive with a 23 decibel compression range, but I set the threshold so that most sibilance is reduced by between 8 decibels and 16 decibels depending upon how harsh it sounds, with 23 acting as a sort of never more than this cap. As the instructions to the plug-in state: “It’s very easy to over DeEss. If your narrator sounds like someone took out his front teeth, you are probably overdoing it. If you hear the DeEssing effect, that too is usually a sign of over DeEssing. If the audio passage sounds rather natural and free from annoying Sizzle and distorted Esses, then you got it right.”
7. Second Pass of General Compression
I then pass the signal through a Waves CLA-3A, which is modeled on the classic Universal Audio Teletronix LA-3A leveling amplifier. I have mine in Limiter mode because I am trying to get slightly more aggressive with the signal after DeEssing.
NOTE: Almost never use analog simulation in your digital audio plug-ins. All it does is add artificial hiss and hiss is usually the enemy unless you *want* hiss for something. We do not want it here, so Analog is off.
Using the meter in gain reduction mode, we are shooting for between 3 and 6 decibels of signal dynamics reduction. So if it bounces around much higher or much lower than that, adjust the Peak Reduction dial, then make up the difference with the Gain knob.
8. Third and Last Pass of General Compression
Next, the signal goes through Waves’ Vocal Rider to smooth out any remaining dynamics beyond a handful of decibels. I have it set to a fairly narrow range on the left because I want the dynamics to have only a small window from loudest to quietest. Remember that this voice must be heard above in-game music and often sound effects. Whispers need to be much louder and screams need to be far less in your face. It needs to sound like the heavily-processed voices from Oblivion or Skyrim.
I am using the Slow setting, with Music sensitivity dialed down and Vocal sensitivity set in the middle. The target is -20 dBFS. When it runs, the center Rider control will auto-magically go up and down to compensate for loud or quiet voice parts, trying to pull them back within a narrow range, but only within the tolerances I have set up on the left, which is no more than 5 dB in either direction. I find this works for pretty much everything in Skywind.
9. Safety Brickwall Limiting and RMS Lock-In
The last signal manipulation is done using Waves L2 Ultramaximizer. I am using this plug-in for three things:
I set the Out Ceiling to -1 dB to eliminate most or all wave clipping on output.
I set the Threshold so that the effective RMS of the voice is generally around -20 dBFS RMS. I monitor that in a follow-up plug-in.
I Quantize to 16-bits with Type 1 Dither and Normal Noise Shaping. We need nothing fancy here.
I use the built-in ARC for release time, because it works just fine.
10. Output Monitoring
I am using the Izotope InSight plug-in to examine the resulting audio’s signal levels. I want to see it bouncing close to –20 dBFS RMS in the bottom graph. You can see here that I am at -19.5 in the Integrated (LUFS) section which is fine. You can also see that I have squashed the dynamics down to a narrow 4.6 decibels of loudness range. This would be terrible for a heartfelt musical performance but it is perfect for a game voice.
11. Print to Disk
I export or print to disk the results of the right channel, one audio segment at a time, as a mono WAV (16-bit, 44100Hz) file since everything is panned to the right speaker. I keep the same name, but add something to the end to signify that this is a mastered version. I also tack on a version number just in case I have to make multiple passes due to feedback from later in the work process.
Conclusion
I hope you found some of this helpful for your voice mastering in Skywind. If you have some great ideas of your own, definitely use them. The goal is the final product, not necessarily how we get there. If you have any questions, please ask in Discord. I can’t promise to get back within a few hours, but I will look in there occasionally to see if anything is pending.