Ever since the day that we’ve been able to push sample rate higher than 44.1kHz, this question has appeared: What is the best sample rate for Audio, and can you actually hear the difference between 48kHz and 96kHz (or higher) sample rates?
Before we get into this, note that I am not an audio engineer, or a scientist. I am a software developer, who is often too curious for his own good, resulting in weird new projects – like StreamFX. So take this with a grain of salt, and if you know better, do feel free to contact me!
This test is based on my own frequency visualizer tool available here. All tests are based on the humanly audible range of 20 Hz to 20kHz (for a healthy young adult). These tests do not take into account inaccuracies caused by the physical properties of the D/A or A/D converters, speakers and microphones.
Update: Information for supersampling D/A and A/D converters has been added to the entry. Please see the conclusion page for more information.
When does 48kHz run into problems?
The most common case is the conversion from Digital to Analog – without it we would not be able to hear any audio at all. Let’s take a look at a few common frequencies: 32 Hz, 64 Hz, 128 Hz, 256 Hz, 512 Hz, 1024 Hz, 2048 Hz, 4096 Hz, 8192 Hz, and finally 16384 Hz.
Looking at the generated graphs, we can immediately tell that anything equal to or below 2048 Hz will be perfectly fine on 48kHz. We can also tell that somewhere between 2048 and 4096 Hz we will start seeing slight artifacts, and that everything above that unknown value will have ever stronger artifacts – In fact we can see the strong artifacts appear on 8192 Hz already.
And at 16384 Hz we might as well just throw in the towel as there is basically no way to create the original wave with current hardware. Even the most accurate DAC will struggle to recreate the wave properly, and overshoot and undershoot constantly, corrupting the wave past recovery. While it is possible to work around the issue, it won’t be gone.
Knowing this we can tell that for the majority of audible frequencies, we’ll be safe with 48kHz. But so far we’ve only looked at 2^n frequencies – what about other frequencies that end up in a pretty bad shape at this sample rate?
The Broken Frequencies in 48kHz
Since we can safely say that any frequency up to 4096 kHz works “fine”, let’s take a look at the frequencies above them. For example, how about we look at integer divisions of 48 kHz and variations of them, such as 19.2 kHz, 16 kHz, 12 kHz, 9.6 kHz, 8 kHz, 6 kHz and 4.8 kHz.
At 19.2 kHz and 16 kHz, we have by far the worst artifacts. It’s not even possible to call these waves anymore, they are just random noise now. Not much of the original wave is left, but we can still guess that it used to be a wave of some type. In the second sample which is slightly offset by time, we can see even worse effects for both frequencies.
Continuing on with 12kHz and 9.6kHz, we can see similar results depending on just how the time offset is adjusted. However good filtering algorithms might be able to still make out that these used to be waves – the velocity of the waveform could be used to recreate a proper wave for the frequency that we are trying to reproduce.
With 8kHz and all frequencies below that, we’ve approached the area where the artifacts become so small that we can filter them out at minimal loss. Knowing this we can infer that all smaller frequencies that this will perform fine, given good filtering.
So the question then is, what sample rate is enough to fix the majority of artifacts?
Which Samplerate avoids the artifacts?
In the best case possible, we would want to accurately reproduce every frequency between 20 Hz and 20 kHz. This is however just not feasible with current technology at a reasonable price point. That means we’ll have to do with what we already have: 96 kHz and 192 kHz. Let’s look at both of them.
In the graphs for 96 kHz we can clearly see an improvement compared to 48 kHz, as it almost eliminates all the artifacts for these frequencies. At 96 kHz we are safe when it comes to human speech and most instruments. Some artifacts are still left, but for the most common use cases, 96 kHz is enough.
At 192 kHz we can see all the remaining artifacts effectively disappear completely. Even 19.2 kHz looks like a proper wave and likely will not need any complex filtering to be detected correctly. This would be the ideal sample rate for instruments such as cymbals and bells, but would be massively overkill for vocals.
Solving the Question(s)
Is 48 kHz enough?
This depends, but the short answer is no. There is a significant audio processing overhead required to make 48 kHz be able to sound like what you would achieve with 96 kHz or higher. If you can confidently say that everything in your audio production pipeline is doing the necessary processing for 48 kHz playback, then you can set your playback frequency to 48 kHz.
Will switching to 96 kHz (or higher) fix the problems?
Yes, absolutely. While they won’t be gone completely, they will be reduced to the point that they won’t matter anymore, which is especially important for audio recording from real world instruments and vocals. A studio performance captured at 192 kHz sample rate will sound much different compared to one captured at 48 kHz.
What sample rate should I pick?
This depends on what you actually want to do:
- If you only intend to capture game audio with nothing else, then 48 kHz will be perfectly fine, as most games mix their audio to 48 kHz or even 44.1 kHz.
- For human voices, such as commentary and singing, you will want to switch to 96 kHz. This covers the majority of frequencies that humans can produce, and also covers a large amount of instruments as well.
- Lastly there are some instruments that don’t sound good at 96 kHz, for which 192 kHz is required, for example cymbals and bells.
However there is a problem with this. If your pipeline involves a naive downsampler, which is common in many popular media production software such as streaming apps, you actually gain none of the benefits of the higher sampling rate. In the worst case this can even cause new artifacts to appear.
What is the correct way to downsample?
This is the hard part, and I have no real answer for it. A reduced sample rate simply cannot cover all the frequencies that higher sample rates can, and even the best downsampling and filtering and only do so much and will struggle with certain frequencies where artifacts are simply unavoidable.
The majority of the frequencies above 9.6 kHz are problematic at 48 kHz, and simply can’t be represented correctly. For example the 19.2 kHz frequency is just nearly impossible to accurately represent, but is fine at 96 kHz.
What about supersampling D/A and A/D converters?
Higher priced audio devices have started using supersampling D/A and A/D converters, which usually have a data resolution of 48, 96 or 192 kHz, and an internal resolution in the mHz area. Since these are usually not listed in the spec sheet, it is impossible to tell if you have one or not without an oscilloscope.
Their quality is defined by their resampling algorithm, and high quality resampling algorithms can make 48 kHz sound nearly indistinguishable compared to 96 kHz, at least for the majority of frequencies. If you can confidently say that you have one of these, then you will be “fine” at 48 kHz sampling rate – the majority of audio frequencies will be reproduced with only minor artifacts.
So there you have it, the answer to the age old question: “Is 48 kHz enough?” – and the answer to it is “No”. The minimum necessary to accurately reproduce most real world audio is 96 kHz, and some things even need 192 kHz or higher to be correctly reproduced.
And thanks to technological advances, we might in the future see 96 kHz become the new “X is enough”. Chips have gotten smaller and more efficient, audio capture/playback devices have gotten better at audio, and even our mobile phones are starting to jump onto higher samplerates.
With all that said, there isn’t anything left to talk about. If you think I made a mistake, or just know better, do feel free to contact me.