Ever since publishing the guide on how to achieve the best possible NVIDIA NVENC quality with FFmpeg 4.3.x and below, people repeatedly ask me what the best possible recording settings are. So today, as a Christmas present, let me answer this question to the best of my knowledge and help all of you achieve a quality you’ve never seen before.
The Basics: Format, Space, Range and Error
As usual in life, nothing is allowed to be simple. And video just so happens to be one of the more complex topics to exist, as it merges many fields of science into one. But that doesn’t mean that you won’t be able to understand the basics of it:
The color format describes how color information is stored, and you might have already heard about NV12, I420, I422, and I444. The four I listed are color formats that store information in different ways, some interleaved, some subsampling. “What’s subsampling?” you ask, and that’s really simple to answer: Human perception is imperfect, and we are far more likely to notice brightness differences than color differences – so storing brightness and color at different sizes is an efficient compression method.
This is what the 444, 422, and 420 part describes, though it’s usually written as X:Y:Z. The X component describes how to resize the luma (brightness) plane, while the Y and Z component describe how to resize the chroma (color) plane. This means that for 4:2:0 at 1920×1080 we have a luma plane of 1920×1080 (the 4: part), and a chroma plane of 960×540 (the 2:0 part). With 4:2:2 we would get a chroma plane of 960×1080 instead. NV12 is just a different way to store 4:2:0.
Note that reducing the subsampling fraction increases the accumulating error, which is something we want to avoid at all cost.
Since you now know about Color Formats, it’s time to learn about color spaces. Color spaces describe how the stored color information can be converted to something that makes sense to a computer. You’ve probable heard of the major three currently in use: Bt.601, Bt.709, and sRGB.
Most content available in the modern day is either Bt.709 or sRGB, while HDR content is Rec.2020 or ACEScc. Since OBS Studio has yet to support HDR, and HDR still hasn’t achieved widespread adoption (due to various reasons), we’ll focus on the SDR ones: Bt.709 and sRGB. “Why not Bt.601?” Because Bt.601 has been replaced by Bt.709 for a long time, and it was only used for compatibility.
Which Color Space you use depends on the content you intend to capture. Bt.709 is used by Movies, Webcams, and a lot of other things, while sRGB is being used by almost every SDR game you can think of. If you want to record PC games or software with the least amount of accumulating error, set it to sRGB. Otherwise set it to Bt.709.
Color range specifies how much “range” the luma and chroma planes have, and should have no effect when the Color Format is RGB. The two settings you will most likely see are “Partial” (TV, Legal, MPEG) and “Full” (PC, Extended, JPEG). When you use the “Partial” setting, luma has a range of 16 to 235, while chroma has a range of 16 to 240. If you use the “Full” setting, both luma and chroma have a range of 0 to 255.
In the modern day there should be no reason to use Partial anymore, as it was primarily used as footroom and headroom during analog and early digital TV. Unfortunately once again we have some software and hardware not implementing the standard fully, so “Partial” is often recommended due to that. However for the best recording quality, and lowest accumulating error, you should set it to “Full”.
You might have noticed that I kept mentioning accumulating error through the description, and that’s for a good reason. Accumulating error is something that happens when you repeatedly convert, compress, decompress, resize, resample or do other mathematical things with information in a limited amount of space. Accumulating error is everyones enemy, especially when trying to be correct with colors. Chances are that you’ve already encountered accumulating error in the wild as color banding from post processing in video games.
Scaling is something that should be avoided, as it removes any possibility of lossless and immediately drops you high quality recording only. If you are forced to use scaling, use Bicubic instead of Lanczos. Lanczos adds artifacts due to its excessive sharpening, and Linear is too smooth for real usage.
Choosing your Recording Quality
Before we actually step into the settings themselves, you first need to choose what quality you want to aim for, and what quality you can actually achieve with your hardware. There are a few quality “levels” that you can aim for, with each one having different requirements, use cases and even effects on video editing software. In the order of highest quality to lowest:
Lossless (True Lossless)
This is the highest possible quality that you can achieve, and is usually described as raw output. Due to the insanely high requirements on hardware and software, it is very rarely supported for video, and almost only supported for single photos or short photo series. Barely any PC will be able to handle true lossless playback, much less recording, at modern framerates and resolutions. File sizes likely will exceed 5 gigabits per second, so recording it requires a super fast SSD on PCI-E.
Effective Lossless (aka Near Lossless)
Effective lossless is a step down from True Lossless, and achieves a similar goal to it: Instead of keeping things perfect, a very tiny margin of error is acceptable, low enough that it will not cause any error in the final output. You might already be familiar with some of the codecs that can achieve this, such as Apple ProRes 4:4:4:4 HQ, H.264 and H.265. It’s very likely that you have already used this before and realized that your hardware might be completely outclassed by the processing requirements of it. The file sizes however are still large, often exceeding 1.5 gigabit per second and still requires a fast SSD.
Visually Lossless (aka Indistinguishable)
A third variant of the lossless kind is visually lossless, which has a higher error threshold than effective lossless. Instead of aiming to be as close to the original footage, we just try to be visually indistinguishable from the original for most algorithms and human perception (and technology based on human perception). This allows the file sizes to shrink to just 300 megabits per second or less, which is already be possible to handle with a normal SSD. You should start here if you have a Turing or Ampere NVIDIA GPU paired with a mid- to high-end Intel or AMD CPU.
In case any of the above aren’t to your liking or impossible, the next step down approaches the area where errors start to be visible to the trained eye, and definitely noticable by algorithms of any kind. File sizes here shrink to 150 megabits per second, while still delivering reasonable overall quality in the footage. In the event that you have an AMD GPU or want to use the integrated GPU in Intel or AMD APUs, this is most likely your best starting point.
As a last ditch effort when everything else fails, this is already in the area where errors are visible. The quality from this setting will be low enough to have significant impacts on video editing tools, and makes your editing life a living nightmare. You can forget about getting any quality out of this, the best you can hope for is footage that doesn’t get taken over by artifacts. You have been warned, it’s your choice if you want to use this.
Setting up the Encoders
Done picking what quality level you want? Then lets move on to encoders! (Split into separate parts due to WordPress editor problems.)