Friday 30 August 2019

The Sharpening Curse

I should start this off by saying that there are times when sharpening filters are absolutely standard. Playing with local contrast using an unsharpen mask or clarity tool is a stock part of most digital photo development (baring skin, where the clarity tool is used in the opposite direction to reduce contrast and provide wrinkle suppression) and something like Adobe Lightroom even does an automatic (mild) sharpen on export for printing (in the default configuration).

That said, I welcome anyone to look at freeze frames from any 4K film print and tell me what you see. Watch it in motion and pay attention to any sub-pixel scale elements as they move through the scene. Watch it on a neutrally (professionally) configured screen that's accurately presenting the source input, not a TV that's doing its own mess of sharpening because it's configured for a showroom with everything dialled up to 11. Even if aggressively sharpened (and most films are not), then there is a lack of aliasing thanks in part to the ubiquitous use of an optical low-pass filter in front of the camera sensor during light capture and because an optical sensor is capturing a temporal and spacial integral (light hitting anywhere on the 2D area of each sub-pixel sensor & at any time during the shutter being open is recorded as contributing to the pixel value). Cinematic (offline) rendering simulates these features, even when not aiming for a photo-realistic or mixed (CG with live action) final scene.

When we move to real-time rendering, we're still not that far away from the early rasterisers - constructing a scene where the final result effectively takes a single sample at the centre of each pixel and at a fixed point in time and calculates the colour value. We're missing a low-pass filter (aka a blur or soften filter) and the anti-aliasing effect of temporal and spacial averaging (even when we employ limited tricks to try and simulate them extremely cheaply).

Assassin's Creed III using early TXAA
Assassin's Creed IV with TXAA

Even when using the current temporal solutions to average out and remove some aliasing (and the more expensive techniques like MSAA for added spacial samples, which doesn't work well with deferred rendering so has fallen out of fashion), the end result is still a scene with far fewer samples into the underlying ground truth (or the output you would expect if filming an actual scene with a real camera) than we would like and a tendency for aliasing to occur. When TXAA (an early nVidia temporal solution) was introduced then it sparked a mild backlash from some who wanted a sharper final result, but mainly because they are so used to the over-sharp mess that is the traditional output of real-time rendering. The result has been various engines that use temporal solutions now also offer a sharpening filter as post-process and AMD (& nVidia) are starting to advertise driver-level sharpening filters (as an enhancement to be applied to games for "greater fidelity").

While AMD are talking about their FidelityFX as an answer to nVidia's DLSS AI upscaling (using those Tensor Cores to upscale and smooth based on training on 64xSSAA "ground truth" images for each game - an effect I sometimes like in theory more than I love the final result), DLSS actually removes more high frequency aliasing than adding additional local contrast (it is primarily adding anti-aliasing to a low res aliased frame while also picking up some additional details that the AI infers from the training set). Technically AMD's FidelityFX contains two different branded techniques, one for Upscaling and another for Sharpening, but these two tasks operate in opposite directions (so combining is something to be attempted with extreme care and possibly not without something as complex at AI training to guide it) and the marketing seems to treat them under a single umbrella. Shader upscaling can certainly be better than just the cheapest resize filter you care to run but really, in the current era, I think temporal reconstruction is showing itself to be the MVP now that issues of ghosting and other incorrect contributions are basically fixed (outside of points of very high motion, where we are very forgiving of slight issues - just look at a static screenshot in the middle of what motion blur effects looks like in ~2014 games, but because we only see it as a fleeting streak then we don't notice how bad it can be). Unless DLSS steps up (while AMD and Intel also start shipping GPUs with dedicated hardware acceleration for this computation type), I think we should expect advancing temporal solutions to offer the ideal mix of performance and fidelity.

Edit: As I was writing this, nVidia Research posted this discussion of DLSS research, including: "One of the core challenges of super resolution is preserving details in the image while also maintaining temporal stability from frame to frame. The sharper an image, the more likely you’ll see noise, shimmering, or temporal artifacts in motion." - that's a good statement of intent (hopefully Intel plan to launch their discrete GPUs with acceleration of "AI" - something even a modern phone SoC offers more dedicated PR (and silicon area?) to than current AMD or Intel efforts).

So far we are seeing a lot of optional sharpening effects (optional on PC - I think stuff like The Division actually retained the user-selectable sharpening strength on consoles but not every console release includes complexity beyond a single "brightness" slider) but I'm worrying about the day that you load up a game and start seeing sharpening halos (oh no, not the halos!) and notice additional aliasing that cannot be removed.

A very mild level of sharpening absolutely can have a place (doing so via variable strength that adapts to the scene? ideal!) and is probably integrated into several game post-processing kernels we don't even notice, but a sharpening arms race seems like the opposite of what real-time rendering needs. We are still producing final frames that contain too much aliasing and should continue to lean on the side of generating a softer final image when weighing detail vs aliasing.