Friday 31 December 2021

Games of the Year 2021

With my GPU going well past its fifth birthday (currently sitting next to a CPU that's a bit newer than that but still no longer exactly new), this was a great year for really focusing on older games and those 100 hour RPGs that seem to build up in the backlog. Because PC games will always look better next year, including the ones released years ago (thanks to nice 4K screens and the endless work of tweakers and modders to push what older engines can do). But, despite putting a few games down in anticipation of when the GPU market finally returns to normal prices and stock availability in the future, I did manage to complete some new games too.

    Beavers of the Year:

Timberborn

This little city builder game about beavers building up a post-human society while preparing for seasonal droughts has been eating up my time this year. In Early Access, plenty of things are going to change before it is completed but there's already something here worth diving into, starting from the question "what if beaver Banished?" The answer is both significantly more adorable and possibly going to end up being mechanically more interesting than that 2014 indie hit. You know the drill: place blueprints, assign jobs, and make sure the needs of your critters are satisfied so that you can continue on into the future. Do it right and you can grow a small community into a thriving town. Mess around and, like Cities Skylines, find out just how water simulation can ruin your day.


This game had public demos early in the year then went into full Early Access a few months ago, with the first major update to the stable branch hitting a few weeks ago and a public experimental branch for all owners if you can't wait for those changes to percolate down. The development process currently needs to figure out a few teething issues with later colony management (how do I efficiently get resources moved around storage zones and prevent all these beavers from doing lots of inefficient work rather than smartly managing jobs so haulers wait to move logs from the active chopping areas to where they are needed) but you'll take a dozen or more hours to get to the point where this is holding you back. Before that you already have two family types who have interesting tech trees to develop down and slowly master the various scenario locations by building plenty of dams and vertically arranging your cities.

    Civ of the Year:

Humankind

I say I love Civilization (specifically that sub-genre of historical non-space 4X game) because I've played hundreds (thousands?) of hours of those games and loved my time with them. Civ 2 is very hard to go back to and the undifferentiated civs somewhat flattens it (just like going back to an RTS when everything was mirror matches rather than imbuing factions with mechanical character). I find Civ 4 is starting to get hard to go back to because so much has moved to hexes now and the death stacks plus AI spiral of hatred makes the latter game less than amazing. Civ 5 was my first "wait, do I like the last one more" but probably I'd say that despite the happiness mechanic brutally punishing expansion, it's probably the game I'd play the most today. Civ 6 feels like it tried to fix 5 and failed, even if merging units to create semi-stacks kinda works and the other changes help move it away from constantly butting against constraints (I have not played the expansions - I bounced off it that hard; especially when they sold a New Frontier Pass or Anthology Edition because they wanted even more paid DLC after the two full expansions shipped).

But before Civ 6 was introducing districts to expand your city on the actual map (despite still only allowing you to grow into the classic 3 tile range of your centre and not allowing more than one of any district type - making them little more than added costs before you could build up the specialist building stack in any city), a little game called Endless Legend was already doing districts. It was a fantasy 4X game which is definitely not a Civ-like: important hero units, customising equipment like a space 4X (MOO style), and turning quests into far more of an RPG layer to name but a few features. That team is now back with their own actual Civ-like and I found it excellent.


Humankind in some respects resembles Civ 6, although this is certainly not a clone (especially, as noted above, some of those Civ 6 similarities come from a previous game from this team doing their own spin on districts etc before Civ did). Another way to consider it is that this is a look at what an alternative evolution of Civ 4 might look like today. The path not taken by Firaxis. Unit stacks exist but are limited and work very differently as combat resolves in a mini turn-based battle on the hexes around where combat started (including options for reinforcements if any other stacks are close). Districts grow a city (bounded by population and local happiness) but are not unique or a storage block for other buildings but those cities are places on pre-defined geographic areas they control and automatically work by proximity not a population placement mechanic. Each area can be joined to a city to form larger blocks or put under provisional control via a cheap outpost where provisional control does not enforce hard borders so skirmishes happen far more often. The list goes on and on but every inch of this game feels Civ-like (as each actual Civ sequel may change any given mechanic but still retains the feel) - just not a Civ you've ever played before.

What feels so incredibly fresh about Humankind is the way you can actually have a bit of friction with other Civs without it becoming a diplomatic nightmare or some endless war that builds and explodes in the later game. Endless Legend modelled this as earlier encounters being flagged as in the age of skirmishes - when border conflicts were normal but not formally war between nations - before you developed formal relations technologies. The use of outposts to cheaply control land with only what armies you can muster to enforce a demand for a border there do much the same here. The casus belli mechanics here also feel a lot more like a Paradox title (in fact, I'd love an expansion that moved even further into Paradox war mechanics with preparing frontlines and supply lines when formal war does break out) than how it worked in Civ 6. You also have a pre-city era where you scout the map and hunt wild animals for resources before you establish yourself, so everyone starts out with far more local awareness than a Civ race to found.

But what I've also done here is buried the lede: each era (which you race towards via getting stars in various categories rather than just racing on scientific research) you get to pick a new civ which combines an eternal perk that lasts into the future with era-specific perks that make it a lot like a modern Civ game but you're not trapped chasing the win condition based on the perks you picked before the game begins. If you need to get a boost to science or your economy or are about to start a large war then you can tailor your perks to that, assuming your pick isn't already taken by someone else getting to the era first and taking them. There are also some catch-up mechanics to give slower players a bit of assistance despite having their choices limited, although I hear this works less well in an online game against other actual players (I primarily play my Civ games solitaire). This is a feature I will find it hard to go back to playing without in other games. But with Humankind and Old World showing there is more to Civ-likes than just Firaxis games, this sub-genre feels healthier than ever before.


    Time Loop of the Year:

The Forgotten City

A proven idea prototyped out as a finished game mod (here for Skyrim) then supported via various loans and grants for the arts so it can become a full commercial release (here moving to Unreal Engine)? This is a great example of where things are working the way we say they should from when it used to be a bit more common to pay attention to mods (and the free SDKs that enabled them were more prominent in AAA gaming). It may have taken five years to make but this small team have crafted something that feels both entirely itself and also the sort of thing you'd make if you prototyped everything inside Skyrim, right down to how the conversation camera zooms in to show who you're talking to (without worrying about the budget for a cinematic conversation camera system).


You are a modern day traveller who has awoken on the banks of a river as a mysterious figure asks you to explore the surroundings and find a missing stranger who just recently walked past. In very short order, you are flung back to a tiny Roman civilization trapped in an underground city and cursed by the Gods to ask what is the nature of sin, for breaking the rules will unleash an armageddon destroying everyone there. But the current leader has done a side deal with one of the deities and when that final day comes, they can rewind time back to the previous morning. This is where you come in: you must talk to everyone here, find out who is going to break the rules, and stop them. If you fail, run for the portal that brought you in and the day will restart.

What follows is an engaging adventure game that's almost entirely about talking to characters, working out what's actually going on, and making sure everyone does what you want them to on this loop or making sure next loop you have in your pocket the thing you're currently missing (theft is definitely a sin here, but what's one more iteration of the loop if you really need to pickpocket that trinket and know you can make it back to the portal easily from here). Where does the writing land? If this had come out when I was an undergrad, it would probably have topped my GotY list. There is also plenty of smart game design around making sure you rarely have to do things multiple times, because someone will give you a hand if you tell them what you did last time so they can effectively advance the loop how you need it without repetition. There is also about an hour of combat through what I found a very interesting side story, which is entirely optional and warns you that it can get a bit scary and involves lots of aiming (presumably finding that some people found the mod was otherwise fun without this chunk that mixes up the gameplay significantly). Everything here feels polished by having tried it all before during the mod prototype.

    Halo of the Year:

Outer Wilds: Echoes of the Eye

My game of the year 2019 finally got some DLC that expands that solar system with another celestial body while explaining another facet of the history of that mysterious universe? Obviously, sign me up for another trip to Timber Hearth.


What you get, after discovering how something you could never see before has been hiding in plain sight, is a miniature halo world which, like the rest of the game, runs on its own clockwork timer as you count down to the supernova. If you've played the main game then you know roughly what to expect with lots of discovery and only the computer in your ship tracking what you've found between loops. What you don't expect, unless you've seen the trailer for this DLC, is the significantly more immediate horror elements (rather than the pure existential dread of spelunking between the ruins of a long-dead race in the original) that creep into this world and then creep up on you as you explore into the second new area. I will keep it vague to avoid spoiling any of the major reveals because this game and DLC is all about discovery but I will say that the remixed and expanded mechanics on offer here keep things fresh and I couldn't wait to find out where the story of this particular halo was going. My only complaint is that this DLC integrates fully into the existing game without giving a real narrative close to things with a bespoke ending (how the story concludes didn't have the finality I wanted and the small addition to the main ending doesn't get me to where I wanted to get). Given the main game provides several early endings that provided alternative firm closes to various avenues, I was surprised to not see one more ending available here (unless I just missed the signs and couldn't work out how to unlock it).

    Survivor of the Year:

Subnautica: Below Zero

I came to this series a bit late (given it had an extensive Early Access period 2014-2018) and thoroughly enjoyed my time descending into the depths with Subnautica. Combining a single designed underwater world (no procgen or other randomisation more typical of this crafting survival sub-genre) with light narrative hooks (hints of Alien, but not aping that aesthetic at all) and a good mechanical progression through the various survival and habitat building paths, you get just enough discovery and crafted events in the original to feel a bit special. Explore the world, work out what's going on, then work out how you're going to solve it. Build then upgrade the submersible to enable you to travel deeper, which will unlock various parts of the story and access to resources you need to build the next tier of upgrades or a new thing. By the time they stopped patching it, everything was a bit slick and the modding community had even given you a few more options (from new crafting options to an entire map system that filled in the areas you'd travelled to rather than relying on just the navigational beacons and compass).


Below Zero is the standalone expansion that takes place on an entirely new area with significantly more above-water area and a lot more story. There are actual characters you will encounter and talk with along with a lot more story to discover about the planet the previous character was stranded on and what happened both before anyone arrived and after they left. A lot of the tech tree is back and lightly tweaked with new recipes to match the different resources available in this area. The visuals push things up to show off more land, demonstrate the colder climate and more varied weather, and make sure the different biomes always look their best. While billed as not a full sequel, I took about 20 hours to play through Below Zero (about the same as the original game) and cannot imagine playing an eventual sequel without knowing the story of this (which seems far more like the setup for that sequel, especially given what this one says about what happened after the original game). I will say that I did not hugely care for the new vehicles introduced in this but then my old Prawn was something I'd mastered in the original and took me through this handily. There is just enough of an edge of the horror part of survival horror (which the crafting survival sub-genre is linked to) in this series to keep you on your toes, although it is definitely not a combat game and most of it can be played at quite a leisurely pace, full of vistas that generate awe with only brief interludes of panicked fleeing. Putting this game right next to a section on Outer Wilds definitely gets my synapses firing, even if I'd not say they exactly shared a genre (but the Immersive Sim DNA is evident in both).

    Sequel of the Year:

Psychonauts 2

In the years since Psychonauts released in 2005, the abandoned result of a Microsoft publishing deal gone south, lots of people have discovered and loved it. I'm about 60% with them. The other 40% is that this was never a game that was good to play, even before we'd totally standardised joypad action-platforming controls, and something about the actual technical art is just supremely disappointing. You can kinda see what they wanted to do but couldn't, be that from budgetary restraints, trying to get a project to the finish line after being dropped by the publisher, or just tech limitations with the studio dev pipeline at the time.


In a supremely good turn of fortune, the sequel that was originally crowdfunded but later finished off with the backing of now-studio-owner Microsoft (funny how the circle turns) looks exactly how you'd want it to in 2021. What's more, it generally plays well too. I've never been a huge fan of the 3D platformer and related genres but sometimes the quality of the writing and interesting varied gameplay and visuals will keep me going and this is the perfect example of having more than enough ideas to sustain the duration of the game then also sticking the landing with actually implementing those ideas into something that plays well. No 60% agree on this one - it's a 100% banger!

We are getting to the place where "of course it looks like an animated movie" is starting to seep into our expectations but that shouldn't take away from how well Psychonauts 2 manages to capture some of the style behind the first game while actually making it look really good. Unreal Engine is once again doing a very solid job rendering everything in pop-out vibrant colours that the artists aimed for while also giving a cohesive set of effects to ground the various elements no matter how fantastical their inspiration. Even looking at it from a purely technical perspective, it's doing most everything right. Wrap it up in the narrative chops the studio is well known for (reaching into the minds of the various characters presented), shake with a few catchy musical numbers, and serve.

    RTS of the Year:

Age of Empires IV

Ensemble Studios made Age of Empires II in 1999. Over 20 years later, made by Relic Entertainment, this is effectively Age of Empires II-2. Everything you remember (possibly refreshed by the recent remaster of that earlier game) is basically here with a few new spins and a lot of new flourishes around the edge. The several faction campaigns (which I hope will be expended over time with campaigns for the remaining factions and even some new factions added to the game) draw you through some lavishly produced documentary videos explaining the history of the time and then dumping you into a short scenario map that allows you to have a bit of fun on something that approximates what the documentary was talking about, including with additional VO narrating the events as they occur in the scenario itself. It's a good binding layer that ties the RTS together while also offering little bites of historical facts about the actual events from primarily Northern Eurasian history.


We do not get many classic RTSs and this is as classic as you can possibly get. It feels like how you remember those old games looking and playing, while actually bringing them into the modern era (unfortunately this means the actual game looks a lot more like a nicer version of a 20 year old classic rather than the initial menu and interstitial loading screens, which have an amazing gold-lined style that could look really good if implemented into the game itself). You also get a bit of variation with the faction designs moving forward and offering something slightly different when it comes to things like a nomadic faction who can pack up every building and who are meant to slowly deplete rock outcrops rather than rapidly mining them with many workers. It never feels like you'd be better off just going back to the AoE2 remake from a few years ago.

AoE4 also ensures that new players who are brought into this series via the ease of GamePass will not feel completely lost. I've been obsessively playing RTSs since Dune 2 in 1992 or maybe even Mega-Lo-Mania in 1991 (depending where you call the origin of the current RTS design) so some of the tutorials are not really something I can fully judge. But giving them a quick look over, and how the first couple of faction campaigns operate as elaborate tutorials for most of the core mechanics shared by all factions, it all seems like the sort of onboarding that will ensure someone isn't lost. If anyone wants to take it further, the game quickly highlights online multiplayer and practicing for that via AI skirmishes. But even if you just go through the campaigns, this isn't light on content.


    Rest of the Year:
Forza Horizon 5 - The sequel to Forza Horizon 4 is exactly what you expect the sequel to that game to be. Some of the visuals are definitely a step up thanks to being a cross-gen title for a new console generation but the underlying engine is still having increasing issues keeping foliage shimmer and other sources of aliasing under control when it only has MSAA (which is way too expensive if you force transparency MSAA in the drivers on PC to be a realistic option on my current system) to work with. Imagine what this would look like with a DLSS/XeSS patch on PC to clean everything up while also reducing the internal resolution so that it can run at even higher framerates consistently. They also clearly tweaked AI difficulty somewhat, especially in dirt events and it seems even more "win by a mile or never even have a chance because you start at the back of the pack" than before (all assists off, top difficulty) so maybe that could do with a few more iterations (before the expiring car licenses permanently delists this and any accumulated content updates or DLC in a few years).

Outriders - Some have called this a B game or ripped out of the PS360 generation but with a large online endgame, modern look, and responsive gameplay, this felt completely current to me. The disconnect may simply be that the narrative doesn't aim for the Sony house style or where the Call of Duty crowd has ended up. But B game is certainly not an accurate assessment of the assets on show, which are right up there with other AAA releases from big publishers (everything about this is a step up from when this team made a Gears of War title). From a technical perspective, I do not understand calling this "budget" while remaining absolutely silent on how Metroid Dread is priced as a AAA game but competes almost entirely with $20 indie titles (which look no more constrained by budget). Outriders is a good loot shooter with plenty of optional missions to give you reasons to return to the quite linear path through their interesting world. My main gripe is the tone never quite settles down and this comes to a crescendo near the end when they choose to place some combat encounters in an abandoned concentration camp (which would have been a lot more effective as a hauntingly silent walk).

The Riftbreaker - Part base builder, part top-down action shooter. This is one of those little hits that bubbles out of nowhere and possibly will not be remembered by that many in a few years but was fun while it lasted. The use of persistent bases that you move between, with different environmental hazards in each region and a slowly explored tech tree, creates a good campaign flow that feels unlike a traditional RTS but also not just a tower defence level-based game. Layer the (chatty - with generally enjoyable VO) mech suit you pilot on top as a super-unit in a game otherwise devoid of controllable units while infested by a lot of hostile critters and waves of attacks - it's both quite frantic and something where you feel you can usually come back as long as you put your attention in the right place. We haven't had a lot of RTS games, classic or slightly weird variants, for a while so this was something to savour.

Sable - This really wowed quite a few people but I have to say it didn't hit me nearly as hard. The visuals reflect their influence well but something about the aliased edges really rubbed me the wrong way about how to digitise the original art style. To the point I injected FXAA and fixed it for myself (something the sparse graphics options do not offer). It's very much an open world game that's more about the act of traversal (both climbing and by customisable ship) than much of the actual fetch quests you get during your travels. What the story said about finding your place in the world: that bit was a big miss for me.

Exo One - This is just an incredibly visually lush experience. You control a probe, built to specifications beamed into the solar system, and have the power to increase your local gravity tenfold or reshape into a disc that glides on the breeze. Travel through several planetoids as you try to make sense of what happened after taking control of the probe but really this is a game about vibes and enjoying the act of traversal.

Next Space Rebels - I didn't see this bothering too many people's lists but I did want to just make a note of it because Kerbal Space Program but with a full narrative wrapper (around YouTube toy rocket stars and dark shadowy internet land-grabs, all done with FMV) isn't something you see every month. The 2D rocket designer never quite matched the precision of KSP and neither did the actual flight controls but at least people are trying to make their own spin on the formula. More of this sort of thing.

Myst (2021) - Name another game that has, for a single game world - so direct remasters/rebuilds only not stealth sequels or other offshoots - been rendered both by offline render (1993) + real-time (realMyst onward) during different iterations and has used their own internal 3D engine (realMyst using Plasma), Unity (realMyst: Masterpiece Edition), and Unreal Engine (2021). This 2021 rebuild of the classic 1993 game (based on the work done last year for a VR port) completely remakes everything once more and clearly eclipses the original offline renders in every single way. Is it the best adventure game for modern tastes? Not really but if you half-remember most of the puzzles and haven't played one of these 3D remakes in almost twenty years then it's quite fun to go back.

Twelve Minutes - I didn't hate this nearly as much as the eventual critical consensus but I also went in after the discourse had said it doesn't stick the landing so buyer (and pre-release hype believer) beware. I quite liked the VO performances, felt the eventual plot twist was gratuitous but no worse than what many reach for looking for shock value, and enjoyed working out the path through the loop - even if I possibly didn't find them all. (Why was Dafoe playing two different roles in an identical voice? Was that ever explained?)

Unpacking - This hit a lot of people's lists but was a bit too slight for me to rank it in my top games. A short sweet tale of environmental storytelling you can finish off in a single sitting.


    Waiting for a PS5 or New GPU:
Everything new in VR - The Valve Index deserves it; Scarlet Nexus - Something about how the PC port runs isn't quite right but hopefully a patch, mods, or brute force GPU power can fix it; The Medium - Beautiful survival horror slash adventure game? Can't wait; Marvel's Guardians of the Galaxy - I didn't mind the story part of "the bad" Eidos Marvel game last year so looking forward to a universally liked one; Ratchet & Clank: Rift Apart - PS5 exclusive is PS5 exclusive; Returnal - Run-based action shooter that dials the particles up to 11? Ok; Halo Infinite - A year of post-release patches and a new GPU should make this sing (please add DLSS/XeSS because the current TAA upscale is… definitely something); Deathloop - DXR visuals are worth waiting for, along with the online being reinvigorated by what I expect will be a big Xbox and GamePass release in a year; Resident Evil Village - RE7 was very good in VR and on PC but given the lack of the former here, I'm sat waiting for them to fully fix that completely broken PC port; Kena: Bridge of Spirits - Lovely animation and a consistent art design (worth playing looking its best); The Ascent - Top-down ARPG fun but a bit too heavy for my current GPU.

    To Play Next Year:
Even in another quiet year, some titles I just didn't find time to play. Luckily they will still be there next year, along with everything else in my backlog. The Artful Escape, Shin Megami Tensei V, Far Cry 6, Tales of Arise, Inscryption, Life Is Strange: True Colors, Dark Pictures Anthology: House of Ashes, Hitman 3, Lost Judgment, NEO: The World Ends with You, The Gunk, Oddworld: Soulstorm, Biomutant, Recompile.

Monday 30 August 2021

Intel XeSS: Joining nVidia in Tensor-Accelerated TAAU

Back in May I wrote about the evolution of per-pixel rendering costs, expecting the imminent announcement of AMD's next generation temporal upscaling technique, offering a competitor to DLSS 2.x that would run on hardware from multiple vendors and even provide a fully open source option to inspect or even improve upon (if it wasn't a perfect match to one vendor's underlying hardware) the offered technology. That ended up not happening and FidelityFX Super Resolution, while an interesting alternative to more basic spatial upsampling, didn't quite match my hopes for some real competition to nVidia's RTX-only DLSS.

I had started a draft post on implementing FidelityFX Super Resolution into your own engine but really, I'm not sure how much it adds. If you want to run a more expensive upscale to retain far more sharpness than bilinear (so not something where you're going to be also doing a blur afterwards) or you're already doing an expensive sharpening pass like FidelityFX CAS after an (optional) upscale pass then you absolutely should drop in FidelityFX Super Resolution any place where you'd otherwise be thinking about the value of Lanczos (because that's roughly what it is). As others have noted by now, this is already a choice players make because modern GPUs (when not doing upscale on the output monitor) implement this when setting the internal resolution lower than the output/native resolution of your system - I've often been quite happy running AAA titles at 1800p for a 4K screen (as long as the anti-aliasing was good) and FSR is an enhancement on that path (increasing quality with the option to composite the UI and any pixel-scale noise, like a film grain effect, at native res after the upscale).

Intel XeSS

What has recently re-energised my interest in upscaling techniques is the Intel Architecture Day announcement of XeSS. A next generation temporal upscaling technique, offering a competitor to DLSS 2.x that will run on hardware from multiple vendors and even provide a fully open source option (at some as yet unknown future date). So I had vaguely the right timescale for an announcement but had bet on the wrong non-nVidia GPU company making it.

XeSS outlined
DLSS 2.x outlined

We do not have full access to XeSS so for now we only have a rough roadmap of releases starting with the initial SDK for use with their Arc series of GPUs (hardware that will not become available until early 2022). The design of the new Arc (Xe-HPG) series goes hard on matrix (Tensor) accelerators and so it is a natural fit to offer something broadly comparable to DLSS, which is accelerated by these AI/Tensor cores. Intel is actually investing even more of their GPU into matrix acceleration than nVidia, so expect a major push to ensure software supports XeSS rather than leaving that silicon idle when running the latest AAA releases.

From the outline we have been provided by Intel, it is easy to see that beyond the similar hardware being tapped to run deep learning algorithms, the inputs are also very similar to nVidia's DLSS 2.x. We have a jittered low resolution input frame along with motion vectors noting the velocity of each pixel and a history buffer of previous frames from which to extract information (which, even when showing a totally static scene, provides additional information thanks to the moving jitter pattern). The only additional information nVidia are explicit about collecting with their API is an exposure value (although the current SDK, 2.2.1, has added an auto-exposure function since these nVidia slides were published) and the depth buffer (which Intel may implicitly include as part of the complete input frame).

Intel in comments to the press have discussed the possibility of the industry converging to a common standard for DL upscaling APIs, allowing almost drop-in dll swaps to make it trivial to support various alternatives. The way this is talked about as a future development means it is unlikely that the initial release of XeSS will be a drop-in dll replacement for DLSS 2.x (using identically named functions/entry-points and settings ranges). Although it remains to be seen how difficult it may be for ingenious hackers to work out how to bridge the differences and allow current DLSS titles to run a bootleg XeSS mode under the hood in the future (of course, not condoned by Intel itself).

DLSS time savings
XeSS time savings
DLSS savings scaling

This brings us to a major point of differentiation (vs nVidia) and something very exciting to various users stuck with our current supply-constrained GPU market (which will not improve sufficiently to allow everyone to upgrade to an RTX card even by late next year): XeSS will provide a fallback mode that runs (be it somewhat slower) on GPUs without hardware (XMX) matrix acceleration. Added to nVidia for Pascal (Series 10), AMD for Vega, and Intel for Xe-LP on Tiger/Rocket Lake (11th Gen Core processors) there are some AI acceleration instructions for Int8 operations (DP4a) that can provide quadruple the throughput for dot products on packed Int8 values in comparison to 32-bit operations - this is effectively a mid-ground between trying to run AI workloads as generic shaders and getting the full acceleration of dedicated Tensor units.

With Intel so invested in matrix acceleration, it becomes more evident that AMD are being left behind - even mobile chips ship with limited amounts of this form of hardware acceleration (as I noted in 2019) - so this fallback is providing a vital half-step (which should more than pay for itself with the reduction in rendering cost of a lower resolution input image with no need for antialiasing). This also applies to the current consoles, which notably didn't get left behind on ray tracing acceleration but are starting to look down a long generational window without hardware matrix acceleration. The Xbox Series of consoles offers something equivalent to DP4a via DirectML (and Microsoft have said they are working on their own DL upscaling technique for use on those consoles in the future) but we don't yet know if Sony have an answer for the PS5.

In interviews it sounds like Intel are, at least initially, reserving the XMX path for their own Arc GPUs (despite nVidia RTX cards having equivalent matrix acceleration) so it will be a case of DLSS only on RTX going up against XeSS XMX (fast) only on Arc and XeSS DP4a (slower) everywhere else. But you could read the answers as being open to others coming in and dropping in their own engine (say nVidia Tensor engine rather than being forced down the fallback codepath on DP4a), but maybe not before Intel releases the full source code (for which a timeframe is not provided). In that DF interview there is also the suggestion of potential future developments where laptops do the main rendering on a dGPU then hand it off to the iGPU, where it has Intel matrix accelerators to run the final stages (XeSS upsample, composite UI etc). Given that current laptops with a discrete GPU already pass the completed 3D render to the iGPU to output via direct connection to the screen, this would only be an incremental step forward (rather than completely reinventing the path a frame takes today).

One can even imagine, looking at the announced AVX-VNNI instructions for consumer CPUs and AMX instructions for server CPUs, a future where those people working on interesting software renderers could stay entirely on the CPU while taking advantage of DL upscaling, assuming there was enough throughput that was power efficient enough to provide a worthwhile wow factor. Real-time software renderers are not competitive with modern GPU-accelerated renderers (an embarrassingly parallel problem on hardware designed around accelerating just that) but they are still an interesting hobby niche that may enjoy playing with this new area of technology.

Non-DL-based clamping limitations
DL-based denoising limitations

Going back to a more broad discussion, the reason for this excitement around DL upscaling (as I hopefully outlined in my previous post) is that it avoids the poor TAA performance of rejecting or clamping values from the history buffer, which has evident detail loss or failure states around higher frequency information (as nVidia have made clear in their talks on this topic). When the buffer can be fully utilised, a well managed jittered history can reconstruct a lot of detail for any element that has already been onscreen for a couple of frames (with anything that hasn't been onscreen liable to be masked behind a motion blur) despite using an internal resolution significantly below native output. Direct competition between two different implementations should provide even more impetus for advancement in this area. We are only scratching the surface of what deep learning algorithms can do to enhance our current rendering techniques.

Of course, there are some problems that nVidia have considered potentially intractable, such as the many types of noise that their DLSS 2.x approach cannot deal with (as it cannot provide a generalised solution that accounts for all noise types) and so, if it cannot be avoided, must be denoised before DLSS is applied. This is something that can force a traditional TAA stage (at a non trivial rendering and memory cost) back into engines that would otherwise be able to drop it entirely; the ultimate goal being only relying on the antialiasing of DLSS to provide exceptional final results. Intel offers a second set of engineers looking at such problems who may have fresh insights into what is possible. Microsoft are working on their own Xbox DL upscaling. There are signs Sony are up to something too. While AMD did not announce their plans in this area with the recent announcement of FSR, I am still convinced that the future of AMD GPUs will involve Tensor units and that they will justify that use of transistors with a DLSS-a-like - but we will maybe be waiting for RDNA3 in late 2022 before we get that piece of the puzzle. For now, Intel are in the spotlight and anyone with a vaguely recent GPU (even the most recent iGPUs) is being invited to come along.

Wednesday 30 June 2021

An Initial Inspection of FidelityFX Super Resolution

As I noted in an addendum to last month's post, I really expected AMD to announce that their new upscaling technology (which supplements FidelityFX Contrast Adaptive Sharpening + Upscale) would use temporal accumulation to compete with upcoming technologies like Unreal Engine 5's Temporal Super Resolution. It seemed like the obvious pivot after a couple of years of offering CAS, with their previous tech advertised as "designed to help increase the quality of existing Temporal Anti-Aliasing (TAA) solutions". AMD already have a branded option for tweaking and upscaling already-anti-aliased image buffers so to respond to nVidia's DLSS (offering close to or even beyond anti-aliased native res rendering quality at lower GPU loads due to upscaling significantly lower res aliased internal frames) the natural step would be integrating anti-aliasing, upscaling, and sharpening - something likely best achieved using a temporal buffer, to go significantly beyond the limits of previous spatial-only techniques.

Last month I linked to a few examples of where enthusiastic sharpening can have a quite poor effect on image quality (from effectively wiping out anti-aliasing to classic halo artefacts that any digital photographer well knows from trying to recover additional detail with a careful manual tweaking of Lightroom settings). This has generally limited my desire for CAS in any game where it has been offered (or turning on nVidia Image Sharpening) - when the effect strength is configurable then I'll generally apply it so lightly as to not be worth any performance cost; when I'm not able to tweak strength then it usually seems too much and I've seen some issues during combined upscaling (which do not seem inherent to the tech but an implementation failure that still managed to ship, although I did say at the time "the tech should be rebranded if fixed to work well in the future"). What we have from the new FidelityFX Super Resolution is something that could be considered CAS-Plus - it's the latest version of CAS (with what seems like a less aggressive default strength, still configurable either by the developer or passed on to a user option) along with a more involved integrated upscaler than the old implementation, one that promises to enable much higher upscaling factors without major quality loss.


Although FSR is not yet fully 1.0 and public, what we have already received is, like CAS, purely an upscaling and sharpening solution (with instructions that make that sound like this will not change) so it expects the game to have already applied anti-aliasing. We will be able to poke it in more detail soon ("The source code for FidelityFX Super Resolution 1.0 will be coming to GPUOpen in mid July") but with some games shipping implementations last week, we can give the output a first examination using our version 1.0 eyeballs. My expectations were tempered from not being blown away by CAS before and wondering how the spatial-only upscaling would deal with any aliasing, but it's pretty clear that AMD would not open-source a simple rebranding exercise so this was going to be at least a completely new generation of the ideas originally proposed via CAS and so worthy of examining on their merits rather than previous experiences.

I am actually ideally situated to take advantage of FSR, being one of the many many people (according to May's Steam survey) who has not made the jump from a GTX card to an RTX upgrade or AMD alternative (even if DLSS was offered for any of the titles currently shipping with FSR support). With shortages leading to terrible availability and ridiculous prices when there is any stock, many of us would likely have upgraded by now (this GTX 1070 shipping note is over five years old) and just need a bit more longevity to wait out supply catching up with demand. Unlike most of the other people on a Series 10 GPU, I am trying to drive a (desk-mounted, not living room) 49" 4K panel which benefits from both quality anti-aliasing and as many pixels as possible.

This blog has always been written with an intended audience of indie teams and enthusiastic amateurs with an interest in rendering; me and a few thousands visitors. Unfortunately the commentary around FSR's launch has seemed a bit toxic and divisive (especially questioning some press analysis). While occasionally forthright, I hope readers understand the aim here is to evaluate, give context with how things fit into the wider rendering landscape, and to make an occasional light-hearted jab at shipping flaws from the perspective of people who have & will continue to see that stuff in our own work because rendering is difficult (big publisher funded or not) with some hard choices being mutually exclusive.

The questions about FSR can broadly be split into two: how does this new generation of sharpening with an integrated upscaler compare in performance cost & quality to the basic fallback upscaler in the games that integrate it; and how does the combination of existing anti-aliasing solutions with FSR applied broadly hold up when other games are shipping with temporal anti-aliasing upscaling solutions either integrated into various game engines or via AI acceleration from nVidia (previously discussed last month)? But ultimately it can all somewhat collapse down to: how can developers offer the best subjective quality (be that headroom to guarantee perfect frame pacing, less flickering aliasing, or just a more pleasing or detailed final scene) on every hardware platform?

Dota 2, FSR 50%
The Riftbreaker, FSR Bal (59%)
The Riftbreaker, CAS 75%

Example Implementations

Everyone appears to have used Godfall as their primary example due to a recent marketing push combined with that being a relatively "next gen" game using some of the latest ray tracing effects available under UE4 - it's well-covered by a wealth of existing analysis (inner surfaces, sharply textured and somewhat noisy in the native presentation, get progressively blurry while edge detail can hold up but sometimes makes the underlying lower resolution apparent via stair-step artefacts; clearly beats basic upscaling at like for like framerates). I'm going to poke at two free titles (F2P or in open beta) both using slightly more bespoke rendering pipelines. Dota 2 currently uses the Source 2 engine but I'm not sure if the MLAA it uses has been much updated for years & years while The Riftbreaker uses a custom engine that just moved to a TAA solution they liked so much they completely removed the previous MLAA-optional "raw" rendering choice but, just like the stock configuration of Godfall, this does not offer an integrated upscaler with that TAA - when you use the basic upscaler it does not use the additional information from a jittered sample location in the frame history buffer to more precisely reconstruct the final high resolution image, rather it does a TAA resolve to whatever internal res you specify then upscales that as a spatial-only step likely using a cheap bilinear resample. Both games have internal framerate overlays (baking the numbers into screenshots) and offer a common "camera in the sky" not-truly-isometric perspective while using very different AA techniques as a point of contrast.

I have uploaded all the png files (to a service that may use compressed jpeg previews for the web viewer but allows you to easily download the genuine bit-identical files), including every 4K capture used for crops. These act as visual aids to the wider points I noted while the games were in motion and I recommend anyone wanting more than this summary, throw up a Dota 2 replay or check out the Prologue for The Riftbreaker to see it running on your own hardware. Accept no (highly compressed video) substitute; everyone ranks fine visual details in subtly different ways.

100% top, FSR 50% bottom
from TL: 70-80-90%; FSR70-80%, 100%
100% top, FSR 50% bottom

Dota 2 offers a simple toggle between FSR and a basic upscaler when the internal rendering resolution is scaled (40-99%) down from (100%) native. There is no option to tweak the sharpness applied and what becomes immediately apparent (centre image above) is that the sharpness Valve has chosen is significantly stronger than other implementations (where FSR is noted as softening flat textured surfaces compared to 100% resolution). Here, the large flat ground of the Dota map leaps off the screen, with 70% (image top right) and 80% scale FSR (centre right) offering almost equal perceived texture detail due to an aggressive sharpen that makes much of the very low contrast textures pop more than their native resolution presentation. The basic upscaler (image left) shows how linearly interpolating between the fewer samples into the underlying texture due to the lower internal resolution applies a blur that smears what soft detail there is available at 100% so that even 90% scale (image bottom left) is washed out. Moving to the leftmost image just above, even scaling FSR down to 50% (that is only using a 1080p internal resolution and no temporal reconstruction of any sort in this FXAA title) then we see an impressive retention of perceived texture detail that even zoomed up to 200% (quad pixels to retail original sharpness - this is the only image used that is not at original output pixel-scale) only just makes clear the sharpening artefacts and some lack of genuine detail from the 100% resolution original that rendered four times as many pixels. The grass texture detail and the dappling on the path in the top render is now more clearly absent in the bottom render and objects like the yellow flowers gain telltale dark halos while the transparent texturing of the tree leaves are clearly losing their clean edge.

I applied some generic (non-AMD branded) image sharpening to some of the unsharpened sub-native resolution captures and a lot of this texture detail can absolutely be recovered by any basic competent algorithm so I would avoid calling the CAS a secret sauce but it is at least doing the job required of it (working against the softening of using a lower internal resolution) well enough without a major performance cost. I also pushed the mip bias values way out and took a few screenshots of that, which captures how FSR compares to native resolution on edge detail retention when all the inner texture detail is blurred away with much smaller mipmaps. Some of the fine edge detail is starting to visibly break down at FSR 75% but lots of the wider edges are being extremely well retained, if rather darkened like a pencil was sketching over the edges, as long as the AA pass caught them. The strong sharpening is starting to grasp for detail not there, so causing mild posterisation in spots. The increased shadow/AO evident may be a side effect of the internal resolution being lowered (or could be an interaction with the mip bias tweaking).

When we move to a closer camera in the rightmost image above and more 3D elements that require anti-aliasing, we continue to see this clear softening on edges and evidence of the enlarging and softening of spots where the FXAA has not sufficiently cleaned up an edge in the internal resolution render. In static screenshots, I find the soft edges with sharpened interior detail to often work in favour of this technique, even if it can verge towards a dithered posterisation at points (even with textures left as intended). In motion, it inherits the issues with any MLAA technique in that elements that are unable to be anti-aliased sufficiently flicker enough to draw attention and the soft upscale here ends up drawing added attention to them not entirely unlike a more basic blur applied over the top of aliased edges (in fact, some of these captures catch artefacts very similar to the ones I noted when discussing that original release of No Man's Sky). Dota 2 will never be at the top of my list of rendering greats, and FSR can only do so much with what it is given (as we know it is not designed in any way to provide anti-aliasing itself), but I was pleasantly surprised with how, looking at paused game replays, FSR significantly increased the framerate with only a mild increase in edge shimmer (when in motion) and virtually no softening of inner detail.

Unfortunately, I then looked at the framerate counter as I unpaused from taking screenshots of a frozen moment in time. My initial impression had been that FSR turned my modest GPU (by 2021 standards) into something capable of making a new generation of 4K144 gaming screens sing with this classic title. Pushing the final step up from the ~100fps with max settings it was previously limited to (in all three of the 100% captures I cropped and discussed above). FSR 50% was able to hit ~165fps with 70% FSR giving about a 30% boost and 80% FSR a 15% boost with that exceptional image quality. But once my Ryzen 2700X has to process the extra load of running replays, which is more typical of actual gameplay, the GPU utilisation dropped. Not for running 100% scale, which sticks exactly where it was before, but even basic upscaler 80% drops from 150fps to 140fps and, more significantly, 50% FSR loses that 165fps for figures between 120-140fps. Higher internal resolution FSR squeezed in below and so was barely paying for the overhead of the FSR pass over native res. As it affects the basic upscale too, this is clearly something common to not having enough GPU load at lower res or some single-threaded weakness of the older Ryzen CPUs with Dota's workload. It's not a dealbreaker but it's why I haven't embossed the paused-time framerates onto all of these clipped shots (they are all printed onto the original so they're not hidden) to show how much framerate improves as image quality changes. Simply put, in actual motion the gains are not nearly as great as the first impression from static scenes. I hope Valve continue to tweak this implementation (as an e-sport, I'm sure their engine is constantly being tweaked to ensure it can hit those highest refresh rates on select machines) so it can saturate the GPU in motion.

My ideal implementation would allow the user to dial in a desired framerate, with Dota 2 dynamically changing the FSR factor to maintain a constant performance (as many console dynamic resolution implementations do, usually backed by a temporal component). The way FSR is implemented here, with a static percentage chosen and framerates changing based on how much is going on onscreen, seems like it would play best on a VRR/G-Sync display. Unfortunately, as you change the setting in real-time in the menus, the edge shimmer can be seen to "bubble" as the percentage scale changes. Although you can only see around the edge of the settings menu into the game itself, that was enough to make me think that the crawling edges of a dynamic FSR in Dota 2 would not be a good experience, at least unless some temporal solution was used to control the edges reshaping as internal resolutions moved around.

from TL: B-Q-UQ-100%CAS; P-75%-75%CAS-100%
from L: Bal, 75%CAS, Ultra-Qual, 100%

The Riftbreaker uses four named FSR levels AMD have suggested but also offers a basic upscaler you can use in 25% increments that allows for CAS to be enabled - this appears to be visually quite similar to enabling FSR, presumably as the game implements the very latest revision of CAS that is based on the same sharpening pass as FSR uses. Those named levels are: Ultra-Quality (77%), Quality (67%), Balanced (59%), and Performance (50%). I would prefer more granular control (or even fixing a desired framerate and a dynamic internal resolution managed by the engine) but this gives us a few fixed points to focus on and compare to the fallback basic upscaler and even using that upscaler but applying CAS. As mentioned earlier, The Riftbreaker uses TAA but does not use TAAU so using a basic upscaler from 50% will not be able to recover all of the texel information via a jitter (looking back four frames to each pixel in the 4K output from four 1080p internal renders), unlike more advanced temporal solutions. (Four frames at 60fps is a remarkably short span of time so even if you think that motion vectors would need to be very good to recover the sub-pixel jitter texel reading, there are likely to be quite a lot of places where TAAU is basically sampling the same spot so doesn't even need great motion vectors.)

This lack of TAAU's recovery of static texture information is quickly apparent when comparing (left image above) the detailed ground texture as the game starts (as our mech basks in the scenery while given orders). The 100% render (bottom right) shows excellent fine grass texturing and the geometry edge detail indicates this TAA errs on the side of sharp with slight aliasing from bright glints unable to be completely cleaned up. This comes at the cost of only just beating the screen refresh, hitting 64fps in this least demanding scene (with the ray tracing effects switched off on this old GTX card). Applying CAS to this 100% native render (bottom left) does make everything pop that tiny bit extra but the overhead drops us 10% to 59fps.

Working up the left side of the image we have quite a different choice made (again, not user configurable) on the strength of the FSR sharpening (and how high contrast the texture work started out) with FSR Ultra-Quality (that's 77% scale) losing quite a lot of that sharply-authored ground detail (while Dota 2 at similar internal resolutions was competitive with native). There could also be a difference in AA solutions at play as Dota 2 just gives FSR the lower res but otherwise barely touched texture detail while TAA could be softening everything before FSR gets involved. The edge detail (eg mech & crystals) gives hints at the lower internal resolution where the TAA couldn't quite suppress artefacts even at native resolution, but is otherwise clean (compare the sword between all clipped captures). It looks good in motion and boosts us to 75fps. Above that FSR Quality (67%) shows incremental softening and texture detail loss but in motion (now 85fps) much of this is less apparent than the direct comparison. At the very top left, Balanced (58%) is where the fine line detail is starting to break into visible stair-stepping in the screenshot and flickering in motion. 93fps also shows it's a point of slightly diminishing returns (although still far from CPU bottlenecked in this engine, which doesn't let you take screenshots of the game when paused so avoided making a similar discovery to in Dota 2). Finally for FSR, at top right is Performance (50%) which is doing well given that it's actually only dealing with a 1080p internal resolution but I'm not sure I'd play a game for extended periods of time looking like this as I'd rather scale back effects to avoid the shimmer that appears in motion and lack of texture detail (wasting the pixel count of the screen) rather than chase that 105fps.

Moving down the right side of that image, we have the basic upscaler and 75% internal res upper right. I would say this broadly shares elements of FSR Balanced and Quality - both of which are using significantly fewer internal pixels to reach their final output. Everything seems a bit softer than it should be when surrounded with all these sharpened and native resolution alternatives and the only real positive point is the 88fps, which puts it somewhere between Balanced and Quality - perceptual quality lining up quite well with rendering cost rather than raw internal resolution. Finally the lower right clip is from CAS applied to the 75% basic upscaled option and here we are given an interesting comparison point - this is effectively almost identical to Ultra-Quality in internal resolution and enjoying a sharpening pass, the only difference is the FSR upscaling (assuming CAS does genuinely use a different code path and so still uses the basic upscaler). I would suggest opening the full sized captures and flipping between them if you really want to assess the differences and why this is running at 80fps when UQ sat at a flat 75fps (with only a tiny increase in pixel count). To my eye, CAS on top of this 75% internal res basic upscale is visibly (if subtly) worse at dealing with edge detail. It's also slightly behind on bringing out that ground texture. Much better than the 75% without CAS, but also losing 10% performance to pay for the sharpening pass. The palm tree fringes, the detail both internal to surfaces and at their edge: I think UQ at 75fps is showing that FSR is more than just the latest generation of CAS (CAS-Plus) and worth paying for on top of the existing CAS performance cost. It's not competing with native res but then that's sitting at 64fps (and when things get more taxing, it takes a big hit).

The image above on the right compares four versions of the main base, from leftmost: Balanced, 75% basic upscale with CAS, Ultra-Quality, and 100% natural (no CAS). The thin geometric detail quickly makes plain the difference in underlying internal resolution and is why I like the idea of a next generation temporal solution that could, at least when the scene isn't too busily moving, have a good chance of recovering all this detail at a much lower per frame rendering cost. There's nothing "wrong" with the middle two results (again, I think that you can make out the difference in FSR vs just CAS in how those thin edges are preserved) but they are clearly on a progression towards the leftmost option, which is starting to show breakup of fine detail into aliased blobs and mild posterisation of the texture detail.

75% CAS traditional shadows
Perf (50%) RT shadows Medium

Another way of looking at FSR is that it unlocks new quality settings at the same output resolution and framerate. Above I managed to get RT shadows (at the lowest quality) enabled via the Performance profile and have compared it directly to the more primitive traditional shadows offered (RT does also use more dynamic lights, but these seem to mainly have an added cost when in the scene rather than at daytime with a single dominant lightsource) while using CAS to tweak a 75% internal resolution. Both scenes have more aliasing than I'd ideally like but the RT shadows rendered at 1080p and not the more detailed quality setting combined with the loss of texture detail makes the scene look significantly worse to my subjective evaluation. It is nice to be able to drop all the way down to 50% internal resolution (where a basic upscale would be significantly worse) but the trade-offs are not where I would go to try and unlock new effects, some of which need at least a bit more resolution than is being fed to them by picking low settings at low internal resolutions. Sometimes the best answer is new hardware after five years using something as your daily workhorse. And I'm left with an open question of if that aliasing and softness could both be sorted out (and even unlock lower internal resolutions, without leaning on FSR) if an integrated jittering TAA with Upscaler was offered - especially in scenes like the one above that contain a lot of stationary or slowly moving elements.

As I played through this beta of The Riftbreaker using a range of settings (and experiencing the quite different performance of different sections), I definitely appreciated being able to claw back performance with better image quality than the basic upscaler could provide on top of the mainly-clean TAA presentation. Right now, it offers the ability to at least look at the new ray tracing options at interactive framerates or to get much the same feeling via UQ to a native render even if it doesn't quite look the same under detailed inspection. In motion the bluring of marching ants wasn't ideal but it also softens the intensity of what would otherwise have already been a visible TAA failure. The sharpening here seems quite subtle and rarely something to negatively note adding extra artefacts. In fact, the main issue with dropping down the quality scale into the lower resolutions is my personal preference against the visual result of the FSR pass having to reconstruct a lot of data and producing slightly weird smoothing - fine in motion but something I'd like a VRS-like or temporal solution to be able to spend extra rendering budget on avoiding starving for crunchy detail when it might otherwise be available.

Dota 2, 100%
The Riftbreaker, 100%

In Conclusion

I have had some concerns over FidelityFX Super Resolution, including holding somewhat of an unflattering mirror up to these two implementations we've explored today, but my summation is actually quite positive. As I've mentioned before, I've seen more than a couple shipping sharpening and upscaling solutions that seem to actively work against the underlying renderer's quality. FSR here has performed admirably on two similar canvases (top down terrains filled with creeps) which use completely different engines (with different feature levels) and totally different anti-aliasing solutions. As internal resolution dropped, both showed increased shimmer but it seemed to be driven by underlying aliasing issues not lack of temporal stability of the spatial-only FSR technique - my leading concern going into this. Beyond a certain point the internal resolution simply doesn't have enough information to avoid some slight weirdness (often mild posterisation) in how it recovers detail without using additional samples (like a history buffer) and I've seen plenty of worse examples than anything I've seen so far with FSR - DLSS 1.0 certainly had more than a bit of weirdness to it.

It seems from my inspection that this is a good future for evolving FidelityFX Contrast Adaptive Sharpening + Upscale and that, especially if more developers provide the power for end users to tweak their own preference for sharpening strength within the bounds the developers consider reasonable, this offers performance without major sacrifices for image quality (until dropping far from the "Quality"-named end of the scale). And, as you can tweak which internal resolution FSR operates at, users can make very informed decisions about which subjective quality they are more interested in boosting. When GPU bottlenecked, the performance cost of FSR is more than reasonable, only slightly increasing the price of the latest CAS pass, and handily goes beyond the blurred result of offering a basic upscale (when comparing at the same output resolution and framerate - ie the lower internal resolution to pay for the FSR pass more than pays for itself vs simply using the cheapest upscale option). The sharpening is mainly adding local contrast where it improves detail while only mildly increasing the visibility of aliasing issues, which are actually just as much of an issue for the upscaling part of the process - often stretching them over more final pixels with somewhat of a blur and not able to reconstruct fine lines the internal resolution couldn't capture properly.

Should you integrate this into your hobby engine? We may have to wait on the source code release to see exactly how easy it is to integrate (I would guess: very easy) but if you've not currently got a good upscaling option and you're not looking at this to replace adding a good anti-aliasing solution (because it is not that) then FSR will definitely be easier than hooking up a complete TAAU solution (or DLSS 2) and tweaking the temporal jitteriness that they all seem to have early on. We will have to see how the next generation of TAAU and DLSS (or competing AI-enhanced anti-aliasing, upscaling, and sharpening algorithms) progress. In the long term, I think we will all join that future. Maybe by version 2.0 of FSR, there will be an optional temporal component that evolves what is possible if you can feed it a history buffer.

Sunday 30 May 2021

Fewer Samples per Pixel per Frame

In my VR roundup, it turned into a bit of an impromptu comparison between various anti-aliasing techniques inside one of the most challenging environments we currently have. VR restricts acceptable (input to photons) latency, so can limit pipeline/work buffer design; uses relatively extreme field of view (close inspection of pixel-scale details) combined with ever-increasing raw pixel counts of screens; and demands more than 60 fps with good frame pacing. Add in lens distortion and a temporal reprojection emergency stage (to avoid dropped frames) and it means even without TAA, you’ve got distortion and potentially an extra reprojection stage exaggerating artefacts in the frames you do render.

I think we’re at another rather interesting point for anti-aliasing techniques, as demands for offline-render quality real-time graphics at high resolutions with fewer compromises (like screen-space effect artefacts) enabled via ray tracing acceleration becomes mainstream. Per pixel shader calculation costs are going to jump just as we saw during the adoption of HDR/physically-based materials and expensive screen-space approximations like real-time SSAO. Samples per pixel per frame may not be forced to drop as quickly as consoles jumping from targeting 1080p to targeting 4K but we are going to need some new magic to ensure a lack of very uncinematic aliasing and luckily it looks like we’re getting there.

Sampling History

It is 1994 and I’m playing Doom on my PC. The CRT is capable of displaying VGA’s 640x480 but due to colour palette limitations most DOS games run 320x200 and Doom’s 3D area is widescreen aspect due to the status bar taking up the bottom area. To make matters worse, those of us without the processor required to software render 35 frames per second (Doom’s cap, half refresh for a VGA CRT’s 70Hz) would often shrink the 3D window to improve framerates. All of this is very common for earlier 3D games (I remember playing Quake 1 two years later similarly), which often had difficulties consistently staying in the “interactive framerate” category. For most it was a dream to output near the maximum displayable image while calculating an individual output value for every pixel of every scan-out and that limitation was not primarily due to early framebuffer limitations.

It is 2004 and I’m playing Half-Life 2. Rapid advancement then convergence under a couple of API families for hardware acceleration has meant most of the last decade provided amazing 3D games that grew with hardware capabilities (even if many earlier examples contain somewhat arbitrary resolution limitations). Even 1998’s Half-Life 1 has quickly jumped past low resolution 3D consoles like the PS2. Super-sampling (SSAA) where every final pixel was internally rendered several times then blended (used extensively for offline rendering) was usually too expensive, especially as screen resolutions continued to increase (initially for 4:3 CRT then LCDs moving to 16:9). But by this point, it was standard to use MSAA to blend samples from different polygons that partially covered a single pixel (the saving being that if multiple coverage points were covered by the same triangle, the shader for the final value was only run once, unlike SSAA). Two years later, nVidia would introduce CSAA to allow more coverage sample points than cached values, making it even cheaper to provide very accurate blending between polygon edges. It was even possible to mix in SSAA for transparent textures, where the edge of the triangle is not where the aliasing happens. Note how those 2006 benchmarks are already showing PC games running at the equivalent of 1080p120 with limited MSAA or 60 fps with many many samples per pixel.

It is 2014 and I’m playing the recent reboot of Tomb Raider. MSAA continued to get faster and better in the intervening decade but unfortunately the move to deferred rendering made it extremely difficult to implement efficiently into newer engines (it is not possible in Tomb Raider, although some deferred renderers did get hacked by nVidia drivers that injected MSAA at an acceptable performance cost). The answer to major aliasing, which had been developed during the xbox 360 generation of consoles, was to run a (MLAA) post-processing pass that looks for high contrast shapes typical of aliased lines and then employ a blur to ease the sudden gradient. This technique requires very clear aliasing telltale line segments so smaller detail like foliage systems become a huge issue, which really stands out in the sequel, Rise of the Tomb Raider. It also completely fails if you apply the pass after doing some other image manipulation that distorts the telltale shapes or edge gradients.

In this 2014 era, the use of HDR intermediate values later tonemapped down to the output range, which was just emerging after HL2, also makes it so that internal calculations can output a much wider range of values and with only one sample per triangle per pixel, a new sort of temporal aliasing become dominant as the sampled locations move enough for slightly different angles to be calculated grazing incredibly bright light sources in sequential frames. Surfaces sparkle and flicker in regular patterns that become at least as distracting in motion as classic polygon edge aliasing, as I mention in my Dragon Age retrospective. A combination of the two aliasing types is easily recognisable where an angle creates a strong lighting highlight along the silhouette of a surface that may be less than a pixel wide, creating light ants crawling along those polygon edges which are too thin for MLAA to catch. A better solution was required. (And you may note the journey isn’t over as I just linked that to a trailer for a 2021 game with an engine that already uses...)

Temporal Accumulation

The problem is clear. By 2014 we are generally using one (complex) sample per pixel per frame and due to fine geometric detail (older games lacked) plus an extreme range of possible lighting values (not to mention potential ordering issues in how various stages of calculating light and darkness components are blended) this is creating pixel-scale aliased elements that are also often not temporally stable. The screenshots look relatively good but in motion anyone with flicker-sensitivity is immediately distracted by aliasing. By this time the shaders have also become complex enough that various motion vectors (showing how far the object under each pixel has moved in the previous frame) are starting to be calculated to enable somewhat accurate motion blur to be added (very important on consoles targeting 30 fps, where this provides extra temporal information missing when not using higher framerate output - it’s also “more cinematic” because most people are used to 24 fps movies with a 180 degree shutter so accumulating all light that hits the lens for 1/48th of a second before closing the shutter for another 1/48th of a second).

Those motion vectors, if they are sufficiently accurate, can point to the pixel location of the object in the previous frame. So expensive effects like real-time ambient occlusion estimation (checking the local depth buffer around a pixel to see how occluded the point is by other geometry that would limit how much bounce lighting it would likely receive) becomes an area of experimentation for temporal accumulation buffers. Sample less in each frame, create a noisy estimation of the ground truth, and filter for stability while reprojecting each frame along the motion vectors. Here’s a good walkthrough blog from this time period and subsequent refinements have worked to deal with edge cases like an incremental buffer not handling geometry arriving from off-screen (causing some early examples to obviously slowly darken geometry as it appeared along the edge of the screen).

As seen shipping in 2011's Crysis 2, temporal accumulation for reducing aliasing not only presents the answer to MLAA’s limitations but also can operate after a cheap MLAA pass to rapidly reduce all aliasing. If you consider a slightly jittered pixel centre location (a common enhancement) then a static scene under TAA effectively generates SSAA-quality images, only spreading the samples per pixel out over time. It was popularised further by nVidia with their branding of the process as TXAA, shipping in games in 2012. Some early implementations had major ghosting issues from motion vector precision and understanding when to reject a previous frame’s data as not contributing to this new location. The actual complexity of this problem becomes apparent when you consider how objects in a scene may have changing visibility (especially during motion and animation) or output values (consider a flickering light and the subsequent illumination between frames). Progress has not always been uniform and a couple of times I've stumbled upon an anti-aliasing fail state that's hard to even explain (Dishonored 2 doesn't have very satisfying TAA due to ghosting thin elements and I don't know what the MLAA is doing here to achieve what's visible in this capture). It is a process under constant refinement but in today’s best temporal accumulation implementations it is often relatively rare to see obvious issues. As mentioned, it also errs on the side of a softer final frame so can be combined with a sharpening filter. Unfortunately this can be handled poorly, effectively paying the computational cost of TAA while then also reintroducing exactly the obvious aliasing that it was meant to remove. It also doesn’t help if your TAA implementation is broken on a platform.

Ray Tracing with DLSS and The Future

In the last couple of years, the new hotness that really explodes the computational costs of working out a stable final value of each pixel in a frame of a modern game is real-time ray tracing. Thanks to nVidia looking to brand the future, they have shipped all RTX GPUs with dedicated silicon to accelerate BVH intersection tests and machine learning tensor operations (big matrix multiplies, often with sparse data) and at least the former part of that is now also available on current AMD GPUs and consoles plus upcoming Intel discrete GPUs. If you thought the aliasing issues from rasterisation going to physically-based materials and HDR were a concern, welcome to a problem so far beyond that that if you look at the underlying data from a single frame using around one sample per pixel, it looks more like white noise than a coherent scene - accumulation with temporally reliable motion vectors is a must and site of ongoing research. The addition of Tensor cores to RTX GPUs was initially proposed as the place to run AI denoising on that ray tracing output, although most games today still denoise in the general purpose shaders. Luckily, another branch of research was to use those Tensor units to AI-accelerate all anti-aliasing and it has been wildly successful with many reviewers now noting that DLSS 2 outperforms native resolution TAA.

DLSS 1 was a bit of a mixed bag as the AI had to be trained on each game and took an aliased lower resolution image from the game then applied the classic AI Super Resolution techniques to “dream” or “hallucinate” the missing details and softened edges. However, DLSS 2 changed the inputs (this presentation originally convinced me AMD would add AI cores to RDNA2) and so required a buffer of previous low resolution input frames (including depth buffers and motion vectors) while removing the previous individual training requirement, effectively giving the AI the power of temporal accumulation information to generate the final output. So each new frame generated by the game can be run at a much lower resolution than the output, reducing the samples per output pixel, and yet will retain the look of a cleanly anti-aliased native resolution render. We are back to 1994 but rather than peering into a small box, the games look almost as good as offline rendering and output fullscreen. Even when not trained to give the exact same result as native processing, the AI seems to be quite stable and creates pleasing results in motion. It’s a game changer when targeting new screens that can accept 4K frames at or above 120Hz.

But nVidia do not have a monopoly on upscaling while anti-aliasing and more significant upscaling without compromises will be the new normal if my reading of the tea leaves (on samples per pixel per frame) is correct. Reusing information from previous frames is clearly a smart efficiency saving as long as we can reliably determine what information is useful and what isn’t (avoiding failures that create significant artefacts which are as distracting as the aliasing we’re trying to move beyond or the framerate drops we’re trying to avoid). The target of 4K on the PS4Pro forced engines to pivot to smart upscaling strategies such as the use of checkerboarding and a rotated tangram resolve in Horizon: Zero Dawn, reducing GPU costs of each new frame by alternating which pixels in a checkerboard were rendered (then blending on the diagonals for that frame while adding in contributions from the previous frame). Recent years have seen an excellent execution of targeting the fixed scan-out time of non-VRR displays by managing the rendering load around modifying the internal render resolution then upscaling for the final presentation (usually with native UI compositing over the top for maximum text clarity). Even when dynamic resolution scaling is not available on PC, it has forced renderers to provide visually pleasing upscaling that gracefully handles even fine texture transparency and pixel-wide polygon details.

The Medium, TAA 50%
The Medium, TAA 75%
The Medium, TAA 100%

The last few years of Unreal Engine 4 have had quite a clean TAA with integrated upscaler (sometimes called TAAU) for dynamic internal resolution (it tracks the sub-pixel jitter so the samples can be correctly distributed even when changing the ratio of internal res to output res; primarily used on consoles, where the APIs for precise frame time calculation and estimation have existed for longer and the fixed platform make it easier to define an ideal internal resolution window for reliable results that still come close to maximising GPU throughput - the skill is not underutilising the GPU by being too conservative and so being ready for scan-out milliseconds before needed). In the best cases, I am completely happy to run UE4 around 80% resolution (just under 1800p) and let the TAA upscaler reconstruct a soft and clean final image on my 4K PC big screen (getting close to home cinema levels of consuming my vision so making aliasing issues more apparent than someone looking at a distant TV or small monitor). It doesn’t compete with DLSS (in Performance mode that is a 50% resolution so 1080p internal renders when the output is 4K) but then head to heads show DLSS 2 reaches close to image quality parity with UE4 TAA running at 100% internal resolution on PC so clearly dropping down to 1800p is under 70% of the actual sample count (previous percentages are edges vs sample count is area) and ensuring a relatively aliasing free result without AI will err on the side of softer than DLSS Perf. The above captures from The Medium show a clear quality loss at 50% while the differences at 75% are more subtle compared to native internal resolution. The captures from Man of Medan below are where I think TAA with some upscaling is showing quality levels that you would not even imagine possible in the MLAA era (expecially noting these captures have significantly fewer samples per pixel per frame than those games from a decade ago).

Man of Medan, TAA 85%
Man of Medan, TAA 85%
Man of Medan, TAA 85%

With the public release of Unreal Engine 5’s beta shipping with default-enabled Temporal Super Resolution, we are looking at the beginning of non-AI (or at least not running on Tensor cores) TAA plus upscaling that aims to hit the same milestones as DLSS when it comes to low internal resolution. The PR for the UE5 release announces 1080p internal render resolution, aiming to hit the quality bar of 4K native. That is an ambitious target and running the editor (which also uses UE5 TSR by default) there is a lot to appreciate about this beta’s visual quality, well beyond the 50% screenshot above from UE4’s technique (and that was already significantly above some previous branded sharpen plus upscale techniques as implemented in shipping games). We are approaching a point where continued refinement of this path of research will be able to pick away at the final issues and retain detail without turning the results into a mess of sharpening halos or lingering aliasing. From there we have a far more interesting future in which some games will be able to explore the artistic choice to reject such smoothing, rather than fall into them via broken PC releases, or even take the performance wins of significant upscaling while tweaking output to retain more of the underlying grainy component of ray tracing or other contributions (while adding noise to areas where it does not naturally occur and so approach something close to movie film grain that actually looks good but reduces render cost rather than increasing it slightly).

Edit (June 2021): This was written on the assumption that the imminent reveal of AMD's FidelityFX Super Resolution would confirm a very similar technique to UE5's Temporal Super Resolution, directly chasing after DLSS's impressive results at similarly low internal rendering resolutions (using fewer samples than checkerboarding and far fewer than where other TAA upscaling, such as in UE4, shines). It has since been announced that AMD are zaging where others have zigged and will not be using a temporal solution. Worryingly this has come with rather weak results on the one pre-release promotional image used to sell the technology. As I mentioned above, DLSS 1 did not come out of the gates a winner so AMD have plenty of time to iterate or to provide an open equivalent that replicates what Epic are doing with UE5.