Monday, 30 August 2021

Intel XeSS: Joining nVidia in Tensor-Accelerated TAAU

Back in May I wrote about the evolution of per-pixel rendering costs, expecting the imminent announcement of AMD's next generation temporal upscaling technique, offering a competitor to DLSS 2.x that would run on hardware from multiple vendors and even provide a fully open source option to inspect or even improve upon (if it wasn't a perfect match to one vendor's underlying hardware) the offered technology. That ended up not happening and FidelityFX Super Resolution, while an interesting alternative to more basic spatial upsampling, didn't quite match my hopes for some real competition to nVidia's RTX-only DLSS.

I had started a draft post on implementing FidelityFX Super Resolution into your own engine but really, I'm not sure how much it adds. If you want to run a more expensive upscale to retain far more sharpness than bilinear (so not something where you're going to be also doing a blur afterwards) or you're already doing an expensive sharpening pass like FidelityFX CAS after an (optional) upscale pass then you absolutely should drop in FidelityFX Super Resolution any place where you'd otherwise be thinking about the value of Lanczos (because that's roughly what it is). As others have noted by now, this is already a choice players make because modern GPUs (when not doing upscale on the output monitor) implement this when setting the internal resolution lower than the output/native resolution of your system - I've often been quite happy running AAA titles at 1800p for a 4K screen (as long as the anti-aliasing was good) and FSR is an enhancement on that path (increasing quality with the option to composite the UI and any pixel-scale noise, like a film grain effect, at native res after the upscale).

Intel XeSS

What has recently re-energised my interest in upscaling techniques is the Intel Architecture Day announcement of XeSS. A next generation temporal upscaling technique, offering a competitor to DLSS 2.x that will run on hardware from multiple vendors and even provide a fully open source option (at some as yet unknown future date). So I had vaguely the right timescale for an announcement but had bet on the wrong non-nVidia GPU company making it.

XeSS outlined
DLSS 2.x outlined

We do not have full access to XeSS so for now we only have a rough roadmap of releases starting with the initial SDK for use with their Arc series of GPUs (hardware that will not become available until early 2022). The design of the new Arc (Xe-HPG) series goes hard on matrix (Tensor) accelerators and so it is a natural fit to offer something broadly comparable to DLSS, which is accelerated by these AI/Tensor cores. Intel is actually investing even more of their GPU into matrix acceleration than nVidia, so expect a major push to ensure software supports XeSS rather than leaving that silicon idle when running the latest AAA releases.

From the outline we have been provided by Intel, it is easy to see that beyond the similar hardware being tapped to run deep learning algorithms, the inputs are also very similar to nVidia's DLSS 2.x. We have a jittered low resolution input frame along with motion vectors noting the velocity of each pixel and a history buffer of previous frames from which to extract information (which, even when showing a totally static scene, provides additional information thanks to the moving jitter pattern). The only additional information nVidia are explicit about collecting with their API is an exposure value (although the current SDK, 2.2.1, has added an auto-exposure function since these nVidia slides were published) and the depth buffer (which Intel may implicitly include as part of the complete input frame).

Intel in comments to the press have discussed the possibility of the industry converging to a common standard for DL upscaling APIs, allowing almost drop-in dll swaps to make it trivial to support various alternatives. The way this is talked about as a future development means it is unlikely that the initial release of XeSS will be a drop-in dll replacement for DLSS 2.x (using identically named functions/entry-points and settings ranges). Although it remains to be seen how difficult it may be for ingenious hackers to work out how to bridge the differences and allow current DLSS titles to run a bootleg XeSS mode under the hood in the future (of course, not condoned by Intel itself).

DLSS time savings
XeSS time savings
DLSS savings scaling

This brings us to a major point of differentiation (vs nVidia) and something very exciting to various users stuck with our current supply-constrained GPU market (which will not improve sufficiently to allow everyone to upgrade to an RTX card even by late next year): XeSS will provide a fallback mode that runs (be it somewhat slower) on GPUs without hardware (XMX) matrix acceleration. Added to nVidia for Pascal (Series 10), AMD for Vega, and Intel for Xe-LP on Tiger/Rocket Lake (11th Gen Core processors) there are some AI acceleration instructions for Int8 operations (DP4a) that can provide quadruple the throughput for dot products on packed Int8 values in comparison to 32-bit operations - this is effectively a mid-ground between trying to run AI workloads as generic shaders and getting the full acceleration of dedicated Tensor units.

With Intel so invested in matrix acceleration, it becomes more evident that AMD are being left behind - even mobile chips ship with limited amounts of this form of hardware acceleration (as I noted in 2019) - so this fallback is providing a vital half-step (which should more than pay for itself with the reduction in rendering cost of a lower resolution input image with no need for antialiasing). This also applies to the current consoles, which notably didn't get left behind on ray tracing acceleration but are starting to look down a long generational window without hardware matrix acceleration. The Xbox Series of consoles offers something equivalent to DP4a via DirectML (and Microsoft have said they are working on their own DL upscaling technique for use on those consoles in the future) but we don't yet know if Sony have an answer for the PS5.

In interviews it sounds like Intel are, at least initially, reserving the XMX path for their own Arc GPUs (despite nVidia RTX cards having equivalent matrix acceleration) so it will be a case of DLSS only on RTX going up against XeSS XMX (fast) only on Arc and XeSS DP4a (slower) everywhere else. But you could read the answers as being open to others coming in and dropping in their own engine (say nVidia Tensor engine rather than being forced down the fallback codepath on DP4a), but maybe not before Intel releases the full source code (for which a timeframe is not provided). In that DF interview there is also the suggestion of potential future developments where laptops do the main rendering on a dGPU then hand it off to the iGPU, where it has Intel matrix accelerators to run the final stages (XeSS upsample, composite UI etc). Given that current laptops with a discrete GPU already pass the completed 3D render to the iGPU to output via direct connection to the screen, this would only be an incremental step forward (rather than completely reinventing the path a frame takes today).

One can even imagine, looking at the announced AVX-VNNI instructions for consumer CPUs and AMX instructions for server CPUs, a future where those people working on interesting software renderers could stay entirely on the CPU while taking advantage of DL upscaling, assuming there was enough throughput that was power efficient enough to provide a worthwhile wow factor. Real-time software renderers are not competitive with modern GPU-accelerated renderers (an embarrassingly parallel problem on hardware designed around accelerating just that) but they are still an interesting hobby niche that may enjoy playing with this new area of technology.

Non-DL-based clamping limitations
DL-based denoising limitations

Going back to a more broad discussion, the reason for this excitement around DL upscaling (as I hopefully outlined in my previous post) is that it avoids the poor TAA performance of rejecting or clamping values from the history buffer, which has evident detail loss or failure states around higher frequency information (as nVidia have made clear in their talks on this topic). When the buffer can be fully utilised, a well managed jittered history can reconstruct a lot of detail for any element that has already been onscreen for a couple of frames (with anything that hasn't been onscreen liable to be masked behind a motion blur) despite using an internal resolution significantly below native output. Direct competition between two different implementations should provide even more impetus for advancement in this area. We are only scratching the surface of what deep learning algorithms can do to enhance our current rendering techniques.

Of course, there are some problems that nVidia have considered potentially intractable, such as the many types of noise that their DLSS 2.x approach cannot deal with (as it cannot provide a generalised solution that accounts for all noise types) and so, if it cannot be avoided, must be denoised before DLSS is applied. This is something that can force a traditional TAA stage (at a non trivial rendering and memory cost) back into engines that would otherwise be able to drop it entirely; the ultimate goal being only relying on the antialiasing of DLSS to provide exceptional final results. Intel offers a second set of engineers looking at such problems who may have fresh insights into what is possible. Microsoft are working on their own Xbox DL upscaling. There are signs Sony are up to something too. While AMD did not announce their plans in this area with the recent announcement of FSR, I am still convinced that the future of AMD GPUs will involve Tensor units and that they will justify that use of transistors with a DLSS-a-like - but we will maybe be waiting for RDNA3 in late 2022 before we get that piece of the puzzle. For now, Intel are in the spotlight and anyone with a vaguely recent GPU (even the most recent iGPUs) is being invited to come along.

Wednesday, 30 June 2021

An Initial Inspection of FidelityFX Super Resolution

As I noted in an addendum to last month's post, I really expected AMD to announce that their new upscaling technology (which supplements FidelityFX Contrast Adaptive Sharpening + Upscale) would use temporal accumulation to compete with upcoming technologies like Unreal Engine 5's Temporal Super Resolution. It seemed like the obvious pivot after a couple of years of offering CAS, with their previous tech advertised as "designed to help increase the quality of existing Temporal Anti-Aliasing (TAA) solutions". AMD already have a branded option for tweaking and upscaling already-anti-aliased image buffers so to respond to nVidia's DLSS (offering close to or even beyond anti-aliased native res rendering quality at lower GPU loads due to upscaling significantly lower res aliased internal frames) the natural step would be integrating anti-aliasing, upscaling, and sharpening - something likely best achieved using a temporal buffer, to go significantly beyond the limits of previous spatial-only techniques.

Last month I linked to a few examples of where enthusiastic sharpening can have a quite poor effect on image quality (from effectively wiping out anti-aliasing to classic halo artefacts that any digital photographer well knows from trying to recover additional detail with a careful manual tweaking of Lightroom settings). This has generally limited my desire for CAS in any game where it has been offered (or turning on nVidia Image Sharpening) - when the effect strength is configurable then I'll generally apply it so lightly as to not be worth any performance cost; when I'm not able to tweak strength then it usually seems too much and I've seen some issues during combined upscaling (which do not seem inherent to the tech but an implementation failure that still managed to ship, although I did say at the time "the tech should be rebranded if fixed to work well in the future"). What we have from the new FidelityFX Super Resolution is something that could be considered CAS-Plus - it's the latest version of CAS (with what seems like a less aggressive default strength, still configurable either by the developer or passed on to a user option) along with a more involved integrated upscaler than the old implementation, one that promises to enable much higher upscaling factors without major quality loss.


Although FSR is not yet fully 1.0 and public, what we have already received is, like CAS, purely an upscaling and sharpening solution (with instructions that make that sound like this will not change) so it expects the game to have already applied anti-aliasing. We will be able to poke it in more detail soon ("The source code for FidelityFX Super Resolution 1.0 will be coming to GPUOpen in mid July") but with some games shipping implementations last week, we can give the output a first examination using our version 1.0 eyeballs. My expectations were tempered from not being blown away by CAS before and wondering how the spatial-only upscaling would deal with any aliasing, but it's pretty clear that AMD would not open-source a simple rebranding exercise so this was going to be at least a completely new generation of the ideas originally proposed via CAS and so worthy of examining on their merits rather than previous experiences.

I am actually ideally situated to take advantage of FSR, being one of the many many people (according to May's Steam survey) who has not made the jump from a GTX card to an RTX upgrade or AMD alternative (even if DLSS was offered for any of the titles currently shipping with FSR support). With shortages leading to terrible availability and ridiculous prices when there is any stock, many of us would likely have upgraded by now (this GTX 1070 shipping note is over five years old) and just need a bit more longevity to wait out supply catching up with demand. Unlike most of the other people on a Series 10 GPU, I am trying to drive a (desk-mounted, not living room) 49" 4K panel which benefits from both quality anti-aliasing and as many pixels as possible.

This blog has always been written with an intended audience of indie teams and enthusiastic amateurs with an interest in rendering; me and a few thousands visitors. Unfortunately the commentary around FSR's launch has seemed a bit toxic and divisive (especially questioning some press analysis). While occasionally forthright, I hope readers understand the aim here is to evaluate, give context with how things fit into the wider rendering landscape, and to make an occasional light-hearted jab at shipping flaws from the perspective of people who have & will continue to see that stuff in our own work because rendering is difficult (big publisher funded or not) with some hard choices being mutually exclusive.

The questions about FSR can broadly be split into two: how does this new generation of sharpening with an integrated upscaler compare in performance cost & quality to the basic fallback upscaler in the games that integrate it; and how does the combination of existing anti-aliasing solutions with FSR applied broadly hold up when other games are shipping with temporal anti-aliasing upscaling solutions either integrated into various game engines or via AI acceleration from nVidia (previously discussed last month)? But ultimately it can all somewhat collapse down to: how can developers offer the best subjective quality (be that headroom to guarantee perfect frame pacing, less flickering aliasing, or just a more pleasing or detailed final scene) on every hardware platform?

Dota 2, FSR 50%
The Riftbreaker, FSR Bal (59%)
The Riftbreaker, CAS 75%

Example Implementations

Everyone appears to have used Godfall as their primary example due to a recent marketing push combined with that being a relatively "next gen" game using some of the latest ray tracing effects available under UE4 - it's well-covered by a wealth of existing analysis (inner surfaces, sharply textured and somewhat noisy in the native presentation, get progressively blurry while edge detail can hold up but sometimes makes the underlying lower resolution apparent via stair-step artefacts; clearly beats basic upscaling at like for like framerates). I'm going to poke at two free titles (F2P or in open beta) both using slightly more bespoke rendering pipelines. Dota 2 currently uses the Source 2 engine but I'm not sure if the MLAA it uses has been much updated for years & years while The Riftbreaker uses a custom engine that just moved to a TAA solution they liked so much they completely removed the previous MLAA-optional "raw" rendering choice but, just like the stock configuration of Godfall, this does not offer an integrated upscaler with that TAA - when you use the basic upscaler it does not use the additional information from a jittered sample location in the frame history buffer to more precisely reconstruct the final high resolution image, rather it does a TAA resolve to whatever internal res you specify then upscales that as a spatial-only step likely using a cheap bilinear resample. Both games have internal framerate overlays (baking the numbers into screenshots) and offer a common "camera in the sky" not-truly-isometric perspective while using very different AA techniques as a point of contrast.

I have uploaded all the png files (to a service that may use compressed jpeg previews for the web viewer but allows you to easily download the genuine bit-identical files), including every 4K capture used for crops. These act as visual aids to the wider points I noted while the games were in motion and I recommend anyone wanting more than this summary, throw up a Dota 2 replay or check out the Prologue for The Riftbreaker to see it running on your own hardware. Accept no (highly compressed video) substitute; everyone ranks fine visual details in subtly different ways.

100% top, FSR 50% bottom
from TL: 70-80-90%; FSR70-80%, 100%
100% top, FSR 50% bottom

Dota 2 offers a simple toggle between FSR and a basic upscaler when the internal rendering resolution is scaled (40-99%) down from (100%) native. There is no option to tweak the sharpness applied and what becomes immediately apparent (centre image above) is that the sharpness Valve has chosen is significantly stronger than other implementations (where FSR is noted as softening flat textured surfaces compared to 100% resolution). Here, the large flat ground of the Dota map leaps off the screen, with 70% (image top right) and 80% scale FSR (centre right) offering almost equal perceived texture detail due to an aggressive sharpen that makes much of the very low contrast textures pop more than their native resolution presentation. The basic upscaler (image left) shows how linearly interpolating between the fewer samples into the underlying texture due to the lower internal resolution applies a blur that smears what soft detail there is available at 100% so that even 90% scale (image bottom left) is washed out. Moving to the leftmost image just above, even scaling FSR down to 50% (that is only using a 1080p internal resolution and no temporal reconstruction of any sort in this FXAA title) then we see an impressive retention of perceived texture detail that even zoomed up to 200% (quad pixels to retail original sharpness - this is the only image used that is not at original output pixel-scale) only just makes clear the sharpening artefacts and some lack of genuine detail from the 100% resolution original that rendered four times as many pixels. The grass texture detail and the dappling on the path in the top render is now more clearly absent in the bottom render and objects like the yellow flowers gain telltale dark halos while the transparent texturing of the tree leaves are clearly losing their clean edge.

I applied some generic (non-AMD branded) image sharpening to some of the unsharpened sub-native resolution captures and a lot of this texture detail can absolutely be recovered by any basic competent algorithm so I would avoid calling the CAS a secret sauce but it is at least doing the job required of it (working against the softening of using a lower internal resolution) well enough without a major performance cost. I also pushed the mip bias values way out and took a few screenshots of that, which captures how FSR compares to native resolution on edge detail retention when all the inner texture detail is blurred away with much smaller mipmaps. Some of the fine edge detail is starting to visibly break down at FSR 75% but lots of the wider edges are being extremely well retained, if rather darkened like a pencil was sketching over the edges, as long as the AA pass caught them. The strong sharpening is starting to grasp for detail not there, so causing mild posterisation in spots. The increased shadow/AO evident may be a side effect of the internal resolution being lowered (or could be an interaction with the mip bias tweaking).

When we move to a closer camera in the rightmost image above and more 3D elements that require anti-aliasing, we continue to see this clear softening on edges and evidence of the enlarging and softening of spots where the FXAA has not sufficiently cleaned up an edge in the internal resolution render. In static screenshots, I find the soft edges with sharpened interior detail to often work in favour of this technique, even if it can verge towards a dithered posterisation at points (even with textures left as intended). In motion, it inherits the issues with any MLAA technique in that elements that are unable to be anti-aliased sufficiently flicker enough to draw attention and the soft upscale here ends up drawing added attention to them not entirely unlike a more basic blur applied over the top of aliased edges (in fact, some of these captures catch artefacts very similar to the ones I noted when discussing that original release of No Man's Sky). Dota 2 will never be at the top of my list of rendering greats, and FSR can only do so much with what it is given (as we know it is not designed in any way to provide anti-aliasing itself), but I was pleasantly surprised with how, looking at paused game replays, FSR significantly increased the framerate with only a mild increase in edge shimmer (when in motion) and virtually no softening of inner detail.

Unfortunately, I then looked at the framerate counter as I unpaused from taking screenshots of a frozen moment in time. My initial impression had been that FSR turned my modest GPU (by 2021 standards) into something capable of making a new generation of 4K144 gaming screens sing with this classic title. Pushing the final step up from the ~100fps with max settings it was previously limited to (in all three of the 100% captures I cropped and discussed above). FSR 50% was able to hit ~165fps with 70% FSR giving about a 30% boost and 80% FSR a 15% boost with that exceptional image quality. But once my Ryzen 2700X has to process the extra load of running replays, which is more typical of actual gameplay, the GPU utilisation dropped. Not for running 100% scale, which sticks exactly where it was before, but even basic upscaler 80% drops from 150fps to 140fps and, more significantly, 50% FSR loses that 165fps for figures between 120-140fps. Higher internal resolution FSR squeezed in below and so was barely paying for the overhead of the FSR pass over native res. As it affects the basic upscale too, this is clearly something common to not having enough GPU load at lower res or some single-threaded weakness of the older Ryzen CPUs with Dota's workload. It's not a dealbreaker but it's why I haven't embossed the paused-time framerates onto all of these clipped shots (they are all printed onto the original so they're not hidden) to show how much framerate improves as image quality changes. Simply put, in actual motion the gains are not nearly as great as the first impression from static scenes. I hope Valve continue to tweak this implementation (as an e-sport, I'm sure their engine is constantly being tweaked to ensure it can hit those highest refresh rates on select machines) so it can saturate the GPU in motion.

My ideal implementation would allow the user to dial in a desired framerate, with Dota 2 dynamically changing the FSR factor to maintain a constant performance (as many console dynamic resolution implementations do, usually backed by a temporal component). The way FSR is implemented here, with a static percentage chosen and framerates changing based on how much is going on onscreen, seems like it would play best on a VRR/G-Sync display. Unfortunately, as you change the setting in real-time in the menus, the edge shimmer can be seen to "bubble" as the percentage scale changes. Although you can only see around the edge of the settings menu into the game itself, that was enough to make me think that the crawling edges of a dynamic FSR in Dota 2 would not be a good experience, at least unless some temporal solution was used to control the edges reshaping as internal resolutions moved around.

from TL: B-Q-UQ-100%CAS; P-75%-75%CAS-100%
from L: Bal, 75%CAS, Ultra-Qual, 100%

The Riftbreaker uses four named FSR levels AMD have suggested but also offers a basic upscaler you can use in 25% increments that allows for CAS to be enabled - this appears to be visually quite similar to enabling FSR, presumably as the game implements the very latest revision of CAS that is based on the same sharpening pass as FSR uses. Those named levels are: Ultra-Quality (77%), Quality (67%), Balanced (59%), and Performance (50%). I would prefer more granular control (or even fixing a desired framerate and a dynamic internal resolution managed by the engine) but this gives us a few fixed points to focus on and compare to the fallback basic upscaler and even using that upscaler but applying CAS. As mentioned earlier, The Riftbreaker uses TAA but does not use TAAU so using a basic upscaler from 50% will not be able to recover all of the texel information via a jitter (looking back four frames to each pixel in the 4K output from four 1080p internal renders), unlike more advanced temporal solutions. (Four frames at 60fps is a remarkably short span of time so even if you think that motion vectors would need to be very good to recover the sub-pixel jitter texel reading, there are likely to be quite a lot of places where TAAU is basically sampling the same spot so doesn't even need great motion vectors.)

This lack of TAAU's recovery of static texture information is quickly apparent when comparing (left image above) the detailed ground texture as the game starts (as our mech basks in the scenery while given orders). The 100% render (bottom right) shows excellent fine grass texturing and the geometry edge detail indicates this TAA errs on the side of sharp with slight aliasing from bright glints unable to be completely cleaned up. This comes at the cost of only just beating the screen refresh, hitting 64fps in this least demanding scene (with the ray tracing effects switched off on this old GTX card). Applying CAS to this 100% native render (bottom left) does make everything pop that tiny bit extra but the overhead drops us 10% to 59fps.

Working up the left side of the image we have quite a different choice made (again, not user configurable) on the strength of the FSR sharpening (and how high contrast the texture work started out) with FSR Ultra-Quality (that's 77% scale) losing quite a lot of that sharply-authored ground detail (while Dota 2 at similar internal resolutions was competitive with native). There could also be a difference in AA solutions at play as Dota 2 just gives FSR the lower res but otherwise barely touched texture detail while TAA could be softening everything before FSR gets involved. The edge detail (eg mech & crystals) gives hints at the lower internal resolution where the TAA couldn't quite suppress artefacts even at native resolution, but is otherwise clean (compare the sword between all clipped captures). It looks good in motion and boosts us to 75fps. Above that FSR Quality (67%) shows incremental softening and texture detail loss but in motion (now 85fps) much of this is less apparent than the direct comparison. At the very top left, Balanced (58%) is where the fine line detail is starting to break into visible stair-stepping in the screenshot and flickering in motion. 93fps also shows it's a point of slightly diminishing returns (although still far from CPU bottlenecked in this engine, which doesn't let you take screenshots of the game when paused so avoided making a similar discovery to in Dota 2). Finally for FSR, at top right is Performance (50%) which is doing well given that it's actually only dealing with a 1080p internal resolution but I'm not sure I'd play a game for extended periods of time looking like this as I'd rather scale back effects to avoid the shimmer that appears in motion and lack of texture detail (wasting the pixel count of the screen) rather than chase that 105fps.

Moving down the right side of that image, we have the basic upscaler and 75% internal res upper right. I would say this broadly shares elements of FSR Balanced and Quality - both of which are using significantly fewer internal pixels to reach their final output. Everything seems a bit softer than it should be when surrounded with all these sharpened and native resolution alternatives and the only real positive point is the 88fps, which puts it somewhere between Balanced and Quality - perceptual quality lining up quite well with rendering cost rather than raw internal resolution. Finally the lower right clip is from CAS applied to the 75% basic upscaled option and here we are given an interesting comparison point - this is effectively almost identical to Ultra-Quality in internal resolution and enjoying a sharpening pass, the only difference is the FSR upscaling (assuming CAS does genuinely use a different code path and so still uses the basic upscaler). I would suggest opening the full sized captures and flipping between them if you really want to assess the differences and why this is running at 80fps when UQ sat at a flat 75fps (with only a tiny increase in pixel count). To my eye, CAS on top of this 75% internal res basic upscale is visibly (if subtly) worse at dealing with edge detail. It's also slightly behind on bringing out that ground texture. Much better than the 75% without CAS, but also losing 10% performance to pay for the sharpening pass. The palm tree fringes, the detail both internal to surfaces and at their edge: I think UQ at 75fps is showing that FSR is more than just the latest generation of CAS (CAS-Plus) and worth paying for on top of the existing CAS performance cost. It's not competing with native res but then that's sitting at 64fps (and when things get more taxing, it takes a big hit).

The image above on the right compares four versions of the main base, from leftmost: Balanced, 75% basic upscale with CAS, Ultra-Quality, and 100% natural (no CAS). The thin geometric detail quickly makes plain the difference in underlying internal resolution and is why I like the idea of a next generation temporal solution that could, at least when the scene isn't too busily moving, have a good chance of recovering all this detail at a much lower per frame rendering cost. There's nothing "wrong" with the middle two results (again, I think that you can make out the difference in FSR vs just CAS in how those thin edges are preserved) but they are clearly on a progression towards the leftmost option, which is starting to show breakup of fine detail into aliased blobs and mild posterisation of the texture detail.

75% CAS traditional shadows
Perf (50%) RT shadows Medium

Another way of looking at FSR is that it unlocks new quality settings at the same output resolution and framerate. Above I managed to get RT shadows (at the lowest quality) enabled via the Performance profile and have compared it directly to the more primitive traditional shadows offered (RT does also use more dynamic lights, but these seem to mainly have an added cost when in the scene rather than at daytime with a single dominant lightsource) while using CAS to tweak a 75% internal resolution. Both scenes have more aliasing than I'd ideally like but the RT shadows rendered at 1080p and not the more detailed quality setting combined with the loss of texture detail makes the scene look significantly worse to my subjective evaluation. It is nice to be able to drop all the way down to 50% internal resolution (where a basic upscale would be significantly worse) but the trade-offs are not where I would go to try and unlock new effects, some of which need at least a bit more resolution than is being fed to them by picking low settings at low internal resolutions. Sometimes the best answer is new hardware after five years using something as your daily workhorse. And I'm left with an open question of if that aliasing and softness could both be sorted out (and even unlock lower internal resolutions, without leaning on FSR) if an integrated jittering TAA with Upscaler was offered - especially in scenes like the one above that contain a lot of stationary or slowly moving elements.

As I played through this beta of The Riftbreaker using a range of settings (and experiencing the quite different performance of different sections), I definitely appreciated being able to claw back performance with better image quality than the basic upscaler could provide on top of the mainly-clean TAA presentation. Right now, it offers the ability to at least look at the new ray tracing options at interactive framerates or to get much the same feeling via UQ to a native render even if it doesn't quite look the same under detailed inspection. In motion the bluring of marching ants wasn't ideal but it also softens the intensity of what would otherwise have already been a visible TAA failure. The sharpening here seems quite subtle and rarely something to negatively note adding extra artefacts. In fact, the main issue with dropping down the quality scale into the lower resolutions is my personal preference against the visual result of the FSR pass having to reconstruct a lot of data and producing slightly weird smoothing - fine in motion but something I'd like a VRS-like or temporal solution to be able to spend extra rendering budget on avoiding starving for crunchy detail when it might otherwise be available.

Dota 2, 100%
The Riftbreaker, 100%

In Conclusion

I have had some concerns over FidelityFX Super Resolution, including holding somewhat of an unflattering mirror up to these two implementations we've explored today, but my summation is actually quite positive. As I've mentioned before, I've seen more than a couple shipping sharpening and upscaling solutions that seem to actively work against the underlying renderer's quality. FSR here has performed admirably on two similar canvases (top down terrains filled with creeps) which use completely different engines (with different feature levels) and totally different anti-aliasing solutions. As internal resolution dropped, both showed increased shimmer but it seemed to be driven by underlying aliasing issues not lack of temporal stability of the spatial-only FSR technique - my leading concern going into this. Beyond a certain point the internal resolution simply doesn't have enough information to avoid some slight weirdness (often mild posterisation) in how it recovers detail without using additional samples (like a history buffer) and I've seen plenty of worse examples than anything I've seen so far with FSR - DLSS 1.0 certainly had more than a bit of weirdness to it.

It seems from my inspection that this is a good future for evolving FidelityFX Contrast Adaptive Sharpening + Upscale and that, especially if more developers provide the power for end users to tweak their own preference for sharpening strength within the bounds the developers consider reasonable, this offers performance without major sacrifices for image quality (until dropping far from the "Quality"-named end of the scale). And, as you can tweak which internal resolution FSR operates at, users can make very informed decisions about which subjective quality they are more interested in boosting. When GPU bottlenecked, the performance cost of FSR is more than reasonable, only slightly increasing the price of the latest CAS pass, and handily goes beyond the blurred result of offering a basic upscale (when comparing at the same output resolution and framerate - ie the lower internal resolution to pay for the FSR pass more than pays for itself vs simply using the cheapest upscale option). The sharpening is mainly adding local contrast where it improves detail while only mildly increasing the visibility of aliasing issues, which are actually just as much of an issue for the upscaling part of the process - often stretching them over more final pixels with somewhat of a blur and not able to reconstruct fine lines the internal resolution couldn't capture properly.

Should you integrate this into your hobby engine? We may have to wait on the source code release to see exactly how easy it is to integrate (I would guess: very easy) but if you've not currently got a good upscaling option and you're not looking at this to replace adding a good anti-aliasing solution (because it is not that) then FSR will definitely be easier than hooking up a complete TAAU solution (or DLSS 2) and tweaking the temporal jitteriness that they all seem to have early on. We will have to see how the next generation of TAAU and DLSS (or competing AI-enhanced anti-aliasing, upscaling, and sharpening algorithms) progress. In the long term, I think we will all join that future. Maybe by version 2.0 of FSR, there will be an optional temporal component that evolves what is possible if you can feed it a history buffer.

Sunday, 30 May 2021

Fewer Samples per Pixel per Frame

In my VR roundup, it turned into a bit of an impromptu comparison between various anti-aliasing techniques inside one of the most challenging environments we currently have. VR restricts acceptable (input to photons) latency, so can limit pipeline/work buffer design; uses relatively extreme field of view (close inspection of pixel-scale details) combined with ever-increasing raw pixel counts of screens; and demands more than 60 fps with good frame pacing. Add in lens distortion and a temporal reprojection emergency stage (to avoid dropped frames) and it means even without TAA, you’ve got distortion and potentially an extra reprojection stage exaggerating artefacts in the frames you do render.

I think we’re at another rather interesting point for anti-aliasing techniques, as demands for offline-render quality real-time graphics at high resolutions with fewer compromises (like screen-space effect artefacts) enabled via ray tracing acceleration becomes mainstream. Per pixel shader calculation costs are going to jump just as we saw during the adoption of HDR/physically-based materials and expensive screen-space approximations like real-time SSAO. Samples per pixel per frame may not be forced to drop as quickly as consoles jumping from targeting 1080p to targeting 4K but we are going to need some new magic to ensure a lack of very uncinematic aliasing and luckily it looks like we’re getting there.

Sampling History

It is 1994 and I’m playing Doom on my PC. The CRT is capable of displaying VGA’s 640x480 but due to colour palette limitations most DOS games run 320x200 and Doom’s 3D area is widescreen aspect due to the status bar taking up the bottom area. To make matters worse, those of us without the processor required to software render 35 frames per second (Doom’s cap, half refresh for a VGA CRT’s 70Hz) would often shrink the 3D window to improve framerates. All of this is very common for earlier 3D games (I remember playing Quake 1 two years later similarly), which often had difficulties consistently staying in the “interactive framerate” category. For most it was a dream to output near the maximum displayable image while calculating an individual output value for every pixel of every scan-out and that limitation was not primarily due to early framebuffer limitations.

It is 2004 and I’m playing Half-Life 2. Rapid advancement then convergence under a couple of API families for hardware acceleration has meant most of the last decade provided amazing 3D games that grew with hardware capabilities (even if many earlier examples contain somewhat arbitrary resolution limitations). Even 1998’s Half-Life 1 has quickly jumped past low resolution 3D consoles like the PS2. Super-sampling (SSAA) where every final pixel was internally rendered several times then blended (used extensively for offline rendering) was usually too expensive, especially as screen resolutions continued to increase (initially for 4:3 CRT then LCDs moving to 16:9). But by this point, it was standard to use MSAA to blend samples from different polygons that partially covered a single pixel (the saving being that if multiple coverage points were covered by the same triangle, the shader for the final value was only run once, unlike SSAA). Two years later, nVidia would introduce CSAA to allow more coverage sample points than cached values, making it even cheaper to provide very accurate blending between polygon edges. It was even possible to mix in SSAA for transparent textures, where the edge of the triangle is not where the aliasing happens. Note how those 2006 benchmarks are already showing PC games running at the equivalent of 1080p120 with limited MSAA or 60 fps with many many samples per pixel.

It is 2014 and I’m playing the recent reboot of Tomb Raider. MSAA continued to get faster and better in the intervening decade but unfortunately the move to deferred rendering made it extremely difficult to implement efficiently into newer engines (it is not possible in Tomb Raider, although some deferred renderers did get hacked by nVidia drivers that injected MSAA at an acceptable performance cost). The answer to major aliasing, which had been developed during the xbox 360 generation of consoles, was to run a (MLAA) post-processing pass that looks for high contrast shapes typical of aliased lines and then employ a blur to ease the sudden gradient. This technique requires very clear aliasing telltale line segments so smaller detail like foliage systems become a huge issue, which really stands out in the sequel, Rise of the Tomb Raider. It also completely fails if you apply the pass after doing some other image manipulation that distorts the telltale shapes or edge gradients.

In this 2014 era, the use of HDR intermediate values later tonemapped down to the output range, which was just emerging after HL2, also makes it so that internal calculations can output a much wider range of values and with only one sample per triangle per pixel, a new sort of temporal aliasing become dominant as the sampled locations move enough for slightly different angles to be calculated grazing incredibly bright light sources in sequential frames. Surfaces sparkle and flicker in regular patterns that become at least as distracting in motion as classic polygon edge aliasing, as I mention in my Dragon Age retrospective. A combination of the two aliasing types is easily recognisable where an angle creates a strong lighting highlight along the silhouette of a surface that may be less than a pixel wide, creating light ants crawling along those polygon edges which are too thin for MLAA to catch. A better solution was required. (And you may note the journey isn’t over as I just linked that to a trailer for a 2021 game with an engine that already uses...)

Temporal Accumulation

The problem is clear. By 2014 we are generally using one (complex) sample per pixel per frame and due to fine geometric detail (older games lacked) plus an extreme range of possible lighting values (not to mention potential ordering issues in how various stages of calculating light and darkness components are blended) this is creating pixel-scale aliased elements that are also often not temporally stable. The screenshots look relatively good but in motion anyone with flicker-sensitivity is immediately distracted by aliasing. By this time the shaders have also become complex enough that various motion vectors (showing how far the object under each pixel has moved in the previous frame) are starting to be calculated to enable somewhat accurate motion blur to be added (very important on consoles targeting 30 fps, where this provides extra temporal information missing when not using higher framerate output - it’s also “more cinematic” because most people are used to 24 fps movies with a 180 degree shutter so accumulating all light that hits the lens for 1/48th of a second before closing the shutter for another 1/48th of a second).

Those motion vectors, if they are sufficiently accurate, can point to the pixel location of the object in the previous frame. So expensive effects like real-time ambient occlusion estimation (checking the local depth buffer around a pixel to see how occluded the point is by other geometry that would limit how much bounce lighting it would likely receive) becomes an area of experimentation for temporal accumulation buffers. Sample less in each frame, create a noisy estimation of the ground truth, and filter for stability while reprojecting each frame along the motion vectors. Here’s a good walkthrough blog from this time period and subsequent refinements have worked to deal with edge cases like an incremental buffer not handling geometry arriving from off-screen (causing some early examples to obviously slowly darken geometry as it appeared along the edge of the screen).

As seen shipping in 2013’s Crysis 3, temporal accumulation for reducing aliasing not only presents the answer to MLAA’s limitations but also can operate after a cheap MLAA pass to rapidly reduce all aliasing. If you consider a slightly jittered pixel centre location (a common enhancement) then a static scene under TAA effectively generates SSAA-quality images, only spreading the samples per pixel out over time. It was popularised further by nVidia with their branding of the process as TXAA, shipping in games in 2014. Some early implementations had major ghosting issues from motion vector precision and understanding when to reject a previous frame’s data as not contributing to this new location. The actual complexity of this problem becomes apparent when you consider how objects in a scene may have changing visibility (especially during motion and animation) or output values (consider a flickering light and the subsequent illumination between frames). Progress has not always been uniform and a couple of times I've stumbled upon an anti-aliasing fail state that's hard to even explain (Dishonored 2 doesn't have very satisfying TAA due to ghosting thin elements and I don't know what the MLAA is doing here to achieve what's visible in this capture). It is a process under constant refinement but in today’s best temporal accumulation implementations it is often relatively rare to see obvious issues. As mentioned, it also errs on the side of a softer final frame so can be combined with a sharpening filter. Unfortunately this can be handled poorly, effectively paying the computational cost of TAA while then also reintroducing exactly the obvious aliasing that it was meant to remove. It also doesn’t help if your TAA implementation is broken on a platform.

Ray Tracing with DLSS and The Future

In the last couple of years, the new hotness that really explodes the computational costs of working out a stable final value of each pixel in a frame of a modern game is real-time ray tracing. Thanks to nVidia looking to brand the future, they have shipped all RTX GPUs with dedicated silicon to accelerate BVH intersection tests and machine learning tensor operations (big matrix multiplies, often with sparse data) and at least the former part of that is now also available on current AMD GPUs and consoles plus upcoming Intel discrete GPUs. If you thought the aliasing issues from rasterisation going to physically-based materials and HDR were a concern, welcome to a problem so far beyond that that if you look at the underlying data from a single frame using around one sample per pixel, it looks more like white noise than a coherent scene - accumulation with temporally reliable motion vectors is a must and site of ongoing research. The addition of Tensor cores to RTX GPUs was initially proposed as the place to run AI denoising on that ray tracing output, although most games today still denoise in the general purpose shaders. Luckily, another branch of research was to use those Tensor units to AI-accelerate all anti-aliasing and it has been wildly successful with many reviewers now noting that DLSS 2 outperforms native resolution TAA.

DLSS 1 was a bit of a mixed bag as the AI had to be trained on each game and took an aliased lower resolution image from the game then applied the classic AI Super Resolution techniques to “dream” or “hallucinate” the missing details and softened edges. However, DLSS 2 changed the inputs (this presentation originally convinced me AMD would add AI cores to RDNA2) and so required a buffer of previous low resolution input frames (including depth buffers and motion vectors) while removing the previous individual training requirement, effectively giving the AI the power of temporal accumulation information to generate the final output. So each new frame generated by the game can be run at a much lower resolution than the output, reducing the samples per output pixel, and yet will retain the look of a cleanly anti-aliased native resolution render. We are back to 1994 but rather than peering into a small box, the games look almost as good as offline rendering and output fullscreen. Even when not trained to give the exact same result as native processing, the AI seems to be quite stable and creates pleasing results in motion. It’s a game changer when targeting new screens that can accept 4K frames at or above 120Hz.

But nVidia do not have a monopoly on upscaling while anti-aliasing and more significant upscaling without compromises will be the new normal if my reading of the tea leaves (on samples per pixel per frame) is correct. Reusing information from previous frames is clearly a smart efficiency saving as long as we can reliably determine what information is useful and what isn’t (avoiding failures that create significant artefacts which are as distracting as the aliasing we’re trying to move beyond or the framerate drops we’re trying to avoid). The target of 4K on the PS4Pro forced engines to pivot to smart upscaling strategies such as the use of checkerboarding and a rotated tangram resolve in Horizon: Zero Dawn, reducing GPU costs of each new frame by alternating which pixels in a checkerboard were rendered (then blending on the diagonals for that frame while adding in contributions from the previous frame). Recent years have seen an excellent execution of targeting the fixed scan-out time of non-VRR displays by managing the rendering load around modifying the internal render resolution then upscaling for the final presentation (usually with native UI compositing over the top for maximum text clarity). Even when dynamic resolution scaling is not available on PC, it has forced renderers to provide visually pleasing upscaling that gracefully handles even fine texture transparency and pixel-wide polygon details.

The Medium, TAA 50%
The Medium, TAA 75%
The Medium, TAA 100%

The last few years of Unreal Engine 4 have had quite a clean TAA with integrated upscaler (sometimes called TAAU) for dynamic internal resolution (it tracks the sub-pixel jitter so the samples can be correctly distributed even when changing the ratio of internal res to output res; primarily used on consoles, where the APIs for precise frame time calculation and estimation have existed for longer and the fixed platform make it easier to define an ideal internal resolution window for reliable results that still come close to maximising GPU throughput - the skill is not underutilising the GPU by being too conservative and so being ready for scan-out milliseconds before needed). In the best cases, I am completely happy to run UE4 around 80% resolution (just under 1800p) and let the TAA upscaler reconstruct a soft and clean final image on my 4K PC big screen (getting close to home cinema levels of consuming my vision so making aliasing issues more apparent than someone looking at a distant TV or small monitor). It doesn’t compete with DLSS (in Performance mode that is a 50% resolution so 1080p internal renders when the output is 4K) but then head to heads show DLSS 2 reaches close to image quality parity with UE4 TAA running at 100% internal resolution on PC so clearly dropping down to 1800p is under 70% of the actual sample count (previous percentages are edges vs sample count is area) and ensuring a relatively aliasing free result without AI will err on the side of softer than DLSS Perf. The above captures from The Medium show a clear quality loss at 50% while the differences at 75% are more subtle compared to native internal resolution. The captures from Man of Medan below are where I think TAA with some upscaling is showing quality levels that you would not even imagine possible in the MLAA era (expecially noting these captures have significantly fewer samples per pixel per frame than those games from a decade ago).

Man of Medan, TAA 85%
Man of Medan, TAA 85%
Man of Medan, TAA 85%

With the public release of Unreal Engine 5’s beta shipping with default-enabled Temporal Super Resolution, we are looking at the beginning of non-AI (or at least not running on Tensor cores) TAA plus upscaling that aims to hit the same milestones as DLSS when it comes to low internal resolution. The PR for the UE5 release announces 1080p internal render resolution, aiming to hit the quality bar of 4K native. That is an ambitious target and running the editor (which also uses UE5 TSR by default) there is a lot to appreciate about this beta’s visual quality, well beyond the 50% screenshot above from UE4’s technique (and that was already significantly above some previous branded sharpen plus upscale techniques as implemented in shipping games). We are approaching a point where continued refinement of this path of research will be able to pick away at the final issues and retain detail without turning the results into a mess of sharpening halos or lingering aliasing. From there we have a far more interesting future in which some games will be able to explore the artistic choice to reject such smoothing, rather than fall into them via broken PC releases, or even take the performance wins of significant upscaling while tweaking output to retain more of the underlying grainy component of ray tracing or other contributions (while adding noise to areas where it does not naturally occur and so approach something close to movie film grain that actually looks good but reduces render cost rather than increasing it slightly).

Edit (June 2021): This was written on the assumption that the imminent reveal of AMD's FidelityFX Super Resolution would confirm a very similar technique to UE5's Temporal Super Resolution, directly chasing after DLSS's impressive results at similarly low internal rendering resolutions (using fewer samples than checkerboarding and far fewer than where other TAA upscaling, such as in UE4, shines). It has since been announced that AMD are zaging where others have zigged and will not be using a temporal solution. Worryingly this has come with rather weak results on the one pre-release promotional image used to sell the technology. As I mentioned above, DLSS 1 did not come out of the gates a winner so AMD have plenty of time to iterate or to provide an open equivalent that replicates what Epic are doing with UE5.

Friday, 30 April 2021

VR Review Roundup 1

Since last month, I've had some time enjoying my new PC VR setup [I also switched domain hosts so if anyone has had any problems with this site or any of my other subdomains, let me know]. I can definitely feel the growing room that exists for really pushing the fidelity of this VR display (and with displays only getting higher resolution from here, that will continue), so all of this is currently being given with the caveat that we are fast-approaching the fifth birthday of my GPU - at some point I'm going to be able to get more games looking nicer or enjoying far less time viewing reprojected alternating frames, maybe even at the highest 144Hz refresh rates that this headset can do. One of the difficulties I have in VR is always being able to be as analytical as I'd like while wrapped inside the virtual space and the default tools for capturing moments are not raw grabs (into the actual deformed view fed to the headset) while adding actual DirectX frame capture into the rendering chain might mess with latencies etc (I've yet to look into it). Let's run down some notable things I've played recently and what my perceptions are of the rendering going on:

No Man's Sky

I just couldn't get this working correctly. Not sure if I'm still finding my "PC expert" legs on how to set things up correctly for VR but the flying-through-space loading screen (along with very unstable movement to photons delay) was enough to make me feel slightly unwell & the framerate once I'd landed on a planet simply wasn't where it needed to be (even after tweaking the Index scaling option well below the automatic value). Maybe I needed to poke more at the in-game settings or wipe my previous config file (from before VR was patched into the game) because aiming for 2D 4K60 and aiming for VR numbers are not remotely similar optimisation processes and the game isn't reading the SteamVR requested resolution correctly. It's important to not take this initial post as being my final decree on modern PC VR (from the perspective of someone who previously has mainly been configuring console VR experiences) - it is still early days and I'm still finding my legs for tweaking VR games.


Star Wars: Squadrons

The scale is wrong. Digital Foundry noted something similar during their stream of Doom VR for PS VR last month and it's immediately very noticeable as soon as you get into this game. When sat down, the floor in the game is about at the level of your actual floor but everything is scaled as if you were standing up. If you do stand up and reset the position (your screen initially going to black as soon as you move out of the sweet spot it expects you to be in, avoiding letting you walk and clip through too much geometry) then the virtual floor is clearly at the wrong depth, as you might decide in a game designed only for seated play.

Getting into the game design choices, the cockpits may be accurate to the fictional universe but the often limited front-only (or slit window) view into space removes the field of view advantage of VR while the instruments feel insufficient to give a good sense of where things are around you (again, this may be true to the source material but I'd much rather they offer "upgraded" in-world interfaces rather than lean on an optional floating HUD). There was a later mission around setting off floating reactor cores as large ships passed them (while also skirmishing with fighters) and I realised I basically didn't have a good idea of the 3D space while playing for the majority of that mission. That seems like a failure to really utilise what VR can do. It didn't help to find plenty of threads of others swearing out that mission design, despite being something that theoretically should be cool if it was easy to judge 3D relationships - ultimately I restarted the mission rather than keep banging my head against the third checkpoint, just so I could swap to a ship with a somewhat better canopy and by then I had almost learned it rote (fly here, shoot this, then fly there, shoot that at this timing, etc) so if that had involved a more dynamic setting then I might well have just given up.

The Frostbite temporal anti-aliasing is surprisingly good (considering FoV & pixel density requirements) here. Zero ghosting issues, even with the added difficulty of regular reprojected frames because I couldn't get a high res VR output at a stable frame time budget close to what you'd want, even with lots of settings dropped and including the new [lighting: Low] forward renderer mode that was patched in precisely to try and offer higher framerates for VR. The way fine detail starts to flicker out of reality at certain distances can become visible (even in only one eye at a time for a real headache) but is generally very rare and it's a lot better than constant "army of ants" edge aliasing (especially how that works the other side of lens inverse distortion to be even more distracting than in 2D). As we get higher and higher res panels in VR, we will need to find a better solution than brute force (very high res super-sampled internal rendering) for cleaning object edges and DLSS or TAA (without introducing significant latency) seems like something that's going to be the future (not just for 2D). I was also recently playing Battlefield V (with settings trying to hit a stable 4K60) and the TAA there caused significantly more issues with thin objects fading out of existence so something in the TAA used here (with a decent 'TAA sharpness' slider that's not 90% way way too much sharpening for anyone to want) felt like the best of what EA are doing.

I was definitely far more aware of polish issues than any aliasing flicker or eye discrepancy. Plenty of walk animations seemed to have not been sorted to actually plant feet on the ground so ended up with very obvious skating feet. By no means is this just a tick-box "IK enabled" fix but it's a lot closer to a solved problem, slightly weird to see not working correctly, than it was a decade plus ago (when I was doing some light animation work for video games). Reflections on the black gloss floor of Imperial bases constantly showed that they were not well aligned with cubes for the static positions from where the player would be observing them (it seems like they could have generated enough static cubes as you teleport between very few view locations and with no free movement or room-scale VR due to the screen fading to black if you moved around).

SuperHot VR

Now here is where I wish I knew how to really prod a game (maybe one of the Unity tweaker/console tools could provide some aid). This style would be perfect for some MSAA anti-aliasing that did more than mild super-sampling around the edges (which wastes shader perf on repeatedly sampling inside basically flat-shaded triangles while undersampling at the edges where the sharp contrast demands the best). The game as shipped doesn't even seem to be able to offer the post-AA (FXAA) that the non-VR games from this team have integrated. And you can't inject FXAA at the driver layer because the inverse warp for the lenses will remove the clean aliased lines that the morphological pass is looking for (assuming the nVidia driver doesn't detect VR titles and disable such tweaking entirely). While moving the Index scaling option clearly affected framerates, the aliasing never cleaned up significantly.

The game itself is still just as fun as it was when first released but, even with tweakable internal res on PC, it's still a long way short of where I would hope it could get to visually (and will likely not be getting any more updates that could add better anti-aliasing as the teem are almost finished with the 3rd game in the series, which is not in VR, then probably moving on to new things). Even a lot of brute-force super-sampling will possibly only go so far to fixing those incredibly sharp aliased edges accentuated by the game's style - something where you're wondering if something in the pipeline explodes if you push beyond 8K rendering so it'll never be viable to do so even with GPUs several generations out. You can definitely get immersed in the experience and have it bother you slightly less over time (especially if you push up refresh rate so you're getting more temporal data rather than letting aliased frames linger, something faster GPUs certainly help with) but quite a few games I've sampled seem to have decided that AA, even a cheap post-AA pass before distortion, isn't in their performance budget and I really think it's not paying off vs targeting a lower internal res but with an AA method enabled. Of course, on PS VR you often had the combination of a low internal res and no AA so at least on PC things are always less bad.

Tetris Effect

All the games I'm talking about this month provide a contrast of different techniques and rendering challenges. I talked about this on PS VR several years ago. It was one of the best games of the year in 2018 and the multiplayer mode is a nice addition in 2021 (but not really why I come to Lumines-style games) so it's still great today. The fidelity here is clearly better than on PS VR, although I found that super-sampling can push down the framerate below where it might be (even with only a 90Hz target rather than pushing towards 144Hz) without ever really making it feel like every sharp edge is anti-aliased (in combination with the FXAA the game uses). Much of the amazing particle work doesn't need AA (despite the High setting defaulting to 150% super-sampling the entire scene) and those semi-transparent particles probably causes major issues with trying to enable a turn-key AA solution so it's a shame someone hasn't built a more bespoke solution that merges the various different techniques each element of the scene needs while maximising performance (to hit the high framerates and native resolution needed for this generation of VR headset).

As with Rez (which I have not yet tried on PC, waiting for a sale to buy a second copy for a second system), there is something I find deeply pleasing about the audio-visual combination here and the soundtrack brings out the in-built speakers on the Index when cranked all the way up. There is certainly not the same deep bass you'd get from a subwoofer (it would be interesting, if not ideal for those living in apartment complexes, to be able to feed the LFE channel to a separate audio device in a game that bothers to enable a second rumble device for additional haptic feedback) but it's not bad. This isn't tinny (which is always the fear when doing something like off-ear small speakers) and is at least as good as a quality set of in-ear canalphones, but with the potential here for better positional audio because it doesn't ever feel like the audio is originating from inside your head.

I couldn't get the Index controls working exactly how I wanted them (anything linked to the right analogue stick is locked out despite the VR mode not using the right stick for anything so I couldn't rebind it; for some reason the individual buttons & pressable surfaces on the Index controllers did not all seem to turn up in the menus, which seemed designed assuming Vive or Oculus layouts and even recommending you not use those but rather plug in an Xbox controller) and this is a bit of a recurring theme in games that I have poked at. The Index controllers are a bit of a variation on the Vive designs, which changes the angle it thinks "forward" is from them but also shuffles the inputs around so that you're sometimes wondering exactly what the game is expecting when an icon from the Vive pops up. It's something that likely won't get ported back into older VR games and hopefully Valve will provide free engineering time to assist VR developers integrating prompts and defaults into their current or upcoming releases. This game is totally fine with a very old 360 controller (as long as you map things off the d-pad, because you can't drop and move Tetris pieces with a d-pad that poor at reading precise inputs) but I'd really like it to be pick-up-and-play with the Index controllers.

Half-Life: Alyx

And here we reach the culmination of a lot of VR work. Valve created an updated version of their engine and built the next entry in the Half-Life series around the development of a new headset and controller update from their earlier cooperation on the Vive ecosystem. That is the Index headset and controllers I'm currently using. This is exactly the game you expect it to be from the developer who have infinite money and time (but seemingly far fewer developers than studios that scaled out when AAA asset creation demanded it) to iterate on their previous design ethos: constant innovation during play. When I discussed Killzone: Shadow Fall in 2014, Half-Life 2 was the obvious title to compare it to when talking about combining a narrative progression with first-person gameplay variety. And that's exactly what we get here in VR, a slow development of new tools and ways of interacting with the world that also slowly eases between several genres from action to horror. Very early on, when you're introduced to the power of 'gravity gloves' to point at an object and pull it towards you, it becomes obvious, "surely everyone should be doing this!" That's the Valve magic: making something that feels like it's the only answer and something everyone else must adopt because it so cleanly solves a problem (you don't want to have to physically slowly move over to pick up every little thing while keeping the action pace up moving through a gaming environment).

On a technical side, this engine is doing exactly what all the early best practices notes (which came from engineers pushing VR like the team at Valve) said you should. Get back to forward rendering (use forward+ or similar clustered options if you want many real-time light sources that your deferred renderer was enabling at high framerates), go back to classic MSAA, and try to get a lot of pixels rendered while maintaining modern geometry and texture detail. Step back to less dynamic lighting if you have to, which is already something HL2 was excellent at mixing to hide just how much wasn't part of some unified real-time lighting solution. The end result: a very sharp result and something that I fully expect to really sing on future hardware (both higher res headsets than the Index and the future GPUs that can drive them at high resolution while hitting 144Hz native). The only thing that actually feels extremely outdated is the level loads between sections, something a level streaming solution could surely have completely alleviated.

As to how it looks on my older GPU driving the Index? At points it's a touch too sharp for me. The textures can crawl and alias a bit in spots and the edge anti-aliasing is good but not perfect. I'd prefer a softer output that manages to deal with shader aliasing, even if it might have more issues around transparencies and thin edges (here using super-sampling on texture transparency, the old classic that we don't see so much of in 2021 but really made the chainlink fences pop in 2005 in games like Half-Life 2). But beyond some mild criticisms, it holds together really well. That's why I think it'll work very well in the future (selling an entirely new generation of headsets on PC and presumably even console). Unlike some of the other games, I think you could pump up the internal res and maybe integrate some VRS or even DLSS to boost output resolution without linearly increasing GPU load (spending your fidelity more smartly with VRS Tier 2 or simply letting AI magic clean aliasing defects while chasing a fixed frame time with DLSS 2.1) and so remove those small criticisms without demanding a radically more powerful GPU.

Thursday, 4 March 2021

My Present/Presence in Virtual Reality

I did not get to experience the CRT-based early consumer VR hype, because that stuff basically failed to make it to market in any real sense and was at trade shows before my time covering the industry (let alone working somewhere that got sent dev kits). But I did enjoy the early success of consumer stereoscopic 3D gaming. I jumped into both the 1999- (Elsa Revelator) and 2008- nVidia 3D Vision ecosystems (first on a CRT and then on a high refresh rate LCD) and while the second push also came with some 3D movies (as cinema chains tried to find a new reason to spend too much to go to the movies), the main draw for me was always interactive 3D experiences. As long as you kept your head still & tweaked with the 3D settings, you could get an impressively convincing window into a miniature 3D world. Things just feel different when you can use your vergence to focus on different elements in the scene (because things close to you & distant cannot both be in focus at the same time so you have to pick which is double - although this tech does not simulate the soft-focus that reality provides to the out of focus depth) and have successfully trained yourself to disable your accommodation-convergence reflex (current VR has the same limitation but we're on the edge of consumer eye-tracking that could allow renderers to apply depth-of-field based on gaze tracking).

When consumer VR was back in the "maybe this could work at consumer prices" stage of Kickstarting in 2013, I started paying attention. Take a high resolution consumer phone screen, add some lenses, and read from the ok quality gyroscope/accelerometer package that phones also now include, and you're just some better motion tracking away from a real VR setup. Without that tracking, there isn't quite enough precision for rotation and you've got a major issue with drift (which you can see with your phone, if you've ever tried to do something fancy with that sensor package) plus the accelerometers are far short of what you need for sub-mm position tracking (as you'd expect from having to do a double integration from acceleration to velocity to position with no external validation).

In 2014 I got the Oculus Rift DK2 to try out a few projects for myself. This fuses the sensor package readings with an external IR camera looking for IR LEDs on the headset. The low persistence displays (you can't leave the image up until the next frame because the headset will be in motion and this smear can cause huge issues - I believe the good series of blogs on this by Michael Abrash all got purged from the Valve servers at some point last year but Archive.org remembers) offer up to 960x1080 PenTile per eye (half a 1080p screen, assuming the lenses go right up to overlapping views with your eye position in the headset) at a maximum of 75Hz. It's dev hardware, but it was only $350 and kinda works. The real issue for me was the PenTile pixel layout because that was a major thing for OLED phone panels at the time and means for each input pixel you only got two colour elements rather than the three of RGB. To me, while we're often talking about the bandwidth limits of higher refresh rates and 4K displays or the GPU load of calculating each sub-pixel's value, effectively throwing away a full third of the information when it hits the display (because the red and blue channels are half resolution on the actual screen) seems like a waste. It also means that the number of individual dots of light in the headset you're looking at is only twice the pixel count (skewing comparisons with RGB layout panels in other devices). Some early consumer games played on the DK2, although I seriously doubt everything released in the last couple of years would work (even if you can accept the quality limitation) as I don't think the current SDK still supports the very early dev hardware.

In 2016, I got a real consumer VR headset with PlayStation VR. $300 got you Sony's spin on their existing line of personal 3D viewers (which I'd always seen advertised as a way of looking at a movie on a plane in a virtual cinema) and the big upgrade from my DK2 was an RGB OLED layout at the same resolution (so that's 50% more individual points of light from the sub-pixel count increase) and up to 120Hz. The camera used visible not IR light to track things and reused the PS3 motion controllers if you wanted to play something not designed to work with the motion-enabled DualShock 4 (default PS4) controller. The big setback: it was released around the time of the PS4 to PS4 Pro transition and most software was mainly made assuming the rather paltry GPU inside the 2013 PS4 (which, even at release, was not even a particularly high-end customised AMD). An external box added support for virtual 3D audio and 2D pass-through to a TV (some games even made social experiences where the players on TV saw something completely different to the person in PSVR). A lot of games seemed to rely heavily on reprojection to double the effective framerate and the tracking was not great (especially for controllers which were either actual PS3 motion controllers repurposed & never intended for exact tracking or a standard controller that likewise was not originally designed for sub-mm tracking because it was just bringing forward the legacy support from the Sixaxis "we were fighting a haptics patent so couldn't include rumble in the PS3 controller so I guess have some motion sensors" controller).

In the before-pandemic times, I also had access to (but never had at home) the commercial first revision of the Rift and HTC Vive. Both 2016 headsets, both 1080x1200 per eye PenTile OLEDs (two actual panels, not one screen with lenses aiming to almost overlap) at 90Hz. At two and half million sub-pixel elements that's actually a lower dot density than the PSVR (about three million) but the advantage is everything expects a higher resolution and modern PCs can really drive those rendered pixel counts up (even using anti-aliasing) as the GeForce 10 Series was out by 2016. The Vive is interesting because it doesn't use a traditional camera for drift correction or sensor fusion; rotating lasers in base stations provide a moving slice of light for objects to orient & position themselves within (synchronising with a wide IR pulse to know the timing of when in the rotation the laser hit them). For the last year of lockdown, I've had no access to this kit and I'd not really used it for a year before that. So I've basically been PSVR-only and while the exclusives have been good, stuff like Resident Evil 7 sure does seem like it'd be better if it wasn't tied to that console GPU. Both the PC headsets have been superseded by higher spec updates but I've not seen anything of them up to this point.

That is, until a week ago. Thanks to the incredible generosity of someone reaching out and offering to ship me their Valve Index VR kit, I now have a modern PC VR headset at home. The Vive was codeveloped by Valve so they decided to take the lead in 2019 and release their own branded kit. The same base station tracking tech but here paired with a headset that offers 1440x1600 per eye RGB from 80Hz to 144Hz (and a somewhat higher field of view than any of the other kit I've used). The audio uses portable "ultra near-field" speakers, which sounds surprisingly good (considering I normally use in-ear or closed headphones which provide good conduction) and doesn't block out sound from the outside world (otherwise it can feel a bit like you're extremely vulnerable when immersed in the presence of VR). I'm glad I can stand in a quiet room so get all the benefit of off-ear sound (you don't need to simulate the distortion of your ear shape because that process still happens - preventing the sound from feeling like it comes "from inside your head") and it continues to be immersive.

The other huge update is the controllers. My limited time with the Vive was using their motion controllers (lot of time with the Rift & PSVR was using traditional wireless gamepads) and the Index controllers are certainly a refinement of that basic idea but rather than holding onto two sub-mm tracked devices, these you tie to your palms and so can entirely let go. The importance of precise tracking can be seen in how you put on the VR kit: with PSVR you need to know where the controller is before you put on the headset; with an Index the controllers need to be switched on but once you put on the headset you can easily walk over to the controllers and put them on using their 3D rendered virtual versions. I'm almost ready for the future where we go into VR by putting gloves on. Yes, you'll never beat the haptics of a real button press or trigger pull but, for a lot of VR experiences, actually having some virtual hands is all you need. This has opened my eyes to where VR gaming isn't just traditional gaming but with fully-immersive environments and extra input from head tracking. With the next generation of devices, gaze tracking should provide even more efficient rendering (only render the highest resolution where you're looking) but also entire new interfaces that are controlled with a look and a hand gesture.

Up next (after maybe a couple more weeks of dipping into all the PC VR experiences I've been missing out on): what are my actual impressions of playing various things?