Tuesday 24 March 2020

Next Gen: Feeding the Beast

No one attended GDC due to the global pandemic (I was considering pausing this blog for some months during the crisis but decided having more things to read during world events wasn't a bad thing at the current time) but we still started to get the technical deep dives on the architecture of the new 4K consoles from both Microsoft and Sony. (I will try to avoid just repeating what is said by both architects.)

I will not, sight unseen, be telling you which numbers are slightly bigger and so which console is definitely going to dominate the next five to seven years (plenty of people with much larger readerships seem to have already covered that). I've not been read in on any next gen hardware so nothing I'm about to say is hinting at information not already make public. But the way these specs are being sold is already somewhat interesting, especially as we think about the divergence between a totally open (assuming you've got the billions to buy into patent lockouts on various standards) PC system and the closed box of a console, after a generation where the custom silicon inside each high end console were ultimately less distinctive than in previous generations (even the one where MS asked for a Pentium III on an nForce chipset & a GeForce GPU).

Getting from There to Here

Moving from AMD's tablet CPU cores to a modern Zen2 desktop architecture and a GPU upgrade to stay contemporary with current PC GPUs targeting 4K screens was widely expected and confirmed early on during this generation transition. But you have to feed the beast and for games that often means juggling a huge array of large assets with only a semi-predictable pattern of demand (based on player action). The dual issues are having the time to load the data you need to render in the near future and the processing power to run the logic that decides which assets to prioritise & actually push the commands that move data around (the latter becomes far more pressing as bandwidth increases).

A major theme from both console architects is load times and the potential for radically faster storage now that NVMe SSDs are the expectation (and luckily PCIe is just ticking over to doubled bandwidth, which removes that bottleneck from controllers with a wide pool of flash chips to parallelise over). MS are branding their upgrades the Xbox Velocity Architecture with a new DirectX API (which will also be available on PC) called DirectStorage. Sony have gone into detail about their custom silicon and decisions to enable SSD throughput beyond even the most premium PC tech.

It's not just about having a fast connection, you need to be able to drive data to where it needs to go and keep everything catalogued. Sony in particular are definitely focusing on taking the fastest PCIe 4.0 drive they can and pumping it with a silicon implementation of RAD Game Tools' current compression (use less drive space, get even more data into RAM for the SSD bandwidth by expanding it the other end of the connection) and classic DMA silicon (don't waste CPU cycles getting your main processor to direct data around, use dedicated silicon to do it - a trick as old as time) with a few tweaks to the formula (looking forward to hearing more about the GPU cache scrubbers from practical talks after games get released).

PC game designers are going to have to get very clever to match the game experience of these new consoles without having access to all these tricks - expect significant increases in RAM requirements for some AAA ports that will have to use the system RAM as a large cache for data they can stream into the (more limited) unified memory as needed on consoles. Will we see high settings with 4K textures on PC ask for at least 32GB RAM & even 16GB VRAM? I wouldn't say it's not going to happen within 2 years. Those 100-200GB installs are not going away (although Sony did make the salient point that SSDs remove the need to duplicate data to help remove seek times - we can claw some excess back to redirect into even more detailed textures).


What new game experiences? Elevators and slow-opening doors everywhere are not an unconstrained design choice; the length of the walk through the between dimension to fast travel in the recent God of War wasn't required by the no-cuts design decision. It's all about being unable to load enough data for clean fast transitions and even with the much lower quality (smaller) textures of the PS360 generation, you saw a lot of slowly fading in the most detailed textures available (initially associated with the Unreal engine but an issue for many engines that wanted to push RAM limits or just spend less time sitting on a load screen). The plan going forward is to load extremely detailed assets just-in-time via a massive sustained read bandwidth from the SSD, so players never see anything less than the most detailed option close enough to the camera to differentiate. Mesh/Primitive Shaders are also going to help here in making the triangles that make up a scene more dynamic in games that step away from the old static pipeline (think the advertising blurb for Tessellation, only it really works this time for far more scenarios).

This also opens up new possibilities beyond just avoiding load screens or design choices aimed at masking invisible loads in more open level designs (rather than very constrained levels or low/repeating texture scenarios) etc. Things can move faster without causing the dreaded texture quality drop/pop-in. That's not just allowing high speed open world driving games to up the detail of each street or letting Spider-Man swing faster. Each camera cut is a very rapid movement to the new position.

You'll be familiar with inserting pre-rendered video that uses the in-engine assets into otherwise real-time rendered game cut-scenes. It's often very obvious when the PC technology moves forward but the assets are still from an old console release (GTX 760 vs video captured on a 360). There are many reasons for it, like being able to do some offline post-processing (less relevant now real-time shaders are so powerful) or show scenes or animations that you don't need to ship as assets (or hand-tweaked animations that don't fit into the shipped animation rig). A big reason: jump cuts to a different area, ie places for which you don't have assets already loaded into memory unless you've got lots of extra RAM sitting idle, are a right pain - once you start looking for it, it's clear how often directors avoid such cuts, especially multiple of them. Well now you don't have to worry about that issue so expect real-time cut-scenes to start to be a lot more dynamic in cutting between different locations (or different areas within a large scene) in a way far more similar to other media.

Going back to more technical details: to put the PCIe 4.0 four lane SSD connection in context, that's half the massive bandwidth that most PC GPUs have used for the last decade and which textures stream down to the local VRAM - many many times faster than the fastest rotating platter HDDs. For Sony to go beyond that via cutting-edge compression (spending silicon on something better than the current compression typical for GPUs either as texture compression or delta compression) is very exciting and MS are no slouch, they're just not betting everything on it needing to be as bleeding edge.

Doing Something New when You're Here

So we have 16GB of unified fast RAM (with a bit of differentiation from MS with a 10GB very fast block + 6GB slow block mainly eaten by the OS, while Sony have made all of the RAM mid-fast). We're feeding it via extremely nice storage and custom silicon that avoid spending all our CPU cycles on memory transfers or decompression algorithms. What about the actual rendering features?

We know a bit more about the MS side for the GPU, because they also did a point update to DirectX 12 and unified the console and PC APIs for the XSX. The new DX12_2 (Ultimate) will update the feature level (things a GPU has to support) to basically be "almost all the shiny things nVidia have been shouting about from 2018's Turing RTX cards". This is actually a real trick for nVidia, who get to claim leadership of the GPU space while not winning either designs for the new 4K consoles (MS & Sony both seemed happy with AMD plus really wanted to update from x86 to x86 CPUs, something nVidia took piles of money from Intel to agree they cannot offer to customers wanting a custom SoC design).

AMD are making their custom RDNA2-derived GPU for MS to basically boost their feature set up close to where Turing has been for a while. Mesh Shaders (not even calling them Primitive Shaders - the ill-fated Vega tech AMD wanted to fix the old shader pipeline structure) as mentioned above; ray tracing via hardware BVH traversal acceleration (Turing's RT Cores); variable rate shading (shading at rates other than the native pixel count in areas of a scene); and sampler feedback (clever tools for making sure you only need the texture data in RAM that the scene actually needs and no more) - it's all here and what we know of Sony's custom RDNA2 GPU is very similar (they called them Primitive Shaders but who knows as AMD are definitely using something compatible with the common Mesh Shaders plan for both MS and their future PC cards & I don't see why Sony wouldn't have signed up for that considering the first shot at AMD's own Primitive Shaders never got enabled - ultimately they're branding quite similar ideas but I think AMD are sanding away any differences to make it more like Turing rather than nVidia having to pivot at all).

Something I want to focus on here is what neither architects have said is being matched vs Turing & isn't in the DX12_2 feature level: Tensor cores. The (usually low precision) AI inference maths that enables a lot of interesting ideas like computation photography (Google Night Sight), nVidia DLSS (AI upscaling), and much more (often in an offline context); especially relevant for games now we're talking ray tracing: high quality denoising. MS gave the PR talk about their XSX having 97 TOPS of DirectML Int4 performance but if you look into the rapid packed maths blurb then that's almost certainly just saying their normal shader cores can do eight Int4 ops as a SIMD-y alternative to a single FP32 op (same 32-bits of data). This explains why that figure is so much lower than the RTX GPUs, which max out at 455 TOPS (using dedicated silicon).

Sony have also not talked about any AI cores and it sounds a bit like their extra SIMD/vector units (saying their old Cell SPEs were basically ideal for complex audio processing needed for things like transfer function processing) for virtual 3D audio are going to be how they offer non-shader core acceleration of other computational demands. I spent a decent amount of last year doing work on a project around virtual 3D audio. I'll say this much: what I got working before the company folded made me a believer (using nice stereo headphones) and I used to be adamant that you needed all 5 speakers since way back in the nForce/original Xbox days (when real-time Dolby encoding finally made it easy to pipe 5.1 audio out of games with one TOSlink wire per device + some real-time spacial game audio was getting ok). I'm very glad Sony are leaning heavily on this and also that their foray into VR isn't being completely forgotten. Ray tracing for sound propagation through a level and good virtual 3D with hundreds of localised spacial sources will make for something unlike what we're used to hearing. When this tech works, it really work (and Sony aren't hiding that it doesn't yet always work so we've got to figure it out).

I'm eager to get more details of these system (how much are they holding back for later reveals?) and then to see what people are starting to do with this power. I wish we'd heard about tensor cores, because I think there's something rather interesting about the potential there that we'll possibly not see if only nVidia push them (and slowly cut down the die area they use for a feature if Intel & AMD don't match them). Just because neither consoles have announced it, doesn't mean AMD aren't going to offer that for their (non-custom) RDNA2 PC cards, but inclusion in a console would definitely push adoption & experimentation.

These new systems seem like very smart steps forward, mainly matching new feature for new feature (with a slight difference in focus on just how fast each feature runs) and not being "a PC but fixed platform". Interesting times for cross-platform games and how we do things right without underutilising silicon when it is available.