Xbox Series X: AI Custom Hardware Support Offers Unparalleled Intelligence for Machine Learning

bcatwilly · Mar 23, 2020

The custom hardware in the Xbox Series X to support application of artificial intelligence (AI) through machine learning hasn't gotten enough attention in my opinion, particularly because Sony has not cited anything similar yet regarding AI support in the hardware. We can all agree that more pixels doesn't change how games play or are designed, but this type of technology can really impact various aspects of visual quality and things like smarter NPCs. One real world example of integration into the rendering pipeline would be smart upscale like the Forza Horizon 3 super resolution demo at GDC last year so that fewer pixels are being rendered by the GPU, which of course means that more of the 12 TF in the Series X can be used for some awesome graphical effects or ray tracing for example.

Here is the technical glossary reference for their Series X hardware support of the DirectML (Direct Machine Learning) API - https://news.xbox.com/en-us/2020/03/16/xbox-series-x-glossary/

DirectML
– Xbox Series X supports Machine Learning for games with DirectML, a component of DirectX. DirectML leverages unprecedented hardware performance in a console, benefiting from over 24 TFLOPS of 16-bit float performance and over 97 TOPS (trillion operations per second) of 4-bit integer performance on Xbox Series X. Machine Learning can improve a wide range of areas, such as making NPCs much smarter, providing vastly more lifelike animation, and greatly improving visual quality.

The really exciting information comes for the Eurogamer full Xbox Series X specs reveal (https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs) where the Microsoft engineers talked further about the custom hardware support that they developed working with AMD to support machine learning applications. The GDC talk referenced later mentioned how the Nvidia Turing RTX cards of course have the tensor cores that provide FP16 support to improve machine learning performance, and Microsoft went further than that in the custom hardware.

Machine learning is a feature we've discussed in the past, most notably with Nvidia's Turing architecture and the firm's DLSS AI upscaling. The RDNA 2 architecture used in Series X does not have tensor core equivalents, but Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores. With over 12 teraflops of FP32 compute, RDNA 2 also allows for double that with FP16 (yes, rapid-packed math is back). However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.

"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

This is a Microsoft talk at GDC last year first introducing DirectML which will now be supported directly by the hardware in the Series X. The entire talk is interesting if you are into API details, but the Forza Horizon 3 super resolution demo to show an upscale from 540p to 1080p at the 23:44 mark is particularly interesting and the Unity demo at 37:26 just shows how they used machine learning to provide animation in their demo based on the physics at play.

mugurumakensei · Mar 23, 2020

I wonder if Microsoft uses this to create a solution similar to dlss with low overhead

ILikeFeet · Mar 23, 2020

mugurumakensei said:
I wonder if Microsoft uses this to create a solution similar to dlss with low overhead

yes, they talk about using it for upscaling in the talk and other times since

Talus · Mar 23, 2020

It's such cool technology. Glad to see MS has some hardware capability in the Series X.

DukeBlueBall · Mar 23, 2020

If you recall what Phil said last year about Azure subsidizing Xsx silicon:

P3 said:
The thing that's interesting for us as we roll forward, is we're actually designing our next-gen silicon in such a way that it works great for playing games in the cloud, and also works very well for machine learning and other non-entertainment workloads. As a company like Microsoft, we can dual-purpose the silicon that we're putting in.

We have a consumer use for that silicon, and we have enterprise use for those blades as well. It all in our space around driving down the cost to serve. Your cost to serve is made up by two things, how much was the hardware, and how much time does that hardware monetize.

So if we can monetize that hardware over more cycles in the 24 hours through game streaming and other things that need CPU and GPU in the cloud, we will drive down the cost to serve in our services. So the design as we move forward is done hand-in-hand with the Azure silicon team, and I think that creates a real competitive advantage.

Essentially the gist is that the BOM going into Xbox consoles, at least on the high end side, will be a lot higher as they'll be used all over MS for high performance applications.

AmirMoosavi · Mar 23, 2020

In terms of what it can do for visual quality, is it similar to DLSS?

space_nut · Mar 23, 2020

We are going to see some impressive technological rendering achievements on the XSX especially devs can use all the power to use ai res boost and crank up the graphics even more

tapedeck · Mar 23, 2020

How much are we really expecting F16 to be utilized next gen? It's my understanding (correct me if I'm wrong) that PS4 used it in a fairly limited capacity? Regardless it's still great that the option is there.

ppn7 · Mar 23, 2020

i wonder if there will be some games compatible with both DirectML and DLSS, to see the difference between them.

Zaraki · Mar 23, 2020

PS5 also has ML capabilities as stated in a wired article from last year.

It remain to be seen to what level they have customised the CUs.

MS as a whole has done alot of research into ML and it should provide exciting returns for XSX and the XSS.

Unlike Nvidia, AMD has their Ray tracing cores as well as what they use for ML within their CUs. Would this impact the overall use of ML during gameplay?

Mecha Meister · Mar 23, 2020

Fascinating stuff! I happened to have missed some of these DirectML talks and hadn't seen its applications in things like upscaling, I wonder what techniques could be deployed by AMD in the PC space, as well as Sony and Microsoft in the console space in comparison to NVIDIAs DLSS?

The Turing GPUs have an impressive amount of computational power due to their Tensor and RT cores, I wonder how RDNA 2 will cope with ray tracing in comparison to Turing and the rumoured upcoming Ampere GPUs? It will be interesting to see what is achievable with these developments!

ILikeFeet · Mar 23, 2020

Zaraki said:
Unlike Nvidia, AMD has their Ray tracing cores as well as what they use for ML within their CUs. Would this impact the overall use of ML during gameplay?

until they show if it off, it remains to be seen

Michilin · Mar 23, 2020

DukeBlueBall said:
If you recall what Phil said last year about Azure subsidizing Xsx silicon:

Essentially the gist is that the BOM going into Xbox consoles, at least on the high end side, will be a lot higher as they'll be used all over MS for high performance applications.

Man, if XSX somehow ends up the same price or cheaper than PS5

christocolus · Mar 23, 2020

This should be good. They've been doing a lot of work in this area for a while now.

Trup1aya · Mar 23, 2020

bcatwilly said:
The custom hardware in the Xbox Series X to support application of artificial intelligence (AI) through machine learning hasn't gotten enough attention in my opinion, particularly because Sony has not cited anything similar yet regarding AI support in the hardware. We can all agree that more pixels doesn't change how games play or are designed, but this type of technology can really impact various aspects of visual quality and things like smarter NPCs. One real world example of integration into the rendering pipeline would be smart upscale like the Forza Horizon 3 super resolution demo at GDC last year so that fewer pixels are being rendered by the GPU, which of course means that more of the 12 TF in the Series X can be used for some awesome graphical effects or ray tracing for example.

Here is the technical glossary reference for their Series X hardware support of the DirectML (Direct Machine Learning) API - https://news.xbox.com/en-us/2020/03/16/xbox-series-x-glossary/

The really exciting information comes for the Eurogamer full Xbox Series X specs reveal (https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs) where the Microsoft engineers talked further about the custom hardware support that they developed working with AMD to support machine learning applications. The GDC talk referenced later mentioned how the Nvidia Turing RTX cards of course have the tensor cores that provide FP16 support to improve machine learning performance, and Microsoft went further than that in the custom hardware.

This is a Microsoft talk at GDC last year first introducing DirectML which will now be supported directly by the hardware in the Series X. The entire talk is interesting if you are into API details, but the Forza Horizon 3 super resolution demo to show an upscale from 540p to 1080p at the 23:44 mark is particularly interesting and the Unity demo at 37:26 just shows how they used machine learning to provide animation in their demo based on the physics at play.

I'm gonna have to bookmark this because real time ML is one concept I can't quite wrap my head around.

Railgun · Mar 23, 2020

Fully expect XSX to have a DLSS like feature

EvilBoris · Mar 23, 2020

AmirMoosavi said:
In terms of what it can do for visual quality, is it similar to DLSS?

The possibilities are almost endless, but a couple of examples:

Render a game with low quality assets and have a ML algorithm imagine the rest.

Do very basic cheap ray tracing and then have AI fill in the gaps.

solis74 · Mar 23, 2020

bcatwilly said:
The custom hardware in the Xbox Series X to support application of artificial intelligence (AI) through machine learning hasn't gotten enough attention in my opinion, particularly because Sony has not cited anything similar yet regarding AI support in the hardware. We can all agree that more pixels doesn't change how games play or are designed, but this type of technology can really impact various aspects of visual quality and things like smarter NPCs. One real world example of integration into the rendering pipeline would be smart upscale like the Forza Horizon 3 super resolution demo at GDC last year so that fewer pixels are being rendered by the GPU, which of course means that more of the 12 TF in the Series X can be used for some awesome graphical effects or ray tracing for example.

Here is the technical glossary reference for their Series X hardware support of the DirectML (Direct Machine Learning) API - https://news.xbox.com/en-us/2020/03/16/xbox-series-x-glossary/

The really exciting information comes for the Eurogamer full Xbox Series X specs reveal (https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs) where the Microsoft engineers talked further about the custom hardware support that they developed working with AMD to support machine learning applications. The GDC talk referenced later mentioned how the Nvidia Turing RTX cards of course have the tensor cores that provide FP16 support to improve machine learning performance, and Microsoft went further than that in the custom hardware.

This is a Microsoft talk at GDC last year first introducing DirectML which will now be supported directly by the hardware in the Series X. The entire talk is interesting if you are into API details, but the Forza Horizon 3 super resolution demo to show an upscale from 540p to 1080p at the 23:44 mark is particularly interesting and the Unity demo at 37:26 just shows how they used machine learning to provide animation in their demo based on the physics at play.

such a cool tech

Scently · Mar 23, 2020

tapedeck said:
How much are we really expecting F16 to be utilized next gen? It's my understanding (correct me if I'm wrong) that PS4 used it in a fairly limited capacity? Regardless it's still great that the option is there.

PS4 Pro and X1X support FP16 but the Pro supports Rapid Packed Maths (RPM) which lets you execute two FP16 instructions/operations at the same time it would take to execute a single FP32 instruction. FP32 is known as single precision which incidentally is how most graphics are computed. But not all things in graphics require that level of precision. So if you could lower the precision you can run twice the instruction/operation in the time it takes to run one single-precision instruction. So when you hear that XSX is 12.1TF it means it can calculate 12.1 single-precision Tera Floating Point Operations (TFLOP).
FP16 is half precision so, in the statement above, XSX can do ~24Tflops half-precision (FP16). Apparently they also added hardware support for INT8 and INT4 so the XSX can run ~48INT8 TOps and ~97INT4 TOps. These allow them to do ML and other sorts of inferencing a lot faster while not having Tensor cores.

dgrdsv · Mar 23, 2020

Zaraki said:
Unlike Nvidia, AMD has their Ray tracing cores as well as what they use for ML within their CUs. Would this impact the overall use of ML during gameplay?

1. RT cores and tensor arrays are inside multiprocessors ("CUs") in Turing.
2. "BVH traversal acceleration" units (some people have issues with calling them AMD's RT cores for whatever reasons) are dedicated h/w.
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.

bcatwilly · Mar 23, 2020

dgrdsv said:
1. RT cores and tensor arrays are inside multiprocessors ("CUs") in Turing.
2. "BVH traversal acceleration" units (some people have issues with calling them AMD's RT cores for whatever reasons) are dedicated h/w.
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.

Yes, I think that Microsoft didn't want to dedicate cost to separate hardware cores such as Turing tensor cores. But they really have a big advantage here with having 52 CUs in general and the specialized support described for 8 bit and 4 bit integers for machine learning is no doubt based on their research in general and plans for how they can leverage this in the Series X for games.

tapedeck · Mar 23, 2020

Scently said:
PS4 Pro and X1X support FP16 but the Pro supports Rapid Packed Maths (RPM) which lets you execute two FP16 instructions/operations at the same time it would take to execute a single FP32 instruction. FP32 is known as single precision which incidentally is how most graphics are computed. But not all things in graphics require that level of precision. So if you could lower the precision you can run twice the instruction/operation in the time it takes to run one single-precision instruction. So when you hear that XSX is 12.1TF it means it can calculate 12.1 single-precision Tera Floating Point Operations (TFLOP).
FP16 is half precision so, in the statement above, XSX can do ~24Tflops half-precision (FP16). Apparently they also added hardware support for INT8 and INT4 so the XSX can run ~48INT8 TOps and ~97INT4 TOps. These allow them to do ML and other sorts of inferencing a lot faster while not having Tensor cores.

Neat.

It'll be very interesting to see how this is used by both systems (assuming PS5 has the functionality as well).

Zaraki · Mar 23, 2020

dgrdsv said:
1. RT cores and tensor arrays are inside multiprocessors ("CUs") in Turing.
2. "BVH traversal acceleration" units (some people have issues with calling them AMD's RT cores for whatever reasons) are dedicated h/w.
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.

Thank you! I didn't fully understand it to be like that before. So it would be wise not to expect performance near what Nvidia is currently performing.

I watched DF video on Control DLSS, See video at 00:25, thats why i though the tensor cores and the RT cores were separate on Turing.

Control DLSS Analysis: How Nvidia's AI Scaling Tech Has Improved & Evolved!

Sometimes it works with great results, sometimes it doesn't - but like any technology, DLSS is evolving and improving over time and the Nvidia anti-aliasing ...

www.youtube.com

Timlot · Mar 23, 2020

I remember when Mark Cerny was telling DF that FP16 actually made PS4 8.4TF and the threads about FP16 being the secret sauce. My guess is machine learning is a different use case for these calculations. Can't wait to see wait MS does with it.

pswii60 · Mar 23, 2020

dgrdsv said:
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.

This, which is why it's somewhat misleading to say "DirectML leverages 24 teraflops of floating point" etc. Sure, it could do, but that would leave 0 teraflops for anything else! Same goes for XSX and PS5. Nvidia's solution with tensor cores is leagues ahead, albeit likely much more expensive which is why we're not seeing it here.

gofreak · Mar 23, 2020

Is INT8/INT4 useful for things like DLSS, or more for inferencing/prediction? I'm not au fait, but at first blush it seems like that level of precision might not be very well suited to image super resolution?

Speaking of rapid packed math, though, what is the story with that in standard RDNA? Did it carry forward? I'm not sure they talked about it so much, but not sure if that's because of a change there, or because they just wanted to renew focus on game performance in their marketing materials.

Scently · Mar 23, 2020

Timlot said:
I remember when Mark Cerny was telling DF that FP16 actually made PS4 8.4TF and the threads about FP16 being the secret sauce. My guess is machine learning is a different use case for these calculations. Can't wait to see wait MS does with it.

I think It wasn't utilized much on the PS4 Pro because it was the only console that supports RPM. Add to that the fact that its only suitable for certain things in graphics rendering. With both next-gen systems supporting RPM it will be used a lot more for those things in graphics rendering that are suited to it.

JaggedSac · Mar 23, 2020

Anyone know if they are using DirectML for estimating HDR for back compat titles?

bangai-o · Mar 23, 2020

So, this is the machine that will become self aware?

Scently · Mar 23, 2020

JaggedSac said:
Anyone know if they are using DirectML for estimating HDR for back compat titles?

Yes. It was stated in the DF article. Machine learning algorithm was trained on Gears5 HDR implementation and that was the basis for applying HDR on back compact titles.

Spasm · Mar 23, 2020

Puppo Race will be MS' Mario Kart.

cyrribrae · Mar 23, 2020

I understood next to none of this. But 50 non-self-aware and floppy physics corgis running around a track is enough to sell me on any new technology.

Corralx · Mar 23, 2020

Scently said:
I think It wasn't utilized much on the PS4 Pro because it was the only console that supports RPM. Add to that the fact that its only suitable for certain things in graphics rendering. With both next-gen systems supporting RPM it will be used a lot more for those things in graphics rendering that are suited to it.

I don't see why XONEX not supporting RPM would have been a blocker for FP16 adoption. Even without RPM you still get benefits from using FP16 on GCN cards, and you don't have to do anything to get the benefit from RPM, it's completely transparent.
FP16 has been adopted where it makes sense and such a low precision was not not causing issues/artifacts, which is less than a lot of ppl were led to believe by the "magic sauce" claims.

dgrdsv · Mar 23, 2020

Zaraki said:
So it would be wise not to expect performance near what Nvidia is currently performing.

One thing to keep in mind here is that with both RT and ML performance is completely, 100% dependent on what the devs are actually running on the h/w.
So while "porting" DLSS to XSX would likely lead to sub par performance it is quite possible that another, less "AI'ish" approach would work here, with similar results.
Control's original DLSS implementation doesn't even use tensor cores for example. Who's to say that something like that can't be run on FP16 precision on RDNA2?
This whole area is terra incognita right now, there will be a lot of research soon into how exactly RDNA2's ML capabilities can be used in gaming graphics. And it's very possible that we may get some unexpectedly good results eventually. Case in point: DLSS 2.0.

gofreak said:
Is INT8/INT4 useful for things like DLSS, or more for inferencing/prediction?

I feel like anything below FP16 won't be enough for graphical usage. But who knows?

JaggedSac · Mar 23, 2020

Scently said:
Yes. It was stated in the DF article. Machine learning algorithm was trained on Gears5 HDR implementation and that was the basis for applying HDR on back compact titles.

Cool

Deleted member 11276 · Mar 23, 2020

Machine Learning will be huge next gen. All kinds of use cases.

Multiverse Moo · Mar 23, 2020

Machine learning and other AI related things are definitely going to be huge next gen. Excited to see how it can be applied to streaming situations.

PS5 will likely use AI in conjunction with gaze-tracking for it's next-gen VR, outside of things akin to DLSS and RT improvements.

bangai-o said:
So, this is the machine that will become self aware?

Next-gen the console wars truly begin... and we're the enemy!

Maple · Mar 23, 2020

This is the kind of next-gen stuff that gets me excited. Watching Google's DeepMind literally teach itself StarCraft and then fine tune its technique against the best human players in the world, and then win, is just incredible.

Imagine if ML was applied to AI in FPS and RPGs in a similar fashion. Enemies would teach themselves how to counteract your behaviors and attacks - they could end up doing things that would even be unpredictable and surprising to the developers themselves.

ILikeFeet · Mar 23, 2020

Maple said:
Imagine if ML was applied to AI in FPS and RPGs in a similar fashion. Enemies would teach themselves how to counteract your behaviors and attacks - they could end up doing things that would even be unpredictable and surprising to the developers themselves.

that's probably the last thing they want. I've seen enough 2 Minute Paper videos to know that machine learned AI will exploit the rules like a motherfucker

BradGrenz · Mar 23, 2020

Most ML applications for games don't work at runtime. DLSS uses machine learning to train an algorithm on a supercomputer that is later run on shader cores or tensor cores, depending. Stuff like AI upscaled asset mods on PC are all precomputed, too. If anything, the importance of ML/AI hardware enhancements for actual games this gen is probably overstated.

Lukas Taves · Mar 23, 2020

BradGrenz said:
Most ML applications for games don't work at runtime. DLSS uses machine learning to train an algorithm on a supercomputer that is later run on shader cores or tensor cores, depending. Stuff like AI upscaled asset mods on PC are all precomputed, too. If anything, the importance of ML/AI hardware enhancements for actual games this gen is probably overstated.

The training phase hardly do run in real time indeed, but there's no need to.

For real time uses we are already seeing quite a few cases. Real time raytracing is practically only possible because of machine learning denoising. Machine learning based upscaling techniques are improving a lot in image quality, Ms has shown a very cool demo of machine learning based hdr implementation and DF said that it was on par with a game actually supporting it.

Having all those cases before the gen even started is quite awesome really. Imagine how this will go in the next 5 years specially with both consoles having hardware acceleration for it.

Saci do PXB · Mar 24, 2020

DukeBlueBall said:
If you recall what Phil said last year about Azure subsidizing Xsx silicon:

Essentially the gist is that the BOM going into Xbox consoles, at least on the high end side, will be a lot higher as they'll be used all over MS for high performance applications.

I didn't know that!
Very clever.

dgrdsv · Mar 24, 2020

BradGrenz said:
Most ML applications for games don't work at runtime. DLSS uses machine learning to train an algorithm on a supercomputer that is later run on shader cores or tensor cores, depending.

Inferencing in FP16 (which is likely what DLSS is using) isn't that much different from learning with the latter just being orders of magnitude slower since it's working with a huge dataset.

Lukas Taves said:
Real time raytracing is practically only possible because of machine learning denoising.

No implementation on the market currently uses ML denoising for RT, all denoising is usually handled by TAA extensions. RT doesn't need ML although it can likely be augmented with it in many ways.

Lukas Taves · Mar 24, 2020

dgrdsv said:
No implementation on the market currently uses ML denoising for RT, all denoising is usually handled by TAA extensions. RT doesn't need ML although it can likely be augmented with it in many ways.

Wait, not even on RTX? I thought they touted their tensor cores for denoising.

ILikeFeet · Mar 24, 2020

dgrdsv said:
No implementation on the market currently uses ML denoising for RT, all denoising is usually handled by TAA extensions. RT doesn't need ML although it can likely be augmented with it in many ways.

intel says they use deep learning based filter

Intel® Open Image Denoise

big_z · Mar 24, 2020

It's thinking.

Deleted member 11276 · Mar 24, 2020

Lukas Taves said:
Wait, not even on RTX? I thought they touted their tensor cores for denoising.

No. But maybe soon.

Filipus · Mar 24, 2020

DLSS is really interesting and a great solution for those without the hardware, would love to see something similar come to consoles. Let's see if something happens this gen and they show us some games.

azeke · Mar 24, 2020

So if the PS3 was able to run The Machine, Xbox Series X will be able to run Samaritan.

KKRT · Mar 25, 2020

bcatwilly said:
Yes, I think that Microsoft didn't want to dedicate cost to separate hardware cores such as Turing tensor cores. But they really have a big advantage here with having 52 CUs in general and the specialized support described for 8 bit and 4 bit integers for machine learning is no doubt based on their research in general and plans for how they can leverage this in the Series X for games.

You know that Tensor cores also have support for INT8 and INT4, right?

NVIDIA Turing Architecture In-Depth | NVIDIA Technical Blog

Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D graphics, NVIDIA® has evolved the GPU into the world’s leading parallel processing engine for many…

devblogs.nvidia.com

Xbox Series X: AI Custom Hardware Support Offers Unparalleled Intelligence for Machine Learning

Elizabeth, I’m coming to join you!

DF Deet Master

Next-Gen Guru

DF Deet Master

Literally a train safety expert

Prophet of Truth - HDTVtest

Chicken Chaser

Account closed at user request

DF Deet Master

DF Deet Master

Account closed at user request

Prophet of Regret