• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.
  • We have made minor adjustments to how the search bar works on ResetEra. You can read about the changes here.

bcatwilly

Member
Oct 27, 2017
2,483
The custom hardware in the Xbox Series X to support application of artificial intelligence (AI) through machine learning hasn't gotten enough attention in my opinion, particularly because Sony has not cited anything similar yet regarding AI support in the hardware. We can all agree that more pixels doesn't change how games play or are designed, but this type of technology can really impact various aspects of visual quality and things like smarter NPCs. One real world example of integration into the rendering pipeline would be smart upscale like the Forza Horizon 3 super resolution demo at GDC last year so that fewer pixels are being rendered by the GPU, which of course means that more of the 12 TF in the Series X can be used for some awesome graphical effects or ray tracing for example.

Here is the technical glossary reference for their Series X hardware support of the DirectML (Direct Machine Learning) API - https://news.xbox.com/en-us/2020/03/16/xbox-series-x-glossary/
DirectML
– Xbox Series X supports Machine Learning for games with DirectML, a component of DirectX. DirectML leverages unprecedented hardware performance in a console, benefiting from over 24 TFLOPS of 16-bit float performance and over 97 TOPS (trillion operations per second) of 4-bit integer performance on Xbox Series X. Machine Learning can improve a wide range of areas, such as making NPCs much smarter, providing vastly more lifelike animation, and greatly improving visual quality.

The really exciting information comes for the Eurogamer full Xbox Series X specs reveal (https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs) where the Microsoft engineers talked further about the custom hardware support that they developed working with AMD to support machine learning applications. The GDC talk referenced later mentioned how the Nvidia Turing RTX cards of course have the tensor cores that provide FP16 support to improve machine learning performance, and Microsoft went further than that in the custom hardware.
Machine learning is a feature we've discussed in the past, most notably with Nvidia's Turing architecture and the firm's DLSS AI upscaling. The RDNA 2 architecture used in Series X does not have tensor core equivalents, but Microsoft and AMD have come up with a novel, efficient solution based on the standard shader cores. With over 12 teraflops of FP32 compute, RDNA 2 also allows for double that with FP16 (yes, rapid-packed math is back). However, machine learning workloads often use much lower precision than that, so the RDNA 2 shaders were adapted still further.

"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

This is a Microsoft talk at GDC last year first introducing DirectML which will now be supported directly by the hardware in the Series X. The entire talk is interesting if you are into API details, but the Forza Horizon 3 super resolution demo to show an upscale from 540p to 1080p at the 23:44 mark is particularly interesting and the Unity demo at 37:26 just shows how they used machine learning to provide animation in their demo based on the physics at play.

 

mugurumakensei

Elizabeth, I’m coming to join you!
Member
Oct 25, 2017
11,330
I wonder if Microsoft uses this to create a solution similar to dlss with low overhead
 

Talus

Banned
Dec 9, 2017
1,386
It's such cool technology. Glad to see MS has some hardware capability in the Series X.
 

DukeBlueBall

Banned
Oct 27, 2017
9,059
Seattle, WA
If you recall what Phil said last year about Azure subsidizing Xsx silicon:

P3 said:
The thing that's interesting for us as we roll forward, is we're actually designing our next-gen silicon in such a way that it works great for playing games in the cloud, and also works very well for machine learning and other non-entertainment workloads. As a company like Microsoft, we can dual-purpose the silicon that we're putting in.

We have a consumer use for that silicon, and we have enterprise use for those blades as well. It all in our space around driving down the cost to serve. Your cost to serve is made up by two things, how much was the hardware, and how much time does that hardware monetize.

So if we can monetize that hardware over more cycles in the 24 hours through game streaming and other things that need CPU and GPU in the cloud, we will drive down the cost to serve in our services. So the design as we move forward is done hand-in-hand with the Azure silicon team, and I think that creates a real competitive advantage.

Essentially the gist is that the BOM going into Xbox consoles, at least on the high end side, will be a lot higher as they'll be used all over MS for high performance applications.
 

space_nut

Member
Oct 28, 2017
3,306
NJ
We are going to see some impressive technological rendering achievements on the XSX especially devs can use all the power to use ai res boost and crank up the graphics even more
 

tapedeck

Member
Oct 28, 2017
7,985
How much are we really expecting F16 to be utilized next gen? It's my understanding (correct me if I'm wrong) that PS4 used it in a fairly limited capacity? Regardless it's still great that the option is there.
 

ppn7

Member
May 4, 2019
740
i wonder if there will be some games compatible with both DirectML and DLSS, to see the difference between them.
 

Zaraki

Member
Mar 20, 2020
26
PS5 also has ML capabilities as stated in a wired article from last year.

It remain to be seen to what level they have customised the CUs.

MS as a whole has done alot of research into ML and it should provide exciting returns for XSX and the XSS.

Unlike Nvidia, AMD has their Ray tracing cores as well as what they use for ML within their CUs. Would this impact the overall use of ML during gameplay?
 

Mecha Meister

Next-Gen Guru
Member
Oct 25, 2017
2,805
United Kingdom
Fascinating stuff! I happened to have missed some of these DirectML talks and hadn't seen its applications in things like upscaling, I wonder what techniques could be deployed by AMD in the PC space, as well as Sony and Microsoft in the console space in comparison to NVIDIAs DLSS?

The Turing GPUs have an impressive amount of computational power due to their Tensor and RT cores, I wonder how RDNA 2 will cope with ray tracing in comparison to Turing and the rumoured upcoming Ampere GPUs? It will be interesting to see what is achievable with these developments!
 

Michilin

Member
Nov 14, 2017
1,372
If you recall what Phil said last year about Azure subsidizing Xsx silicon:



Essentially the gist is that the BOM going into Xbox consoles, at least on the high end side, will be a lot higher as they'll be used all over MS for high performance applications.
Man, if XSX somehow ends up the same price or cheaper than PS5
 

Trup1aya

Literally a train safety expert
Member
Oct 25, 2017
21,395
The custom hardware in the Xbox Series X to support application of artificial intelligence (AI) through machine learning hasn't gotten enough attention in my opinion, particularly because Sony has not cited anything similar yet regarding AI support in the hardware. We can all agree that more pixels doesn't change how games play or are designed, but this type of technology can really impact various aspects of visual quality and things like smarter NPCs. One real world example of integration into the rendering pipeline would be smart upscale like the Forza Horizon 3 super resolution demo at GDC last year so that fewer pixels are being rendered by the GPU, which of course means that more of the 12 TF in the Series X can be used for some awesome graphical effects or ray tracing for example.

Here is the technical glossary reference for their Series X hardware support of the DirectML (Direct Machine Learning) API - https://news.xbox.com/en-us/2020/03/16/xbox-series-x-glossary/


The really exciting information comes for the Eurogamer full Xbox Series X specs reveal (https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs) where the Microsoft engineers talked further about the custom hardware support that they developed working with AMD to support machine learning applications. The GDC talk referenced later mentioned how the Nvidia Turing RTX cards of course have the tensor cores that provide FP16 support to improve machine learning performance, and Microsoft went further than that in the custom hardware.


This is a Microsoft talk at GDC last year first introducing DirectML which will now be supported directly by the hardware in the Series X. The entire talk is interesting if you are into API details, but the Forza Horizon 3 super resolution demo to show an upscale from 540p to 1080p at the 23:44 mark is particularly interesting and the Unity demo at 37:26 just shows how they used machine learning to provide animation in their demo based on the physics at play.



I'm gonna have to bookmark this because real time ML is one concept I can't quite wrap my head around.
 

EvilBoris

Prophet of Truth - HDTVtest
Verified
Oct 29, 2017
16,686
In terms of what it can do for visual quality, is it similar to DLSS?

The possibilities are almost endless, but a couple of examples:

Render a game with low quality assets and have a ML algorithm imagine the rest.

Do very basic cheap ray tracing and then have AI fill in the gaps.
 

solis74

Member
Jun 11, 2018
43,056
The custom hardware in the Xbox Series X to support application of artificial intelligence (AI) through machine learning hasn't gotten enough attention in my opinion, particularly because Sony has not cited anything similar yet regarding AI support in the hardware. We can all agree that more pixels doesn't change how games play or are designed, but this type of technology can really impact various aspects of visual quality and things like smarter NPCs. One real world example of integration into the rendering pipeline would be smart upscale like the Forza Horizon 3 super resolution demo at GDC last year so that fewer pixels are being rendered by the GPU, which of course means that more of the 12 TF in the Series X can be used for some awesome graphical effects or ray tracing for example.

Here is the technical glossary reference for their Series X hardware support of the DirectML (Direct Machine Learning) API - https://news.xbox.com/en-us/2020/03/16/xbox-series-x-glossary/


The really exciting information comes for the Eurogamer full Xbox Series X specs reveal (https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs) where the Microsoft engineers talked further about the custom hardware support that they developed working with AMD to support machine learning applications. The GDC talk referenced later mentioned how the Nvidia Turing RTX cards of course have the tensor cores that provide FP16 support to improve machine learning performance, and Microsoft went further than that in the custom hardware.


This is a Microsoft talk at GDC last year first introducing DirectML which will now be supported directly by the hardware in the Series X. The entire talk is interesting if you are into API details, but the Forza Horizon 3 super resolution demo to show an upscale from 540p to 1080p at the 23:44 mark is particularly interesting and the Unity demo at 37:26 just shows how they used machine learning to provide animation in their demo based on the physics at play.



such a cool tech
 

Scently

Member
Oct 27, 2017
1,464
How much are we really expecting F16 to be utilized next gen? It's my understanding (correct me if I'm wrong) that PS4 used it in a fairly limited capacity? Regardless it's still great that the option is there.
PS4 Pro and X1X support FP16 but the Pro supports Rapid Packed Maths (RPM) which lets you execute two FP16 instructions/operations at the same time it would take to execute a single FP32 instruction. FP32 is known as single precision which incidentally is how most graphics are computed. But not all things in graphics require that level of precision. So if you could lower the precision you can run twice the instruction/operation in the time it takes to run one single-precision instruction. So when you hear that XSX is 12.1TF it means it can calculate 12.1 single-precision Tera Floating Point Operations (TFLOP).
FP16 is half precision so, in the statement above, XSX can do ~24Tflops half-precision (FP16). Apparently they also added hardware support for INT8 and INT4 so the XSX can run ~48INT8 TOps and ~97INT4 TOps. These allow them to do ML and other sorts of inferencing a lot faster while not having Tensor cores.
 

dgrdsv

Member
Oct 25, 2017
11,887
Unlike Nvidia, AMD has their Ray tracing cores as well as what they use for ML within their CUs. Would this impact the overall use of ML during gameplay?
1. RT cores and tensor arrays are inside multiprocessors ("CUs") in Turing.
2. "BVH traversal acceleration" units (some people have issues with calling them AMD's RT cores for whatever reasons) are dedicated h/w.
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.
 
OP
OP
bcatwilly

bcatwilly

Member
Oct 27, 2017
2,483
1. RT cores and tensor arrays are inside multiprocessors ("CUs") in Turing.
2. "BVH traversal acceleration" units (some people have issues with calling them AMD's RT cores for whatever reasons) are dedicated h/w.
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.

Yes, I think that Microsoft didn't want to dedicate cost to separate hardware cores such as Turing tensor cores. But they really have a big advantage here with having 52 CUs in general and the specialized support described for 8 bit and 4 bit integers for machine learning is no doubt based on their research in general and plans for how they can leverage this in the Series X for games.
 

tapedeck

Member
Oct 28, 2017
7,985
PS4 Pro and X1X support FP16 but the Pro supports Rapid Packed Maths (RPM) which lets you execute two FP16 instructions/operations at the same time it would take to execute a single FP32 instruction. FP32 is known as single precision which incidentally is how most graphics are computed. But not all things in graphics require that level of precision. So if you could lower the precision you can run twice the instruction/operation in the time it takes to run one single-precision instruction. So when you hear that XSX is 12.1TF it means it can calculate 12.1 single-precision Tera Floating Point Operations (TFLOP).
FP16 is half precision so, in the statement above, XSX can do ~24Tflops half-precision (FP16). Apparently they also added hardware support for INT8 and INT4 so the XSX can run ~48INT8 TOps and ~97INT4 TOps. These allow them to do ML and other sorts of inferencing a lot faster while not having Tensor cores.
Neat.

It'll be very interesting to see how this is used by both systems (assuming PS5 has the functionality as well).
 

Zaraki

Member
Mar 20, 2020
26
1. RT cores and tensor arrays are inside multiprocessors ("CUs") in Turing.
2. "BVH traversal acceleration" units (some people have issues with calling them AMD's RT cores for whatever reasons) are dedicated h/w.
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.

Thank you! I didn't fully understand it to be like that before. So it would be wise not to expect performance near what Nvidia is currently performing.

I watched DF video on Control DLSS, See video at 00:25, thats why i though the tensor cores and the RT cores were separate on Turing.

www.youtube.com

Control DLSS Analysis: How Nvidia's AI Scaling Tech Has Improved & Evolved!

Sometimes it works with great results, sometimes it doesn't - but like any technology, DLSS is evolving and improving over time and the Nvidia anti-aliasing ...
 

Timlot

Banned
Nov 27, 2019
359
I remember when Mark Cerny was telling DF that FP16 actually made PS4 8.4TF and the threads about FP16 being the secret sauce. My guess is machine learning is a different use case for these calculations. Can't wait to see wait MS does with it.
 

pswii60

Member
Oct 27, 2017
26,681
The Milky Way
2. ML "stuff" is handled on general SIMDs in RDNA2 and this means that a) using it will consume shading performance and b) it's performance is several times lower than that of Turing's tensor cores.
This, which is why it's somewhat misleading to say "DirectML leverages 24 teraflops of floating point" etc. Sure, it could do, but that would leave 0 teraflops for anything else! Same goes for XSX and PS5. Nvidia's solution with tensor cores is leagues ahead, albeit likely much more expensive which is why we're not seeing it here.
 

gofreak

Member
Oct 26, 2017
7,737
Is INT8/INT4 useful for things like DLSS, or more for inferencing/prediction? I'm not au fait, but at first blush it seems like that level of precision might not be very well suited to image super resolution?

Speaking of rapid packed math, though, what is the story with that in standard RDNA? Did it carry forward? I'm not sure they talked about it so much, but not sure if that's because of a change there, or because they just wanted to renew focus on game performance in their marketing materials.
 

Scently

Member
Oct 27, 2017
1,464
I remember when Mark Cerny was telling DF that FP16 actually made PS4 8.4TF and the threads about FP16 being the secret sauce. My guess is machine learning is a different use case for these calculations. Can't wait to see wait MS does with it.
I think It wasn't utilized much on the PS4 Pro because it was the only console that supports RPM. Add to that the fact that its only suitable for certain things in graphics rendering. With both next-gen systems supporting RPM it will be used a lot more for those things in graphics rendering that are suited to it.
 

cyrribrae

Chicken Chaser
Member
Jan 21, 2019
12,723
I understood next to none of this. But 50 non-self-aware and floppy physics corgis running around a track is enough to sell me on any new technology.
 

Corralx

Member
Aug 23, 2018
1,176
London, UK
I think It wasn't utilized much on the PS4 Pro because it was the only console that supports RPM. Add to that the fact that its only suitable for certain things in graphics rendering. With both next-gen systems supporting RPM it will be used a lot more for those things in graphics rendering that are suited to it.

I don't see why XONEX not supporting RPM would have been a blocker for FP16 adoption. Even without RPM you still get benefits from using FP16 on GCN cards, and you don't have to do anything to get the benefit from RPM, it's completely transparent.
FP16 has been adopted where it makes sense and such a low precision was not not causing issues/artifacts, which is less than a lot of ppl were led to believe by the "magic sauce" claims.
 

dgrdsv

Member
Oct 25, 2017
11,887
So it would be wise not to expect performance near what Nvidia is currently performing.
One thing to keep in mind here is that with both RT and ML performance is completely, 100% dependent on what the devs are actually running on the h/w.
So while "porting" DLSS to XSX would likely lead to sub par performance it is quite possible that another, less "AI'ish" approach would work here, with similar results.
Control's original DLSS implementation doesn't even use tensor cores for example. Who's to say that something like that can't be run on FP16 precision on RDNA2?
This whole area is terra incognita right now, there will be a lot of research soon into how exactly RDNA2's ML capabilities can be used in gaming graphics. And it's very possible that we may get some unexpectedly good results eventually. Case in point: DLSS 2.0.

Is INT8/INT4 useful for things like DLSS, or more for inferencing/prediction?
I feel like anything below FP16 won't be enough for graphical usage. But who knows?
 
Oct 25, 2017
4,427
Silicon Valley
Machine learning and other AI related things are definitely going to be huge next gen. Excited to see how it can be applied to streaming situations.

PS5 will likely use AI in conjunction with gaze-tracking for it's next-gen VR, outside of things akin to DLSS and RT improvements.

So, this is the machine that will become self aware?
Next-gen the console wars truly begin... and we're the enemy!

 

Maple

Member
Oct 27, 2017
11,742
This is the kind of next-gen stuff that gets me excited. Watching Google's DeepMind literally teach itself StarCraft and then fine tune its technique against the best human players in the world, and then win, is just incredible.

Imagine if ML was applied to AI in FPS and RPGs in a similar fashion. Enemies would teach themselves how to counteract your behaviors and attacks - they could end up doing things that would even be unpredictable and surprising to the developers themselves.
 

ILikeFeet

DF Deet Master
Banned
Oct 25, 2017
61,987
Imagine if ML was applied to AI in FPS and RPGs in a similar fashion. Enemies would teach themselves how to counteract your behaviors and attacks - they could end up doing things that would even be unpredictable and surprising to the developers themselves.
that's probably the last thing they want. I've seen enough 2 Minute Paper videos to know that machine learned AI will exploit the rules like a motherfucker
 

BradGrenz

Banned
Oct 27, 2017
1,507
Most ML applications for games don't work at runtime. DLSS uses machine learning to train an algorithm on a supercomputer that is later run on shader cores or tensor cores, depending. Stuff like AI upscaled asset mods on PC are all precomputed, too. If anything, the importance of ML/AI hardware enhancements for actual games this gen is probably overstated.
 

Lukas Taves

Banned
Oct 28, 2017
5,713
Brazil
Most ML applications for games don't work at runtime. DLSS uses machine learning to train an algorithm on a supercomputer that is later run on shader cores or tensor cores, depending. Stuff like AI upscaled asset mods on PC are all precomputed, too. If anything, the importance of ML/AI hardware enhancements for actual games this gen is probably overstated.
The training phase hardly do run in real time indeed, but there's no need to.

For real time uses we are already seeing quite a few cases. Real time raytracing is practically only possible because of machine learning denoising. Machine learning based upscaling techniques are improving a lot in image quality, Ms has shown a very cool demo of machine learning based hdr implementation and DF said that it was on par with a game actually supporting it.

Having all those cases before the gen even started is quite awesome really. Imagine how this will go in the next 5 years specially with both consoles having hardware acceleration for it.
 

dgrdsv

Member
Oct 25, 2017
11,887
Most ML applications for games don't work at runtime. DLSS uses machine learning to train an algorithm on a supercomputer that is later run on shader cores or tensor cores, depending.
Inferencing in FP16 (which is likely what DLSS is using) isn't that much different from learning with the latter just being orders of magnitude slower since it's working with a huge dataset.

Real time raytracing is practically only possible because of machine learning denoising.
No implementation on the market currently uses ML denoising for RT, all denoising is usually handled by TAA extensions. RT doesn't need ML although it can likely be augmented with it in many ways.
 

Filipus

Prophet of Regret
Avenger
Dec 7, 2017
5,135
DLSS is really interesting and a great solution for those without the hardware, would love to see something similar come to consoles. Let's see if something happens this gen and they show us some games.
 

KKRT

Member
Oct 27, 2017
1,544
Yes, I think that Microsoft didn't want to dedicate cost to separate hardware cores such as Turing tensor cores. But they really have a big advantage here with having 52 CUs in general and the specialized support described for 8 bit and 4 bit integers for machine learning is no doubt based on their research in general and plans for how they can leverage this in the Series X for games.
You know that Tensor cores also have support for INT8 and INT4, right?
devblogs.nvidia.com

NVIDIA Turing Architecture In-Depth | NVIDIA Technical Blog

Fueled by the ongoing growth of the gaming market and its insatiable demand for better 3D graphics, NVIDIA® has evolved the GPU into the world’s leading parallel processing engine for many…