• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.

Deleted member 10675

User requested account closure
Banned
Oct 27, 2017
990
Madrid
Mesh Shading Pipeline

A new, two-stage pipeline alternative supplments the classic attribute fetch, vertex, tessellation, geometry shader pipeline. This new pipeline consists of a task shader and mesh shader:

  • Task shader : a programmable unit that operates in workgroups and allows each to emit (or not) mesh shader workgroups

  • Mesh shader : a programmable unit that operates in workgroups and allows each to generate primitives

The mesh shader stage produces triangles for the rasterizer using the above-mentioned cooperative thread model internally. The task shader operates similarly to the hull shader stage of tessellation, in that it is able to dynamically generate work. However, like the mesh shader, the task shader also uses a cooperative thread mode. Its input and output are user defined instead of having to take a patch as input and tessellation decisions as output.

The interfacing with the pixel/fragment shader is unaffected. The traditional pipeline is still available and can provide very good results depending on the use-case. Figure 4 highlights the differences in the pipeline styles.

meshlets_pipeline.png


The new mesh shader pipeline provides a number of benefits for developers:

  • Higher scalability through shader units by reducing fixed-function impact in primitive processing. The generic purpose use of modern GPUs helps a greater variety of applications to add more cores and improve shader's generic memory and arithmetic performance.

  • Bandwidth-reduction, as de-duplication of vertices (vertex re-use) can be done upfront, and reused over many frames. The current API model means the indexbuffers have to be scanned by the hardware every time. Larger meshlets mean higher vertex re-use, also lowering bandwidth requirements. Furthermore developers can come up with their own compression or procedural generation schemes.
    The optional expansion/filtering via task shaders allows to skip fetching more data entirely.

  • Flexibility in defining the mesh topology and creating graphics work. The previous tessellation shaders were limited to fixed tessellation patterns while geometry shaders suffered from an inefficient threading, unfriendly programming model which created triangle strips per-thread.

Mesh shading follows the programming model of compute shaders, giving developers the freedom to use threads for different purposes and share data among them. When rasterization is disabled, the two stages can also be used to do generic compute work with one level of expansion.

More info:

https://devblogs.nvidia.com/introduction-turing-mesh-shaders/

https://www.nvidia.com/content/dam/...{"num":168,"gen":0},{"name":"XYZ"},106,720,0]

Pretty interesting, first big change in rendering pipeline since unified shaders.
 

gofreak

Member
Oct 26, 2017
7,735
This is what AMD was attempting with primitive shaders.

Yeah. It seemed to get shuffled off due to API or other problems in Vega. Hopefully they get that fixed for Navi onwards.

I'll need to read more about it. At a glance some of the use cases sound a lot like what people already do with compute shaders to get around cpu draw call perf problems.
 

Durante

Dark Souls Man
Member
Oct 24, 2017
5,074
Thanks for the headsup for that devblog article!

I spotted this in the whitepaper and was wondering how it is actually supposed to be accessed -- my speculation was a Vulkan extension and it seems that's true. (And an unofficial NVAPI "extension" to DX12 -- stupid inextensible APIs :P)
 

Echo

Banned
Oct 29, 2017
6,482
Mt. Whatever
Is this something games will have to be programmed around or are the new GPU's smart enough to automate this?

Can we realistically expect developers to use this when next-gen consoles are confirmed to be AMD hardware again?
 

anexanhume

Member
Oct 25, 2017
12,913
Maryland
Is this something games will have to be programmed around or are the new GPU's smart enough to automate this?

Can we realistically expect developers to use this when next-gen consoles are confirmed to be AMD hardware again?
Yes, we can expect them to use it, as AMD has their own version of this. It's a necessary paradigm for the industry.

As Durante points out, devs simply need to address the appropriate API. The summary also calls it out as optional.
 

Dark1x

Digital Foundry
Verified
Oct 26, 2017
3,530
Very interesting stuff. Turing is certainly proving to be a forward looking design which may disappoint some PC gamers but I'm feeling like this is a GeForce 3 style launch in terms of what it means.
 

Veggen

Member
Oct 25, 2017
1,246
Very interesting stuff. Turing is certainly proving to be a forward looking design which may disappoint some PC gamers but I'm feeling like this is a GeForce 3 style launch in terms of what it means.
Do you have to go back all the way to GF3 for a similar change in pipelines?
 

Durante

Dark Souls Man
Member
Oct 24, 2017
5,074
Having now read the blog post, it seems what this ultimately does is codify the "bleeding edge" OGL paradigm of generating your mesh data using a compute program and then multidrawing it, but with more flexibility and better performance (e.g. because it can remain on-chip/cached throughout). "Just give me an index and a data structure pointer" is actually in a way a more natural paradigm in a modern renderer than the whole input binding thing that traditional shaders still use.

As Durante points out, devs simply need to address the appropriate API. The summary also calls it out as optional.
I'd remove the "simply" there ;)

It's pretty amazing in a way that NV do this now at the same time as introducing RTX.
Taken together it can be argued that this is the largest change -- in a single generational step -- in how GPUs can be programmed for 3D graphics since he introduction of programmable shaders.
 

anexanhume

Member
Oct 25, 2017
12,913
Maryland
Having now read the blog post, it seems what this ultimately does is codify the "bleeding edge" OGL paradigm of generating your mesh data using a compute program and then multidrawing it, but with more flexibility and better performance (e.g. because it can remain on-chip/cached throughout). "Just give me an index and a data structure pointer" is actually in a way a more natural paradigm in a modern renderer than the whole input binding thing that traditional shaders still use.

I'd remove the "simply" there ;)

It's pretty amazing in a way that NV do this now at the same time as introducing RTX.
Taken together it can be argued that this is the largest change -- in a single generational step -- in how GPUs can be programmed for 3D graphics since he introduction of programmable shaders.
Agree. It's a shame this took a backseat to the RT cores, which NV is unwilling to share technical details on.
 

Nostremitus

Member
Nov 15, 2017
7,777
Alabama
20xx isn't next gen lot are going to be quiet now aren't they...
I'm not one of those guys, but I wouldn't say that Nvidia catching up to something AMD was already doing as the thing that makes it next gen, unless you think AMD was a gen ahead of Nvidia with Vega...

If anything, I wonder if Vega performance will increase when games are developed with this sort of pipeline in mind.
 

mugurumakensei

Elizabeth, I’m coming to join you!
Member
Oct 25, 2017
11,320
I'm not one of those guys, but I wouldn't say that Nvidia catching up to something AMD was already doing as the thing that makes it next gen, unless you think AMD was a gen ahead of Nvidia with Vega...

Uh, actually, it's what AMD was attempting with Vega. NVidia actually succeeded with the 2080. AMD probably will with Navi since it had extra time in the oven.
 

Nostremitus

Member
Nov 15, 2017
7,777
Alabama
Uh, actually, it's what AMD was attempting with Vega. NVidia actually succeeded with the 2080. AMD probably will with Navi since it had extra time in the oven.
You need the game engines to shift the way they work with you when you change the way your pipeline works, right?
Otherwise, isn't the card essentially brute-forcing the game engines not designed to use the new pipeline?
 
Nov 5, 2017
240
I'm not one of those guys, but I wouldn't say that Nvidia catching up to something AMD was already doing as the thing that makes it next gen, unless you think AMD was a gen ahead of Nvidia with Vega...

If anything, I wonder if Vega performance will increase when games are developed with this sort of pipeline in mind.
Yeah, because that thing works so well on Vega...
 

Nostremitus

Member
Nov 15, 2017
7,777
Alabama
Yeah, because that thing works so well on Vega...
It only works if APIs and engines adjust to utilize it, right? Otherwise wouldn't you have increased frame times while the engine is waiting for what it expects to receive from the old-style pipeline? With Nvidia adopting the pipeline changes couldn't that actually help Vega?

I'm not saying it will, but asking if it's possible.

I'm assuming you chose to ignore the next three posts in the thread because you wanted to be reactionary instead of having a conversation, though.
 

Locuza

Member
Mar 6, 2018
380
It only works if APIs and engines adjust to utilize it, right? Otherwise wouldn't you have increased frame times while the engine is waiting for what it expects to receive from the old-style pipeline? With Nvidia adopting the pipeline changes couldn't that actually help Vega?

I'm not saying it will, but asking if it's possible.

I'm assuming you chose to ignore the next three posts in the thread because you wanted to be reactionary instead of having a conversation, though.
No because the new pipeline needs explicit developer support and is currently only available through Nvidia specific extensions.
Vega doesn't offer API extensions for developers so even if their solutions were very similiar, developers can't exploit surface/primitive shaders on Vega.
And using the extensions from Nvidia will give Vega nothing to work with.

In general the lack of information is limiting the ability to form a decent comparison and understanding.
Comparing high level diagrams is not the best way of judging the new approaches and how similiar they might be.
And contradicting statements doesn't make it easier.

For one there was the statement from Raja Koduri and Rys Sommefeldt that the stuff would work automatically and there was the statement that NGG and Primitive Shaders would need explicit API support.
Now we know that NGG is dead for Vega:
DnI3UO9W4AAod_0.jpg:large


Without underlying driver support it won't work either way.

For the long term perspective the best way would be for Vulkan and DX12 to have an API update which lays down a common standard for every hardware vendor.
And probably some time will will pass till this happens and I wouldn't hold my breath for Vega being able to support it or for AMD developing support for it.
 

Darktalon

Member
Oct 27, 2017
3,265
Kansas
These types of rendering paradigm shifts take time to reap the benefits, Nvidia has planted the seeds, now we need developers to harvest.
 
Last edited:

Nostremitus

Member
Nov 15, 2017
7,777
Alabama
No because the new pipeline needs explicit developer support and is currently only available through Nvidia specific extensions.
Vega doesn't offer API extensions for developers so even if their solutions were very similiar, developers can't exploit surface/primitive shaders on Vega.
And using the extensions from Nvidia will give Vega nothing to work with.

In general the lack of information is limiting the ability to form a decent comparison and understanding.
Comparing high level diagrams is not the best way of judging the new approaches and how similiar they might be.
And contradicting statements doesn't make it easier.

For one there was the statement from Raja Koduri and Rys Sommefeldt that the stuff would work automatically and there was the statement that NGG and Primitive Shaders would need explicit API support.
Now we know that NGG is dead for Vega:
DnI3UO9W4AAod_0.jpg:large


Without underlying driver support it won't work either way.

For the long term perspective the best way would be for Vulkan and DX12 to have an API update which lays down a common standard for every hardware vendor.
And probably some time will will pass till this happens and I wouldn't hold my breath for Vega being able to support it or for AMD developing support for it.
Thanks. I thought that a Vulkan API standard was on the way that would allow the new pipeline to work with Vega.

I wonder if this was Raja's baby that lost support once he left...
 

Locuza

Member
Mar 6, 2018
380
I remember at the time when AMD first reveal the Vega hardware features that Primitive Shaders would need explicit API support but that AMD thinks that APIs are going in this direction.
AMD and Nvidia have many members on the Khronous Group and they work with Microsoft so it's very likely that behind the curtain there is discussion about a better standard for geometry processing but currently there is nothing official stating that.

In regards if that was Raja's baby, no, at least I don't believe that or won't describe it like that.
Raja was the manager of the whole RTG I don't know on how much exactly he worked himselfs than overseeing project milestones and giving advice/directions.
There is a patent he worked on with others (also Mike Mantor) on a virtual register file design.

If you look up one primitive shader patent from AMD then and often times actually you, will find the name Micheal Mantor which works since decades for AMD:
https://patentimages.storage.googleapis.com/16/34/77/be4393dc5704c5/US20180082399A1.pdf

Inventors:
Todd Martin, Mangesh P.Nijasure, Randy W.Ramsey, Michael Mantor, Laurent Lefebvre

We can only speculate what went wrong for Vega in that regard and why AMD scrapped the support for it.
But they sure as hell didn't scrapped the marketing claims with over 2x the geometry throughput for it, even till now.
 

datschge

Member
Oct 25, 2017
623
I remember at the time when AMD first reveal the Vega hardware features that Primitive Shaders would need explicit API support but that AMD thinks that APIs are going in this direction.
AMD and Nvidia have many members on the Khronous Group and they work with Microsoft so it's very likely that behind the curtain there is discussion about a better standard for geometry processing but currently there is nothing official stating that.

In regards if that was Raja's baby, no, at least I don't believe that or won't describe it like that.
Raja was the manager of the whole RTG I don't know on how much exactly he worked himselfs than overseeing project milestones and giving advice/directions.
There is a patent he worked on with others (also Mike Mantor) on a virtual register file design.

If you look up one primitive shader patent from AMD then and often times actually you, will find the name Micheal Mantor which works since decades for AMD:
https://patentimages.storage.googleapis.com/16/34/77/be4393dc5704c5/US20180082399A1.pdf



We can only speculate what went wrong for Vega in that regard and why AMD scrapped the support for it.
But they sure as hell didn't scrapped the marketing claims with over 2x the geometry throughput for it, even till now.
The oddest part about this whole development is that nothing of it was supported through Mantle either. One would think AMD planning to introduce a paradigm shift in the next GPU gen would also start some preliminary support in their own API while at that, something that would possibly have then leaked into DX12 and Vulkan. Instead AMD managed to outplay itself, managing to push optimizations for their older GPUs into standards that now are again not sufficient to support their latest GPUs, falling back to the whole chicken/egg issue once again, something which forward looking ATi/AMD hardware traditionally suffers on. The efforts clearly were not coordinated.
 

neoak

Member
Oct 25, 2017
15,260
Thanks. I thought that a Vulkan API standard was on the way that would allow the new pipeline to work with Vega.

I wonder if this was Raja's baby that lost support once he left...
ATI/AMD have a history of putting hardware in shipping silicon that's turned off. A backbus in a model that I can't remember for one, eventually was improved and used for the Radeon SSG.

Intel has done this too. They shipped CPUs with Hyperthreading disabled before officially introducing it.

This is done in order to beta test new features on shipping silicon, as well as performance. You need engineering BIOS for the tests, and these usually are in a high security room with restrictions meant to ensure the stuff doesn't work outside of that room.

Nvidia doesn't do this often IIRC. They invest heavily on simulation to get everything right "on paper" before sending it to the factory. It does backfire sometimes, like GF100 and the abysmal yield.
 
Last edited:

Locuza

Member
Mar 6, 2018
380
The oddest part about this whole development is that nothing of it was supported through Mantle either. One would think AMD planning to introduce a paradigm shift in the next GPU gen would also start some preliminary support in their own API while at that, something that would possibly have then leaked into DX12 and Vulkan. Instead AMD managed to outplay itself, managing to push optimizations for their older GPUs into standards that now are again not sufficient to support their latest GPUs, falling back to the whole chicken/egg issue once again, something which forward looking ATi/AMD hardware traditionally suffers on. The efforts clearly were not coordinated.
I don't think thats odd at all, Mantle is practically dead since a long time and irrelevant for game developers.
The only reasonable target for adoption were from the beginning DX12/Vulkan extensions but AMD didn't provide anything.
And you can't synchronize your newest tech features parallel with Microsoft and the Khronos Group where multiple stakeholders are sitting and expect everyone to agree on and then having a suitable core API standard when you launch your product.
That happens quite rarely.

For Nvidia the situation is not different apart from the DXR standard in DX12, the rest is not in the core API of DX12/Vulkan and must be provided through extensions and with additional effort and cooperation for inclusion.
Nvidia has more resources here but their success with the current status will be also limited.
 

datschge

Member
Oct 25, 2017
623
I don't think thats odd at all, Mantle is practically dead since a long time and irrelevant for game developers.
The only reasonable target for adoption were from the beginning DX12/Vulkan extensions but AMD didn't provide anything.
And you can't synchronize your newest tech features parallel with Microsoft and the Khronos Group where multiple stakeholders are sitting and expect everyone to agree on and then having a suitable core API standard when you launch your product.
That happens quite rarely.

For Nvidia the situation is not different apart from the DXR standard in DX12, the rest is not in the core API of DX12/Vulkan and must be provided through extensions and with additional effort and cooperation for inclusion.
Nvidia has more resources here but their success with the current status will be also limited.
You are talking about now, I am talking about the past. I'm fully aware that Mantle is irrelevant now as AMD succeeded in getting its tech into accepted standards as part of DX12 and Vulkan. My point is that the work on Vega happened not yesterday but was a long time coming back when Mantle was still developed and both DX12 and Vulkan didn't exist yet. Yet AMD did nothing to avoid Vega hardware becoming yet another clean break from their concurrent software API efforts. Nvidia is handling these kinds of paradigm shifts better not only PR and market domination wise but also by ensuring that at least their proprietary APIs are pretty much always up to date with their publicized hardware capabilities. AMD traditionally keeps tripping themselves right there.
 

dgrdsv

Member
Oct 25, 2017
11,850
Uh, actually, it's what AMD was attempting with Vega. NVidia actually succeeded with the 2080. AMD probably will with Navi since it had extra time in the oven.
Turing mesh shaders are considerably different from what was promised with primitive shaders for Vega. For one, it's an optional pipeline which still use lots of FF h/w - and it's actually pointed out several times by NV that some things in it is still better left to FF. AMD's primitive shaders were more about fixing their geometry processing weakness through early culling of unnecessary triangles, NV's mesh shaders are more about removing the old pipeline bottlenecks (GS mostly) while providing more control over the execution to developers. From what NV said on this topic it doesn't look like they expect mesh shaders to completely substitute the old VS/GS pipeline any time soon.
 

brainchild

Independent Developer
Verified
Nov 25, 2017
9,478
I haven't had the time to go over the whitepapers yet, but I think this effectively ends the debate about whether or not Turing is just an incremental generational improvement.

Once I have time, I'll probably make a summary of all notable changes mentioned in the papers.
 

trugc

Member
Oct 28, 2017
138
These types of rendering paradigm shifts take time to reap the benefits, Nvidia has planted the seeds, now we need developers to harvest.
Actually this feature has been asked by some developers for a while. Current implementation of cluster/triangle level culling needed for GPU driven pipeline is pretty clumsy and requires platform-specific hack. Adding a compute pipeline before vertex stage would make it much easier to implement and we might see developers quickly port their existing code to the new API.
 

brainchild

Independent Developer
Verified
Nov 25, 2017
9,478
After looking over the whitepaper, I think there should be more emphasis on the significance of the task shader when discussing the pipeline changes with Turing.

Essentially, with task shaders, you are completely elimating the need to do individual draw calls for every object in a scene, and can avoid allocating memory for vertex data that you don't need; you simply categorize objects in a list which the task shader then generates work for. These are significant efficiency improvements to the pipeline, even before we get to the mesh shader, as it can prevent the mesh shader from doing more work than what's necessary.

Also, the whitepaper was pretty insightful in general. I will definitely be making a thread breaking down the important bits so that it's a bit more digestible to the laymen, as the changes with this architecture are so significant that more people should be able to appreciate them.
 

Locuza

Member
Mar 6, 2018
380
My point is that the work on Vega happened not yesterday but was a long time coming back when Mantle was still developed and both DX12 and Vulkan didn't exist yet. Yet AMD did nothing to avoid Vega hardware becoming yet another clean break from their concurrent software API efforts.
How could they avoided that?
Vega came much later than Mantle and Mantle had no future as a 3D API for game development.
It was obviously meant to push the industry in the direction AMD wanted to.

CUDA in contrast is a compute API from Nvidia used for HPC, Data-Center and so on.
It became a standard in the early days when there was no compeling alternative and Nvidia would invest a lot resources into it.
For the work field proprietary solutions aren't such a huge deal.

Mantle couldn't have the same luxary.

Turing mesh shaders are considerably different from what was promised with primitive shaders for Vega. For one, it's an optional pipeline which still use lots of FF h/w - and it's actually pointed out several times by NV that some things in it is still better left to FF. AMD's primitive shaders were more about fixing their geometry processing weakness through early culling of unnecessary triangles, NV's mesh shaders are more about removing the old pipeline bottlenecks (GS mostly) while providing more control over the execution to developers. From what NV said on this topic it doesn't look like they expect mesh shaders to completely substitute the old VS/GS pipeline any time soon.
The Primitve Assembler is a fixed function unit which does the load balancing and culling at AMD and where the crossbar between the Shader Engines is very complex interconnection wise and the load balancing is sub-optimal because the results are getting stored in dedicated parameter caches and if the buffer space on one engine runs out it stalls all the others.
A Primitve Shader would shift that workload to the Compute Units which would do vertex fetching, culling calculations but also screen space divisions and feeding the rasterizer.
AMD would store the data in the local data share and use the interconnect from the Compute Units for data communication.
Also AMD combined some work stages into two, like the Surface Shader would replace the Vertex and Hull Shader stage and the Primitive Shader the Geometry and Domain shader stage.

On a high level Nvidias concept is similiar in several places.
Nvidia does the culling on the Compute Units it doesn't happen on fixed function hardware anymore like the Primitive Assembler on AMD or what Nvidia calls VPCs (View Port Culling).
Also vertex fetching wouldn't be done by fixed function hardware anymore.
They also combine the Vertex and Hull shader stage into what they call a Task Shader and the Geometry and Domain Shader stage into a Mesh Shader stage.

Now Nvidia can operate on more primitives than vertices and the thread mapping seems to be flexibel and parallel, I'm not sure if Surface/Primitive Shaders from AMD would also offer that kind of execution model and further capabilities laid out by Nvidia.
 
Mar 17, 2018
2,927
All of this looks amazing, and I think the next gen is going to benefit heavily from all of this especially VR. I am now thinking about just getting a 2070 when they come out for the extra power I need in a few games.
 

brainchild

Independent Developer
Verified
Nov 25, 2017
9,478
They also combine the Vertex and Hull shader stage into what they call a Task Shader and the Geometry and Domain Shader stage into a Mesh Shader stage.

Just so that people aren't confused, the new shaders aren't just literal combinations of the older shaders; they also have more flexibility, like user-defined inputs.

We shouldn't be too reductive about their functions, otherwise it makes it more difficult for people to understand what makes them new.
 

datschge

Member
Oct 25, 2017
623
How could they avoided that?
Vega came much later than Mantle and Mantle had no future as a 3D API for game development.
It was obviously meant to push the industry in the direction AMD wanted to.
They could have avoided that by creating a forward looking API that covers their forward looking hardware development already in planning if not past planning stage at that point. Development on Mantle was suspended in March 2015, specs for the first announced Vega product (Radeon Instinct MI25) were revealed less than 2 years later in December 2016.

After looking over the whitepaper, I think there should be more emphasis on the significance of the task shader when discussing the pipeline changes with Turing.

Essentially, with task shaders, you are completely elimating the need to do individual draw calls for every object in a scene, and can avoid allocating memory for vertex data that you don't need; you simply categorize objects in a list which the task shader then generates work for. These are significant efficiency improvements to the pipeline, even before we get to the mesh shader, as it can prevent the mesh shader from doing more work than what's necessary.

Also, the whitepaper was pretty insightful in general. I will definitely be making a thread breaking down the important bits so that it's a bit more digestible to the laymen, as the changes with this architecture are so significant that more people should be able to appreciate them.
Looking forward to reading your thread. Any thoughts on how (if?) this could be integrated in existing API standards for the widest possible coverage?
 

brainchild

Independent Developer
Verified
Nov 25, 2017
9,478
Any thoughts on how (if?) this could be integrated in existing API standards for the widest possible coverage?

To be perfectly honest, I'm not sure that achieving ubiquity with existing API standards should be the goal right now. Devs first need to understand how best to take advantage of the paradigm shift in micro-architecture/pipeline, and decide afterward what makes the most sense. It wouldn't do much good to roll out API support for a pipeline that can't fully reap the benefits of said pipeline, imo.
 

dgrdsv

Member
Oct 25, 2017
11,850
Any thoughts on how (if?) this could be integrated in existing API standards for the widest possible coverage?
Should be easy to do in Vulkan. With DX though MS will have to make a new SM - 7.0?

The biggest issue here will likely be in finding the common ground between NV, AMD and Intel before adding this into a common API.
 

Laiza

Member
Oct 25, 2017
2,170
Wow, this is actually pretty huge. I'd love to see a list of games supporting this, the same way we have a list of games supporting DLSS. It'd also be pretty instructive to have a game, or at least a tech demo, that supports both the new and old pipelines so we can get some real performance numbers.

I can see why it's not being advertised as much, though - it's definitely more technical in nature than "machine learning dramatically improves upsampling" or "real, ray-traced reflections at 30+ FPS". Looking forward to that thread, brainchild!
 
Mar 17, 2018
2,927
It definitely is a great thing, but like other people I think these things will be more ingrained into the dev culture on the next round of cards.