I just skimmed over the last 40 pages (again), it's really not easy to keep up. ^^
I do wonder though if AMD will scale up to three Shader-Arrays or even more per SE?
IIRC I didn't mentioned it here but it was news to me that GCN1 already used two Shader Arrays per Shader Engine.
Tahiti for example looks like this:
Every GCN1 chip except Oland and Hainan (low-end chips) used two Shader Arrays per Shader Engine.
Just after that AMD used for every GPU just one SA per SE.
Navi is back to two SAs but of course on Navi they scale together with more logic.
The question is, what does AMD count as a Shader Engine on Navi?
If for example the Shader Processor Input (SPI) is per Shader Engine and feeding the Shader Arrays, like previously, than you might also not want to go beyond two Shader Arrays on Navi per SE.
Probably API reasons.
Up until recently with DirectML there was no common API and software abstraction to utilize Tensor-Cores for anything.
You either have to use CUDA or some propritäry extensions.
ST performance would have been better but total performance under multi threading would have been roughly the same.
Steamroller was a bit larger 29.47mm² (one module = 2 threads) vs. 26.2mm² Jaguar (4 cores = 4 threads).
AMD could have halfed the L2$ (they did it for Excavator) per module and come up with roughly ~ 24mm² but that's a bit of extra work and Steamroller was at least on Desktop already late and just came in 2014.
But the real issue would have been probably the power consumption with Steamroller running at 3.2 Ghz for similar performance.
And just to add on it, Sony/MS would have been able to clock Jaguar at 2.2 or even 2.4 Ghz (AMD did it for embeeded products) but for perf/watt reasons they didn't and invested the power budget into the GPU portion.
Even PS4 Pro and Xbox One just clock Jaguar at 2.1 or 2.3 Ghz.
And it's definietly the best die shot of Zen 2 world wide.
AMD didn't even publish a Zen 2 die shot.
For Zen1 AMD showed a relativly high resolution die shot but it still couldn't hold a candle against Fritzchens Fritz excellent die shots.
I wonder why hobbyist have to provide the world with high quality die shots and in many cases they are the only ones who provide die photos at all.
I might mention it in part 2 but if not the L3$ area is of course interesting in relation to the consoles since it's very likely that they will at least half the size from 16MB to 8MB.
That would reduce the CCX size of ~31.28mm² to ~24.23mm² if you cut it together as I did.
It would be a little bit smaller than a Jaguar CCX of 26.2 mm², kinda crazy how much horsepower you can get for the same area.
Thank you!
Although I don't think I will always be able to include so much cringy
humour :P
Obviously 8TF. /s
Gib es zu, ohne die deutschen Texte in den Screenshots wärst du nie darauf gekommen...