Next-gen PS5 and next Xbox speculation launch thread |OT5| - It's in RDNA

What do you think could be the memory setup of your preferred console, or one of the new consoles?


  • Total voters
    1,168
OP - I GOT, I GOT, I GOT

Mecha Meister

Member
Oct 25, 2017
495
I GOT I GOT I GOT I GOT




Previous threads:
PS5 and next Xbox launch speculation - timing and specifications

PS5 and next Xbox launch speculation - Post E3 2018

Next gen PS5 and next Xbox launch speculation - Secret sauces spicing 2019

Next-gen PS5 and next Xbox speculation launch thread - MY ANACONDA DON'T WANT NONE

Official PlayStation 5 Specifications Revealed so far:
  • 7nm 8 Zen 2 Cores (Unknown clock-speeds, and whether it will have SMT)
  • 7nm AMD Radeon NAVI GPU (Unknown clock-speeds and core count)
  • Ray tracing support (Hardware-accelerated? Software-accelerated?)
  • 8K Output support
  • SSD (allegedly faster than PC solutions available at the time of publication)
  • PS4 Backwards compatibility
  • Not coming 2019
Revealed by Mark Cerny in Wired Interview

Official Xbox Scarlet Specifications Revealed so far:
  • Zen 2 CPU (Unknown clock-speeds, and whether it will have SMT)
  • NAVI GPU (Unknown clock-speeds and core count)
  • GDDR6 Memory (Spies speculate that it has 12 memory chips, pottentially giving it a 384 bus-width and total of 24GB of memory with 672GB/s of memory bandwidth)
    On the topic of this, in 2016 Microsoft showcased a render of the Xbox One X's board while the system was in development and the number of chips were able to match up with the number of chips the retail system has. Of-course, things may be subject to change so this is somewhat speculation included with officially revealed information.
  • Up to 8K resolution support
  • Up to 120 fps support
  • Ray tracing support - Hardware-accelerated
  • Variable Refresh Rate Support
  • SSD (Can be used as virtual memory)
  • Backwards Compatible
Revealed in Xbox Project Scarlett - E3 2019 - Reveal Trailer

Patents
Becareful with patents, don't take everything you read in a patent to mean that it will be implemented in a company's next product, as some things that companies patent don't always come to fruition.

PS5 - a patent dive into what might be the tech behind Sony's SSD customisations (technical!)

The TLDR is

- some hardware changes vs the typical inside the SSD (SRAM for housekeeping and data buffering instead of DRAM)
- some extra hardware and accelerators in the system for handling file IO tasks independent of the main CPU
- at the OS layer, a second file system customised for these changes

all primarily aimed at higher read performance and removing potential bottlenecks for data that is written less often than it is read, like data installed from a game disc or download.

Other information regarding the PlayStation 5:

Richard Geldreich:
"Next-gen console games will be all about extremely efficient and optimized geometry and texture data distribution. I saw a public Sony demo that used extreme amounts of texture and other content in a dense city. Microsoft will have to keep up."



Next Xbox rumours:
There's rumoured to be two models, Codenamed Lockhart and Anaconda, with one being more powerful than the other.

Windows Central: Xbox 'Scarlett,' 'Anaconda' and 'Lockhart:' Everything (we think) we know

What we know about RDNA:
  • Allegedly a new 7nm GPU architecture
  • New Compute Unit Design with improved efficiency and increased IPC allegedly offering 1.25x performance per clock
  • Features higher clock speeds and gaming performance at lower power requirements
  • First RDNA GPUs available in July, starting with the RX 5700 series GPUs
  • An RX 5700 GPU was shown to perform 10% faster than the RTX 2070 in Strange Brigade (However this game is known to perform better on AMD GPUs than NVIDIA GPUs, so it doesn't necessarily tell the whole story performance-wise)


RDNA Details:
The RX 5700 GPUs are not necessarily what's going to be used in the PS5 and Project Scarlet.
So far no ray tracing capabilities have been confirmed for these GPUs.

RX 5700 XT
Compute Units: 40
Core Count: 2560
Teraflops: Up to 9.75
Memory: 8GB GDDR6

Boost Clock: Up to 1905 MHz
Game Clock: 1755MHz
Base Clock: 1605MHz

RX 5700
Compute Units: 36
Core Count: 2304
Teraflops: Up to 7.95
Memory: 8GB GDDR6

Boost Clock: Up to 1725 MHz
Game Clock: 1625MHz
Base Clock: 1455MHz

RDNA GPU performance speculation
On the topic of IPC improvements, here is an RTX 2070 review with Strange Brigade tested on Page 2

Nvidia GeForce RTX 2070 Review: Page 2 - Shadow of the TR, Strange Brigade, Monster Hunter: World

It would be ideal to pull results from a review of the RX 5700 GPUs when they release but for now all we have to work with is the numbers AMD have given us, the RX 5700 GPU was shown to be performing around 10% faster than the RTX 2070 in Strange Brigade. Looking at TechSpot's review of this GPU it appears that the RTX 2070 performs within 10% of the Vega 64, so a RX 5700 should theoretically perform similar to, or greater than an air-cooled Vega 64 GPU.
In Anandtech's review of the Vega 64 it appears to hold an average clock speed of 1523MHz out of the 8 games they tested, which means that it's theoretically offering 12.4 teraflops of performance.

I don't recommend using a single game as a point of reference for determining the performance of the RX 5700, but it could be that the RX 5700 offers Vega 64 performance. Presuming RDNA's 1.25x performance per clock translates across to these workloads, then it could be estimated that the RX 5700 in AMD's testing is a around a 10 TF GPU, offering the performance of a 12.4 TF Vega GPU. However my calculation may be off so take this with a grain of salt.
If anyone has anything to add to this then please do!

RDNA die sizes have been estimated to be around 255mm2 and 275mm2, this could mean that the RX 5700 offers Vega 64 performance at around, or almost half the size. As the Vega 64 is a 495mm GPU.

 
Last edited:
anexanhume's performance and die-size estimation

anexanhume

Member
Oct 25, 2017
4,017
Reposting from last thread

Indeed. But is there any chance we can derive the number of stream processors from the die size? (estimation)
Or do we believe the GPU sports 3072 stream processors?
We can try!

We have two unknown variables unfortunately. Clocks and CU count. For reference, I’ll be using Strange Brigade benchmarks from here.

The knowns are:

RTX 2070 + 10% performance in Strange Brigade at 4K. This puts it within a few % of Vega 64, so let’s call them equal for simplicity’s sake.

Architecture gain of 1.25x per clock based on a suite of 30 benchmarks at 4K. This is a good comparison because it’s more likely to stress any memory bandwidth disparities.

Perf/Watt gain of 1.5x over GCN at 14nm. I’ll assume this is Vega 64 and immediately discard the metric. Why? Because we already know Vega 20 enjoys a 1.25x perf/Watt boost over Vega 64, so this is AMD admitting Navi is running at some clock where there are no additional perf/Watt advantages.

I think we should assume a minimum of 40 CUs based on the various leaks, and no more than 52 based on AdoredTV’s numbers.

Vega 64 has a 1250MHz base clock and 1550MHz boost. To draw equal, Navi must make up any CU deficiencies not overcome with the 1.25X factor by clocks. This boosts 40CUs to an effective 50, meaning a 64/50 ratio boost to clocks. 1,600MHz base clock. 1984MHz boost clock. These don’t seem totally far fetched given AMD says Navi clocks better, and Nvidia designs can clock that high.

44CUs: 1450MHz base, 1800MHz boost.
48CUs: 1333MHz base, 1650MHz boost.
52CUs: 1250MHz base, 1550MHz boost. (No change)

What’s also interesting in my mind is CU sizing. If CUs have grown a lot, it really speaks to a lot of architecture rework. Vega 20 fits 64 CUs and a 4096 bit HBM2 controller in 332mm^2. I think it has more negative die area than strictly needed, and my personal belief is because this could have been a hurried refresh as a pipe cleaner, as well as current and power density meant it couldn’t be shrunk further due to IR loss and heat density concerns. Navi is sporting a 255mm^2 die.

Navi has a 256-bit GDDR6 interface, and we know per 128 bits, it’s 1.5-1.75x larger than a 1024 bit HBM2 interface. Let’s assume Navi’s 256 bit and Vega 20’s 4096-bit are roughly equal, rather than GDDR6 being 20-25% smaller. I do this because I assume Navi will have less negative die area.

That means the rest of the area should be roughly equal, and so we can do an approximate CU sizing.

40 CUs: Navi CUs are 23% larger than Vega 20.
44 CUs: 12%
48 CUs: 2%
52 CUs: -5%

In any event, 255mm^2 is a good sign for consoles being able to include a full Navi 10 die along with 70mm^2 Zen 2 design, with some spare room for glue logic and misc. IO. If that leaked dev kit PS5 rumor is true (312mm^2), we’re clearly dealing with a cut down Navi 10 (or a LITE version with smaller CUs?)

Which outcome is better for consoles? I would argue the smaller CU is actually better for consoles, because it makes the clock situation a lot more favorable. I think the 52CU scenario is infeasible because there’s no way AMD would market a GPU with clocks that low and make the statements they did. I think we are likely looking at 44CUs for Navi based on it giving us a 1800MHz boost, which lines up perfectly with Radeon 7, and gives up the 0% perf/Watt advantage of Navi over Vega 20 that we expect. Of course, 40CUs is the best case if you believe the Gonzalo leak, because it tells us that console GPU clocks are actually 10% under desktop GPU boost clocks.

RX 580 clocks are 1257MHz/1340MHz, which means Xbox One X comes within 7% of desktop base clocks and 13% of boost clocks, and so I think Gonzalo clocks are completely believable for a 40CU Navi with ~2000MHz boost clock. They are far-fetched for anything beyond 44CUs.

And drum roll, teraflop time!

Given the clocks are scaled based on CU count, all above configurations have the same metrics. 8.2TF bass, 10.1TF boost. This puts us right in the TF band we expect for consoles (if a bit on the low side). I suspect the RX 5700 is not the top end Navi SKU though (I would expect an 8 or 9 in the name), and there’s probably a full die version with 4-8 more CUs enabled, meaning all the above calculations are going to move up. Given consoles will most likely disable CUs for yield, this may absolutely still be a comparable situation. Conclusion, I remain team 10TF, but they punch like 12.5TF.
 
Colbert's Next Gen Predictions

Colbert

Member
Oct 27, 2017
3,618
Germany
New thread - old habits

I repost my last recent prediction - bookmark it as I will update this prediction every once in a while ...





Remarks/Assumptions:
  • Clock speeds derived from TBP of Navi PC GPUs
  • RDNA Cache architecture allows for lower VRAM bandwidth
  • Both platform holders have the chance to wear the performance crown
  • Expecting an increase in game performance at the same computational power (TFs) just not by the number itself by at least 15%
Change Log:
Rev 9.3: Reworked Lockhart prediction based on NAVI 10 scheme
Rev 9.2: DF video happened ;) New prediction layers: The Bottom Line, The Hope, The Dream
Rev 9.1: Changed lower bound to a different setup (Setup A) with a lower CU count but higher clocks. Upper bound still Setup B (16-06-2019)
Rev 9.0: Incorporated E3 Info from MS, AMD and some secret sauce from my side that will not be disclosed (14-06-2019)
Rev 8.2: Lower clock for the lower bounds . (27-05-2019)
Rev 8.1: Eliminate the difference between PS5/Anaconda but having 2 tier prediction for those consoles
Rev 8: New NAVI architecture makes CUs obsolete. Consolidated to ECC vote as more likely now after Computex info (27-05-2019)
Rev 7.1: Included the Endless Cycle Committee prediction vote into the table after a RfC (25-05-2019)
Rev 7: Completely changed prediction scheme: Lower and Upper bounds provide the range I expect the consoles will land (16-05-2019)
Rev 6.1: Changed from Baseline to Ballpark, Modified specs based on new information available to me
Rev 6: Consolidated to 1 baseline per console, adjusted specs & pricing for all consoles
Rev 5: Added 3rd tier to adapt on AdoredTV table of Navi GPUs
Rev. 4.3: SSD for Lockhart, Lockhart now $399 instead of $349
Rev. 4.2: Increased Memory clock instead of 448Gbps,
 
Last edited:
Colbert's HDD vs SSD vs NVME Speed Comparison: Part 1

Colbert

Member
Oct 27, 2017
3,618
Germany
I think this is a good moment to repost my SSD testing I did some weeks ago ...

In the below picture you see 3 benchmarks for each type of storage you can use in a console except Intel Optane. The tests were performed on my own PC. While the sizes of the mediums differ it will give you still an idea what would be a possible realistic speed estimation.

I only talk about reading operations!

Testsystem:
Motherboard: MSI B450 Tomahawk | CPU: AMD R7 2700X @ stock | Memory: 32GB DDR4-3000Mhz | GPU: MSI RX Vega 56 Airboost OC watercooled by a Eiswolf GPX Pro 120 | Storage Drives: NVMe = Samsung 970 EVO 500GB (4x PCIe Gen3), SSD = SanDisk Ultra 2TB (SATA3), HDD = Seagate Barracuda 4TB 7200rpm (SATA3)




If someone is not familiar with this kind of benchmark I now do some explaining what the 4 tests are about
  • The first test is a sequential read of 1GB of data with 32 I/O queues by 1 thread (not a valid use case for streaming data to a game, but initial loading)
  • The second test is a random reads of 4KB data objects from a 1GB data file with 8 I/O queues done by 8 threads (not likely in a console because of the 8 threads)
  • The third test is random reads of 4KB data objects from a 1GB data file with 32 I/O queues done by 1 thread (a typical console use case)
  • The fourth test is random reads of 4KB data objects from a 1GB data file with 1 I/O queue done by 1 thread (a worst case scenario)
Analysis:
  • A single non-RAID HDD is 50 times worse than any decent SATA3 SSD in random access data read for everything other than sequential read which is not your typical in game streaming use case. So any SSD will be already a huge jump in streaming-to-game capabilities even not optimized! And that is just the worst case scenario. test pattern 3 looks even better where we see 100 times the performance. I repeat: 100 times the performance on the most common access pattern you will find on any system.
  • test pattern 1 is showing a 6.4 times increase in speed in favor for the NVMe. The open question is how often you will see that pattern with your games other than initial loading or copying/moving data from drive a to drive b ...
  • In test pattern 2 you see a difference by still 4.35 times but I ask if this a valid use case for consoles because you run 8 dedicated threads while you have to maintain your frame rate. Maybe someone actually have deeper knowledge there can shed some led on it....
  • In test pattern 3 the NVMe advantage is reduced to a merely 37% on a pattern which is normally your bread and butter access pattern on a PC and maybe on a console too.
  • In test pattern 4 which is the worst case scenario my NVMe is just 22.5% better than a SATA3 SSD.
Conclusion:

I am aware that a real world access pattern would be a mix of those tested access patterns but the differences in speed between PC SSDs is not as high if you leave the roam of best case scenarios. But there is also a high chance that a next gen console game optimized for SSD speeds would actually target nearer to the worst case than to the best case to be able to be on-premise almost 100%.

TL;DR
The jump between HDD and SSD is a huge generational leap. The differences in SSD speeds are not as high as many expect them to be.
 
Last edited:
Colbert's HDD vs SSD vs NVME Speed Comparison: Part 2

Colbert

Member
Oct 27, 2017
3,618
Germany
About SSD speeds:

Because of a comment of chris 1515 that my first speed test for drives wasn't addressing bigger data block sizes (less blocks in an allocation table) I was looking for a tool that was able to allow me to alter those block sizes for the speed tests.

I now was able to run a couple of tests with the same NVMe, SSD and HDD as the first one and the diagram below shows you the results:

Testsystem:
Motherboard: MSI B450 Tomahawk | CPU: AMD R7 2700X @ stock | Memory: 32GB DDR4-3000Mhz | GPU: MSI RX Vega 56 Airboost OC watercooled by a Eiswolf GPX Pro 120 | Storage Drives: NVMe = Samsung 970 EVO 500GB (4x PCIe Gen3), SSD = SanDisk Ultra 2TB (SATA3), HDD = Seagate Barracuda 4TB 7200rpm (SATA3)



You can see that a normal SSD stagnates way earlier in its speed and a SSD is 2.4 times faster than my HDD at a maximum.
Different story with a NVMe. There is already a triple the performance beginning with a 32KB block size (3.6 times faster than my HDD). This goes up to 12.7 times faster than my HDD at a 2MB block size.

While the very big block sizes are good for the speed of the system, the bigger they are the more space will allocated for smaller files. Which means you need to find a balance between wasting storage space and speed.

Conclusion:
To reach gains like showed in the PS5 demo you will probably need a storage implementation around 2000 MB/s. For me personally I would see the sweet spot of a data block at 256KB without knowing the typical files sizes on a console.

Testmethod:
Random Reads with 1 thread and 1 queue from a 1GB file!
 
Last edited:
Colbert's thoughts about NAVI GPU setups for next gen consoles

Colbert

Member
Oct 27, 2017
3,618
Germany
💬 NAVI GPU layouts analyzed 💬

Some more info how did I land with number of stream processors in my most recent prediction after E3:

With the new RDNA architecture the old ways to design the GPU wasn't anymore. Less and wider Shader Engines. So I asked myself in what steps you could actually expand the number of CUs without producing gaps in die space that potentially would end up as dark silicon.

My results are shown with the below picture after the wall of text ;)

Setup A shows a possible design with up to 22 WGPs (48 CUs) but as usual I assume 1 WGP deactived per SE for yield purposes (NA).

Setup B is the next iteration of expanding on WGPs (and CUs).

With Setup C I also followed the idea of anexanhume with 3 shader engines but rejected it because it comes with a lot of tiles for just 1 WGP.

If we assume each block consumes the same die space (which I know is not the case but just as an approximation) we get the following results:
  • Setup A adds 12.5 % tile space for additional 4 active CUs
  • Setup B adds 25.0 % tile space for additional 12 active CUs
  • Setup C adds 50.0 % tile space for additional 14 active CUs
So looking into the numbers I had to go with Setup B as it gives you the most bang for the buck! (CU/tile space increase ratio)!!

Btw, the inactive WGPs (NA) were picked randomly.

 
Last edited: