Next-gen PS5 and next Xbox speculation launch thread |OT5| - It's in RDNA

Mecha Meister · May 31, 2019

I GOT I GOT I GOT I GOT

Previous threads:
PS5 and next Xbox launch speculation - timing and specifications

PS5 and next Xbox launch speculation - Post E3 2018

Next gen PS5 and next Xbox launch speculation - Secret sauces spicing 2019

Next-gen PS5 and next Xbox speculation launch thread - MY ANACONDA DON'T WANT NONE

Official PlayStation 5 Specifications Revealed so far:

7nm 8 Zen 2 Cores (Unknown clock-speeds, and whether it will have SMT)
7nm AMD Radeon NAVI GPU (Unknown clock-speeds and core count)
Ray tracing support (Hardware-accelerated? Software-accelerated?)
8K Output support
SSD (allegedly faster than PC solutions available at the time of publication)
PS4 Backwards compatibility
Not coming 2019

Revealed by Mark Cerny in Wired Interview

Official Xbox Scarlet Specifications Revealed so far:

Zen 2 CPU (Unknown clock-speeds, and whether it will have SMT)
NAVI GPU (Unknown clock-speeds and core count)
GDDR6 Memory (~~Spies~~ ~~speculate that it has 12 memory chips, potentially giving it a 384 bus-width and total of 24GB of memory with 672GB/s of memory bandwidth)~~
Updated Memory Speculation:
It's speculated to have between 10 and 12 memory chips, giving it's either a 320 or 384 bit memory bus, ram capacity could range between 12 and 24GB.
On the topic of this, in 2016 Microsoft showcased a render of the Xbox One X's board while the system was in development and the number of chips were able to match up with the number of chips the retail system has. Of-course, things may be subject to change so this is somewhat speculation included with officially revealed information.
Up to 8K resolution support
Up to 120 fps support
Ray tracing support - Hardware-accelerated
Variable Refresh Rate Support
SSD (Can be used as virtual memory)
Backwards Compatible

Revealed in Xbox Project Scarlett - E3 2019 - Reveal Trailer

Patents
Becareful with patents, don't take everything you read in a patent to mean that it will be implemented in a company's next product, as some things that companies patent don't always come to fruition.

PS5 - a patent dive into what might be the tech behind Sony's SSD customisations (technical!)

gofreak said:
The TLDR is

- some hardware changes vs the typical inside the SSD (SRAM for housekeeping and data buffering instead of DRAM)
- some extra hardware and accelerators in the system for handling file IO tasks independent of the main CPU
- at the OS layer, a second file system customised for these changes

all primarily aimed at higher read performance and removing potential bottlenecks for data that is written less often than it is read, like data installed from a game disc or download.

Other information regarding the PlayStation 5:

Richard Geldreich:
"Next-gen console games will be all about extremely efficient and optimized geometry and texture data distribution. I saw a public Sony demo that used extreme amounts of texture and other content in a dense city. Microsoft will have to keep up."

Next Xbox rumours:
There's rumoured to be two models, Codenamed Lockhart and Anaconda, with one being more powerful than the other.

Windows Central: Xbox 'Scarlett,' 'Anaconda' and 'Lockhart:' Everything (we think) we know

What we know about RDNA:

Allegedly a new 7nm GPU architecture
New Compute Unit Design with improved efficiency and increased IPC allegedly offering 1.25x performance per clock
Features higher clock speeds and gaming performance at lower power requirements
First RDNA GPUs available in July, starting with the RX 5700 series GPUs
An RX 5700 GPU was shown to perform 10% faster than the RTX 2070 in Strange Brigade (However this game is known to perform better on AMD GPUs than NVIDIA GPUs, so it doesn't necessarily tell the whole story performance-wise)

The RX 5700 GPUs are not necessarily what's going to be used in the PS5 and Project Scarlet.
So far no ray tracing capabilities have been confirmed for these GPUs.

RDNA Details:

Click to expand...

Click to shrink...

NAVI and Zen 2
The Reviews for the Ryzen 3000 series processors and NAVI GPUs have gone live, we now have a wealth of information regarding the performance of Zen 2 and NAVI.

I've decided to include information regarding these products as Zen 2 and NAVI are going to be integrated in the next generation PlayStation and Xbox, however we're uncertain of how these machines are going to be configured, with regard to the number of stream processors the GPU will have as well as it's clock-speed which dictates it's therotical teraflops. As-well as the clock-speed of the CPU and whether it will have SMT or not.

From the reviews, Zen 2 is revealed to have incredible performance! We're at at state where they are able to rival Intel's offerings and even beat them depending on the workloads.

So far it seems that Intel still has an advantage in gaming performance in the current software that is available and I have seen up-to 20% performance advantages go to Intel's 9900K when tested against AMD's current best mainstream CPU, the 3900X which is a 12 core 24 thread CPU with a base clock of 3.8GHz and a boost clock of 4.6GHz, while the 9900K is an 8 core 16 thread CPU, with a base clock of 3.6GHz and a boost clock of 5GHz.
Intel has clock speed advantage and some chips are capable of reaching 5GHz on all cores, some review sites such as Tom's Hardware tested the 9900K at 5GHz against the 3900X and 3700X.

It seems that AMD has rectified many performance bottlenecks from the first generation Zen CPUs with Zen 2. This is some really amazing stuff! I found TechReport's review of the 3700X and 3900X to be very telling of this, it's a very comprehensive review and they did a fantastic job reviewing these CPUs, check it out here:
Tech Report - AMD's Ryzen 7 3700X and Ryzen 9 3900X CPUs reviewed

From this review I wanted to bring this section to attention, check out GTA V's performance on the Ryzen 3700X and 2700X here:

That's a 24% gain in average fps and signficantly lower frame times, this is ridiculous! This game has been something the Zen architecture has struggled with since it's inception, and Zen 2 seems to have closed the gap signficantly between it and Intel's offerings.
However, regardless of this improvement in GTA V, Intel still have a sizable lead in gaming performance in other games such as Deus Ex: Mankind Divided and Hitman 2 as featured in this review.

Here we see Intel's 9900K leading by 26.6% against the 3700X, and 34.5% against the 3900X.

Where the third-generation Ryzens traded blows with their Intel competitors in Crysis, Deus Ex is a different story altogether. Both of the new CPUs take a back seat to all three of our Intel CPUs in this title. I'm not qualified to pass judgment on why, but if you forced me to guess I might suspect that it has something to do with memory latency.

For more reviews, here's a Ryzen 3000 and Zen 2 Review Thread

On the topic of PC hardware, it is important to note that consoles are designed around fairly signficant power and thermal limitations versus PCs which can afford to have hardware that is consumes signficantly more power and can be more powerful, as a result of this it is unrealistic to expect clock-speeds matching those that are found in things such as AMD's desktop products which can see clock-speeds boosting up-to 4.7GHz on single cores.
The consoles CPUs are likely to be within the range of 2.7 - 3.2GHz to keep the power consumption and thermal output down, and the GPUs may prioritze core count over clock-speed. For example, a GPU configuration such as this:
3072 cores (48 Compute Units)
3072 x 1400 = 8.6 Teraflops
3072 x 1500 = 9.2 Teraflops

Versus desktop NAVI parts which range between 1600-1900MHz
TechPowerUp - AMD Radeon RX 5700 XT Review - Clock Speeds and Power Limit
TechPowerUp - AMD Radeon RX 5700 Review - Clock Speeds and Power Limit
Anandtech - Anandtech 5700 XT and 5700 Review - Power, Noise and Temperatures

The 5700 XT has 40 Compute Units and the 5700 has 36.
5700 XT (Anandtech)
2560 x 1823 = 9.3 Teraflops
5700 (Anandtech)
2304 x 1671 = 7.69 Teraflops (Air-cooled Vega 64 Performance level, sometimes slightly faster than the Vega 64)

TechPowerUp - 5700 XT Performance Summary

RX 5700 Series:
Anandtech 5700 XT and 5700 Review

RDNA GPU performance speculation (Before Reviews)

On the topic of IPC improvements, here is an RTX 2070 review with Strange Brigade tested on Page 2

Nvidia GeForce RTX 2070 Review: Page 2 - Shadow of the TR, Strange Brigade, Monster Hunter: World

It would be ideal to pull results from a review of the RX 5700 GPUs when they release but for now all we have to work with is the numbers AMD have given us, the RX 5700 GPU was shown to be performing around 10% faster than the RTX 2070 in Strange Brigade. Looking at TechSpot's review of this GPU it appears that the RTX 2070 performs within 10% of the Vega 64, so a RX 5700 should theoretically perform similar to, or greater than an air-cooled Vega 64 GPU.
In Anandtech's review of the Vega 64 it appears to hold an average clock speed of 1523MHz out of the 8 games they tested, which means that it's theoretically offering 12.4 teraflops of performance.

I don't recommend using a single game as a point of reference for determining the performance of the RX 5700, but it could be that the RX 5700 offers Vega 64 performance. Presuming RDNA's 1.25x performance per clock translates across to these workloads, then it could be estimated that the RX 5700 in AMD's testing is a around a 10 TF GPU, offering the performance of a 12.4 TF Vega GPU. However my calculation may be off so take this with a grain of salt.
If anyone has anything to add to this then please do!

RDNA die sizes have been estimated to be around 255mm2 and 275mm2, this could mean that the RX 5700 offers Vega 64 performance at around, or almost half the size. As the Vega 64 is a 495mm GPU.

Click to expand...

Click to shrink...

anexanhume · May 31, 2019

Reposting from last thread

Colbert said:
Indeed. But is there any chance we can derive the number of stream processors from the die size? (estimation)
Or do we believe the GPU sports 3072 stream processors?

We can try!

We have two unknown variables unfortunately. Clocks and CU count. For reference, I'll be using Strange Brigade benchmarks from here.

The knowns are:

RTX 2070 + 10% performance in Strange Brigade at 4K. This puts it within a few % of Vega 64, so let's call them equal for simplicity's sake.

Architecture gain of 1.25x per clock based on a suite of 30 benchmarks at 4K. This is a good comparison because it's more likely to stress any memory bandwidth disparities.

Perf/Watt gain of 1.5x over GCN at 14nm. I'll assume this is Vega 64 and immediately discard the metric. Why? Because we already know Vega 20 enjoys a 1.25x perf/Watt boost over Vega 64, so this is AMD admitting Navi is running at some clock where there are no additional perf/Watt advantages.

I think we should assume a minimum of 40 CUs based on the various leaks, and no more than 52 based on AdoredTV's numbers.

Vega 64 has a 1250MHz base clock and 1550MHz boost. To draw equal, Navi must make up any CU deficiencies not overcome with the 1.25X factor by clocks. This boosts 40CUs to an effective 50, meaning a 64/50 ratio boost to clocks. 1,600MHz base clock. 1984MHz boost clock. These don't seem totally far fetched given AMD says Navi clocks better, and Nvidia designs can clock that high.

44CUs: 1450MHz base, 1800MHz boost.
48CUs: 1333MHz base, 1650MHz boost.
52CUs: 1250MHz base, 1550MHz boost. (No change)

What's also interesting in my mind is CU sizing. If CUs have grown a lot, it really speaks to a lot of architecture rework. Vega 20 fits 64 CUs and a 4096 bit HBM2 controller in 332mm^2. I think it has more negative die area than strictly needed, and my personal belief is because this could have been a hurried refresh as a pipe cleaner, as well as current and power density meant it couldn't be shrunk further due to IR loss and heat density concerns. Navi is sporting a 255mm^2 die.

Navi has a 256-bit GDDR6 interface, and we know per 128 bits, it's 1.5-1.75x larger than a 1024 bit HBM2 interface. Let's assume Navi's 256 bit and Vega 20's 4096-bit are roughly equal, rather than GDDR6 being 20-25% smaller. I do this because I assume Navi will have less negative die area.

That means the rest of the area should be roughly equal, and so we can do an approximate CU sizing.

40 CUs: Navi CUs are 23% larger than Vega 20.
44 CUs: 12%
48 CUs: 2%
52 CUs: -5%

In any event, 255mm^2 is a good sign for consoles being able to include a full Navi 10 die along with 70mm^2 Zen 2 design, with some spare room for glue logic and misc. IO. If that leaked dev kit PS5 rumor is true (312mm^2), we're clearly dealing with a cut down Navi 10 (or a LITE version with smaller CUs?)

Which outcome is better for consoles? I would argue the smaller CU is actually better for consoles, because it makes the clock situation a lot more favorable. I think the 52CU scenario is infeasible because there's no way AMD would market a GPU with clocks that low and make the statements they did. I think we are likely looking at 44CUs for Navi based on it giving us a 1800MHz boost, which lines up perfectly with Radeon 7, and gives up the 0% perf/Watt advantage of Navi over Vega 20 that we expect. Of course, 40CUs is the best case if you believe the Gonzalo leak, because it tells us that console GPU clocks are actually 10% under desktop GPU boost clocks.

RX 580 clocks are 1257MHz/1340MHz, which means Xbox One X comes within 7% of desktop base clocks and 13% of boost clocks, and so I think Gonzalo clocks are completely believable for a 40CU Navi with ~2000MHz boost clock. They are far-fetched for anything beyond 44CUs.

And drum roll, teraflop time!

Given the clocks are scaled based on CU count, all above configurations have the same metrics. 8.2TF bass, 10.1TF boost. This puts us right in the TF band we expect for consoles (if a bit on the low side). I suspect the RX 5700 is not the top end Navi SKU though (I would expect an 8 or 9 in the name), and there's probably a full die version with 4-8 more CUs enabled, meaning all the above calculations are going to move up. Given consoles will most likely disable CUs for yield, this may absolutely still be a comparable situation. Conclusion, I remain team 10TF, but they punch like 12.5TF.

Deleted member 12635 · May 31, 2019

THIS POST GETS UPDATED REGULARLY

New thread - old habits

I repost my last recent prediction - bookmark it as I will update this prediction every once in a while ...

The signature meme of my prediction!

GIF made by Chris Metal

The prediction:

Remarks/Assumptions:

Clock speeds derived from TBP of Navi PC GPUs
RDNA Cache architecture allows for lower VRAM bandwidth
Both platform holders have the chance to wear the performance crown
Expecting an increase in game performance at the same computational power (TFs) just not by the number itself by at least 15%

Change Log:
Rev 10.4: Another stealth update. Formatting and Ranks for my predictions (Gold, Silver, Bronze).
Rev 10.3: Stealth update. Introduce clock range for each SOC layout. Results in a range of peak performances of computational power
Rev 10.2: Added SOC die size estimations
Rev 10.1: Lowered spec for "The Bottom", now all layouts have deactivated WGPs
Rev 10.0: Lockhart is gone, forever, ... based on more than one report
Rev 9.3: Reworked Lockhart prediction based on NAVI 10 scheme
Rev 9.2: DF video happened ;) New prediction layers: The Bottom Line, The Hope, The Dream
Rev 9.1: Changed lower bound to a different setup (Setup A) with a lower CU count but higher clocks. Upper bound still Setup B (16-06-2019)
Rev 9.0: Incorporated E3 Info from MS, AMD and some secret sauce from my side that will not be disclosed (14-06-2019)
Rev 8.2: Lower clock for the lower bounds . (27-05-2019)
Rev 8.1: Eliminate the difference between PS5/Anaconda but having 2 tier prediction for those consoles
Rev 8: New NAVI architecture makes CUs obsolete. Consolidated to ECC vote as more likely now after Computex info (27-05-2019)
Rev 7.1: Included the Endless Cycle Committee prediction vote into the table after a RfC (25-05-2019)
Rev 7: Completely changed prediction scheme: Lower and Upper bounds provide the range I expect the consoles will land (16-05-2019)
Rev 6.1: Changed from Baseline to Ballpark, Modified specs based on new information available to me
Rev 6: Consolidated to 1 baseline per console, adjusted specs & pricing for all consoles
Rev 5: Added 3rd tier to adapt on AdoredTV table of Navi GPUs
Rev. 4.3: SSD for Lockhart, Lockhart now $399 instead of $349
Rev. 4.2: Increased Memory clock instead of 448Gbps,

Deleted member 12635 · Jun 6, 2019

I think this is a good moment to repost my SSD testing I did some weeks ago ...

In the below picture you see 3 benchmarks for each type of storage you can use in a console except Intel Optane. The tests were performed on my own PC. While the sizes of the mediums differ it will give you still an idea what would be a possible realistic speed estimation.

I only talk about reading operations!

Testsystem:
Motherboard: MSI B450 Tomahawk | CPU: AMD R7 2700X @ stock | Memory: 32GB DDR4-3000Mhz | GPU: MSI RX Vega 56 Airboost OC watercooled by a Eiswolf GPX Pro 120 | Storage Drives: NVMe = Samsung 970 EVO 500GB (4x PCIe Gen3), SSD = SanDisk Ultra 2TB (SATA3), HDD = Seagate Barracuda 4TB 7200rpm (SATA3)

If someone is not familiar with this kind of benchmark I now do some explaining what the 4 tests are about

The first test is a sequential read of 1GB of data with 32 I/O queues by 1 thread (not a valid use case for streaming data to a game, but initial loading)
The second test is a random reads of 4KB data objects from a 1GB data file with 8 I/O queues done by 8 threads (not likely in a console because of the 8 threads)
The third test is random reads of 4KB data objects from a 1GB data file with 32 I/O queues done by 1 thread (a typical console use case)
The fourth test is random reads of 4KB data objects from a 1GB data file with 1 I/O queue done by 1 thread (a worst case scenario)

Analysis:

A single non-RAID HDD is 50 times worse than any decent SATA3 SSD in random access data read for everything other than sequential read which is not your typical in game streaming use case. So any SSD will be already a huge jump in streaming-to-game capabilities even not optimized! And that is just the worst case scenario. test pattern 3 looks even better where we see 100 times the performance. I repeat: 100 times the performance on the most common access pattern you will find on any system.
test pattern 1 is showing a 6.4 times increase in speed in favor for the NVMe. The open question is how often you will see that pattern with your games other than initial loading or copying/moving data from drive a to drive b ...
In test pattern 2 you see a difference by still 4.35 times but I ask if this a valid use case for consoles because you run 8 dedicated threads while you have to maintain your frame rate. Maybe someone actually have deeper knowledge there can shed some led on it....
In test pattern 3 the NVMe advantage is reduced to a merely 37% on a pattern which is normally your bread and butter access pattern on a PC and maybe on a console too.
In test pattern 4 which is the worst case scenario my NVMe is just 22.5% better than a SATA3 SSD.

Conclusion:

I am aware that a real world access pattern would be a mix of those tested access patterns but the differences in speed between PC SSDs is not as high if you leave the roam of best case scenarios. But there is also a high chance that a next gen console game optimized for SSD speeds would actually target nearer to the worst case than to the best case to be able to be on-premise almost 100%.

TL;DR
The jump between HDD and SSD is a huge generational leap. The differences in SSD speeds are not as high as many expect them to be.

Deleted member 12635 · Jun 7, 2019

About SSD speeds:

Because of a comment of chris 1515 that my first speed test for drives wasn't addressing bigger data block sizes (less blocks in an allocation table) I was looking for a tool that was able to allow me to alter those block sizes for the speed tests.

I now was able to run a couple of tests with the same NVMe, SSD and HDD as the first one and the diagram below shows you the results:

Testsystem:
Motherboard: MSI B450 Tomahawk | CPU: AMD R7 2700X @ stock | Memory: 32GB DDR4-3000Mhz | GPU: MSI RX Vega 56 Airboost OC watercooled by a Eiswolf GPX Pro 120 | Storage Drives: NVMe = Samsung 970 EVO 500GB (4x PCIe Gen3), SSD = SanDisk Ultra 2TB (SATA3), HDD = Seagate Barracuda 4TB 7200rpm (SATA3)

You can see that a normal SSD stagnates way earlier in its speed and a SSD is 2.4 times faster than my HDD at a maximum.
Different story with a NVMe. There is already a triple the performance beginning with a 32KB block size (3.6 times faster than my HDD). This goes up to 12.7 times faster than my HDD at a 2MB block size.

While the very big block sizes are good for the speed of the system, the bigger they are the more space will allocated for smaller files. Which means you need to find a balance between wasting storage space and speed.

Conclusion:
To reach gains like showed in the PS5 demo you will probably need a storage implementation around 2000 MB/s. For me personally I would see the sweet spot of a data block at 256KB without knowing the typical files sizes on a console.

Testmethod:
Random Reads with 1 thread and 1 queue from a 1GB file!

Deleted member 12635 · Jun 16, 2019

💬 NAVI GPU layouts analyzed 💬

Some more info how did I land with number of stream processors in my most recent prediction after E3:

With the new RDNA architecture the old ways to design the GPU wasn't anymore. Less and wider Shader Engines. So I asked myself in what steps you could actually expand the number of CUs without producing gaps in die space that potentially would end up as dark silicon.

My results are shown with the below picture after the wall of text ;)

Setup A shows a possible design with up to 22 WGPs (48 CUs) but as usual I assume 1 WGP deactived per SE for yield purposes (NA).

Setup B is the next iteration of expanding on WGPs (and CUs).

With Setup C I also followed the idea of anexanhume with 3 shader engines but rejected it because it comes with a lot of tiles for just 1 WGP.

If we assume each block consumes the same die space (which I know is not the case but just as an approximation) we get the following results:

Setup A adds 12.5 % tile space for additional 4 active CUs
Setup B adds 25.0 % tile space for additional 12 active CUs
Setup C adds 50.0 % tile space for additional 14 active CUs

So looking into the numbers I had to go with Setup B as it gives you the most bang for the buck! (CU/tile space increase ratio)!!

Btw, the inactive WGPs (NA) were picked randomly.

Liabe Brave · Jun 20, 2019

Here's the actual reason I was going through the Scarlett announce video again. I captured the very short sequence panning over the APU package area. There's a focus pull during this shot, so as it goes different parts of the image are in focus. I used multiple frames to paste together a version combining all parts while they're most clear. Here's the combined shot.

There's no paint-in here--that is, I didn't add any pixels. But obviously, I made choices about where and what to erase and combine, and my decisions may not have preserved accuracy. This image should definitely not be relied on for size measurement, counting of traces, etc.

But it's good for rough conclusions, and I think well demonstrates the evidence for a 320-bit memory interface. There are apparently no RAM chips on the far side, so all would be situated to the left, bottom, and (offscreen) the right of the APU package. The 3 chips on the left fit entirely within the length of the package. The 2 chips at the bottom start outside the package edge, and don't reach the center of the die. It would thus make most sense for there to be 4 chips total at bottom, with 2 offscreen. Logically, there'd then be another 3 chips on the right. Though they can't be seen, the view past the top-right package corner shows no RAM in a corner position...and further down at the very edge, there's a mild dark blur that could be out-of-focus RAM in the expected position.

This adds up to very good evidence for 10 RAM chips in a 3-4-3 configuration. At 32-bit per GDDR6 chip, that's a 320-bit bus. Given 14bps chips (see below), total RAM bandwidth would be 560GB/s. That's about 72% higher than Xbox One X. It's 25% higher than both the new Navi GPUs and the Nvidia RTX 2080, though in Anaconda this would be shared with the CPU. (Note that this is a max figure for these particular GDDR6 chips. In One X, Microsoft downclocked the RAM slightly, so a repeat of that would give lower results.)

As for size, two different Samsung designations can be seen. The middle chip on the left, and probably the right one at bottom, are code 325BM-HC14. This is 16Gb/2GB capacity and 14Gbps speed. The lower chip on the left is code 325BC-HC14, an 8Gb/1GB capacity at the same speed. Presuming symmetry of the left with the proposed unseen chips on the right, we have 10GB known, plus 4 unknown chips. That'd allow 14, 16, or 18GB total.

Here's a rough overhead schematic showing the situation I described. The RAM labeled in yellow is definite, as we can read the part numbers. The RAM labeled in white is from assumption of symmetry. The remaining chips are unknown, and could be 1 or 2GB each.

Hopefully this is helpful for folks that hadn't seen this data fully explicated before. Note that apart from the newly enhanced image, none of this analysis is new from me. It was developed by others earlier, notably DukeBlueBall . But I believe a thorough layout like this will show that the memory bandwidth and (in a tight range) amount are probably known for Anaconda, even though not verbally announced by Microsoft.

Next-gen PS5 and next Xbox speculation launch thread |OT5| - It's in RDNA

What do you think could be the memory setup of your preferred console, or one of the new consoles?

GDDR6

GDDR6 + DDR4

HBM2

HBM2 + DDR4

Recent threadmarks

Mecha Meister

Next-Gen Guru

anexanhume

Deleted member 12635

User requested account closure

Deleted member 12635

User requested account closure

Deleted member 12635

User requested account closure

Deleted member 12635

User requested account closure

Liabe Brave

Professionally Enhanced

Recent threadmarks