wccftech.com: Why The PS5’s 10.3 TFLOPs Figure Is Misleading

Mr.Black · Mar 21, 2020

guitarNINJA said:
I'm not a game programmer, but I do develop GPU accelerated scientific models, so I have a few years experience in optimizing different gpu architectures to accelerate computation. So I have sense of what makes something fast or slow as I've seen it play out on various hardware.

The notion of a 'sustained TF rate' doesn't really make sense. The TF number is a peak theoretical number that will in all likelyhood never be actually hit on either console(possibly occassionally for a few milliseconds at a time, but generally they'll not be computing at that rate). It's not a benchmark, it's just a way of counting components and clocks within a single number. It's actually a lot like a the way a business considers the number of 'man hours' it'll take to perform a task. The actual computational throughput will be determined by things like thread occupancy and how much shared and local memory each thread requires. Since both components use the same architecture, this in theory effects both equally, but it's not nonsense to suggest the higher clockrate gives a bit of help to the PS5 here as it's able to utilize local data (the data sitting in the GPU cache) and shuffle it out for the next piece of data that it needs more readily. It is an effective bandwidth increase on the ram->cache->computation pipeline.

More technical version:
The biggest factor on speeding up a GPU is how much you can saturate the compute units (those CUs that we keep hearing about). If you can get a concurrent thread on each ALU within each CU, and have minimal or no requirement to reach back to VRAM within a kernel call to swap data in and out of the CU cache, then you can get pretty close to your peak throughput. This is, in practice, not common, as the available local storage within the CU is tiny, a few tens of KB shared between all the ALUs. For example, in the Nvidia volta architecutre (which i'm most familiar with), there is a single 256KB block of memory (arranged in 32bit registers) for every thread running on that SM to use for data exclusive to that thread. In a perfect world, every one of the 64 CUDA cores in a single Volta SM would have it's own thread, meaning each one gets ~4KB per thread to store useful data. (There is 96KB of shared memory as well, but I'll ignore this for the moment. It's extremely useful but somewhat immaterial for this explanation). This is not generally practical, in my experience, so you're left with two options, not mutually exclusive. You can reduce the number of concurrent threads, or you can periodically swap data in and out of registers by calling back to VRAM. The former is what Cerny was alluding to when he said it's hard to fill more CUs than fewer, although I somewhat disagree with his characterization in the case where all CUs in both platforms have access to the same relative register and shared memory. I'm not familiar with RDNA2 though so I don't want to comment on that too much. In the latter case, which is almost always necessary to some degree, you can think of an analogy to screen tearing as to what happens under the hood.

Several threads are going about their business, making computations, thread a says 'oh, i need something from vram'. Thread a then gets paused, and thread b gets moved into his place to keep computing while the data for thread a is fetched. Meanwhile, thread b finishes his work, and thread a either is ready to keep going or isnt. This is determined by the latency in access to vram. Now if thread a isn't ready, most likely another thread gets moved in to thread b's place and keeps going. If thread a is ready, then it'll get shuffled back in to pick up the computation where it left off with its new data. The analogy to screen tearing is this. If you have ever played with v-sync on a 60hz monitor, and then on a 144hz monitor, you've probably noticed that screen tearing is far less noticeable than on a slower refresh. This is because the gap between when getting data and being able to use it is smaller. A similar analogy holds with clock speeds in a GPU. A faster clock speed will generally lead to less 'down time' in any given ALU as it is more likely to be ready to go sooner when the requisite data is available.

What I want to point out, is that NONE of this shows up in a TF metric. The underlying reality of swapping data in and out, the various bottlenecks, tradeoffs, etc, all of that is presumed to essentially not exist when discussing TF. However this is one of the biggest considerations when doing optimization, as you have to take into account these facts of life about having threads 'stall', so to speak.

Will this make the PS5 faster in computations than the XSX? In a few cases, possibly, but in general, no it won't. However it does mean that the story on what that gap is isn't as simple as many here are claiming. I expect that the PS5 may generally run at a slightly lower resolution (some quick calculations would put the resolution of 3504x1971 at a hair more than 16% reduction in pixel count), but in many cases I think the gap will be closer than you'd expect from a raw TF count, because the higher clock speed does help 'in the real world' in a somewhat non-linear fashion as compared to raw TF numbers in that it makes the penalty of moving data in and out of local storage smaller. It's not a HUGE difference (at least not in most cases), but it's not nothing.

This was a good read, thanks for that.

It's just sad that posts like these get buried beneath the huge pile that is console warring, drive-by snark and concern trolling.

True_fan · Mar 21, 2020

Articles like this would not be necessary if there wasn't such a massive defense of a poor initial presentation by Sony. We have supposed neutral journalist and gaming fans moving goal posts and changing the importance of teraflops. MSFT screwed up with the X1 and they had to eat it, they were crucified for "secret sauce" The ps5 is not some revolutionary system in comparison to the series X, it's inferior in most areas. Sony is being vague and getting a pass by most of the media.

Deleted member 22750 · Mar 21, 2020

DSP said:
When you are talking about specifications, you have to be, you know, specific. Saying it runs most of the time at this clock but it is variable is not specific. "Typical" and "most of the time" are not specifics, nobody gives specifications like that. You have to give the range of frequency which is the standard practice for every piece of component being sold out there. The fact that he didn't is very telling.

The presentation obviously went through PR before going public and it is apparent that Sony thinks of it as a weakness and they don't want it to be seen as below 10TF. That is what would happen if they give the actual frequency range. The thing is, it doesn't really matter in practice if it is like 9.5TF compared with 10.2, it is just that a single digit number looks bad, it is purely marketing and it is really silly. Think like $9.99 and $11.

What I actually have issues with, is Cerny calling out wider GPUs in his presentation as being difficult to work with. I get his point trying to pitch his own design, but cmon. Every high end GPU design goes wide instead of higher clock because GPUs are parallel processors and vector operations are easily parallelized, that is the entire point of a GPU design. To imply Micorsoft is going to have difficulties extracting performance from their 52CU to make your console look closer is just nonsense. That is all.

Agree

Deleted member 23212 · Mar 21, 2020

audisio said:
Ask that for ps4 vs xbox one...

You deny the ps4 performs much better?

Doesn't change that the games are still almost exactly the same on each console.

Betelgeuse · Mar 21, 2020

DocH1X1 said:
NAVI 10 lite, and people already picked up that Cerni never once mentioned variable rate shading which is associated with RDNA 2. if you dont mention that to a developer briefing then when?

Matt says they both have VRS:

PS5 and Xbox Series speculation |OT11| Cry Havoc and Let Slip the Dogs of War [NEW NEWS, NEW THREAD - CHECK OUT THE STAFF POST]

Uh that sounds like the plan? Reads to me like "see you at E3". They specifically saved stuff like the price and release date which they'll announce at E3 obviously

www.resetera.com

nib95 · Mar 21, 2020

gremlinz1982 said:
2. Cerny also stated that upping clocks would work wonders and allow for the GPU to punch above its weight. We know from RX 5700 that upping clocks does not lead to a commensurate return in gains. What are the chances that the same is true in RDNA2?

Do we?

Here's an article where they overclocked their 5700 from the default 1,850 MHz to 2,005 MHz, a clock frequency increase of 8.4%.

This was their gaming performance increase across different titles.

3D Mark: Increase in GPU score of 12.5%
Assassin's Creed Odyssey: 12% fps increase
Far Cry New Dawn: 10.1% fps increase
F1 2019: 27.4% fps increase
Metro Exodus: 11.8% fps increase
SoT Tomb Raider: 10% fps increase.

These results show that the gaming performance percentage increase, actually exceeds the overclock percentage increase, which discredits your notion and gives further credence to why Sony chose a higher clockspeed. Remember, the GPU in the PS5 is clocked 22% faster at peak than the Series X's (2.23Ghz vs 1.825GHz). We don't really know how RDNA2 will fare.

AMD hid the true power of the Radeon RX 5700… here’s how to unlock it

Unlocking the frequency and power shackles on the second-string Navi GPU makes it a fantastic card

www.pcgamesn.com

Further to that, a 5700 at a core clockspeed of 1,920 MHz gets a Fire Strike GPU score of 25,205, whilst a 5700 with a core clockspeed of 2,194 MHz (closer to the PS5's speed) gets a GPU score of 28,806.

Meaning, a clock speed increase of 14.3% leads to a Fire Strike GPU benchmark score increase of 14.3%. Obviously this isn't a gaming test, but it is interesting nonetheless and shows an almost perfect correlation with increases in frequency clock to actual benchmarking (not Tflop) results.

I scored 23 590 in Fire Strike

AMD Ryzen 5 3600, AMD Radeon RX 5700 x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

I scored 23 737 in Fire Strike

Intel Core i7-6700K Processor, AMD Radeon RX 5700 x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

On a side note, a 5700 at near PS5 clock speeds actually gets a similar GPU score as a 2080 Super which gets 29.3k, and that's a 2,055 MHz clocked GPU at 12.6 turing Tflops that costs a whopping £600-£700 alone.

I scored 25 400 in Fire Strike

AMD Ryzen 7 3800X, NVIDIA GeForce RTX 2080 SUPER x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

asmith906 · Mar 21, 2020

If microsoft had a 14 teraflops machine dont you think they'd be screaming that from the heavens

uzipukki · Mar 21, 2020

c0de said:
He forked and made a few pro Sony accounts on Twitter.

Ahahah. Amazing. I enjoyed his conspiracies. Wild shit.

True_fan · Mar 21, 2020

asmith906 said:
If microsoft had a 14 teraflops machine dont you think they'd be screaming that from the heavens

MSFT has a clear goal of being transparent this time around, so they focused on what they can deliver 100% of the time.

LebGuns · Mar 21, 2020

Astronut325 said:
Am I missing something. I thought all GPUs work on a variable clock. Like... My current desktop GPU isn't going full blast while I'm posting/browsing here.

That's what I thought as well. NX Gamer had a good video about this too showing that standard in PC, CPU and GPU are working in variable frequencies.

mutantmagnet · Mar 21, 2020

True_fan said:
Articles like this would not be necessary if there wasn't such a massive defense of a poor initial presentation by Sony. We have supposed neutral journalist and gaming fans moving goal posts and changing the importance of teraflops. MSFT screwed up with the X1 and they had to eat it, they were crucified for "secret sauce" The ps5 is not some revolutionary system in comparison to the series X, it's inferior in most areas. Sony is being vague and getting a pass by most of the media.

It isn't so much that they are getting a pass as much as they are showing their own ignorance as tech journalists. WCCF article is good as well as the digital Foundry videos but so far I've been disappointed by other tech websites that have commented on this so far.

Very few people in tech media are really putting the effort to be well versed on the technologies they are reporting on.

FancyPants · Mar 21, 2020

Oheao said:
Honestly, all of these numeral differences probably won't mean much for multiplat titles as the lowest common denominator will be the target.

Not a single console generation hasn't taken advantage of the more powerful consoles. Games on XOX in many cases looks significantly better than PS4 Pro games.

Phellps · Mar 21, 2020

Wow, people are really desperate to keep the narrative that the PS5 is supposedly that much weaker based on numbers alone.
If spec numbers were all there was to hardware performance, people wouldn't read GPU reviews before buying anything. Whenever someone claims that we will only know the whole story once these consoles are out, people just brush them aside.
And then we got people here questioning the neutrality of a journalist that made his name by revealing stuff publishers would rather keep under wraps, as if now he's just a Sony shill.
Why is it so important to people to validate their console of choice?
This is so exhausting, there isn't a single thread about next gen that isn't just ridden with fanboys going in circles about the same thing.

radiotoxic · Mar 21, 2020

Phellps said:
And then we got people here questioning the neutrality of a journalist that made his name by revealing stuff publishers would rather keep under wraps, as if now he's just a Sony shill.

Please understand, no one is a bigger Sony shill than the guy that made everyone realize Naughty Dog (wow! is that a Sony studio?!) is one of the worst places to work for in the entire gaming industry (if you are a human being, of course).

Shpeshal Nick · Mar 21, 2020

nib95 said:
12 Tflops is not an average performance figure either though. By your logic Microsoft is misleading people too. In reality the Tflop figure is peak performance only, not average. Infact neither console will have frequency clocks at their maximum simultaneously very often at all.

you got a source for this? Because Microsoft mentioned nothing about their clocks being variable. 52 CU at 1850 (or whatever the number was) is a constant.

LilScooby77 · Mar 21, 2020

Why is the majority speculation super negative about the ps5? Getting ridiculous.

mordecaii83 · Mar 21, 2020

nib95 said:
Do we?

Here's an article where they overclocked their 5700 from the default 1,850 MHz to 2,005 MHz, a clock frequency increase of 8.4%.

This was their gaming performance increase across different titles.

3D Mark: Increase in GPU score of 12.5%
Assassin's Creed Odyssey: 12% fps increase
Far Cry New Dawn: 10.1% fps increase
F1 2019: 27.4% fps increase
Metro Exodus: 11.8% fps increase
SoT Tomb Raider: 10% fps increase.

These results show that the gaming performance percentage increase, actually exceeds the overclock percentage increase, which discredits your notion and gives further credence to why Sony chose a higher clockspeed. Remember, the GPU in the PS5 is clocked 22% faster at peak than the Series X's (2.23Ghz vs 1.825GHz). We don't really know how RDNA2 will fare.

AMD hid the true power of the Radeon RX 5700… here’s how to unlock it

Unlocking the frequency and power shackles on the second-string Navi GPU makes it a fantastic card

www.pcgamesn.com

Further to that, a 5700 at a core clockspeed of 1,920 MHz gets a Fire Strike GPU score of 25,205, whilst a 5700 with a core clockspeed of 2,194 MHz (closer to the PS5's speed) gets a GPU score of 28,806.

Meaning, a clock speed increase of 14.3% leads to a Fire Strike GPU benchmark score increase of 14.3%. Obviously this isn't a gaming test, but it is interesting nonetheless and shows an almost perfect correlation with increases in frequency clock to actual benchmarking (not Tflop) results.

I scored 23 590 in Fire Strike

AMD Ryzen 5 3600, AMD Radeon RX 5700 x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

I scored 23 737 in Fire Strike

Intel Core i7-6700K Processor, AMD Radeon RX 5700 x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

On a side note, a 5700 at near PS5 clock speeds actually gets a similar GPU score as a 2080 Super which gets 29.3k, and that's a 2,055 MHz clocked GPU at 12.6 turing Tflops that costs a whopping £600-£700 alone.

I scored 25 400 in Fire Strike

AMD Ryzen 7 3800X, NVIDIA GeForce RTX 2080 SUPER x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

I fully expect your post to be ignored by people with an agenda to push, even though you just 100% refuted the claim that clock speed gains are lower than expected.

Astronut325 · Mar 21, 2020

LebGuns said:
That's what I thought as well. NX Gamer had a good video about this too showing that standard in PC, CPU and GPU are working in variable frequencies.

Right.... and I don't think the GPU in the PS4 or the X1/X are going full blast while you're on YouTube or on low activity stuff. I thought the TF figures were theoretical maximums.

Lady Gaia · Mar 21, 2020

Shpeshal Ed said:
you got a source for this? Because Microsoft mentioned nothing about their clocks being variable. 52 CU at 1850 (or whatever the number was) is a constant.

Teraflops were always a theoretical peak performance figure, which is why anyone who actually understands the underlying technology has been saying over and over for ages that it's a terrible measure of performance. You don't actually get 12.1 trillion floating point operations per second out of a Series X GPU, because not every instruction is a FMAD, and not all the operands are available without stalling computation pipelines waiting for cache or RAM fetches.

So yes, the theoretical peak is constant. The actual throughput in practice is not, so the "constantness" of the theoretical peak is not really all that relevant.

Phellps · Mar 21, 2020

LilScooby77 said:
Why is the majority speculation super negative about the ps5? Getting ridiculous.

It gets clicks.

mordecaii83 · Mar 21, 2020

Lady Gaia said:
Teraflops were always a theoretical peak performance figure, which is why anyone who actually understands the underlying technology has been saying over and over for ages that it's a terrible measure of performance. You don't actually get 12.1 trillion floating point operations per second out of a Series X GPU, because not every instruction is a FMAD, and not all the operands are available without stalling computation pipelines waiting for cache or RAM fetches.

So yes, the theoretical peak is constant. The actual throughput in practice is not, so the "constantness" of the theoretical peak is not really all that relevant.

That's a really excellent way of explaining things, thank you. :)

PJV3 · Mar 21, 2020

LilScooby77 said:
Why is the majority speculation super negative about the ps5? Getting ridiculous.

Thirst.
I am enjoying the Teraflop stuff, it's so important but not many seem to really understand it.

Dezzy · Mar 21, 2020

PS5 is powerful enough, and I'd bet they'd do Pro after a few years too.
Look at games like Horizon: Zero Dawn on the PS4 Pro. That already looks more than good enough for me.

I think people being negative towards the PS5 are probably feeling that way because the numbers are lower than on the Series X. It's not like they're actually bad though. They'll do amazing things with the PS5 regardless.

Alucardx23 · Mar 21, 2020

Tagyhag said:
Meh, with that SSD it's basically 30 TFLOPS anyway so it doesn't matter.

You win, perfect!!!

Paronth · Mar 21, 2020

If you read the article from digital foundry Cerny makes it clear that rather the cpu and gpu clocks are constantly at their maximum (like xbox serie x or the previous consoles), they adapt to the use that the games need, so if it needs 10.3 Tflops it could, this flexibility probably makes it easier to regulate the cooling.

Put simply, the PlayStation 5 is given a set power budget tied to the thermal limits of the cooling assembly. "It's a completely different paradigm," says Cerny. "Rather than running at constant frequency and letting the power vary based on the workload, we run at essentially constant power and let the frequency vary based on the workload."

Rather than look at the actual temperature of the silicon die, we look at the activities that the GPU and CPU are performing and set the frequencies on that basis - which makes everything deterministic and repeatable," Cerny explains in his presentation.

ResetEraVetVIP · Mar 21, 2020

Tagyhag said:
Meh, with that SSD it's basically 30 TFLOPS anyway so it doesn't matter.

lmao.

Sei · Mar 21, 2020

Still have no idea how the power gets handled in both systems, they've only said a few buzz words. The 30 TLFOPS is just as accurate as this article.

tzare · Mar 21, 2020

guitarNINJA said:
I'm not a game programmer, but I do develop GPU accelerated scientific models, so I have a few years experience in optimizing different gpu architectures to accelerate computation. So I have sense of what makes something fast or slow as I've seen it play out on various hardware.

The notion of a 'sustained TF rate' doesn't really make sense. The TF number is a peak theoretical number that will in all likelyhood never be actually hit on either console(possibly occassionally for a few milliseconds at a time, but generally they'll not be computing at that rate). It's not a benchmark, it's just a way of counting components and clocks within a single number. It's actually a lot like a the way a business considers the number of 'man hours' it'll take to perform a task. The actual computational throughput will be determined by things like thread occupancy and how much shared and local memory each thread requires. Since both components use the same architecture, this in theory effects both equally, but it's not nonsense to suggest the higher clockrate gives a bit of help to the PS5 here as it's able to utilize local data (the data sitting in the GPU cache) and shuffle it out for the next piece of data that it needs more readily. It is an effective bandwidth increase on the ram->cache->computation pipeline.

More technical version:
The biggest factor on speeding up a GPU is how much you can saturate the compute units (those CUs that we keep hearing about). If you can get a concurrent thread on each ALU within each CU, and have minimal or no requirement to reach back to VRAM within a kernel call to swap data in and out of the CU cache, then you can get pretty close to your peak throughput. This is, in practice, not common, as the available local storage within the CU is tiny, a few tens of KB shared between all the ALUs. For example, in the Nvidia volta architecutre (which i'm most familiar with), there is a single 256KB block of memory (arranged in 32bit registers) for every thread running on that SM to use for data exclusive to that thread. In a perfect world, every one of the 64 CUDA cores in a single Volta SM would have it's own thread, meaning each one gets ~4KB per thread to store useful data. (There is 96KB of shared memory as well, but I'll ignore this for the moment. It's extremely useful but somewhat immaterial for this explanation). This is not generally practical, in my experience, so you're left with two options, not mutually exclusive. You can reduce the number of concurrent threads, or you can periodically swap data in and out of registers by calling back to VRAM. The former is what Cerny was alluding to when he said it's hard to fill more CUs than fewer, although I somewhat disagree with his characterization in the case where all CUs in both platforms have access to the same relative register and shared memory. I'm not familiar with RDNA2 though so I don't want to comment on that too much. In the latter case, which is almost always necessary to some degree, you can think of an analogy to screen tearing as to what happens under the hood.

Several threads are going about their business, making computations, thread a says 'oh, i need something from vram'. Thread a then gets paused, and thread b gets moved into his place to keep computing while the data for thread a is fetched. Meanwhile, thread b finishes his work, and thread a either is ready to keep going or isnt. This is determined by the latency in access to vram. Now if thread a isn't ready, most likely another thread gets moved in to thread b's place and keeps going. If thread a is ready, then it'll get shuffled back in to pick up the computation where it left off with its new data. The analogy to screen tearing is this. If you have ever played with v-sync on a 60hz monitor, and then on a 144hz monitor, you've probably noticed that screen tearing is far less noticeable than on a slower refresh. This is because the gap between when getting data and being able to use it is smaller. A similar analogy holds with clock speeds in a GPU. A faster clock speed will generally lead to less 'down time' in any given ALU as it is more likely to be ready to go sooner when the requisite data is available.

What I want to point out, is that NONE of this shows up in a TF metric. The underlying reality of swapping data in and out, the various bottlenecks, tradeoffs, etc, all of that is presumed to essentially not exist when discussing TF. However this is one of the biggest considerations when doing optimization, as you have to take into account these facts of life about having threads 'stall', so to speak.

Will this make the PS5 faster in computations than the XSX? In a few cases, possibly, but in general, no it won't. However it does mean that the story on what that gap is isn't as simple as many here are claiming. I expect that the PS5 may generally run at a slightly lower resolution (some quick calculations would put the resolution of 3504x1971 at a hair more than 16% reduction in pixel count), but in many cases I think the gap will be closer than you'd expect from a raw TF count, because the higher clock speed does help 'in the real world' in a somewhat non-linear fashion as compared to raw TF numbers in that it makes the penalty of moving data in and out of local storage smaller. It's not a HUGE difference (at least not in most cases), but it's not nothing.

Interesting explanation. Makes sense.

gremlinz1982 · Mar 21, 2020

nib95 said:
Do we?

Here's an article where they overclocked their 5700 from the default 1,850 MHz to 2,005 MHz, a clock frequency increase of 8.4%.

This was their gaming performance increase across different titles.

3D Mark: Increase in GPU score of 12.5%
Assassin's Creed Odyssey: 12% fps increase
Far Cry New Dawn: 10.1% fps increase
F1 2019: 27.4% fps increase
Metro Exodus: 11.8% fps increase
SoT Tomb Raider: 10% fps increase.

These results show that the gaming performance percentage increase, actually exceeds the overclock percentage increase, which discredits your notion and gives further credence to why Sony chose a higher clockspeed. Remember, the GPU in the PS5 is clocked 22% faster at peak than the Series X's (2.23Ghz vs 1.825GHz). We don't really know how RDNA2 will fare.

AMD hid the true power of the Radeon RX 5700… here’s how to unlock it

Unlocking the frequency and power shackles on the second-string Navi GPU makes it a fantastic card

www.pcgamesn.com

Further to that, a 5700 at a core clockspeed of 1,920 MHz gets a Fire Strike GPU score of 25,205, whilst a 5700 with a core clockspeed of 2,194 MHz (closer to the PS5's speed) gets a GPU score of 28,806.

Meaning, a clock speed increase of 14.3% leads to a Fire Strike GPU benchmark score increase of 14.3%. Obviously this isn't a gaming test, but it is interesting nonetheless and shows an almost perfect correlation with increases in frequency clock to actual benchmarking (not Tflop) results.

I scored 23 590 in Fire Strike

AMD Ryzen 5 3600, AMD Radeon RX 5700 x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

I scored 23 737 in Fire Strike

Intel Core i7-6700K Processor, AMD Radeon RX 5700 x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

On a side note, a 5700 at near PS5 clock speeds actually gets a similar GPU score as a 2080 Super which gets 29.3k, and that's a 2,055 MHz clocked GPU at 12.6 turing Tflops that costs a whopping £600-£700 alone.

I scored 25 400 in Fire Strike

AMD Ryzen 7 3800X, NVIDIA GeForce RTX 2080 SUPER x 1, 16384 MB, 64-bit Windows 10}

www.3dmark.com

You are looking at the wrong things and coming to a wrong conclusion.

An 18% overclock to 2.1Ghz which is close to the frequency Sony is aiming for got worse returns across the board.

mordecaii83 · Mar 21, 2020

gremlinz1982 said:
You are looking at the wrong things and coming to a wrong conclusion.

An 18% overclock to 2.1Ghz which is close to the frequency Sony is aiming for got worse returns across the board.

That's for the XT (which has 4 more CU's than the non-XT) and it may be bumping against power limits. The 5700 saw much larger gains from an OC.

nib95 · Mar 21, 2020

gremlinz1982 said:
You are looking at the wrong things and coming to a wrong conclusion.

An 18% overclock to 2.1Ghz which is close to the frequency Sony is aiming for got worse returns across the board.

In my post I specifically compare a 5700 that is overclocked to 2,194 MHz, which is even closer to the PS5's clock and shows a correlating improvement in performance.

Just looks like the 5700 responds much better to overclocking than the 5700 XT does, perhaps that's due to the XT hitting power limitations, but as in your original post you only mentioned 5700, that's the one I looked into.

But that is interesting nonetheless. I guess now it's a case of seeing how RDNA2 responds to OC'ing, but presumably Sony is happy enough with the results that they did it.

Dee Harp · Mar 21, 2020

LilScooby77 said:
Why is the majority speculation super negative about the ps5? Getting ridiculous.

Because the Microsoft defense for the last 6 years is that the only reason the PS4 sold more is because it was more powerful. And now that we are in world where people bought PlayStation because it has the games they want to play. It seems like some people want to recreate the power narrative. At least sustained clock speeds as a new thing.

Fatmanp · Mar 21, 2020

mordecaii83 said:
I fully expect your post to be ignored by people with an agenda to push, even though you just 100% refuted the claim that clock speed gains are lower than expected.

The post is discussing an identical chip at different clock speeds. The XSX and PS5 GPUs are different chips. One has nearly 50% more CUs which unless I am mistaken is more like a 2080 Vs an obscenely over overclocked 2060.

nib95 · Mar 21, 2020

Shpeshal Ed said:
you got a source for this? Because Microsoft mentioned nothing about their clocks being variable. 52 CU at 1850 (or whatever the number was) is a constant.

And they're being truthful, but they're referring to peak performance, that's what Tflops refer to. And the XSX could indeed function at sustained peak CPU/GPU clocks were the load to require it.

And whilst I don't have a source, you can test it with PC GPU's that show this in action. Essentially GPU's have DTM States that regulate clocks with voltage depending on demand/load (for efficiency savings). There's no reason to assume that consoles wouldn't have similar efficiency saving features or functions, especially when we can actually see the huge variations in power/wattage usage whilst running console games. That is why for example when playing Gears of War 4 on an Xbox One X, power usage (in wattage) in testing, jumps from well below 100w all the way up to 172w peak, but averaged at around 107w. When the load doesn't require it, the voltages and frequencies change to save energy (and presumably heat and hardware longevity too).

I should clarify, this isn't me saying this is the same at what Cerny is describing, it's not. What Cerny is talking about with the PS5 is different, instead it's a potential minor downclock in the event of hitting a peak power threshold (eg 2% downclock to get back 10% power). The XSX in this uncommon scenario would just keep running at max clocks, but have a power and/or heat spike instead.

rokkerkory · Mar 21, 2020

How many ROPs for each do we know? TMUs as well?

Vinx · Mar 21, 2020

guitarNINJA said:
I'm not a game programmer, but I do develop GPU accelerated scientific models, so I have a few years experience in optimizing different gpu architectures to accelerate computation. So I have sense of what makes something fast or slow as I've seen it play out on various hardware.

The notion of a 'sustained TF rate' doesn't really make sense. The TF number is a peak theoretical number that will in all likelyhood never be actually hit on either console(possibly occassionally for a few milliseconds at a time, but generally they'll not be computing at that rate). It's not a benchmark, it's just a way of counting components and clocks within a single number. It's actually a lot like a the way a business considers the number of 'man hours' it'll take to perform a task. The actual computational throughput will be determined by things like thread occupancy and how much shared and local memory each thread requires. Since both components use the same architecture, this in theory effects both equally, but it's not nonsense to suggest the higher clockrate gives a bit of help to the PS5 here as it's able to utilize local data (the data sitting in the GPU cache) and shuffle it out for the next piece of data that it needs more readily. It is an effective bandwidth increase on the ram->cache->computation pipeline.

More technical version:
The biggest factor on speeding up a GPU is how much you can saturate the compute units (those CUs that we keep hearing about). If you can get a concurrent thread on each ALU within each CU, and have minimal or no requirement to reach back to VRAM within a kernel call to swap data in and out of the CU cache, then you can get pretty close to your peak throughput. This is, in practice, not common, as the available local storage within the CU is tiny, a few tens of KB shared between all the ALUs. For example, in the Nvidia volta architecutre (which i'm most familiar with), there is a single 256KB block of memory (arranged in 32bit registers) for every thread running on that SM to use for data exclusive to that thread. In a perfect world, every one of the 64 CUDA cores in a single Volta SM would have it's own thread, meaning each one gets ~4KB per thread to store useful data. (There is 96KB of shared memory as well, but I'll ignore this for the moment. It's extremely useful but somewhat immaterial for this explanation). This is not generally practical, in my experience, so you're left with two options, not mutually exclusive. You can reduce the number of concurrent threads, or you can periodically swap data in and out of registers by calling back to VRAM. The former is what Cerny was alluding to when he said it's hard to fill more CUs than fewer, although I somewhat disagree with his characterization in the case where all CUs in both platforms have access to the same relative register and shared memory. I'm not familiar with RDNA2 though so I don't want to comment on that too much. In the latter case, which is almost always necessary to some degree, you can think of an analogy to screen tearing as to what happens under the hood.

Several threads are going about their business, making computations, thread a says 'oh, i need something from vram'. Thread a then gets paused, and thread b gets moved into his place to keep computing while the data for thread a is fetched. Meanwhile, thread b finishes his work, and thread a either is ready to keep going or isnt. This is determined by the latency in access to vram. Now if thread a isn't ready, most likely another thread gets moved in to thread b's place and keeps going. If thread a is ready, then it'll get shuffled back in to pick up the computation where it left off with its new data. The analogy to screen tearing is this. If you have ever played with v-sync on a 60hz monitor, and then on a 144hz monitor, you've probably noticed that screen tearing is far less noticeable than on a slower refresh. This is because the gap between when getting data and being able to use it is smaller. A similar analogy holds with clock speeds in a GPU. A faster clock speed will generally lead to less 'down time' in any given ALU as it is more likely to be ready to go sooner when the requisite data is available.

What I want to point out, is that NONE of this shows up in a TF metric. The underlying reality of swapping data in and out, the various bottlenecks, tradeoffs, etc, all of that is presumed to essentially not exist when discussing TF. However this is one of the biggest considerations when doing optimization, as you have to take into account these facts of life about having threads 'stall', so to speak.

Will this make the PS5 faster in computations than the XSX? In a few cases, possibly, but in general, no it won't. However it does mean that the story on what that gap is isn't as simple as many here are claiming. I expect that the PS5 may generally run at a slightly lower resolution (some quick calculations would put the resolution of 3504x1971 at a hair more than 16% reduction in pixel count), but in many cases I think the gap will be closer than you'd expect from a raw TF count, because the higher clock speed does help 'in the real world' in a somewhat non-linear fashion as compared to raw TF numbers in that it makes the penalty of moving data in and out of local storage smaller. It's not a HUGE difference (at least not in most cases), but it's not nothing.

This is an excellent post and should probably be posted into every topic dealing with these systems however, it will be ignored and forgotten by tomorrow.

People with little to no understanding of this has been throwing around "Tflops" so much over the past couple days that it has lost all meaning. Might as well say the Xbox Series X has eleventy flippity flappy floos.

So, again, excellent post but you wasted your time typing that up in an effort to educate people here on the subject. Unfortunately, no one cares. They only want to hear that their box of choice is the one thats the best.

sam777 · Mar 21, 2020

Funny how everyone focused on Tflops back in 2013 when it was 1.81 vs 1.33...

Asbsand · Mar 21, 2020

Tagyhag said:
Meh, with that SSD it's basically 30 TFLOPS anyway so it doesn't matter.

It's graphics processing vs asset loading.

The GPU is the speed of rendering. The hard-drive is the speed of streaming in the data that needs to be rendered. On Xbox Series X the hard-drive will "bottleneck" the GPU but not by much. The PS5 GPU will bottleneck the increase of speed of the SSD, but it only depends on where developers set the bar for graphics.

PS5 will still have faster load times and shit but if what you want is higher image quality you'll go with Xbox this time.

zombiejames · Mar 21, 2020

guitarNINJA said:
I'm not a game programmer, but I do develop GPU accelerated scientific models, so I have a few years experience in optimizing different gpu architectures to accelerate computation. So I have sense of what makes something fast or slow as I've seen it play out on various hardware.

The notion of a 'sustained TF rate' doesn't really make sense. The TF number is a peak theoretical number that will in all likelyhood never be actually hit on either console(possibly occassionally for a few milliseconds at a time, but generally they'll not be computing at that rate). It's not a benchmark, it's just a way of counting components and clocks within a single number. It's actually a lot like a the way a business considers the number of 'man hours' it'll take to perform a task. The actual computational throughput will be determined by things like thread occupancy and how much shared and local memory each thread requires. Since both components use the same architecture, this in theory effects both equally, but it's not nonsense to suggest the higher clockrate gives a bit of help to the PS5 here as it's able to utilize local data (the data sitting in the GPU cache) and shuffle it out for the next piece of data that it needs more readily. It is an effective bandwidth increase on the ram->cache->computation pipeline.

More technical version:
The biggest factor on speeding up a GPU is how much you can saturate the compute units (those CUs that we keep hearing about). If you can get a concurrent thread on each ALU within each CU, and have minimal or no requirement to reach back to VRAM within a kernel call to swap data in and out of the CU cache, then you can get pretty close to your peak throughput. This is, in practice, not common, as the available local storage within the CU is tiny, a few tens of KB shared between all the ALUs. For example, in the Nvidia volta architecutre (which i'm most familiar with), there is a single 256KB block of memory (arranged in 32bit registers) for every thread running on that SM to use for data exclusive to that thread. In a perfect world, every one of the 64 CUDA cores in a single Volta SM would have it's own thread, meaning each one gets ~4KB per thread to store useful data. (There is 96KB of shared memory as well, but I'll ignore this for the moment. It's extremely useful but somewhat immaterial for this explanation). This is not generally practical, in my experience, so you're left with two options, not mutually exclusive. You can reduce the number of concurrent threads, or you can periodically swap data in and out of registers by calling back to VRAM. The former is what Cerny was alluding to when he said it's hard to fill more CUs than fewer, although I somewhat disagree with his characterization in the case where all CUs in both platforms have access to the same relative register and shared memory. I'm not familiar with RDNA2 though so I don't want to comment on that too much. In the latter case, which is almost always necessary to some degree, you can think of an analogy to screen tearing as to what happens under the hood.

Several threads are going about their business, making computations, thread a says 'oh, i need something from vram'. Thread a then gets paused, and thread b gets moved into his place to keep computing while the data for thread a is fetched. Meanwhile, thread b finishes his work, and thread a either is ready to keep going or isnt. This is determined by the latency in access to vram. Now if thread a isn't ready, most likely another thread gets moved in to thread b's place and keeps going. If thread a is ready, then it'll get shuffled back in to pick up the computation where it left off with its new data. The analogy to screen tearing is this. If you have ever played with v-sync on a 60hz monitor, and then on a 144hz monitor, you've probably noticed that screen tearing is far less noticeable than on a slower refresh. This is because the gap between when getting data and being able to use it is smaller. A similar analogy holds with clock speeds in a GPU. A faster clock speed will generally lead to less 'down time' in any given ALU as it is more likely to be ready to go sooner when the requisite data is available.

What I want to point out, is that NONE of this shows up in a TF metric. The underlying reality of swapping data in and out, the various bottlenecks, tradeoffs, etc, all of that is presumed to essentially not exist when discussing TF. However this is one of the biggest considerations when doing optimization, as you have to take into account these facts of life about having threads 'stall', so to speak.

Will this make the PS5 faster in computations than the XSX? In a few cases, possibly, but in general, no it won't. However it does mean that the story on what that gap is isn't as simple as many here are claiming. I expect that the PS5 may generally run at a slightly lower resolution (some quick calculations would put the resolution of 3504x1971 at a hair more than 16% reduction in pixel count), but in many cases I think the gap will be closer than you'd expect from a raw TF count, because the higher clock speed does help 'in the real world' in a somewhat non-linear fashion as compared to raw TF numbers in that it makes the penalty of moving data in and out of local storage smaller. It's not a HUGE difference (at least not in most cases), but it's not nothing.

We need more post and insights like this one here from people with actual experience in these matters. Really good to see.

test_account · Mar 21, 2020

"What that means is that the console will not run the GPU at 2.23 GHz all the time. Since Microsoft's clock is a static number, just because PS5's clock rate is variable makes the 10.28 TFLOPs number uncomparable to the Xbox Series X and misleading. This is because XSX is displaying the "sustained TFLOPs" figure while PS5 is displaying the "peak TFLOPs" figure. To give you some context, when the industry uses the term TFLOPs, it is usually referring to the sustained TFLOPs figure."

Can someone explain the difference to me? So XSX GPU will run constantly at 12TFLOPS with no problem at all, while PS5's GPU wont be able to run at 10.3TFLOP all the time? If so, why not?

Serenity Lotus · Mar 21, 2020

sam777 said:
Funny how warriors focused on Tflops back in 2013 when it was 1.81 vs 1.33...

ftfy :)

rokkerkory · Mar 21, 2020

test_account said:
"What that means is that the console will not run the GPU at 2.23 GHz all the time. Since Microsoft's clock is a static number, just because PS5's clock rate is variable makes the 10.28 TFLOPs number uncomparable to the Xbox Series X and misleading. This is because XSX is displaying the "sustained TFLOPs" figure while PS5 is displaying the "peak TFLOPs" figure. To give you some context, when the industry uses the term TFLOPs, it is usually referring to the sustained TFLOPs figure."

Can someone explain the difference to me? So XSX GPU will run constantly at 12TFLOPS with no problem at all, while PS5's GPU wont be able to run at 10.3TFLOP all the time? If so, why not?

As I understand it, XsX is similar to other consoles of the past with fixed performance with varying power (watt) consumption. This is why sometimes the fan in the system kicks up higher under higher load.

PS5 is designed to run at constant power consumption and will flex cpu/gpu speeds as needed to keep it at a constant power. Cerny said he expects most of the time the 'peak' speeds will be obtained.

Munstre · Mar 21, 2020

test_account said:
"What that means is that the console will not run the GPU at 2.23 GHz all the time. Since Microsoft's clock is a static number, just because PS5's clock rate is variable makes the 10.28 TFLOPs number uncomparable to the Xbox Series X and misleading. This is because XSX is displaying the "sustained TFLOPs" figure while PS5 is displaying the "peak TFLOPs" figure. To give you some context, when the industry uses the term TFLOPs, it is usually referring to the sustained TFLOPs figure."

Can someone explain the difference to me? So XSX GPU will run constantly at 12TFLOPS with no problem at all, while PS5's GPU wont be able to run at 10.3TFLOP all the time? If so, why not?

The article is nonsense. Read the post by guitarNINJA to understand what's going on.

gundamkyoukai · Mar 21, 2020

guitarNINJA said:
I'm not a game programmer, but I do develop GPU accelerated scientific models, so I have a few years experience in optimizing different gpu architectures to accelerate computation. So I have sense of what makes something fast or slow as I've seen it play out on various hardware.

The notion of a 'sustained TF rate' doesn't really make sense. The TF number is a peak theoretical number that will in all likelyhood never be actually hit on either console(possibly occassionally for a few milliseconds at a time, but generally they'll not be computing at that rate). It's not a benchmark, it's just a way of counting components and clocks within a single number. It's actually a lot like a the way a business considers the number of 'man hours' it'll take to perform a task. The actual computational throughput will be determined by things like thread occupancy and how much shared and local memory each thread requires. Since both components use the same architecture, this in theory effects both equally, but it's not nonsense to suggest the higher clockrate gives a bit of help to the PS5 here as it's able to utilize local data (the data sitting in the GPU cache) and shuffle it out for the next piece of data that it needs more readily. It is an effective bandwidth increase on the ram->cache->computation pipeline.

More technical version:
The biggest factor on speeding up a GPU is how much you can saturate the compute units (those CUs that we keep hearing about). If you can get a concurrent thread on each ALU within each CU, and have minimal or no requirement to reach back to VRAM within a kernel call to swap data in and out of the CU cache, then you can get pretty close to your peak throughput. This is, in practice, not common, as the available local storage within the CU is tiny, a few tens of KB shared between all the ALUs. For example, in the Nvidia volta architecutre (which i'm most familiar with), there is a single 256KB block of memory (arranged in 32bit registers) for every thread running on that SM to use for data exclusive to that thread. In a perfect world, every one of the 64 CUDA cores in a single Volta SM would have it's own thread, meaning each one gets ~4KB per thread to store useful data. (There is 96KB of shared memory as well, but I'll ignore this for the moment. It's extremely useful but somewhat immaterial for this explanation). This is not generally practical, in my experience, so you're left with two options, not mutually exclusive. You can reduce the number of concurrent threads, or you can periodically swap data in and out of registers by calling back to VRAM. The former is what Cerny was alluding to when he said it's hard to fill more CUs than fewer, although I somewhat disagree with his characterization in the case where all CUs in both platforms have access to the same relative register and shared memory. I'm not familiar with RDNA2 though so I don't want to comment on that too much. In the latter case, which is almost always necessary to some degree, you can think of an analogy to screen tearing as to what happens under the hood.

Several threads are going about their business, making computations, thread a says 'oh, i need something from vram'. Thread a then gets paused, and thread b gets moved into his place to keep computing while the data for thread a is fetched. Meanwhile, thread b finishes his work, and thread a either is ready to keep going or isnt. This is determined by the latency in access to vram. Now if thread a isn't ready, most likely another thread gets moved in to thread b's place and keeps going. If thread a is ready, then it'll get shuffled back in to pick up the computation where it left off with its new data. The analogy to screen tearing is this. If you have ever played with v-sync on a 60hz monitor, and then on a 144hz monitor, you've probably noticed that screen tearing is far less noticeable than on a slower refresh. This is because the gap between when getting data and being able to use it is smaller. A similar analogy holds with clock speeds in a GPU. A faster clock speed will generally lead to less 'down time' in any given ALU as it is more likely to be ready to go sooner when the requisite data is available.

What I want to point out, is that NONE of this shows up in a TF metric. The underlying reality of swapping data in and out, the various bottlenecks, tradeoffs, etc, all of that is presumed to essentially not exist when discussing TF. However this is one of the biggest considerations when doing optimization, as you have to take into account these facts of life about having threads 'stall', so to speak.

Will this make the PS5 faster in computations than the XSX? In a few cases, possibly, but in general, no it won't. However it does mean that the story on what that gap is isn't as simple as many here are claiming. I expect that the PS5 may generally run at a slightly lower resolution (some quick calculations would put the resolution of 3504x1971 at a hair more than 16% reduction in pixel count), but in many cases I think the gap will be closer than you'd expect from a raw TF count, because the higher clock speed does help 'in the real world' in a somewhat non-linear fashion as compared to raw TF numbers in that it makes the penalty of moving data in and out of local storage smaller. It's not a HUGE difference (at least not in most cases), but it's not nothing.

Thanks for the write up .

Vinx said:
This is an excellent post and should probably be posted into every topic dealing with these systems however, it will be ignored and forgotten by tomorrow.

People with little to no understanding of this has been throwing around "Tflops" so much over the past couple days that it has lost all meaning. Might as well say the Xbox Series X has eleventy flippity flappy floos.

So, again, excellent post but you wasted your time typing that up in an effort to educate people here on the subject. Unfortunately, no one cares. They only want to hear that their box of choice is the one thats the best.

I happy that he post it .
Yeah most will ignored it but some of us happy to gain knowledge and info from people that know more.
Still like you said it will be sad since people not willing to learn and we go around in circles.

Alexx · Mar 21, 2020

LilScooby77 said:
Why is the majority speculation super negative about the ps5? Getting ridiculous.

Because the PS5 is inferior to the XSX in most aspects hence the negative perception. Sony can equalize this with the right price point though.

LiquidSolid · Mar 21, 2020

This is because XSX is displaying the "sustained TFLOPs" figure.

LMAO does wccftech stand for We Can't Comprehend Fucking Tech? Because this is peak dipshittery.

Why is this thread still open?

Iwao · Mar 21, 2020

True_fan said:
Articles like this would not be necessary if there wasn't such a massive defense of a poor initial presentation by Sony. We have supposed neutral journalist and gaming fans moving goal posts and changing the importance of teraflops. MSFT screwed up with the X1 and they had to eat it, they were crucified for "secret sauce" The ps5 is not some revolutionary system in comparison to the series X, it's inferior in most areas. Sony is being vague and getting a pass by most of the media.

This is an embarrassing take. Teraflops are not what you think they are, and devs agree. That actual neutral journalist is only repeating what developers who are working on the machines are saying. If PS5 wasn't some revolutionary system, developers would not be calling it revolutionary. What is this then? Developer bias? Outta here.

LiquidSolid said:
Why is this thread still open?

Agreed. They are misleading people in an article about something they think is misleading.

Ebtesam · Mar 21, 2020

sam777 said:
Funny how everyone focused on Tflops back in 2013 when it was 1.81 vs 1.33...

Yeah it was all about TF but now the situation reversed it about SSD ( that were it's beat Xbox )

Hypocrites

DSP · Mar 21, 2020

test_account said:
"What that means is that the console will not run the GPU at 2.23 GHz all the time. Since Microsoft's clock is a static number, just because PS5's clock rate is variable makes the 10.28 TFLOPs number uncomparable to the Xbox Series X and misleading. This is because XSX is displaying the "sustained TFLOPs" figure while PS5 is displaying the "peak TFLOPs" figure. To give you some context, when the industry uses the term TFLOPs, it is usually referring to the sustained TFLOPs figure."

Can someone explain the difference to me? So XSX GPU will run constantly at 12TFLOPS with no problem at all, while PS5's GPU wont be able to run at 10.3TFLOP all the time? If so, why not?

The peak TF value is a function of frequency. On Series X, the peak is always 12 because it is running at a fixed rate. The peak on PS5 is going to vary because its clock varies, basically this 10.3 figure is the peak of the peak, and that's the point the author is trying to make that these are not directly comparable, if you want to compare you need some kind of average on PS5 that we can't get so he's trying to estimate.

GING-SAMA · Mar 21, 2020

I don't think PS5 need damage control.

wccftech.com: Why The PS5’s 10.3 TFLOPs Figure Is Misleading

Deleted member 22750

User requested account closure

Contains No Misinformation on Philly Cheesesteaks

Attempted to circumvent ban with alt account

Contains No Misinformation on Philly Cheesesteaks

Contains No Misinformation on Philly Cheesesteaks

Self-Requested Ban