Next-gen PS5 and next Xbox speculation launch thread - MY ANACONDA DON'T WANT NONE

Overall maximum teraflops for next-gen launch consoles?

  • 8 teraflops

    Votes: 43 1.9%
  • 9 teraflops

    Votes: 56 2.4%
  • 12 teraflops

    Votes: 978 42.5%
  • 14 teraflops

    Votes: 525 22.8%
  • Team ALL THE WAY UP +14 teraflops

    Votes: 491 21.3%
  • 10 teraflops (because for some reason I put 9 instead of 10)

    Votes: 208 9.0%

  • Total voters
    2,301
Status
Not open for further replies.

gofreak

Member
Oct 26, 2017
2,216
It's quite a lot, and dunno if everyone's interested in the nitty gritty. But here you go:

I found several Japanese SIE patents from Hideyuki Saito along with a single (combined?) US application that appear to be relevant.

The patents were filed across 2015 and 2016.

Note: This is an illustrative embodiment in a patent application. i.e. Maybe parts of it will make it into a product, maybe all of it, maybe none of it.

But it perhaps gives an idea of what Sony has been researching. And does seem in line with what Cerny talked about in terms of customisations across the stack to optimise performance.

http://www.freepatentsonline.com/y2017/0097897.html

There's quite a lot going on, but to try and break it down:

It talks about the limitations of simply using a SSD 'as is' in a games system, and a set of hardware and software stack changes to improve performance.

Basically, 'as is', an OS uses a virtual file system, designed to virtualise a host of different I/O devices with different characteristics. Various tasks of this file system typically run on the CPU - e.g. traversing file metadata, data tamper checks, data decryption, data decompression. This processing, and interruptions on the CPU, can become a bottleneck to data transfer rates from an SSD, particularly in certain contexts e.g. opening a large number of small files.

At a lower level, SSDs typically employ a data block size aimed at generic use. They distribute blocks of data around the NAND memory to distribute wear. In order to find a file, the memory controller in the SSD has to translate a request to the physical addresses of the data blocks using a look-up table. In a regular SSD, the typical data block size might require a look-up table 1GB in size for a 1TB SSD. A SSD might typically use DRAM to cache that lookup table - so the memory controller consults DRAM before being able to retrieve the data. The patent describes this as another potential bottleneck.

Here are the hardware changes the patent proposes vs a 'typical' SSD system:

- SRAM instead of DRAM inside the SSD for lower latency and higher throughput access between the flash memory controller and the address lookup data. The patent proposes using a coarser granularity of data access for data that is written once, and not re-written - e.g. game install data. This larger block size can allow for address lookup tables as small as 32KB, instead of 1GB. Data read by the memory controller can also be buffered in SRAM for ECC checks instead of DRAM (because of changes made further up the stack, described later). The patent also notes that by ditching DRAM, reduced complexity and cost may be possible.

- The SSD's read unit is 'expanded and unified' for efficient read operations.

- A secondary CPU, a DMAC, and a hardware accelerator for decoding, tamper checking and decompression.

- The main CPU, the secondary CPU, the system memory controller and the IO bus are connected by a coherent bus. The patent notes that the secondary CPU can be different in instruction set etc. from the main CPU, as long as they use the same page size and are connected by a coherent bus.

- The hardware accelerator and the IO controller are connected to the IO bus.

An illustrative diagram of the system:





At a software level, the system adds a new file system, the 'File Archive API', designed primarily for write-once data like game installs. Unlike a more generic virtual file system, it's optimised for NAND data access. It sits at the interface between the application and the NAND drivers, and the hardware accelerator drivers.

The secondary CPU handles a priority on access to the SSD. When read requests are made through the File Archive API, all other read and write requests can be prohibited to maximise read throughput.

When a read request is made by the main CPU, it sends it to the secondary CPU, which splits the request into a larger number of small data accesses. It does this for two reasons - to maximise parallel use of the NAND devices and channels (the 'expanded read unit'), and to make blocks small enough to be buffered and checked inside the SSD SRAM. The metadata the secondary CPU needs to traverse is much simpler (and thus faster to process) than under a typical virtual file system.

The NAND memory controller can be flexible about what granularity of data it uses - for data requests send through the File Archive API, it uses granularities that allow the address lookup table to be stored entirely in SRAM for minimal bottlenecking. Other granularities can be used for data that needs to be rewritten more often - user save data for example. In these cases, the SRAM partially caches the lookup tables.

When the SSD has checked its retrieved data, it's sent from SSD SRAM to kernel memory in the system RAM. The hardware accelerator then uses a DMAC to read that data, do its processing, and then write it back to user memory in system RAM. The coordination of this happens with signals between the components, and not involving the main CPU. The main CPU is then finally signalled when data is ready, but is uninvolved until that point.

A diagram illustrating data flow:



Interestingly, for a patent, it describes in some detail the processing targets required of these various components in order to meet certain data transfer rates - what you would need in terms of timings from each of the secondary CPU, the memory controller and the hardware accelerator in order for them not to be a bottleneck on the NAND data speeds:



Though I wouldn't read too much into this, in most examples it talks about what you would need to support a end-to-end transfer rate of 10GB/s.

The patent is also silent on what exactly the IO bus would be - that obviously be a key bottleneck itself on transfer rates out of the NAND devices.

Once again, this is one described embodiment. Not necessarily what the PS5 solution will look exactly like. But it is an idea of what Sony's been researching in how to customise a SSD and software stack for faster read throughput for installed game data.
 
Last edited:

MrDeveus

Member
Apr 26, 2019
70
Navi will be more efficient, giving more graphical power for the same number of watts. But Vega will apparently continue as AMD's flagship gaming card, with Navi launching as a medium tier. (I don't know specifically why that is, but multiple sources say so.) Microsoft want to be undisputed performance leader, so that might make Vega attractive to them. Also, Vega features effectual FP64 calculation, which is rare for gaming but useful for machine learning AI and other advanced tasks. And Microsoft have officially said they want to use the same hardware in the next Xbox and their gaming servers.

It's important to note that this imagined scenario where Sony use Navi and Microsoft use Vega wouldn't have an inevitable winner. It could go all sorts of ways. Maybe Microsoft does get the highest TF card, and their Xbox One X experience lets them cool it effectively without much extra cost. There's rumors that Navi isn't as effective or power-thrifty as meant to be, and maybe PS5 could end up needing extra cooling redesign and downclocks at the last minute, which make it just as expensive as Xbox but notably less powerful.

Or, maybe Vega has higher TF but a lack of efficiency, and the usage of chip space for new server-specific features makes the next Xbox less performant than expected. Meanwhile, a careful Navi design with smartly customized gaming-relevant features allows PS5 to produce graphics basically indistinguishable from Anaconda, but at a much lower price.

In other words, neither choice is necessarily bad. Both have potential drawbacks, but both also have potential advantages for what the different platforms hope to accomplish. And no matter what they design for, some tech bets simply don't pan out.
This is the best explanation I have heard in the last few pages of the Next Xbox Vega vs PS5 Navi situation. Thank you
 

Flutter

Member
Oct 25, 2017
4,651
It's quite a lot, and dunno if everyone's interested in the nitty gritty. But here you go:

I found several Japanese SIE patents from Saito Hideyuki along with a single (combined?) US application that appear to be relevant.

The patents were filed across 2015 and 2016.

Note: This is an illustrative embodiment in a patent application. i.e. Maybe parts of it will make it into a product, maybe all of it, maybe none of it.

But it perhaps gives an idea of what Sony has been researching. And does seem in line with what Cerny talked about in terms of customisations across the stack to optimise performance.

http://www.freepatentsonline.com/y2017/0097897.html

There's quite a lot going on, but to try and break it down:

It talks about the limitations of simply using a SSD 'as is' in a games system, and a set of hardware and software stack changes to improve performance.

Basically, 'as is', an OS uses a virtual file system, designed to virtualise a host of different I/O devices with different characteristics. Various tasks of this file system typically run on the CPU - e.g. traversing file metadata, data tamper checks, data decryption, data decompression. This processing, and interruptions on the CPU, can become a bottleneck to data transfer rates from an SSD, particularly in certain contexts e.g. opening a large number of small files.

At a lower level, SSDs typically employ a data block size aimed at generic use. They distribute blocks of data around the NAND memory to distribute wear. In order to find a file, the memory controller in the SSD has to translate a request to the physical addresses of the data blocks using a look-up table. In a regular SSD, the typical data block size might require a look-up table 1GB in size for a 1TB SSD. A SSD might typically use DRAM to cache that lookup table - so the memory controller consults DRAM before being able to retrieve the data. The patent describes this as another potential bottleneck.

Here are the hardware changes the patent proposes vs a 'typical' SSD system:

- SRAM instead of DRAM inside the SSD for lower latency and higher throughput access between the flash memory controller and the address lookup data. The patent proposes using a coarser granularity of data access for data that is written once, and not re-written - e.g. game install data. This larger block size can allow for address lookup tables as small as 32KB, instead of 1GB. Data read by the memory controller can also be buffered in SRAM for ECC checks instead of DRAM (because of changes made further up the stack, described later). The patent also notes that by ditching DRAM, reduced complexity and cost may be possible.

- The SSD's read unit is 'expanded and unified' for efficient read operations.

- A secondary CPU, a DMAC, and a hardware accelerator for decoding, tamper checking and decompression.

- The main CPU, the secondary CPU, the system memory controller and the IO bus are connected by a coherent bus. The patent notes that the secondary CPU can be different in instruction set etc. from the main CPU, as long as they use the same page size and are connected by a coherent bus.

- The hardware accelerator and the IO controller are connected to the IO bus.

An illustrative diagram of the system:





At a software level, the system adds a new file system, the 'File Archive API', designed primarily for write-once data like game installs. Unlike a more generic virtual file system, it's optimised for NAND data access. It sits at the interface between the application and the NAND drivers, and the hardware accelerator drivers.

The secondary CPU handles a priority on access to the SSD. When read requests are made through the File Archive API, all other read and write requests can be prohibited to maximise read throughput.

When a read request is made by the main CPU, it sends it to the secondary CPU, which splits the request into a larger number of small data accesses. It does this for two reasons - to maximise parallel use of the NAND devices and channels (the 'expanded read unit'), and to make blocks small enough to be buffered and checked inside the SSD SRAM. The metadata the secondary CPU needs to traverse is much simpler (and thus faster to process) than under a typical virtual file system.

The NAND memory controller can be flexible about what granularity of data it uses - for data requests send through the File Archive API, it uses granularities that allow the address lookup table to be stored entirely in SRAM for minimal bottlenecking. Other granularities can be used for data that needs to be rewritten more often - user save data for example. In these cases, the SRAM partially caches the lookup tables.

When the SSD has checked its retrieved data, it's sent from SSD SRAM to kernel memory in the system RAM. The hardware accelerator then uses a DMAC to read that data, do its processing, and then write it back to user memory in system RAM. The coordination of this happens with signals between the components, and not involving the main CPU. The main CPU is then finally signalled when data is ready, but is uninvolved until that point.

A diagram illustrating data flow:



Interestingly, for a patent, it describes in some detail the processing times required of these various components in order to meet certain data transfer targets - what you would need in terms of timings from each of the secondary CPU, the memory controller and the hardware accelerator in order for them not to be a bottleneck on the NAND data speeds. Though I wouldn't read too much into this, it lays out a table of targets from 1GB/s to 20GB/s. In most examples it talks about what you would need to support a end-to-end transfer rate of 10GB/s.

The patent is also silent on what exactly the IO bus would be - that obviously be a key bottleneck itself on transfer rates out of the NAND devices.

Once again, this is one described embodiment. Not necessarily what the PS5 solution will look exactly like. But it is an idea of what Sony's been researching in how to customise a SSD and software stack for faster read throughput for installed game data.
yeah, having a sub CPU isn't something a far of a reach for the playstation. PS4 had an ARM processor.

Which was what I was thinking, what if the VR breakout box and this for the SSD can be done on one multipurpose processor.
 

anexanhume

Member
Oct 25, 2017
4,018
How much do you estimate the ballpark for 16Gb of GDDR6 256 Gbit/s and 24Gb of GDDR6 of 384 bits/s. And with the memory controller too.
Conservatively I would say 35W and 50W, respectively.

This is the best explanation I have heard in the last few pages of the Next Xbox Vega vs PS5 Navi situation. Thank you
The only way Vega is a superior solution is if Navi missed its design targets or MS insisted on having something more capable of double precision and AI for Azure, with Navi being a pure gaming GPU.
 

modiz

Member
Oct 8, 2018
3,769
It's quite a lot, and dunno if everyone's interested in the nitty gritty. But here you go:

I found several Japanese SIE patents from Hideyuki Saito along with a single (combined?) US application that appear to be relevant.

The patents were filed across 2015 and 2016.

Note: This is an illustrative embodiment in a patent application. i.e. Maybe parts of it will make it into a product, maybe all of it, maybe none of it.

But it perhaps gives an idea of what Sony has been researching. And does seem in line with what Cerny talked about in terms of customisations across the stack to optimise performance.

http://www.freepatentsonline.com/y2017/0097897.html

There's quite a lot going on, but to try and break it down:

It talks about the limitations of simply using a SSD 'as is' in a games system, and a set of hardware and software stack changes to improve performance.

Basically, 'as is', an OS uses a virtual file system, designed to virtualise a host of different I/O devices with different characteristics. Various tasks of this file system typically run on the CPU - e.g. traversing file metadata, data tamper checks, data decryption, data decompression. This processing, and interruptions on the CPU, can become a bottleneck to data transfer rates from an SSD, particularly in certain contexts e.g. opening a large number of small files.

At a lower level, SSDs typically employ a data block size aimed at generic use. They distribute blocks of data around the NAND memory to distribute wear. In order to find a file, the memory controller in the SSD has to translate a request to the physical addresses of the data blocks using a look-up table. In a regular SSD, the typical data block size might require a look-up table 1GB in size for a 1TB SSD. A SSD might typically use DRAM to cache that lookup table - so the memory controller consults DRAM before being able to retrieve the data. The patent describes this as another potential bottleneck.

Here are the hardware changes the patent proposes vs a 'typical' SSD system:

- SRAM instead of DRAM inside the SSD for lower latency and higher throughput access between the flash memory controller and the address lookup data. The patent proposes using a coarser granularity of data access for data that is written once, and not re-written - e.g. game install data. This larger block size can allow for address lookup tables as small as 32KB, instead of 1GB. Data read by the memory controller can also be buffered in SRAM for ECC checks instead of DRAM (because of changes made further up the stack, described later). The patent also notes that by ditching DRAM, reduced complexity and cost may be possible.

- The SSD's read unit is 'expanded and unified' for efficient read operations.

- A secondary CPU, a DMAC, and a hardware accelerator for decoding, tamper checking and decompression.

- The main CPU, the secondary CPU, the system memory controller and the IO bus are connected by a coherent bus. The patent notes that the secondary CPU can be different in instruction set etc. from the main CPU, as long as they use the same page size and are connected by a coherent bus.

- The hardware accelerator and the IO controller are connected to the IO bus.

An illustrative diagram of the system:





At a software level, the system adds a new file system, the 'File Archive API', designed primarily for write-once data like game installs. Unlike a more generic virtual file system, it's optimised for NAND data access. It sits at the interface between the application and the NAND drivers, and the hardware accelerator drivers.

The secondary CPU handles a priority on access to the SSD. When read requests are made through the File Archive API, all other read and write requests can be prohibited to maximise read throughput.

When a read request is made by the main CPU, it sends it to the secondary CPU, which splits the request into a larger number of small data accesses. It does this for two reasons - to maximise parallel use of the NAND devices and channels (the 'expanded read unit'), and to make blocks small enough to be buffered and checked inside the SSD SRAM. The metadata the secondary CPU needs to traverse is much simpler (and thus faster to process) than under a typical virtual file system.

The NAND memory controller can be flexible about what granularity of data it uses - for data requests send through the File Archive API, it uses granularities that allow the address lookup table to be stored entirely in SRAM for minimal bottlenecking. Other granularities can be used for data that needs to be rewritten more often - user save data for example. In these cases, the SRAM partially caches the lookup tables.

When the SSD has checked its retrieved data, it's sent from SSD SRAM to kernel memory in the system RAM. The hardware accelerator then uses a DMAC to read that data, do its processing, and then write it back to user memory in system RAM. The coordination of this happens with signals between the components, and not involving the main CPU. The main CPU is then finally signalled when data is ready, but is uninvolved until that point.

A diagram illustrating data flow:



Interestingly, for a patent, it describes in some detail the processing times required of these various components in order to meet certain data transfer targets - what you would need in terms of timings from each of the secondary CPU, the memory controller and the hardware accelerator in order for them not to be a bottleneck on the NAND data speeds. Though I wouldn't read too much into this, it lays out a table of targets from 1GB/s to 20GB/s. In most examples it talks about what you would need to support a end-to-end transfer rate of 10GB/s.

The patent is also silent on what exactly the IO bus would be - that obviously be a key bottleneck itself on transfer rates out of the NAND devices.

Once again, this is one described embodiment. Not necessarily what the PS5 solution will look exactly like. But it is an idea of what Sony's been researching in how to customise a SSD and software stack for faster read throughput for installed game data.
This is certainly interesting! Considering how detailed it is i think it will be worth it to create a thread on this
Also thank you for the great summary of the patent!
 

Deleted member 38397

User requested account closure
Banned
Jan 15, 2018
838
Am I the only one who haven't bought games for say the last 10 months, waiting for the PS5?
Nope, I haven't picked up Spider-Man, Days Gone yet. Nor will I buy Ghosts of Tsushima, Death Stranding, The Last Of Us Pt II because I will wait to see if there are PS5 versions. It's unlikely the PS store will see another penny from me until the PS5 launches now. I've got a massive backlog on PC which should keep me occupied for the next 18 months (and I'm hoping MS launch Game Pass for PC at E3).
 

KennyX

Member
Nov 21, 2018
1,510
But RAM TDP is a cost. It reduces what you can do with other element of the console.

On PS4, 8Gb of GDDR5 was 40 to 50 watts. 16 Gb of GDDR6 will be more than this probably between 60 to 70 watts and 24 Gb of GDDR6 on a 384 bits would be 90 to 100 watts. This is crazy.

It will limit the potential of growth of the GPU. I will try to find it but I read a Nvidia study about a new type of RAM coming when HBM will reach consumption limit. It was not something recent because it compares HBM against GDDR5 but they said the main challenge for GPU is the RAM power consumption.
What? Maybe on the very first model using 16 GDDR5 chips in the worst case scenario, but not for a normal RAM configuration.
24GB GDDR6 in non clamshell will not be close to 100 watts.

Half of that is a better estimate.
 

DrKeo

Member
Mar 3, 2019
479
Israel
Again read my answer and you made yourself the calculation 16GB GDDR 6 with 256 Gbit/s and 24 GDDR6 384 Gbit/s are already huge problem for power consumption on console scale...

8 Gb of GDDR5 was already a huge part of PS4 TDP.
I think that it will be easier to calcul
Conservatively I would say 35W and 50W, respectively.
No interest of HBM 2 and why people keep saying HBM2 is more interesting than GDDR6? Is the Rambus engineer nuts? For low level HBM2 they have less perfomance per watts than GDDR6.



The only way Vega is a superior solution is if Navi missed its design targets or MS insisted on having something more capable of double precision and AI for Azure, with Navi being a pure gaming GPU.
If the X has 12 GDDR5 chips in a 384-bit setup, shouldn't GDDR6 with the same setup actually draw less power than the X's setup? Every article I've read about GDDR6 claimed that it draws 15% less power than the same GDDR5 setup. What always felt ambiguous to me is if the 15% less power consumption number is before or after GDDR6 is ramped up to 14 or 16 Gb/s. From what I know about GDDR6, it doubles its' bandwidth mainly because of QDR, that's how a 3000MHz GDDR6 chip is 12Gb/s while a 3000MHz GDDR5 is a 6Gb/s chip.

So in theory, because the Xbox One X uses 12 GDDR5 chips running at 6.8Gb/s on a 384-bit bus, an Anaconda with 12 GDDR6 chips running at 13.6Gb/s (underclocked 14Gb/s chips?) on the same 384-bit bus should provide 632.4GB/s bandwidth while drawing ~15% less power. I mean, I might have gotten everything totally wrong, but that's what every GDDR6 story that I've ever read has alluded to.

https://www.anandtech.com/show/13832/amd-radeon-vii-high-end-7nm-february-7th-for-699

AMD are the stupidiest people in the world paid 320 dollars for HBM2 16 GBits GDDR6 with bus 256 bits or a custom bus would have done better.... Only 36 Gbit/s between the two solutions and 5 watts... So stupid...
Vega 64 and 56 predate GDDR6 and Radeon VII was a rush job from a server card, I don't think that we've seen a real AMD card in the market while GDDR6 was available. Navi is rumored to use GDDR6, not HBM. In Vega 56 and 64 it made sense, they had to go above 400GB/s and no way in hell that you can get there with GDDR5 with GCN's crazy power draw.
 
Last edited:

chris 1515

Member
Oct 27, 2017
1,944
Barcelona Spain
I think that it will be easier to calcul

If the X has 12 GDDR5 chips in a 384-bit setup, shouldn't GDDR6 with the same setup actually draw less power than the X's setup? Every article I've read about GDDR6 claimed that it draws 15% less power than the same GDDR5 setup. What always felt ambiguous to me is if the 15% less power consumption number is before or after GDDR6 is ramped up to 14 or 16 Gb/s. From what I know about GDDR6, it doubles its' bandwidth mainly because of QDR, that's how a 3000MHz GDDR6 chip is 12Gb/s while a 3000MHz GDDR5 is a 6Gb/s chip.

So in theory, because the Xbox One X uses 12 GDDR5 chips running at 6.8Gb/s on a 372-bit bus, an Anaconda with 12 GDDR6 chips running at 13.6Gb/s (underclocked 14Gb/s chips?) on the same 372-bit bus should provide 632.4GB/s bandwidth while drawing ~15% less power. I mean, I might have gotten everything totally wrong, but that's what every GDDR6 story that I've ever read has alluded to.


Vega 64 and 56 predate GDDR6 and Radeon VII was a rush job from a server card, I don't think that we've seen a real AMD card in the market while GDDR6 was available. Navi is rumored to use GDDR6, not HBM. In Vega 56 and 64 it made sense, they had to go above 400GB/s and no way in hell that you can get there with GDDR5 with GCN's crazy power draw.
I asked some realworld measurement to a youtuber who can mesure the difference with some card and do the maths. Because it looks like all datacenter engineer and the rambus guy looks stupid.
 

Trieu

Member
Feb 22, 2019
554

DrKeo

Member
Mar 3, 2019
479
Israel
I asked some realworld measurement to a youtuber who can mesure the difference with some card and do the maths. Because it looks like all datacenter engineer and the rambus guy looks stupid.
HBM2 can get to over 1TB/s with less power draw than most GDDR6 setups. If you want to go there with GDDR6, you will need loads of chips and very high clock speed so yeah, HBM2 makes sense for servers. if ~700GB/s is good enough for your product, GDDR is probably a better choice. Two years ago GDDR5's bandwidth was just too low for Vega, but GDDR6 is fast enough for it. I'm guessing that's why Navi will use GDDR6. Someday HMB will be cheaper and the whole GPU indestry will move to HBM but we will probably have to wait until a pretty mature point in HBM3's life cycle to get there so it will take some time.

GDDR6 doubled GDDR5's bandwidth using QDR so it can get double the performance for the same clock speed and 15% less power draw. It doesn't mean that GDDR6 can suddenly go to 1TB/s+ level of bandwidth because its' clocks will probably be too high, but HBM can.
 
Last edited:

chris 1515

Member
Oct 27, 2017
1,944
Barcelona Spain
yes a fortune 500 company are the stupidiest people in the world.

... thats just not how it works.
Like I said I am suspicious. I asked realworld mesurement to a guy with a Titan V, a Vega 7 and able to do the measurement for 16 Gb 3200 of DDR4. All with memory controller. I hope the guy will answer he can help me and not give only estimation.


HBM2 can get to over 1TB/s with less power draw than any GDDR6 setup. If you want to go there with GDDR6, you will need loads of chips and very high clock speed so yeah, HBM2 makes sense for servers. if ~700GB/s is good enough for your product, GDDR is probably a better choice. But someday HMB will be cheaper anyway, we will probably have to wait until a pretty mature point in HBM3's life cycle to get there so it will take some time.
Here it is low speed HBM2...
 

Trieu

Member
Feb 22, 2019
554
Like I said I am suspicious. I asked realworld mesurement to a guy with a Titan V, a Vega 7 and able to do the measurement for 16 Gb 3200 of DDR4. All with memory controller. I hope the guy will answer he can help me and not give only estimation.
We are derailing the thread.
It doesn't matter, because AMD couldn't have done the Vega VII, like you brilliantly suggested, with 16GB GDDR6, because it was based on the MI60 workstation card which was released earlier and came with HBM2 and wasn't meant for gaming.

Also what has 16GB of 3200 DDR4 to do with any of it? Are you even aware that VRAM and RAM are different?
 

chris 1515

Member
Oct 27, 2017
1,944
Barcelona Spain
We are derailing the thread.
It doesn't matter, because AMD couldn't have done the Vega VII, like you brilliantly suggested, with 16GB GDDR6, because it was based on the MI60 workstation card which was released earlier and came with HBM2 and wasn't meant for gaming.

Also what has 16GB of 3200 DDR4 to do with any of it? Are you even aware that VRAM and RAM are different?
It is for one PS5 rumors the 8Gb HBM2 and 16Gb DDR4 with HBCC rumors, compare to 16 Gb/s of GDDR6 256 bits with 14 Gbps module and 24 Gb/s 384 bits with 14 Gbps module.
 

DrKeo

Member
Mar 3, 2019
479
Israel
Like I said I am suspicious. I asked realworld mesurement to a guy with a Titan V, a Vega 7 and able to do the measurement for 16 Gb 3200 of DDR4. All with memory controller. I hope the guy will answer he can help me and not give only estimation.
Let's hope he can get us real numbers.

Here it is low speed HBM2...
The low-speed HBM2 thing is just a rumor. We have no idea what kind of impact this action will cause and by how much the price will drop if it will happen. Considering that HBM2 is ~x2 the price of GDDR6, the price will have to drop by a huge margin. I'm not saying that it's impossible, but it does seem a bit extreme.
 

DrKeo

Member
Mar 3, 2019
479
Israel
It is for one PS5 rumors the 8Gb HBM2 and 16Gb DDR4 with HBCC rumors, compare to 16 Gb/s of GDDR6 256 bits with 14 Gbps module and 24 Gb/s 384 bits with 14 Gbps module.
X had 12 GDDR5 chips running @3400MHz. If Anaconda has 12 GDDR6 chips running @3400MHz it will have 24GB with a 652.8GB/s bandwidth while drawing 15% less power than the X. At least that's what the GDDR6 PR claims.
 

cooldawn

Member
Oct 28, 2017
674
Data rate on a per-chip basis has to do with the clock rate of each chip used. That 8 times discrepancy you see is reflected in the base clock that each memory format runs at. so something like 500mhz frHBM but 4000Mhz for GDDR6. That is also why GDDR6 needs more power to drive it.

Why you end up with a lot more bandwidth with HBM is because the interface bus is much much bigger. Each HBM chip has 2 x 128bit channels. Each chip of GDDR6 has the only 32bit. So with a 4hi stack of HBM you are looking at 256bit x 4 = 1024bit bus pr stack and thee are typically at least 2 stacks.

For GDDR6, the more chips that make up the configuration, the higher your bs width becomes. So just multiply the number of chips used by 32 and you get your bus size. With HBM, first find out the number of chips in a stack (usually there are 4) then multiply that by 256bit to get the bus width of each stack, then multiply that by the number of stacks present to get total bus width.
Sorry it's been a few days to catch up again.

Gotcha. Thank you for the clarification. It's an excellent post because it all makes sense now.

New Redit Leak

PS5 Details and Launch
  • $449/€449/£399.
  • ~12-13TF.
  • 24GB RAM.
  • 2TB storage (SSD + custom).
  • BC with all previous Playstations.
  • Minor changes to DS5, also has IR camera.
  • Launches worldwide September 2020.
  • PSVR2 coming later.

  • Big first party year 1 games are Gran Turismo and Horizon 2. GT will launch with the console. There's other titles for year 1, but these are the two "big" ones at the reveal event.
  • Japan Studio will be at launch with a new IP, will also be shown off at event, but they have begun work on a AAA title too.
  • San Diego is the only other first party studio planned to appear at the event.

  • No plans for any upcoming PS4 first party titles to be cross gen at release.
  • All of Ghost, TLOU2, and Death Stranding are planned to release by mid-2020 as PS4 games.
  • TLOU2 late 2019, Ghost early 2020, and Death Stranding after that, barring any delays.
  • No PS5 versions are decided at the moment. Focus is on delivering the PS4 versions. Discussions about how to proceed with hypothetical PS5 versions - if you launch a significantly upgraded PS5 game for retail/digital in year 1, then letting PS4 owners pay a small upgrade fee since the PS4 disc is usable and playable on PS5 anyway seems to be favored route. Alternatively, you just play the normal PS4 disc on the PS5.

  • Square Enix will be at reveal event with Luminous Production new IP (RPG) and Final Fantasy 16.
  • FF16 was originally planned as a current gen title but due to the screw up with FF7 Remake which delayed it from the planned release of 2018, it was moved to next gen.
  • FF7 Remake is not a PS5 release. Current gen only and no plans for a PS5 version right now.
  • LP is also making a FF15 PS5 Edition for launch.
  • DQXI with added content will be on PS5.
  • Capcom, Warner Bros, and Rockstar (RDR2 next gen) will be at the reveal event.

Those are the only things confirmed for now as planning is underway.
Considering they name Gran Turismo as a year one title, I'd call this one 'hard to believe'.

Gran Turismo is known for pushing technology but it's not known for being 'to market' in reasonable time. Having a Gran Turismo made available as soon as possible makes sense because it'll showcase PlayStation 5 but I can't see it happening...unless what this leak actually means is a 'slice' of the full game, a bit of a taster with all the technology in place.

Navi will be more efficient, giving more graphical power for the same number of watts. But Vega will apparently continue as AMD's flagship gaming card, with Navi launching as a medium tier. (I don't know specifically why that is, but multiple sources say so.) Microsoft want to be undisputed performance leader, so that might make Vega attractive to them. Also, Vega features effectual FP64 calculation, which is rare for gaming but useful for machine learning AI and other advanced tasks. And Microsoft have officially said they want to use the same hardware in the next Xbox and their gaming servers.

It's important to note that this imagined scenario where Sony use Navi and Microsoft use Vega wouldn't have an inevitable winner. It could go all sorts of ways. Maybe Microsoft does get the highest TF card, and their Xbox One X experience lets them cool it effectively without much extra cost. There's rumors that Navi isn't as effective or power-thrifty as meant to be, and maybe PS5 could end up needing extra cooling redesign and downclocks at the last minute, which make it just as expensive as Xbox but notably less powerful.

Or, maybe Vega has higher TF but a lack of efficiency, and the usage of chip space for new server-specific features makes the next Xbox less performant than expected. Meanwhile, a careful Navi design with smartly customized gaming-relevant features allows PS5 to produce graphics basically indistinguishable from Anaconda, but at a much lower price.

In other words, neither choice is necessarily bad. Both have potential drawbacks, but both also have potential advantages for what the different platforms hope to accomplish. And no matter what they design for, some tech bets simply don't pan out.
Another 'Thanks' for the explanation.

A simply clean overview of what the differences could be.
 

Silenti

Member
May 29, 2018
53
I dont have it on me right now but its basically a memory setup of 8gb of hbm and 16gb of ddr4, 4gb of the ddr4 is used for the OS. The remaining memory is shared between the CPU and GPU and with the use of HBCC it appears to the devs as 20gb of memory. There are a couple of advantages to this, the first is that the ps4 had the issue where the CPU took a disproportionate chunk of the bandwidth, and this split memory setup. Second the HBM will use significantly less TDP than what a GDDR6 would, meaning you can push higher GPU clock speed. Another point is the HBM prices are expected to go down faster than GDDR6 so it would be more cost efficient in the future.
Wouldn't this leak and the specific RAM leak conflict with each other? This leak says 4GB reserved for OS. The remaining 20 seen as a single pool for game use. But there was another leak that was broken down to mean that the SSD tables would be stored in main memory rather than using an expensive existing NVME controller? Which alone would take up 3 GB at a minimum. I may be getting the specific leaks confused.
 

Flutter

Member
Oct 25, 2017
4,651
Doesn't the devkit leak on reddit correlate with the one in 4chan?

Reddit:
  • monolithic die ~22.4mm by ~14.1mm
  • 16 Samsung K4ZAF325BM-HC18 in clamshell configuration
  • memory vrm seems like overkill with multiple Fairchild/ON Semiconductor FDMF3170 power stages controlled by an MP2888 from MPS
  • 3 Samsung K4AAG085WB-MCRC, 2 of those close to the NAND acting as DRAM cache (unusual 2GB DRAM per 1 TB NAND)
  • 4 NAND packages soldered to the PCB which are TH58LJT2T24BAEG from Toshiba
  • PS5016-E16 from Phison
4chan:
Hmm, noticed on the Beyond3D forums that there's a newer 4Chan "leak" about the PS5 devkit. Anyone care to give it a quick look to see if it's even remotely plausible?

Big PC towers.
Pretty loud, fans seem to be running at max rpm all the time. Bug?
GPU dump has memory at 18432mb, bandwidth 733GB/s, core clock at 1850mhz.
CPU shows up as Zen 7. According to docs only the GPU on the SOC is being used in this iteration of the devkit.
64GB of system ram.
4TB SSD

Source

Yeah, dunno what to make of that "Zen 7" bit, it seems to be the most glaring oddity
16GB GDDR6 RAM + 2GB DDR4 (the other 4GB is used as DRAM cache)
 

DrKeo

Member
Mar 3, 2019
479
Israel
If the guys answer positively. It will not be estimation but some realworld value...
It would be nice if he can check GDDR6 vs the same setup GDRR5 with the same clock speed. I have a pretty shitty motherboard and no diagnostic tools but I do have a 2080RTX so if you know of any way for me to check the isolated power draw of the memory alone then I can do that.
 

ArmGunar

Member
Oct 30, 2017
2,304
With that new thread about SSD, I wondered, what does Sony want to make with their ultra reduced load times, it’s something which works with every device with a SSD or it’s something achievable thanks to their custom SSD/architecture ?
 

Deleted member 56784

User requested account closure
Banned
May 16, 2019
68
Random question: Does the superfast SSD solution have any impact on the perception of speed in an new gran turismo, like seen in the spiderman demo? Or is this something which is determined by other factors. Framerate, graphic effects, general speed of cars?
I mean Gran Turismo 7 with 4k/60fps RT reflections and a new kind of speed perception thanks to the ssd would be a killer showcase.
And what would it mean in comparison to Forza if playground also need to consider a pc version? Could we see some drastic differences?
 

Pheonix

Member
Dec 14, 2018
975
St Kitts
So 40cu x1850 = 9.47tf
where are you getting 40U from?
Random question: Does the superfast SSD solution have any impact on the perception of speed in an new gran turismo, like seen in the spiderman demo? Or is this something which is determined by other factors. Framerate, graphic effects, general speed of cars?
I mean Gran Turismo 7 with 4k/60fps RT reflections and a new kind of speed perception thanks to the ssd would be a killer showcase.
And what would it mean in comparison to Forza if playground also need to consider a pc version? Could we see some drastic differences?
Its stuff like that that represents one of the benefits to having an SSD set up like that. So yes. There is a but though, but let's just leave it as yes for simplicity sake.
 

ArmGunar

Member
Oct 30, 2017
2,304
Probably just way better, that's all.
Ok thanks !

I know we don't know anything yet but just for an example of what we could expect from this, if we took the example of Spider-Man :
- PS4 : 8 sec
- PS5 : 0.8 sec
- Device with a SSD natively : 3-4 sec ?

With a device including a SSD in this example, we can expect 3-4 sec load times or less than that (or more) ?
 
Oct 27, 2017
2,847
Somewhere South
Random question: Does the superfast SSD solution have any impact on the perception of speed in an new gran turismo, like seen in the spiderman demo?
The "perception of speed" in the Spiderman demo is more about being able to load assets so fast that you can actually travel at much higher speeds, streaming new areas in, without hiccups. AFAIK, that's not how closed circuit games usually do it - they usually load entire levels to memory, what isn't possible with open world games.

That said, if GT where to implement streaming, they'd be able to have much richer, densely packed tracks, with a huge variety of assets, as they wouldn't be limited to what they can fit into the memory.
 
Status
Not open for further replies.