• Ever wanted an RSS feed of all your favorite gaming news sites? Go check out our new Gaming Headlines feed! Read more about it here.

gofreak

Member
Oct 26, 2017
7,733
This will be one for people interested in some potentially more technical speculation. I posted in the next-gen speculation thread, but was encouraged to spin it off into its own thread.

I did some patent diving to see if I could dig up any likely candidates for what Sony's SSD solution might be.

I found several Japanese SIE patents from Saito Hideyuki along with a single (combined?) US application that appear to be relevant.

The patents were filed across 2015 and 2016.

Caveat: This is an illustrative embodiment in a patent application. i.e. Maybe parts of it will make it into a product, maybe all of it, maybe none of it. Approach it speculatively.

That said, it perhaps gives an idea of what Sony has been researching. And does seem in line with what Cerny talked about in terms of customisations across the stack to optimise performance.

http://www.freepatentsonline.com/y2017/0097897.html

There's quite a lot going on, but to try and break it down:

It talks about the limitations of simply using a SSD 'as is' in a games system, and a set of hardware and software stack changes to improve performance.

Basically, 'as is', an OS uses a virtual file system, designed to virtualise a host of different I/O devices with different characteristics. Various tasks of this file system typically run on the CPU - e.g. traversing file metadata, data tamper checks, data decryption, data decompression. This processing, and interruptions on the CPU, can become a bottleneck to data transfer rates from an SSD, particularly in certain contexts e.g. opening a large number of small files.

At a lower level, SSDs typically employ a data block size aimed at generic use. They distribute blocks of data around the NAND memory to distribute wear. In order to find a file, the memory controller in the SSD has to translate a request to the physical addresses of the data blocks using a look-up table. In a regular SSD, the typical data block size might require a look-up table 1GB in size for a 1TB SSD. A SSD might typically use DRAM to cache that lookup table - so the memory controller consults DRAM before being able to retrieve the data. The patent describes this as another potential bottleneck.

Here are the hardware changes the patent proposes vs a 'typical' SSD system:

- SRAM instead of DRAM inside the SSD for lower latency and higher throughput access between the flash memory controller and the address lookup data. The patent proposes using a coarser granularity of data access for data that is written once, and not re-written - e.g. game install data. This larger block size can allow for address lookup tables as small as 32KB, instead of 1GB. Data read by the memory controller can also be buffered in SRAM for ECC checks instead of DRAM (because of changes made further up the stack, described later). The patent also notes that by ditching DRAM, reduced complexity and cost may be possible, and cost will scale better with larger SSDs that would otherwise need e.g. 2GB of DRAM for 2TB of storage, and so on.

- The SSD's read unit is 'expanded and unified' for efficient read operations.

- A secondary CPU, a DMAC, and a hardware accelerator for decoding, tamper checking and decompression.

- The main CPU, the secondary CPU, the system memory controller and the IO bus are connected by a coherent bus. The patent notes that the secondary CPU can be different in instruction set etc. from the main CPU, as long as they use the same page size and are connected by a coherent bus.

- The hardware accelerator and the IO controller are connected to the IO bus.

An illustrative diagram of the system:

uS6bo2P.png




At a software level, the system adds a new file system, the 'File Archive API', designed primarily for write-once data like game installs. Unlike a more generic virtual file system, it's optimised for NAND data access. It sits at the interface between the application and the NAND drivers, and the hardware accelerator drivers.

The secondary CPU handles a priority on access to the SSD. When read requests are made through the File Archive API, all other read and write requests can be prohibited to maximise read throughput.

When a read request is made by the main CPU, it sends it to the secondary CPU, which splits the request into a larger number of small data accesses. It does this for two reasons - to maximise parallel use of the NAND devices and channels (the 'expanded read unit'), and to make blocks small enough to be buffered and checked inside the SSD SRAM. The metadata the secondary CPU needs to traverse is much simpler (and thus faster to process) than under a typical virtual file system.

The NAND memory controller can be flexible about what granularity of data it uses - for data requests send through the File Archive API, it uses granularities that allow the address lookup table to be stored entirely in SRAM for minimal bottlenecking. Other granularities can be used for data that needs to be rewritten more often - user save data for example. In these cases, the SRAM partially caches the lookup tables.

When the SSD has checked its retrieved data, it's sent from SSD SRAM to kernel memory in the system RAM. The hardware accelerator then uses a DMAC to read that data, do its processing, and then write it back to user memory in system RAM. The coordination of this happens with signals between the components, and not involving the main CPU. The main CPU is then finally signalled when data is ready, but is uninvolved until that point.

A diagram illustrating data flow:

yUFsoEN.png


Interestingly, for a patent, it describes in some detail the processing targets required of these various components in order to meet certain data transfer rates - what you would need in terms of timings from each of the secondary CPU, the memory controller and the hardware accelerator in order for them not to be a bottleneck on the NAND data speeds:

CYH6AMw.png


Though I wouldn't read too much into this, in most examples it talks about what you would need to support a end-to-end transfer rate of 10GB/s.

The patent is also silent on what exactly the IO bus would be - that obviously be a key bottleneck itself on transfer rates out of the NAND devices. Until we know what that is, it's hard to know what the upper end on the transfer rates could be, but it seems a host of customisations are possible to try to maximise whatever that bus will support.

Once again, this is one described embodiment. Not necessarily what the PS5 solution will look exactly like. But it is an idea of what Sony's been researching in how to customise a SSD and software stack for faster read throughput for installed game data.
 
OP
OP
gofreak

gofreak

Member
Oct 26, 2017
7,733
Could any of this be useful for PC SSD's? What would the drawback be?

The use of SRAM is very much aimed at data access that is write-once, read-many. Which dovetails nicely with game installs - but would not necessarily suit general applications. A PC has to be ready for anything.

Some of the other proposals could be done - secondary CPUs and accelerators - but would need OS support.
 

melodiousmowl

Member
Jan 14, 2018
3,774
CT
Great research. Thank you! Might need a TLDR/TLDU summary though...

for me

if i'm reading this right, the tldr is it's a patent to make ssds tailored to the workload a video game would need vs generic io.


edit: for me, this kind of misses the most exciting thing ssds could be used for: nearline ram - there's lots of data that could potentially be moved from main ram pool into something slower. don't know if there's a practical application
 
Last edited:

modiz

Member
Oct 8, 2018
17,808
Great job on finding and compiling all the patent op. This patent is very interesting.
 

Aztechnology

Community Resettler
Avenger
Oct 25, 2017
14,131
Sounds kind of similiar to the stuff anout Unified Memory Architectures I remember reading about back in like 2013-14 but with a secondary processing unit that's not the GPU but SSD and processor. But I also can barely follow what in reading here in reality. So you tell me.
 
OP
OP
gofreak

gofreak

Member
Oct 26, 2017
7,733
Great research. Thank you! Might need a TLDR/TLDU summary though...

for me

The TLDR is

- some hardware changes vs the typical inside the SSD (SRAM for housekeeping and data buffering instead of DRAM)
- some extra hardware and accelerators in the system for handling file IO tasks independent of the main CPU
- at the OS layer, a second file system customised for these changes

all primarily aimed at higher read performance and removing potential bottlenecks for data that is written less often than it is read, like data installed from a game disc or download.
 
Nov 5, 2017
240
Thanks for the write up, very interesting. NAND has way more potential than we can access right now with nvme and PCIe 3.0.
 

Andromeda

Member
Oct 27, 2017
4,839
Based on this:

- There would be no need for GPU decompression as the custom SSD has its own hardware accelerator.
- The solution means PS4 games via BC could all benefit from the fast loadings by default: all PS4 games on PS5 could have super quick loadings like shown in the Spiderman demo.
 
Last edited:

Gamer17

Banned
Oct 30, 2017
9,399
Based on this:

- There would be no need for GPU decompression as the custom SSD has its own hardware accelerator.
- The solution mean PS4 games via BC could all benefit from the fast loadings by default: all PS4 games on PS5 could have super quick loadings like shown in the Spiderman demo.
If ps4 games don't need patch to use this SSD it will be major

Imagine okaying Witcher 3 without worrying about dying and waiting to load for 45 second
 

Yerffej

Prophet of Regret
Member
Oct 25, 2017
23,448
Complicated but interesting. Thanks for that.

Does this leave an ounce of space for a packet of secret sauce somewhere?
 

Carn

Member
Oct 27, 2017
11,905
The Netherlands
The explanation was succinct and doesn't ramble on, and doesn't contain 10 paragraphs of nonsensical speculation. I can actually understand what is being said here; with Rigby threads you're trying to grasp the mental model of the thread creator and why he thinks the way he does.

haha, well put. I do kinda miss him tho. I know he posted a while on a different account on ERA I think?
 

Thraktor

Member
Oct 25, 2017
570
Thanks for compiling this, it's very interesting. I had speculated that they may use Optane (along with a HDD), as it sidesteps a lot of these issues by being directly RAM-addressable (plus just has much lower latency than NAND), but there's quite a bit to be gained by optimising the SSD controller for a games console, and at a much lower price per GB than Optane.

The patent is also silent on what exactly the IO bus would be - that obviously be a key bottleneck itself on transfer rates out of the NAND devices. Until we know what that is, it's hard to know what the upper end on the transfer rates could be, but it seems a host of customisations are possible to try to maximise whatever that bus will support.

The advantage of designing your own SoC and SSD is that you can use whatever interface you like. PCIe 4 is a possibility, but they could create their own interface based on PCIe/M-PHY/etc if they felt they need the bandwidth. Hell, they could even stack the SSD on top of the SoC and use TSVs if they wanted to go nuts with bandwidth (and considerably increase the price of the console in the process).
 

modiz

Member
Oct 8, 2018
17,808
I want to know how the thing is going to work when you play from an external regular hdd.
my guess is that the most recently played games will be saved on the SSD and if you want to play something new and not enough space in SDD then the least recently played title in the SSD will be swapped out.
 
OP
OP
gofreak

gofreak

Member
Oct 26, 2017
7,733
So, this mean custom ssd so you can't just buy pc ssd and put it on ps5? I hope it's not too expensive...

If they went with this exactly as described, I think probably yes.

Based on this:

- There would be no need for GPU decompression as the custom SSD has its own hardware accelerator.
- The solution means PS4 games via BC could all benefit from the fast loadings by default: all PS4 games on PS5 could have super quick loadings like shown in the Spiderman demo.

The 'hardware accelerator' could actually be logic on the GPU die designed for these tasks. It's not really prescriptive about where that hardware would live, as long as it's connected up correctly to the IO bus and controller.

Your second point, yeah - as long as the PS4 environment can make use of the 'File Archive API' I guess. If applications have to specify their use of that, then legacy games might not be able to take quite as full advantage.
 

Thatguy

Banned
Oct 27, 2017
6,207
Seattle WA
The TLDR is

- some hardware changes vs the typical inside the SSD (SRAM for housekeeping and data buffering instead of DRAM)
- some extra hardware and accelerators in the system for handling file IO tasks independent of the main CPU
- at the OS layer, a second file system customised for these changes

all primarily aimed at higher read performance and removing potential bottlenecks for data that is written less often than it is read, like data installed from a game disc or download.
So decreased load times and world render times? I guess my biggest question is; will it enable games that are not possible on other platforms including streaming platforms?
 

RevengeTaken

Banned
Aug 12, 2018
1,711
Based on this:

- There would be no need for GPU decompression as the custom SSD has its own hardware accelerator.
- The solution means PS4 games via BC could all benefit from the fast loadings by default: all PS4 games on PS5 could have super quick loadings like shown in the Spiderman demo.
So the GPU decompression has been told by hmqqg was bullshit, right?
 

melodiousmowl

Member
Jan 14, 2018
3,774
CT
also, keep in mind that sata 3 at max in a ps4 was ~500mb/s real world(with an appropriate ssd), much lower depending on the workload. spinning hard drives probably topped out at 120mb/s but usually sustain around 50.

top tier nvme devices saturate their busses (i believe this is the case at least, or it's pretty close ) at ~3500mb/s

and this patent addresses decompression which was a lot of the loading time waiting for the cpu to crunch all the data off your had going into ram
 

modiz

Member
Oct 8, 2018
17,808
also, keep in mind that sata 3 at max in a ps4 was ~500mb/s real world, much lower depending on the workload. spinning hard drives probably topped out at 120mb/s but usually sustain around 50.

top tier nvme devices saturate their busses (i believe this is the case at least, or it's pretty close ) at ~3500mb/s
Also because the HDD was swappable developers needed to design their games with abysmally low "minimum spec" (see spiderman technical postmortem GDC video) so technically the jump in bandwidth is significantly bigger
 
OP
OP
gofreak

gofreak

Member
Oct 26, 2017
7,733
So the GPU decompression has been told by hmqqg was bullshit, right?


Hmm, no, not necessarily. This is just a patent. Research done 2/3 years ago. Things can be different and can evolve. What's described in a patent as 'a hardware accelerator' could in the end wind up being a set of accelerators in the different places. Or those tasks might even wind up not being hardware accelerated at all.

Don't take this as a definitive specification of what is going on in the PS5. Take it as a look at some of the things Sony has researched, things that might be involved in their customisation of the PS5 solution and how they might be achieving some of what Cerny has been claiming about its read throughput.

So decreased load times and world render times? I guess my biggest question is; will it enable games that are not possible on other platforms including streaming platforms?

Yes on the first bit.

On the second question, it depends entirely on what's in other hardware. Sony seems like it might go to great lengths to be try to maximise whatever they're using. But another system could come along and, though less customised, still match or beat its performance through purely higher raw bandwidths or the like. We'll have to see how Sony's solution ends up comparing in absolute terms.
 

Corralx

Member
Aug 23, 2018
1,176
London, UK
Based on this:

- There would be no need for GPU decompression as the custom SSD has its own hardware accelerator.
- The solution means PS4 games via BC could all benefit from the fast loadings by default: all PS4 games on PS5 could have super quick loadings like shown in the Spiderman demo.

What GPU decompression are you talking about? Assets are usually decompressed on the CPU (if needed), or not decompressed at all if you're using a format the GPU can understand.

Also, PS4 would not benefit from that because you need to use the new File Archive API. The system cannot distinguish between a normal read and a read to immutable data (otherwise there would be no need for an additional API in the first place).
 

stryke

Member
Oct 28, 2017
2,347
Thanks for insight.

Going to miss being able to swap out internal storage if a lot of this pans out.
 

melodiousmowl

Member
Jan 14, 2018
3,774
CT
Also because the HDD was swappable developers needed to design their games with abysmally low "minimum spec" (see spiderman technical postmortem GDC video) so technically the jump in bandwidth is significantly bigger

Yeah, well, to a point - remember, a LOT of loading was/is CPU bound, so at some point you hit a limit before you can fully utilize the SSD.

I think you can find me on a past thread on here saying that nvme was inevitable for the next round of consoles because of the minimum spec problem.

BUT, heres the rub: if this patent is implemented, theres a very low chance you will be able to put in your own drive. Either there will be hardware on the SSD -or- SSDs will have to meet a minimum spec and that would be a support nightmare (nvme ranges from 1000-3500 now, and read and write speeds can vary wildly)

edit: this is making me think the storage might be soldered on, but hopefully theres room to add your own
 
Last edited:
Oct 27, 2017
3,583
It makes sense. SSDs right now are all about trade-offs in terms of capacity, price, max theoretical speed, and real-world speed. For a console with a very specific use-case (that can be enforced through software), Sony can make different trade-offs to a device that would be used in a PC, and they'll no doubt be able to get a bunch more performance out of it.

Also, PS4 would not benefit from that because you need to use the new File Archive API. The system cannot distinguish between a normal read and a read to immutable data (otherwise there would be no need for an additional API in the first place).

I don't see why the cache of the disc contents on the hard drive couldn't use this accelerated method.
 
Nov 30, 2018
2,078
The explanation was succinct and doesn't ramble on, and doesn't contain 10 paragraphs of nonsensical speculation. I can actually understand what is being said here; with Rigby threads you're trying to grasp the mental model of the thread creator and why he thinks the way he does.

Still waiting for that 4K Blu-ray update for PS4
 
OP
OP
gofreak

gofreak

Member
Oct 26, 2017
7,733
I don't see why the cache of the disc contents on the hard drive couldn't use this accelerated method.

I suppose it depends on the PS4 file IO API. If the runtime environment 'knows' this is a file operation suited to the new API, it could map it to that without the application knowing or caring. I guess.
 

Corralx

Member
Aug 23, 2018
1,176
London, UK
It makes sense. SSDs right now are all about trade-offs in terms of capacity, price, max theoretical speed, and real-world speed. For a console with a very specific use-case (that can be enforced through software), Sony can make different trade-offs to a device that would be used in a PC, and they'll no doubt be able to get a bunch more performance out of it.



I don't see why the cache of the disc contents on the hard drive couldn't use this accelerated method.

That bit would ofc work, just as games getting better loading time by default because of the SSD vs mechanical drive. I was talking about the more complex system + new file API described in the OP. That requires some support by the developer.
 

Deleted member 17952

User requested account closure
Banned
Oct 27, 2017
1,980
What GPU decompression are you talking about? Assets are usually decompressed on the CPU (if needed), or not decompressed at all if you're using a format the GPU can understand.

Also, PS4 would not benefit from that because you need to use the new File Archive API. The system cannot distinguish between a normal read and a read to immutable data (otherwise there would be no need for an additional API in the first place).
Wouldn't the PS4 emulator automatically detect installation/ROM as immutable data?
 

Thera

Banned
Feb 28, 2019
12,876
France
- The solution means PS4 games via BC could all benefit from the fast loadings by default: all PS4 games on PS5 could have super quick loadings like shown in the Spiderman demo.
It depends on how the BC is done. If it is some sort of virtual machine, it will keep the file system as it is on actual PS4.
So you will have faster loading because it is on a crazy fast SSD, but you will have bottleneck earlier in the process. It still could be a factor 3, 4 or 5.
Don't forget that most recent games hide loading times by putting slow sequence (climbing, for exemple) or other artifice (open a heavy door). So it won't be a killer thing for that.

Like Cerny said, the improvement will be on data streaming. Imagine an open-world where you have control of the period the world is, and you do that by a simple button press, and 1 second later, the world you are is 50 years later (or before).
You can imagine some portal ala Dr Strange, where you go from world to world in one second (the time required to make the portal).

It is really a major change for console gaming, something we didn't have since a long time. For the first time, the PC will be the botlleneck for game dev.
 

Fafalada

Member
Oct 27, 2017
3,065
Hmm, no, not necessarily. This is just a patent. Research done 2/3 years ago. Things can be different and can evolve. What's described in a patent as 'a hardware accelerator' could in the end wind up being a set of accelerators in the different places. Or those tasks might even wind up not being hardware accelerated at all.
Worth noting that PS4/XB CPUs have built in LZ class compression support and many software stacks don't use it at all, and on PS3 we had 'hardware accelerated' LZ via SPEs and again - only a subset of software actually utilized that.

I'd argue that 'catch-all' approach to compression isn't actually all that effective if your aim is to minimize data-transfers/storage use - more bespoke solutions can still be orders of magnitude better (any form of procedural generation is a class of compression, for instance).
 
Oct 25, 2017
1,205
So, this mean custom ssd so you can't just buy pc ssd and put it on ps5? I hope it's not too expensive...

I'm thinking it'll be a bit of a hybrid solution in that regard, much like most modern PCs. Long term big ass spinning disc storage that's user-changeable, and you can "Install" your most recently used 2 or 3 games at a time to the superfast SSD. so initial install time much like how games install on a PS4 nowadays, but then once you're in, Boom, no loading times ever.
 

bombshell

Banned
Oct 27, 2017
2,927
Denmark
gofreak Thanks for the insight, nothing in that seems too good to be true. Exciting times ahead.

I want to know how the thing is going to work when you play from an external regular hdd.
My bet is playing games from external HDD will not be allowed, but it could be allowed as a data storage from where you can copy game installs to the internal SSD.
With the way Mark Cerny decided to highlight how the bandwidth bottleneck of HDD's meant a strict limit on the maximum possible webswinging speed in Spider-Man, then allowing playing from an HDD on PS5 would not just mean that you would get longer load times, but would have direct impact on the gameplay design of PS5 games. So you can't get the game design evolutions that Cerny talked about if you allow playing from a slow HDD.
 

chris 1515

Member
Oct 27, 2017
7,074
Barcelona Spain
also, keep in mind that sata 3 at max in a ps4 was ~500mb/s real world(with an appropriate ssd), much lower depending on the workload. spinning hard drives probably topped out at 120mb/s but usually sustain around 50.

top tier nvme devices saturate their busses (i believe this is the case at least, or it's pretty close ) at ~3500mb/s

and this patent addresses decompression which was a lot of the loading time waiting for the cpu to crunch all the data off your had going into ram

And no use of the GPU for decompression...