That's not true. Titan Volta is 815 mm^2, 2080 Ti is 754 mm^2. The addition of the RT cores didn't make the chips bigger. It's the Tensor Cores. So if the Switch Pro has Tensor Cores it costs nothing to have RT cores as well.
It should be noted that the "40% performance boost" is from rendering at a lower resolution, as DLSS is an upscaling technique. The fact that they call it super-sampling is marketing nonsense, as it's actually doing exactly the opposite of super-sampling (which renders at a higher resolution and downscales).
That said, it should be quite a good upscaling technique, which could be useful to Nintendo if they want to be able to run third-party games at a lower internal resolution than PS4/XBO. There are also plenty of other potential uses for neural-net accelerators in games, as I've discussed before, but I doubt Nintendo could squeeze it in their silicon budget here.
I've been thinking about that Foxconn leak, and although it definitely wasn't a near-final devkit of any specific piece of hardware (as it's almost two years old, so such a device would have released by now), it might make sense as an early catch-all devkit for any future Switch devices.
Pre-launch of Switch, Nintendo's hardware planning would probably have been in flux, with future development heavily dependent on how the market reacted to Switch. They may have planned for new hardware in 2019, but with a variety of different form factors and performance levels considered. If people just weren't interested in the hybrid approach they could release distinct portable and home consoles again (while maintaining the same library), or if the hybrid approach worked well, but the system didn't appeal next to the more powerful PS4/XBO they could release a more powerful Switch, etc. etc. The choice of ARM CPU and Nvidia GPU gave them a lot of flexibility in this regards, as they can release anything from a Vita-sized handheld to an XB1X-level home console and maintain binary compatibility with Switch games.
If they wanted to develop games with that future hardware in mind, then a devkit like the one described in the Foxconn leak makes sense. The GP106 could be configured to match pretty much any possible new piece of hardware Nintendo could release. Disable all but two SMs, clock them up past 1GHz, and you've got a decent approximation of a die-shrink of the existing Switch with higher clock speeds. Configure it to use 4 SMs if you want to try out one of the Switch Pro configurations people are speculating on, or push it up to 10 SMs for a more powerful home console config. Ditto with ROPs, memory bandwidth, etc, as these can all be limited in software, to test out different possible configurations.
That also makes some degree of sense with the timescale. If they want to release a new device in 2019 they'd need to finalize SoC design in 2018, but they could spend 2017 getting feedback from developers (both internal and external) on the hardware they're considering.
Titan Volta also has full FP64 support at half the rate of FP32 (compared to I think 1/16th on 2080Ti), which would consume non-trivial amounts of die space. They're quite different architectures, though, so it's hard to make comparisons.
Knowing absolutely nothing about Tensor cores, are they different enough from CUDA cores to potentially break compatibility? Or is this something that would be an additional core rather than a replacement?
Either way it sounds a bit too bleeding edge to be in a ~$300-400 Nintendo device in 2019.
I like that theory, that makes a lot of sense. The only hiccup really is that it's odd we haven't heard of such a devkit from any insiders if they were making as many as 2000 of them. We've seen photos of devkits from various developers here and there but a year and a half after launch we haven't heard anyone mention a more powerful "all in one" devkit like you're describing. Maybe these have a much harsher NDA attached?
My point here and the point I've made in the past is that in the same space that Xavier has allocated to GPU (90 mm^2 at 12nm) the Switch Pro could fit a Turing GPU with 512 CUDA cores and their corresponding Tensor Cores and RT cores. Of course, they could have more CUDA cores in the same space if they only had traditional rasterization, or they could have the same CUDA cores, but at higher clocks. In this line of thinking, perhaps it'd be better to have 512 CUDA cores in a 7nm node for a small, cheap chip that can clock very high than to have a big 12nm chip for the same price with all the Tensor cores and RT cores.It should be noted that the "40% performance boost" is from rendering at a lower resolution, as DLSS is an upscaling technique. The fact that they call it super-sampling is marketing nonsense, as it's actually doing exactly the opposite of super-sampling (which renders at a higher resolution and downscales).
That said, it should be quite a good upscaling technique, which could be useful to Nintendo if they want to be able to run third-party games at a lower internal resolution than PS4/XBO. There are also plenty of other potential uses for neural-net accelerators in games, as I've discussed before, but I doubt Nintendo could squeeze it in their silicon budget here.
I've been thinking about that Foxconn leak, and although it definitely wasn't a near-final devkit of any specific piece of hardware (as it's almost two years old, so such a device would have released by now), it might make sense as an early catch-all devkit for any future Switch devices.
Pre-launch of Switch, Nintendo's hardware planning would probably have been in flux, with future development heavily dependent on how the market reacted to Switch. They may have planned for new hardware in 2019, but with a variety of different form factors and performance levels considered. If people just weren't interested in the hybrid approach they could release distinct portable and home consoles again (while maintaining the same library), or if the hybrid approach worked well, but the system didn't appeal next to the more powerful PS4/XBO they could release a more powerful Switch, etc. etc. The choice of ARM CPU and Nvidia GPU gave them a lot of flexibility in this regards, as they can release anything from a Vita-sized handheld to an XB1X-level home console and maintain binary compatibility with Switch games.
If they wanted to develop games with that future hardware in mind, then a devkit like the one described in the Foxconn leak makes sense. The GP106 could be configured to match pretty much any possible new piece of hardware Nintendo could release. Disable all but two SMs, clock them up past 1GHz, and you've got a decent approximation of a die-shrink of the existing Switch with higher clock speeds. Configure it to use 4 SMs if you want to try out one of the Switch Pro configurations people are speculating on, or push it up to 10 SMs for a more powerful home console config. Ditto with ROPs, memory bandwidth, etc, as these can all be limited in software, to test out different possible configurations.
That also makes some degree of sense with the timescale. If they want to release a new device in 2019 they'd need to finalize SoC design in 2018, but they could spend 2017 getting feedback from developers (both internal and external) on the hardware they're considering.
Titan Volta also has full FP64 support at half the rate of FP32 (compared to I think 1/16th on 2080Ti), which would consume non-trivial amounts of die space. They're quite different architectures, though, so it's hard to make comparisons.
If DLSS scales well with resolution, then maybe the tensor cores could be worth it, since Ray tracing does scale linearly with resolution. But there may be other factors to this.The Tensor cores take up too much room, I forgot the die size for Titan Volta was so big, my point still stands that Nintendo would be better served with more cuda cores than tensor cores. For instance the GTX Titan X(p) has 3840 Cuda cores and 471mm^2 die size, meanwhile the RTX 2080 TI has 4352 Cuda cores at 754mm^2, Tensor cores were a huge difference in die space, and they are useless for running games in a traditional sense, they basically act as REALLY good AA. That isn't worth the space for more cores here, you could have a Tegra die with ~140mm^2 that has 512 Cuda cores, adding Tensor cores would increase that size to ~175-180mm^2 and that is just to have ~1/10th of the available cores in a RTX 2080TI. Would DLSS even work well with so few cores? that is a lot to ask when a chip that size could probably house 1024 cuda cores. Remember the GTX 1060 is only 200mm^2.
I do think you are probably right Thraktor, it's what I've speculated all this time and as for Skittzo0413, I believe it's possible that a dev wouldn't know they have a super powered devkit, and even if they did, that really isn't out of the ordinary for devkits to have a piece of hardware that mimics the performance but is much bigger, like the Wii U devkits using R700 HD 4800 series GPUs. It's also probably worth noting that devkits can't be opened by a developer, so whatever hardware is inside a devkit is only known through whatever specs Nintendo sends them, and Nintendo could very well disable "x" SMs on a GPU and list the devkit with whatever is left available to them.
So no to Tensor cores being likely IMO, one last nail in it's coffin is that it uses power to have tensor cores there, and when they are used, it would be a much higher power draw than with them not there, and since the biggest constraint is the heat and power of the device, Tensor cores as a performance solution make very little logic over more cuda cores.
Knowing absolutely nothing about Tensor cores, are they different enough from CUDA cores to potentially break compatibility? Or is this something that would be an additional core rather than a replacement?
I like that theory, that makes a lot of sense. The only hiccup really is that it's odd we haven't heard of such a devkit from any insiders if they were making as many as 2000 of them. We've seen photos of devkits from various developers here and there but a year and a half after launch we haven't heard anyone mention a more powerful "all in one" devkit like you're describing. Maybe these have a much harsher NDA attached?
The Tensor cores take up too much room, I forgot the die size for Titan Volta was so big, my point still stands that Nintendo would be better served with more cuda cores than tensor cores. For instance the GTX Titan X(p) has 3840 Cuda cores and 471mm^2 die size, meanwhile the RTX 2080 TI has 4352 Cuda cores at 754mm^2, Tensor cores were a huge difference in die space, and they are useless for running games in a traditional sense, they basically act as REALLY good AA. That isn't worth the space for more cores here, you could have a Tegra die with ~140mm^2 that has 512 Cuda cores, adding Tensor cores would increase that size to ~175-180mm^2 and that is just to have ~1/10th of the available cores in a RTX 2080TI. Would DLSS even work well with so few cores? that is a lot to ask when a chip that size could probably house 1024 cuda cores. Remember the GTX 1060 is only 200mm^2.
My point here and the point I've made in the past is that in the same space that Xavier has allocated to GPU (90 mm^2 at 12nm) the Switch Pro could fit a Turing GPU with 512 CUDA cores and their corresponding Tensor Cores and RT cores. Of course, they could have more CUDA cores in the same space if they only had traditional rasterization, or they could have the same CUDA cores, but at higher clocks. In this line of thinking, perhaps it'd be better to have 512 CUDA cores in a 7nm node for a small, cheap chip that can clock very high than to have a big 12nm chip for the same price with all the Tensor cores and RT cores.
If DLSS scales well with resolution, then maybe the tensor cores could be worth it, since Ray tracing does scale linearly with resolution. But there may be other factors to this.
A big one is actually, how mature and widespread is Nvidia's RT implementation going to be? Because so far Playstation and XBOX and most PCs won't have it, it's locked to Direct X and to Nvidia's proprietary software, etc.
Just like Nvidia and AMD pioneered different forms of GPU compute and it took quite a while until it was more that a footnote, we might have a similar situation here. Look at Physx. It lives on through GPU compute and now everyone uses that, but it took an independent and vendor-agnostic solution.
That's what I say. The way I see it they can have a "wide and slow" chip at 12nm with the tensor cores and the RT cores and so forth, for ~200mm ^2 (Which by late 2019 will be cheap as hell) or a "narrow and fast" chip without Tensor Cores or RT cores, but the same 512 CUDA cores, in 7nm and running higher clocks.It's an additional functional unit built into the SMs (shader modules) alongside the regular logic. If you didn't explicitly call on the tensor core to do something you wouldn't know it was there.
It's possible that the vast majority of them were for use inside Nintendo. If they were just looking for feedback on possible hardware configurations then they may only send out a handful of them to key third parties, which means less likelihood of leaks.
I disagree with the bolded, I think there are plenty of unexplored uses for neural-net accelerators in games, particularly in visual processing. An obvious one is the use of style-transfer or similar techniques for non-realistic rendering in games, which is something which I think could be particularly appealing to Nintendo. I'm just not sure it's something Nintendo will be willing to go for just yet given their die size and power constraints.
Keep in mind that the entire Tegra X1 die size is 121mm2. Even if they could cut down the remainder of Xavier's 350mm2 by swapping the Carmel cores out for A53/A73s, dropping the automotive-specific hardware, etc., they'd still likely be on the north side of 200mm2 on a 12nm process, which is a big die for a device like the Switch. At 7nm it becomes more reasonable (TSMC claim 70% die size reduction compared to 16/12nm processes, but I'm always kind of skeptical of those claims), but even a ~120mm2 7nm die wouldn't be cheap in 2019, it would place it alongside the top of the line mobile SoCs from Apple, Qualcomm, etc.
Nvidia's PC ray-tracing uses the DirectX Ray-Tracing API, as far as I'm aware. In any case, I just don't see RT hardware hitting the performance levels necessary to be useful in a Switch-like device any time soon, as much as I'd love to see consoles with dedicated ray-tracing hardware.
One thing I don't see people talking about are changes to the joycon. Do you think we could see the shoulder buttons changed? Remember that one patent with the scrolling clickable wheels as a shoulder button? Maybe that could be a thing.
I see. It looked so cool. I would love to see evolutions like that.If you've seen it in a patent it's pretty unlikely they're going to use it in the future. If they were they wouldn't want to publish it beforehand.
It wouldn't be unprecedented with Wii Motion plus and Circle Pad Pro both coming out in less than 3 years, buuut I can't say I'm expecting anything this time around.One thing I don't see people talking about are changes to the joycon. Do you think we could see the shoulder buttons changed? Remember that one patent with the scrolling clickable wheels as a shoulder button? Maybe that could be a thing.
That's what I say. The way I see it they can have a "wide and slow" chip at 12nm with the tensor cores and the RT cores and so forth, for ~200mm ^2 (Which by late 2019 will be cheap as hell) or a "narrow and fast" chip without Tensor Cores or RT cores, but the same 512 CUDA cores, in 7nm and running higher clocks.
I see. It looked so cool. I would love to see evolutions like that.
Still waiting on that official Joycon-L with a dpad they pattented over a year back. :/If you've seen it in a patent it's pretty unlikely they're going to use it in the future. If they were they wouldn't want to publish it beforehand.
They were licensed to Hori.Still waiting on that official Joycon-L with a dpad they pattented over a year back. :/
was it actually licensed?
Hori's design is slightly different. No gyro, wireless or SR/SL. Also not a Nintendo dpad.
I thought yes, because Hori sold the D-pad joycon under the license with Nintendo, right?
I see.Hori's design is slightly different. No gyro, wireless or SR/SL. Also not a Nintendo dpad.
So, unless Nintendo goes directly to 7nm, we're fucked in terms if hoping for Xavier like specs? So it's either 16nm X2, or 7nm for a chance at 512 cuda cores?There's a reason you don't see phone and tablet SoCs much bigger than 100mm2 or so, and it's not just the cost. Die size is directly proportional to static leakage, which means increased idle power consumption, and a reduced benefit from lower clock speeds. Even at 12nm, I'm not sure if they'd be able to clock a 200mm2 chip low enough to fit within the ~5W TDP they'd be looking for in portable mode, which would mean a battery life hit compared to the current model.
Docked would be a different matter, and if Nintendo wanted to release a stand-alone home console then it'd give them a nice affordable competitor to PS4 and XBO, but I think a 4x SM configuration is just too big on 12nm to work in a hybrid form-factor.
not really. licensed in that Nintendo officially allows them to use their stamp of approval. but licensing a patent is different.I thought yes, because Hori sold the D-pad joycon under the license with Nintendo, right?
not really. licensed in that Nintendo officially allows them to use their stamp of approval. but licensing a patent is different.
They are not. They aren't even allowed to call them Joycon. It is simply called a D-pad controller.
or they aren't using the patented design, as (almost) all third party controllers do.If they aren't suing Hori for infringement then yes they have effectively licensed the patent.
or they aren't using the patented design, as (almost) all third party controllers do.
Couple problems here IMO. Tegra X1's cuda cores are in 2 blocks of 128 clusters, size is only 6mm^2 a piece, yes there is other components to the GPU, but Xavier is not a good indicator of such a GPU size as Volta's GPU in Titan V is 815mm^2 with 5120 cuda cores, well half (2560) the cuda cores with GTX 1080's die is only 314mm^2. GTX 1060 is 200mm^2 with 1280 Cuda cores, if we were to just linearly look at cuda core to die size, 1280 cuda cores + other components took up an extra 114mm^2, that would reflect less than 24mm^2 for another 256 cuda cores for 16nm over Tegra X1's 256 cores. It seems 12nm without size reduction (which TSMC claims there is some size reduction) you'd only have ~144mm^2. As for leakage, 20nm is far worse than 3D transistor solution. I don't see how a reasonable estimation for a Maxwell/Pascal GPU solution SoC with even the large A57 cores (A73 cores are what? 30% as big) could ever hit 200mm^2. We should be looking at ~23mm^2 for the extra cuda cores at most.There's a reason you don't see phone and tablet SoCs much bigger than 100mm2 or so, and it's not just the cost. Die size is directly proportional to static leakage, which means increased idle power consumption, and a reduced benefit from lower clock speeds. Even at 12nm, I'm not sure if they'd be able to clock a 200mm2 chip low enough to fit within the ~5W TDP they'd be looking for in portable mode, which would mean a battery life hit compared to the current model.
Docked would be a different matter, and if Nintendo wanted to release a stand-alone home console then it'd give them a nice affordable competitor to PS4 and XBO, but I think a 4x SM configuration is just too big on 12nm to work in a hybrid form-factor.
Dude, you dont need to say 'excuse me" in a internet forum, we are just speculating anyway. Congrats in your child.I'm going into the hospital in a few, my wife is due with our second child, so I probably won't be keeping up with this thread very much for the next week or so.
Couple problems here IMO. Tegra X1's cuda cores are in 2 blocks of 128 clusters, size is only 6mm^2 a piece, yes there is other components to the GPU, but Xavier is not a good indicator of such a GPU size as Volta's GPU in Titan V is 815mm^2 with 5120 cuda cores, well half (2560) the cuda cores with GTX 1080's die is only 314mm^2. GTX 1060 is 200mm^2 with 1280 Cuda cores, if we were to just linearly look at cuda core to die size, 1280 cuda cores + other components took up an extra 114mm^2, that would reflect less than 24mm^2 for another 256 cuda cores for 16nm over Tegra X1's 256 cores. It seems 12nm without size reduction (which TSMC claims there is some size reduction) you'd only have ~144mm^2. As for leakage, 20nm is far worse than 3D transistor solution. I don't see how a reasonable estimation for a Maxwell/Pascal GPU solution SoC with even the large A57 cores (A73 cores are what? 30% as big) could ever hit 200mm^2. We should be looking at ~23mm^2 for the extra cuda cores at most.
Thanks, I'm just saying that I might not get back to this thread to help with my estimations in the future, but yeah it is a bit ridiculous, though I've known some of the people in this thread for 7+ years now.Dude, you dont need to say 'excuse me" in a internet forum, we are just speculating anyway. Congrats in your child.
Congrats on becoming a dad my dude.Thanks, I'm just saying that I might not get back to this thread to help with my estimations in the future, but yeah it is a bit ridiculous, though I've known some of the people in this thread for 7+ years now.
So, unless Nintendo goes directly to 7nm, we're fucked in terms if hoping for Xavier like specs? So it's either 16nm X2, or 7nm for a chance at 512 cuda cores?
I'm going into the hospital in a few, my wife is due with our second child, so I probably won't be keeping up with this thread very much for the next week or so.
Couple problems here IMO. Tegra X1's cuda cores are in 2 blocks of 128 clusters, size is only 6mm^2 a piece, yes there is other components to the GPU, but Xavier is not a good indicator of such a GPU size as Volta's GPU in Titan V is 815mm^2 with 5120 cuda cores, well half (2560) the cuda cores with GTX 1080's die is only 314mm^2. GTX 1060 is 200mm^2 with 1280 Cuda cores, if we were to just linearly look at cuda core to die size, 1280 cuda cores + other components took up an extra 114mm^2, that would reflect less than 24mm^2 for another 256 cuda cores for 16nm over Tegra X1's 256 cores. It seems 12nm without size reduction (which TSMC claims there is some size reduction) you'd only have ~144mm^2. As for leakage, 20nm is far worse than 3D transistor solution. I don't see how a reasonable estimation for a Maxwell/Pascal GPU solution SoC with even the large A57 cores (A73 cores are what? 30% as big) could ever hit 200mm^2. We should be looking at ~23mm^2 for the extra cuda cores at most.
I play handheld only, so can't vouch for what I'm about to say, but I've read more than a few times for various games that their handheld version runs better than their docked version (so, the developers optimized performance for handheld, not for docked).I think with the Switch+ we'll get Docked Mode resolution/graphics in portable mode (downsampled to the 720p screen) with better Performance and while docked better Performance overall and games with dynamic res like Zelda, DOOM,... stay at the highest res.
I am not familiar with what Liam said. Was that pure speculation or speculation based on info/hints he has?with current speculation trending towards a DSLite-esque revision (as per Liam), I'm wondering how much they could even change
I'm basing this off people transcribing his paid podcast (ironic, no?) in another thread. he seems to characterize it's more like the DS Lite, a form change rather than a spec bump like the DSi or n3DSI am not familiar with what Liam said. Was that pure speculation or speculation based on info/hints he has?
This is where Nintendo shines. Never expect the unexpected.with current speculation trending towards a DSLite-esque revision (as per Liam), I'm wondering how much they could even change
I'm basing this off people transcribing his paid podcast (ironic, no?) in another thread. he seems to characterize it's more like the DS Lite, a form change rather than a spec bump like the DSi or n3DS
Well, if it's indeed more of a form change, then regardless of power level, that sounds to me like the 3DS successor I expect to come in time for "Classic" Pokemon and especially Animal Crossing - a lower cost handheld that plays Switch games.
yup. i'll trust insiders here instead.
One thing I noticed is that both Matt and John said that such a version exists, not when it is coming, or even if it is coming. For all we know, Nintendo could have been playing around with a pro version from the onset (including giving it to a select few major developers to play around) and then decided not to come out with it commercially, and this would not be a contradiction to what these insiders said. They only said "yes it exists". Or there could be two versions coming, "lite" in spring 2019 and "pro" holiday 2019 or some other date, and again this would not be a contradiction to what these insiders said.
with current speculation trending towards a DSLite-esque revision (as per Liam), I'm wondering how much they could even change
insiders never said what the revision is, just that there is one
in either case, a Lite and Pro would be based on the same SoC, so one would naturally beget the otherOne thing I noticed is that both Matt and John said that such a version exists, not when it is coming, or even if it is coming. For all we know, Nintendo could have been playing around with a pro version from the onset (including giving it to a select few major developers to play around) and then decided not to come out with it commercially, and this would not be a contradiction to what these insiders said. They only said "yes it exists". Or there could be two versions coming, "lite" in spring 2019 and "pro" holiday 2019 or some other date, and again this would not be a contradiction to what these insiders said.
aye, this is the only way I can see a Lite model happening. even with a shrink to 12nm, there's not much that could be changed, size-wise unless you make a model with a smaller battery, but the capacity would have to change a lot in order to have a non-negligible shrinkOrdinarily I'd expect a DS Lite like revision from Nintendo at around this point, but there's not much Nintendo can do in that department while retaining compatibility with the existing joycons (and I very much doubt they're going to break compatibility with them after just two years). I can see a smaller portable-only device as a definite possibility, but it would be a separate device alongside the Switch, not a revision of the current model.
Liam Robertson is a terrible "insider". He's been hilariously wrong ( that last minute Star Fix Zero delay, anyone ?) over the years.with current speculation trending towards a DSLite-esque revision (as per Liam), I'm wondering how much they could even change
Ill buy twoWant N to take inspiration from this and call it....Micro Switch
More than likely is.
I could see a Lite style revision maintaining size and shape while improving other ways internally and externally from two things: improving the screen and moving to a smaller fab. With those a Lite unit could be lighter, brighter, slightly increase screen size/reduce bezels, move to passive cooling and increase battery life.Ordinarily I'd expect a DS Lite like revision from Nintendo at around this point, but there's not much Nintendo can do in that department while retaining compatibility with the existing joycons (and I very much doubt they're going to break compatibility with them after just two years). I can see a smaller portable-only device as a definite possibility, but it would be a separate device alongside the Switch, not a revision of the current model.