Status
Not open for further replies.

Thraktor

Member
Oct 25, 2017
571
It all comes down to how much power 8 A78 cores would draw at different frequencies I think. You mention that 12 A78 cores could match Zen 2 at 3 GHz - despite being an unrealistic setup, of course. Well, how about 8 at 1.8 GHz? That would give you core count parity (which I could see make porting code easier than when you have fewer cores) and would give 40% of the CPU performance of a 12 A78 3 GHz rival to the Zen2 8 core setup, which would bring us much closer to next gen performance than the Switch was (which was what, 23% of XB1?). Some current gen ports are possible even with the wide disparity in CPU power between Switch and XB1/PS4 (it was pretty interesting to learn how Saber Interactive modified Witcher 3 to run on the Switch), so closing that gap should progressively make more and more ports possible, even if some of the most demanding games may turn out to be unfeasible still. You've mentioned it's difficult to find power draw for a specific core (because they are usually combined into an actual SoC, which does more than just run the CPU chip), so it's difficult to say what power draw we'd be looking at. But you're probably right that the 1.8W budget for the CPU needs to be expanded a bit in order to be a little bit more competitive. I believe z0m3le did some informed guesstimath on what a possible power profile for such an A78 setup was a while ago, perhaps he can re-iterate that once more.

Quite a while ago I did some calculations across a range of scenarios based on Geekbench 5 FP scores (far from the best comparison, but the only one available for all the CPUs in question), and came up with a best case scenario of a Switch 2 (on 5nm) hitting about 30%-40% of the CPU performance of the PS5/XBSX. (Can't quite find the post right now, this is just off the top of my head) It's impressive given the huge difference in power consumption, and close enough for some ports, but far enough that any large scale systems-based game, say most of Ubisoft's open world games, would have significant difficulty being ported.

Besides, my point wasn't that Nintendo can't build a device which could get some PS5/XBSX ports, or even that they won't. My point was that Nintendo won't design a new Switch with the specific intention to get PS5/XBSX ports. If we're trying to speculate about what hardware Nintendo might release as a Switch "pro" or Switch 2, or whatever, then we should ignore PS5/XBSX, think about what would work best for Nintendo and their partners, and if that ends up being something which can get some next-gen ports then that's a nice bonus.

While I'm not really an experienced practitioner with neural nets, I can provide you with a theoretical answer.

The comparison between image classification and image upscaling is not really apt since the first one is a classification problem but the second one is regression, thus leading to a different (loss/cost) function to optimize. In addition, DLSS does not really add data into the input since it is, at its core, an autoencoder, which by design strives to approximate the input as close as possible in a constrained condition (lossy media encoders are the closest analogs). The reason output images have higher res vs input one is that 1 output image was created using multiple consecutive input images (plus motion vectors, this is why DLSS works very closely with TAA in the rendering pipeline) as shown below:
nvidia-dlss-2-0-architecture.png

Besides, my guess for the reason why the DLSS can "add" more details at far distance compared to rendering at native res without DLSS is simply the graphic settings used to generate the ground truth images, e.g tuning up LOD and/or draw distance to maximum. As the result end users will always see the difference since they can never achieve such settings in their machine. Images on their computers must be rendered in real time while ground truth images for training DLSS need not be so.

About pruning, I concur with your assertion that it will reduce accuracy of the algorithm. I think it is a cost NVDIA chose to take in order to have 9x resolution scaling in their next version of DLSS (vs. up to 4x scaling for DLSS 2.0 at the moment) per their Reddit Q&A session. What caught me off guard was, like I stated in my post, the method they used to achieve sparsity for their neural nets: my initial guess was something akin to l1 regularization during training phase then clamp a certain number of weights in the result to 0. But yeah it turned out to be a little bit different: 1st training phase happened normally. Then NVIDIA just straight up cut weights, in a nearly checkerboard pattern, out of the result then compensated the loss by putting the remaining weights through a 2nd training phase.

Edit: after looking at all available info about structural sparsity again, I think my original claim that DLSS 2.0 (and later version) will see increased performance in Ampere may be a little bit premature, since the supposed gain in computation throughput in Ampere's tensor cores clouded my judgement about the performance/quality trade-off for DLSS. The main problem with the claim above is that the current neural net in DLSS 2.0 can be dense. So if NVIDIA wants to utilize Ampere's sparsity leverage on DLSS 2.0, they will have to perform a 2nd training phase for all of their neural nets (for each mode: quality/balanced/perf) as described in the above paragraph, and this can not be verified without benchmarks on current DLSS 2.0-supported games. Unfortunately I have not yet found any RTX 3080 reviews with deep dive into DLSS performance gain in comparison to Turing cards. So in summary:
- It is unknown whether NVIDIA reworked their DLSS 2.0 algorithm to utilize Ampere's structural sparsity, therefore ..
- If NVIDIA want to use Ampere arch on the next Switch and at the same time utilize structural sparsity for DLSS then they will have to refine their current neural nets, as of 2.0 version.
- Whether such refinements on the existing neural nets (for 2x, 3x, and 4x resolution scaling modes) are silently introduced in DLSS 2.0, or will be introduced in later DLSS versions, is an open question. However it is likely that the neural nets for the upcoming 9x scaling mode, expected to come with later DLSS versions, will see this change implemented.

Thanks, that's quite illuminating. It had slipped my mind that multiple frames (including motion vectors) are passed in, hence a lot more input data than I was thinking of.

Regarding the use of L1 regularisation vs a checkerboard-like pattern for the sparsity, my initial thought was that this was down to how the sparsity support is implemented in hardware. Being able to achieve a high speed up with arbitrary sparsity patterns would likely require very complex logic to be implemented in silicon, which would increase size, power draw, and largely defeat the purpose of the tensor cores (which achieve high performance by implementing relatively simple logic on a large scale).

I've done a bit of reading on it, and this diagram is quite instructive (from this post on Nvidia's developer site):

New_Sparsity_Diag_White_is_Zero.jpg


Basically, Ampere's tensor cores operate on 4-element vectors of the weight matrix at once. The sparsity acceleration is achieved by having zero entries for 2 out of the 4 weights in each vector, which allows them to combine two vectors into a single vector for calculation, which gets them their double performance.

So they basically just zero-out the two smallest weights from each 4-element vector and then re-train. Using L1 regularisation wouldn't result in a weight matrix which fits the 2:4 sparsity requirements for Ampere's sparsity acceleration, so simply cutting weights seems like the only way to go about it. I would assume Nvidia have done enough testing to be confident that this can still achieve accurate results. It certainly seems to be a very silicon-efficient way of doing it, in any case.

If Nvidia can apply this sparsity technique to DLSS and achieve good accuracy (theoretically there must be some loss in accuracy, but it may be small enough that a player would not notice), then I would certainly imagine they would re-train their DLSS neural nets to leverage Ampere's sparsity speedup, if they haven't done so already. From the point of view of a hypothetical new Nintendo device, then I'm sure it would be worth re-training, as the cost would certainly be lower than the cost of adding double the tensor cores to the hardware, not to mention increased bandwidth requirements, reduced battery life, etc.

One other interesting point is that it appears the consumer Ampere cards do actually have full-size tensor cores (ie 2x Turing, like A100) on the silicon, but are limited to half that in the GeForce line (so presumably Quadro cards will come out with full-rate support). If this is true, and if DLSS can leverage sparsity acceleration with minimal impact on visual quality, then it changes my expectations on what kind of hardware Nintendo would need to release a DLSS-enabled device in 2021.

I posted a short while ago about what Nintendo might need if they wanted to design a device around using DLSS to achieve 4K output resolution. Based on no sparsity acceleration and the idea that consumer Ampere tensor cores offered the same performance as Turing (without sparsity acceleration), it seemed very unlikely that an affordable portable device could achieve DLSS at 60fps and 4K output resolutions while having to render and perform DLSS sequentially (ie render the frame for approx 75% of the frame time, and spend approx 25% running DLSS). My suggestion was that the shader logic and tensor cores could be decoupled into separate SMs, allowing them to run concurrently (ie run DLSS on one frame while the next is rendering), which would allow 100% utilisation of the tensor cores at the cost of 1 extra frame of latency.

However, assuming sparsity acceleration can be applied to DLSS, and that such a device would get full-rate use of those larger Ampere tensor cores, it actually becomes feasible to achieve DLSS on a standard Ampere tensor core setup within ~4.2ms (1/4 of a frame at 60fps), without the need for customisations to the architecture or additional latency. By my estimates, a 6 SM Ampere GPU at 1.3GHz, or an 8 SM GPU at 1GHz would be able to manage it, which isn't out of the realms of possibility for a Tegra chip designed for a 2021 Switch (for comparison, Xavier uses an 8 SM GPU).

One limitation which might still be there (actually increased, due to the shorter time DLSS has to operate over) would be memory bandwidth. Assuming RTX 2060 is fully bandwidth-limited while running DLSS at 4K (it's probably not, but worst-case scenario), even with a 2x reduction in bandwidth from sparsity, an Ampere GPU would need just over 100GB/s to run it within ~4.2ms. RTX 2060 probably isn't bandwidth-limited, but you're still probably looking at a 128-bit LPDDR4X RAM setup (about 68GB/s) as a realistic requirement.

In fact, I would have ruled out LPDDR5 not that long ago, both based on cost, but also the fact that it was only available in large capacities, meaning a 2x8GB setup would be the only option if they wanted a 128-bit bus, which is well past what I'd expect in a 2021 device. I just had a quick look on it, though, and it looks like Micron are currently sampling a variety of LPDDR5 parts with smaller capacity. What this tells me is that (a) it's actually possible for Nintendo to go as low as 8GB total memory on a 128-bit LPDDR5 bus, but also (b) that there's likely to be a migration of the mid and low end of the smartphone market to LPDDR5 over the next year or so. In particular, they've got a 2GB part with a 32-bit interface, which only really makes sense for the lower end of the market. Nintendo haven't really hesitated on using expensive RAM in the past (they used LPDDR4X on the 2019 Switch and Switch Lite purely for the improved power efficiency), so I wouldn't rule out LPDDR5 either. Two of the 4GB x64 LPDDR5 chips Micron are sampling would give 8GB of RAM and 102.4GB/s of bandwidth, which would be a huge boost over the original Switch. Of course they could go with a single 6GB/8GB module for 51.2GB/s, but I'd be somewhat concerned at DLSS becoming bandwidth-limited at that point.

A hypothetical DLSS-enhanced Switch model in 2021 could actually look pretty nice:

7nm manufacturing process (much like TX1 vs desktop Maxwell, I think there's a benefit to going with a smaller node for power efficiency)
4x A78 @1.6GHz (lets say) for games, with some A55 cores for OS/sleep mode
6x Ampere SMs @1.3GHz docked (maybe ~500MHz portable?)
8GB LPDDR5 on a 128-bit bus for 102.4GB/s

Aside from obviously the DLSS, it's quite a good upgrade from the original Switch, with a substantial CPU boost, probably 2-3x the GPU performance pre-DLSS, and obviously the RAM capacity and bandwidth upgrade.

On top of that, it actually makes sense as a SoC Nvidia might want to position for edge AI applications. Assuming sparsity support and ~1.5GHz peak GPU clock, it would hit 74 TOPS INT8, which is about 2.5x Xavier (without having to mess with it being split between tensor cores and a DLA). It should also be a lot smaller and more power-efficient than Xavier, and potentially cheaper than Xavier given the smaller CPU/GPU configurations. It would also support up to 32GB of RAM, or more as larger LPDDR5 modules come along.

That's not to say I'd expect such a chip, but it's not entirely outside the realms of possibility for a reasonably-priced Nintendo device in 2021, and it's interesting to consider.
 

UltraMagnus

Banned
Oct 27, 2017
15,670
I think Nintendo would like to have major Japanese 3rd party games like FFXVI, Resident Evil Village, MH World 2 for Switch 2 next time out. Western stuff is kinda like a "well whatever you want to put there is fine".
 

ShadowFox08

Banned
Nov 25, 2017
3,524
Ah. You see my major premise (although, I've been happy to argue other scenarios) is that Nintendo could launch a A78/Ampere based console in 2021, and get that big boost sooner. Then they could in 2023 launch a ARM2022/Hopper based console that was 75% faster. Then in 2025 launch a ARM2024/NV2024 based console that's 75% faster than the 2023 version. Repeat ad infinitum. If we start with 2021 as a baseline with this, the progressions looks like this:


1e7OTLs.png



The progression really adds up after a bit, and at the same time, it never becomes the end of a generation. It's just a progression. It would never be a really bad time to buy a switch. Very few games should require the latest and greatest. You could easily pick up new units every other generation. Nintendo could theoretically launch Odyssey 2 in 2021, and then Odyssey 3 in 2027, and have Odyssey 3 run in generation 3+. Any big game worth maintaining could continue to be maintained. Every generation BotW could become a bit prettier and have a bit more stable frame rate.

All it would really take is for Nintendo to start treating hardware like Samsung treats their flagship phones.

I really also need to find a different word than generation to use to describe these Biennial releases.

75% may or may not be the right number and may differ from biennial release to biennial release. I would compare it to a slower version of Moore's law, but that's about transistor density.
If we get a performance revision every 2 years, it's going to be a nightmare. Devs and consumers alike are not going to like this. Devs already have to deal with multiple closed performance profiles as it is. And I don't think consumers want to have a potential replacement/upgrade like a phone every 2 years. Which even then, most consumers don't change phones for another 3-4 years anyway. Nintendo should not emulate phones. It's going to be suicide and will be hated.
 

bmfrosty

Member
Oct 27, 2017
1,896
SF Bay Area
If we get a performance revision every 2 years, it's going to be a nightmare. Devs and consumers alike are not going to like this. Devs already have to deal with multiple closed performance profiles as it is. And I don't think consumers want to have a potential replacement/upgrade like a phone every 2 years.
That's the beauty, you wouldn't have to every 2 years. Every 6 should be fine, unless you're a person who want's to play a GTAVI on the go or something and it requires the latest version.
 

z0m3le

Member
Oct 25, 2017
5,418
Quite a while ago I did some calculations across a range of scenarios based on Geekbench 5 FP scores (far from the best comparison, but the only one available for all the CPUs in question), and came up with a best case scenario of a Switch 2 (on 5nm) hitting about 30%-40% of the CPU performance of the PS5/XBSX. (Can't quite find the post right now, this is just off the top of my head) It's impressive given the huge difference in power consumption, and close enough for some ports, but far enough that any large scale systems-based game, say most of Ubisoft's open world games, would have significant difficulty being ported.

Besides, my point wasn't that Nintendo can't build a device which could get some PS5/XBSX ports, or even that they won't. My point was that Nintendo won't design a new Switch with the specific intention to get PS5/XBSX ports. If we're trying to speculate about what hardware Nintendo might release as a Switch "pro" or Switch 2, or whatever, then we should ignore PS5/XBSX, think about what would work best for Nintendo and their partners, and if that ends up being something which can get some next-gen ports then that's a nice bonus.



Thanks, that's quite illuminating. It had slipped my mind that multiple frames (including motion vectors) are passed in, hence a lot more input data than I was thinking of.

Regarding the use of L1 regularisation vs a checkerboard-like pattern for the sparsity, my initial thought was that this was down to how the sparsity support is implemented in hardware. Being able to achieve a high speed up with arbitrary sparsity patterns would likely require very complex logic to be implemented in silicon, which would increase size, power draw, and largely defeat the purpose of the tensor cores (which achieve high performance by implementing relatively simple logic on a large scale).

I've done a bit of reading on it, and this diagram is quite instructive (from this post on Nvidia's developer site):

New_Sparsity_Diag_White_is_Zero.jpg


Basically, Ampere's tensor cores operate on 4-element vectors of the weight matrix at once. The sparsity acceleration is achieved by having zero entries for 2 out of the 4 weights in each vector, which allows them to combine two vectors into a single vector for calculation, which gets them their double performance.

So they basically just zero-out the two smallest weights from each 4-element vector and then re-train. Using L1 regularisation wouldn't result in a weight matrix which fits the 2:4 sparsity requirements for Ampere's sparsity acceleration, so simply cutting weights seems like the only way to go about it. I would assume Nvidia have done enough testing to be confident that this can still achieve accurate results. It certainly seems to be a very silicon-efficient way of doing it, in any case.

If Nvidia can apply this sparsity technique to DLSS and achieve good accuracy (theoretically there must be some loss in accuracy, but it may be small enough that a player would not notice), then I would certainly imagine they would re-train their DLSS neural nets to leverage Ampere's sparsity speedup, if they haven't done so already. From the point of view of a hypothetical new Nintendo device, then I'm sure it would be worth re-training, as the cost would certainly be lower than the cost of adding double the tensor cores to the hardware, not to mention increased bandwidth requirements, reduced battery life, etc.

One other interesting point is that it appears the consumer Ampere cards do actually have full-size tensor cores (ie 2x Turing, like A100) on the silicon, but are limited to half that in the GeForce line (so presumably Quadro cards will come out with full-rate support). If this is true, and if DLSS can leverage sparsity acceleration with minimal impact on visual quality, then it changes my expectations on what kind of hardware Nintendo would need to release a DLSS-enabled device in 2021.

I posted a short while ago about what Nintendo might need if they wanted to design a device around using DLSS to achieve 4K output resolution. Based on no sparsity acceleration and the idea that consumer Ampere tensor cores offered the same performance as Turing (without sparsity acceleration), it seemed very unlikely that an affordable portable device could achieve DLSS at 60fps and 4K output resolutions while having to render and perform DLSS sequentially (ie render the frame for approx 75% of the frame time, and spend approx 25% running DLSS). My suggestion was that the shader logic and tensor cores could be decoupled into separate SMs, allowing them to run concurrently (ie run DLSS on one frame while the next is rendering), which would allow 100% utilisation of the tensor cores at the cost of 1 extra frame of latency.

However, assuming sparsity acceleration can be applied to DLSS, and that such a device would get full-rate use of those larger Ampere tensor cores, it actually becomes feasible to achieve DLSS on a standard Ampere tensor core setup within ~4.2ms (1/4 of a frame at 60fps), without the need for customisations to the architecture or additional latency. By my estimates, a 6 SM Ampere GPU at 1.3GHz, or an 8 SM GPU at 1GHz would be able to manage it, which isn't out of the realms of possibility for a Tegra chip designed for a 2021 Switch (for comparison, Xavier uses an 8 SM GPU).

One limitation which might still be there (actually increased, due to the shorter time DLSS has to operate over) would be memory bandwidth. Assuming RTX 2060 is fully bandwidth-limited while running DLSS at 4K (it's probably not, but worst-case scenario), even with a 2x reduction in bandwidth from sparsity, an Ampere GPU would need just over 100GB/s to run it within ~4.2ms. RTX 2060 probably isn't bandwidth-limited, but you're still probably looking at a 128-bit LPDDR4X RAM setup (about 68GB/s) as a realistic requirement.

In fact, I would have ruled out LPDDR5 not that long ago, both based on cost, but also the fact that it was only available in large capacities, meaning a 2x8GB setup would be the only option if they wanted a 128-bit bus, which is well past what I'd expect in a 2021 device. I just had a quick look on it, though, and it looks like Micron are currently sampling a variety of LPDDR5 parts with smaller capacity. What this tells me is that (a) it's actually possible for Nintendo to go as low as 8GB total memory on a 128-bit LPDDR5 bus, but also (b) that there's likely to be a migration of the mid and low end of the smartphone market to LPDDR5 over the next year or so. In particular, they've got a 2GB part with a 32-bit interface, which only really makes sense for the lower end of the market. Nintendo haven't really hesitated on using expensive RAM in the past (they used LPDDR4X on the 2019 Switch and Switch Lite purely for the improved power efficiency), so I wouldn't rule out LPDDR5 either. Two of the 4GB x64 LPDDR5 chips Micron are sampling would give 8GB of RAM and 102.4GB/s of bandwidth, which would be a huge boost over the original Switch. Of course they could go with a single 6GB/8GB module for 51.2GB/s, but I'd be somewhat concerned at DLSS becoming bandwidth-limited at that point.

A hypothetical DLSS-enhanced Switch model in 2021 could actually look pretty nice:

7nm manufacturing process (much like TX1 vs desktop Maxwell, I think there's a benefit to going with a smaller node for power efficiency)
4x A78 @1.6GHz (lets say) for games, with some A55 cores for OS/sleep mode
6x Ampere SMs @1.3GHz docked (maybe ~500MHz portable?)
8GB LPDDR5 on a 128-bit bus for 102.4GB/s

Aside from obviously the DLSS, it's quite a good upgrade from the original Switch, with a substantial CPU boost, probably 2-3x the GPU performance pre-DLSS, and obviously the RAM capacity and bandwidth upgrade.

On top of that, it actually makes sense as a SoC Nvidia might want to position for edge AI applications. Assuming sparsity support and ~1.5GHz peak GPU clock, it would hit 74 TOPS INT8, which is about 2.5x Xavier (without having to mess with it being split between tensor cores and a DLA). It should also be a lot smaller and more power-efficient than Xavier, and potentially cheaper than Xavier given the smaller CPU/GPU configurations. It would also support up to 32GB of RAM, or more as larger LPDDR5 modules come along.

That's not to say I'd expect such a chip, but it's not entirely outside the realms of possibility for a reasonably-priced Nintendo device in 2021, and it's interesting to consider.
So Ampere is 128 Cuda cores per SM, 6 seems like a lot, that is 2TFLOPs at 1.3GHz on the GPU, I've been suggesting 512 Cuda cores, giving a 1.3TFLOPs GPU at 1.3GHz.

And while it is true that Ampere TFLOPs are less potent than Turing, you are still adding VRS and other GPU features that would speed up GPU processing vs Maxwell, meaning that is more like 5* the docked performance of the Switch before DLSS is taken into account.

I think they will go with 64 Cuda cores per SM and have tensor cores in a Turing configuration, however Ampere configuration would probably just result in lower clocks than 1.3GHz, I could see 1.1GHz in such a scenario, giving 1.7tflops. We probably won't hear too much until 2021, GTC Online could give us a hint.

Nvidia could still use 8 A78 cores with conservative clocks thanks to DynamIQ, they can simply make sure their CPU performance is locked into a wattage instead of worrying about clocks so much, giving you 3 to 3.5 watts for the CPU and offsetting the loss in GPU wattage with DLSS is a smart move. PS4 will continue to get games for the next 3 years imo, the economy is going to limit next gen growth compared to what they were expecting and what we've been talking about blows away the PS4 in CPU, matching 8 cores is a no brainer as the entire industry as been on 8 cores for nearly the last 10 years, Nintendo themselves wouldn't use more than 1 core for the OS, and A55 would need to be clocked somewhat high to match 1GHz A57 core used in the current Switch, not a problem for the processor, but they could probably get away with 400MHz A78 which should be a very small power consumption to add to the overall budget.

Overall I like your system speculation, that configuration would be a lot more powerful than what I've been talking about though, and I think even though Nintendo has mentioned cutting edge tech, they will still want a decent battery life and a ~800gflops portable + DLSS probably is a bit too high performance, I've been suggesting 600gflops + DLSS, which would result in something on par with base current gen consoles, then again if joycon batteries could be used, the battery life is probably there for the more powerful console, I just think they will push that extra power consumption to the CPU as they will be happy with PS4 like GPU performance in a portable.
 
Apr 11, 2020
1,235
So Ampere is 128 Cuda cores per SM, 6 seems like a lot, that is 2TFLOPs at 1.3GHz on the GPU, I've been suggesting 512 Cuda cores, giving a 1.3TFLOPs GPU at 1.3GHz.

Nvidia could still use 8 A78 cores with conservative clocks thanks to DynamIQ, they can simply make sure their CPU performance is locked into a wattage instead of worrying about clocks so much, giving you 3 to 3.5 watts for the CPU and offsetting the loss in GPU wattage with DLSS is a smart move.
It all comes down to the node used for this new chip. 8*A78 and 8 Ampere SM would be the size of the original TX1 on 20 nm. Maybe a little bigger but still in line with the size of the E9820 made on the same node. It would be only limited by power consumption. Both die size and power consumption could be solved by the use of a more expensive 7 nm chip especially with higher density libraries and lower clocks.

I'm wondering if Nvidia are making smaller chips for a new switch alongside GA100 in order to maximize wafer costs.
 

Dakhil

Member
Mar 26, 2019
4,459
Orange County, CA
I'm wondering if Nvidia are making smaller chips for a new switch alongside GA100 in order to maximize wafer costs.
kopite7kimi mentioned that Nvidia might have planned on designing a new chip that's fabricated on a newer fabrication node and is in a smaller package (GA103?) that can compete with Big Navi without being too expensive. But I don't know if that really means anything, to be honest.


 
Apr 11, 2020
1,235
kopite7kimi mentioned that Nvidia might have planned on designing a new chip that's fabricated on a newer fabrication node and is in a smaller package (GA103?) that can compete with Big Navi without being too expensive. But I don't know if that really means anything, to be honest.



Yes but he also mentioned the new tegra chip on 8 nm (he actually exactly called it like that instead of 'Drive' or 'AGX').
 

Dakhil

Member
Mar 26, 2019
4,459
Orange County, CA
Yes but he also mentioned the new tegra chip on 8 nm (he actually exactly called it like that instead of 'Drive' or 'AGX').
That's why I said "But I don't know if that really means anything, to be honest."

But considering there's a rumour of the 13k of TSMC's 7 nm nodes capacity that was originally prebooked by Huawei is now being taken up by AMD for the fabrication of the APUs of the PlayStation 5, Xbox Series X, and Xbox Series S, I don't know if Nvidia is able to reserve enough 7 nm EUV nodes capacity from TSMC for the fabrication of the Tegra SoC on the next Nintendo Switch model, or if Samsung manages to improve 7 nm EUV yields to the point where Nvidia wants to prebook capacity for Samsung's 7 nm EUV node for the Tegra SoC on the next Nintendo Switch model.


And I don't know if Nvidia categorises the Tegra SoC on the next Nintendo Switch model as the entry model or the high end model.
 

dgrdsv

Member
Oct 25, 2017
12,123
kopite7kimi mentioned that Nvidia might have planned on designing a new chip that's fabricated on a newer fabrication node and is in a smaller package (GA103?) that can compete with Big Navi without being too expensive. But I don't know if that really means anything, to be honest.
I doubt that it will be anything else besides the same 8N. There isn't a lot of other choices but 8N and N7, with the latter certainly not being cheaper than 8N so making a GA103 on it because it's "very expensive" to compete on 8N doesn't make any sense.
 

Onix555

Member
Apr 23, 2019
3,381
UK
It got stealth revealed in a completely seperate press release by some Chinese car company that Orin is 7nm. Just in case anyone was wondering.

wccftech.com

Li Auto (NASDAQ: LI) To Equip Its Vehicles With NVIDIA’s Orin SoC Chipset To Facilitate Autonomous Driving Under a New Partnership Agreement

Li Auto is enhancing the autonomous driving capabilities of its upcoming vehicles by utilizing NVIDIA's Orin SoC chipset under a new deal

"Orin uses a 7-nanometer production process to achieve a computing power of 200 TOPS, 7 times that of the previous generation, the Xavier SoC. Even with the significant improvement in computing performance, Orin's baseline power consumption is just 45 watts, relatively equivalent to the low power consumption of the previous generation SoC."
 

Thraktor

Member
Oct 25, 2017
571
So Ampere is 128 Cuda cores per SM, 6 seems like a lot, that is 2TFLOPs at 1.3GHz on the GPU, I've been suggesting 512 Cuda cores, giving a 1.3TFLOPs GPU at 1.3GHz.

And while it is true that Ampere TFLOPs are less potent than Turing, you are still adding VRS and other GPU features that would speed up GPU processing vs Maxwell, meaning that is more like 5* the docked performance of the Switch before DLSS is taken into account.

I think they will go with 64 Cuda cores per SM and have tensor cores in a Turing configuration, however Ampere configuration would probably just result in lower clocks than 1.3GHz, I could see 1.1GHz in such a scenario, giving 1.7tflops. We probably won't hear too much until 2021, GTC Online could give us a hint.

Nvidia could still use 8 A78 cores with conservative clocks thanks to DynamIQ, they can simply make sure their CPU performance is locked into a wattage instead of worrying about clocks so much, giving you 3 to 3.5 watts for the CPU and offsetting the loss in GPU wattage with DLSS is a smart move. PS4 will continue to get games for the next 3 years imo, the economy is going to limit next gen growth compared to what they were expecting and what we've been talking about blows away the PS4 in CPU, matching 8 cores is a no brainer as the entire industry as been on 8 cores for nearly the last 10 years, Nintendo themselves wouldn't use more than 1 core for the OS, and A55 would need to be clocked somewhat high to match 1GHz A57 core used in the current Switch, not a problem for the processor, but they could probably get away with 400MHz A78 which should be a very small power consumption to add to the overall budget.

Overall I like your system speculation, that configuration would be a lot more powerful than what I've been talking about though, and I think even though Nintendo has mentioned cutting edge tech, they will still want a decent battery life and a ~800gflops portable + DLSS probably is a bit too high performance, I've been suggesting 600gflops + DLSS, which would result in something on par with base current gen consoles, then again if joycon batteries could be used, the battery life is probably there for the more powerful console, I just think they will push that extra power consumption to the CPU as they will be happy with PS4 like GPU performance in a portable.

This is why I just referred to SMs, rather than Gflops/Tflops, as with Ampere that gives people an unrealistic view of the performance if they're not familiar with the architecture. In general focusing on flops as a relative measure of performance between two different architectures isn't great, but in Ampere's case particularly it's just not even worth comparing given the FP setup in Ampere SMs. As a quick example, the RTX 3080 has 2.2 times the theoretical floating point performance as the RTX 2080Ti, but the actual performance increase is around 15-30% depending on resolution. If you compare performance per SM per clock (they both have 68 SMs) you get maybe around 10% or 20% increase in performance over Turing SMs, which is in line with what you'd expect for a generational upgrade. This doesn't necessarily translate directly from a big 68 SM desktop GPU to a tiny 6 SM mobile one, but it's a better point of comparison than flops.

So a 6 SM Ampere GPU (that theoretically hits 2Tflops) would offer, lets say 15% better performance than a 6 SM Turing GPU, which would be a 1Tflop part, which would itself provide perhaps 20% better performance than a 3 SM Maxwell part which would also hit a theoretical 1Tflop. You also have to consider that as much as 25% of the frame time would be taken up by DLSS, hence why I would consider such a GPU to be about 2-3 times as powerful as the Switch GPU. Of course this is a very rough estimate which may only have the slightest approximation to reality, but for idle speculation it's probably ok.

The other thing is that I'm looking at the theoretical case where Nintendo and Nvidia are specifically designing a SoC around getting Switch games to output at 4K via DLSS. If a GPU comprising of 6 Ampere SMs is what it takes to make that possible, then 6 Ampere SMs is what they get. Ditto on the memory side, where a 128-bit bus becomes necessary for DLSS's bandwidth requirements, LPDDR5 is the current memory standard, and 8GB is the lowest capacity they can get with a 128-bit bus, so they end up with 8GB at 102GB/s. On the CPU front, there's no need for more performance, but newer cores and slightly higher clock speeds pretty much come as default on a new SoC with a smaller manufacturing process, so 4 higher clocked A78s is what they get.

If they're not specifically targeting DLSS at 4K/60, then we would probably end up with a very different chip. Eight big CPU cores is possible, but not something I'm really expecting. A smaller GPU is definitely possible, and we could see something like 6GB of LPDDR5 on a 64-bit bus for 51GB/s of bandwidth.
 

z0m3le

Member
Oct 25, 2017
5,418
This is why I just referred to SMs, rather than Gflops/Tflops, as with Ampere that gives people an unrealistic view of the performance if they're not familiar with the architecture. In general focusing on flops as a relative measure of performance between two different architectures isn't great, but in Ampere's case particularly it's just not even worth comparing given the FP setup in Ampere SMs. As a quick example, the RTX 3080 has 2.2 times the theoretical floating point performance as the RTX 2080Ti, but the actual performance increase is around 15-30% depending on resolution. If you compare performance per SM per clock (they both have 68 SMs) you get maybe around 10% or 20% increase in performance over Turing SMs, which is in line with what you'd expect for a generational upgrade. This doesn't necessarily translate directly from a big 68 SM desktop GPU to a tiny 6 SM mobile one, but it's a better point of comparison than flops.

So a 6 SM Ampere GPU (that theoretically hits 2Tflops) would offer, lets say 15% better performance than a 6 SM Turing GPU, which would be a 1Tflop part, which would itself provide perhaps 20% better performance than a 3 SM Maxwell part which would also hit a theoretical 1Tflop. You also have to consider that as much as 25% of the frame time would be taken up by DLSS, hence why I would consider such a GPU to be about 2-3 times as powerful as the Switch GPU. Of course this is a very rough estimate which may only have the slightest approximation to reality, but for idle speculation it's probably ok.

The other thing is that I'm looking at the theoretical case where Nintendo and Nvidia are specifically designing a SoC around getting Switch games to output at 4K via DLSS. If a GPU comprising of 6 Ampere SMs is what it takes to make that possible, then 6 Ampere SMs is what they get. Ditto on the memory side, where a 128-bit bus becomes necessary for DLSS's bandwidth requirements, LPDDR5 is the current memory standard, and 8GB is the lowest capacity they can get with a 128-bit bus, so they end up with 8GB at 102GB/s. On the CPU front, there's no need for more performance, but newer cores and slightly higher clock speeds pretty much come as default on a new SoC with a smaller manufacturing process, so 4 higher clocked A78s is what they get.

If they're not specifically targeting DLSS at 4K/60, then we would probably end up with a very different chip. Eight big CPU cores is possible, but not something I'm really expecting. A smaller GPU is definitely possible, and we could see something like 6GB of LPDDR5 on a 64-bit bus for 51GB/s of bandwidth.
This is an end of the generation type device, not something they would continuously build on like some are speculating based on Iwata's comments and the following president's reassurance that he didn't change plans from what Iwata had in mind. 8 cores makes sense in the latter approach, also it's faulty to compare RTX 2080 TI to RTX 3080 for the architecture performance difference, we have a better comparison with the RTX 3070, which Nvidia says out performs RTX 2080 TI, if we take that at face value, we can see that Ampere has ~30% less performance per flop, since they only double the shader count. The main reason the 3080 is a bad choice is because of other bottlenecks that could be observed, such as CPU or memory bandwidth.

In the end there is little difference between using 8 SM with Turing configuration or 6 SM with Ampere's configuration, nor do I think there is an issue with using Ampere with 64 cuda cores per SM, the end result is very similar at the same clock, if you have 2TFLOPs (Ampere) or 1.3TFLOPS Turing, just like RTX 3070's 20TFLOPs vs RTX 2080 TI's 13.4TFLOPs Turing, the configuration you have is superior to what I'm suggesting. Both would beat out PS4 base when docked without DLSS being taken into account.
 
Last edited:

Zedark

Member
Oct 25, 2017
14,719
The Netherlands
Do we know why Ampere SMs aren so much less efficient at attaining their theoretical peak performance numbers than the Turing SMs? Normally architectural efficiency improves the ratio between theoretical and actual performance. From what I'm reading, it seems they put in double the amount of cores in each SM. Is that the cause of the decreased efficiency?
 

z0m3le

Member
Oct 25, 2017
5,418
Do we know why Ampere SMs aren so much less efficient at attaining their theoretical peak performance numbers than the Turing SMs? Normally architectural efficiency improves the ratio between theoretical and actual performance. From what I'm reading, it seems they put in double the amount of cores in each SM. Is that the cause of the decreased efficiency?
Yes, Turing SM vs Ampere SM, Ampere SM is far better per clock.

In a Turing SM you have 64 Cuda cores, 8 Tensor cores and 1 RT core.
In an Ampere SM you have 128 Cuda cores, 8 Tensor cores and 1 RT core.

Even though the Tensor cores doubled in some performance metrics and RT cores are seemingly faster as well, you aren't doubling the components, cache, TMUs, so while the flops double, the performance doesn't scale perfectly with that, however because we have a RTX 3070 with 20TFLOPs that is suppose to beat the RTX 2080 TI by a small margin according to Nvidia's graphs, we can see that the performance difference is somewhere around 30% less per flop over Turing, which is why Thraktor's 2TFLOPs Switch he lists above isn't unreasonable, as it's somewhere around 1.4TFLOPs Turing.

However, Ampere could use Turing's configuration with more SMs instead, the benefit here is more Tensor cores, instead of just 48 tensor cores at 1.3GHz, they would have 64 tensor cores with my configuration, which makes more sense for DLSS, even though raw performance (1.3TFLOPs) would be less.
 
Last edited:

Zedark

Member
Oct 25, 2017
14,719
The Netherlands
Yes, Turing SM vs Ampere SM, Ampere SM is far better per clock.

In a Turing SM you have 64 Cuda cores, 8 Tensor cores and 1 RT core.
In an Ampere SM you have 128 Cuda cores, 8 Tensor cores and 1 RT core.

Even though the Tensor cores doubled in some performance metrics and RT cores are seemingly faster as well, you aren't doubling the components, cache, TMUs, so while the flops double, the performance doesn't scale perfectly with that, however because we have a RTX 3070 with 20TFLOPs that is suppose to beat the RTX 2080 TI by a small margin according to Nvidia's graphs, we can see that the performance difference is somewhere around 30% less per flop over Turing, which is why Thraktor's 2TFLOPs Switch he lists above isn't unreasonable, as it's somewhere around 1.4TFLOPs Turing.
I see. Do we know if a similar thing applies to the RT/Tensor core performance as well, or do those scale better than the CUDA cores? I believe that the tensor cores have a 4x performance improvement compared to last year's model, right? So theoretically, tensor core performance per SM is 4x before considering other components potentially holding it back.
 

z0m3le

Member
Oct 25, 2017
5,418
I see. Do we know if a similar thing applies to the RT/Tensor core performance as well, or do those scale better than the CUDA cores? I believe that the tensor cores have a 4x performance improvement compared to last year's model, right? So theoretically, tensor core performance per SM is 4x before considering other components potentially holding it back.
They have the same configuration as turing in the SM, 8 tensor cores per SM, 1 RT core per SM.
 

z0m3le

Member
Oct 25, 2017
5,418
Right, but if the tensor cores are 4x as performant, then they might need to get more data to work on within the same time span than the Turing tensor cores would, right? So a bottleneck in cache could still occur as a result I think?
I believe Nvidia claims over 2 times performance per tensor core, which is good because that would mean around 6ms for 4K via DLSS, leaving around 10ms for rendering 720p at 60fps or 27ms for 30fps, quite a bit of rendering time there to do 4K.
 

Ryoku

Member
Oct 28, 2017
460
Could Ampere's performance oddity have something to do with how the cuda cores are set up?
It's not literally 2xcuda cores, I think.
It's just that the INT32 calculation hardware portion of the SM (forgot what it's called, don't hurt me) got repurposed to also be able to handle FLOAT32 operations. So that portion isn't really doing anything unless it is told to do so? Or is the data load supposed to simply scale across all of it?

Could it have something to do with SIMD vs MIMD?
 

Hermii

Member
Oct 27, 2017
4,751
This is an end of the generation type device, not something they would continuously build on like some are speculating based on Iwata's comments and the following president's reassurance that he didn't change plans from what Iwata had in mind. 8 cores makes sense in the latter approach, also it's faulty to compare RTX 2080 TI to RTX 3080 for the architecture performance difference, we have a better comparison with the RTX 3070, which Nvidia says out performs RTX 2080 TI, if we take that at face value, we can see that Ampere has ~30% less performance per flop, since they only double the shader count. The main reason the 3080 is a bad choice is because of other bottlenecks that could be observed, such as CPU or memory bandwidth.

In the end there is little difference between using 8 SM with Turing configuration or 6 SM with Ampere's configuration, nor do I think there is an issue with using Ampere with 64 cuda cores per SM, the end result is very similar at the same clock, if you have 2TFLOPs (Ampere) or 1.3TFLOPS Turing, just like RTX 3070's 20TFLOPs vs RTX 2080 TI's 13.4TFLOPs Turing, the configuration you have is superior to what I'm suggesting. Both would beat out PS4 base when docked without DLSS being taken into account.
Wether this is a device designed to play switch games at 4K, designed to be a incremental upgrade, or something else is anyone's guess at this point.
 

Zedark

Member
Oct 25, 2017
14,719
The Netherlands
I believe Nvidia claims over 2 times performance per tensor core, which is good because that would mean around 6ms for 4K via DLSS, leaving around 10ms for rendering 720p at 60fps or 27ms for 30fps, quite a bit of rendering time there to do 4K.
Hmm, so translated to tensor core performance per cuda core, the tensor core performance has remained stable? That's not quite the jump I thought it was, then
 
Oct 26, 2017
7,981
Wether this is a device designed to play switch games at 4K, designed to be a incremental upgrade, or something else is anyone's guess at this point.
Yeah I feel like we're in IT'S NOT A HYBRID era if we try and rigidly determine what is happening in the future based on what Iwata said. If anything they've been touting their flexibility in being able to quicky adapt to different market conditions.
 

Dakhil

Member
Mar 26, 2019
4,459
Orange County, CA
It got stealth revealed in a completely seperate press release by some Chinese car company that Orin is 7nm. Just in case anyone was wondering.

wccftech.com

Li Auto (NASDAQ: LI) To Equip Its Vehicles With NVIDIA’s Orin SoC Chipset To Facilitate Autonomous Driving Under a New Partnership Agreement

Li Auto is enhancing the autonomous driving capabilities of its upcoming vehicles by utilizing NVIDIA's Orin SoC chipset under a new deal

"Orin uses a 7-nanometer production process to achieve a computing power of 200 TOPS, 7 times that of the previous generation, the Xavier SoC. Even with the significant improvement in computing performance, Orin's baseline power consumption is just 45 watts, relatively equivalent to the low power consumption of the previous generation SoC."
Interesting. I wonder if the Tegra SoC that kopite7kimi mentioned has to do with Nano Next if Orin is indeed fabricated at 7 nm nodes (I think whether it's TSMC's or Samsung's 7 nm nodes is unknown.).
 
Last edited:

ShadowFox08

Banned
Nov 25, 2017
3,524
post: 46240553, member: 17625"]
That's the beauty, you wouldn't have to every 2 years. Every 6 should be fine, unless you're a person who want's to play a GTAVI on the go or something and it requires the latest version.
Yeah we don't have to, but I don't think dedicated game consoles are ready for 2 year refreshes that change the gaming profiles/clockspeeds. Do you think consumers and developers (especially third party) would be happy with that?

Interesting. I wonder if the Tegra SoC that kopite7kimi mentioned has to do with Nano Next if Orin is indeed fabricated at 7 nm nodes (I think whether it's TSMC's or Samsung's 7 nm nodes is unknown.).
In regards to a 2021 switch pro, I think Orion is out if the question, if the release date is 2022 as planned.
 

bmfrosty

Member
Oct 27, 2017
1,896
SF Bay Area
Yeah we don't have to, but I don't think dedicated game consoles are ready for 2 year refreshes. Do you think consumers and developers (especially third party) would be happy with that?
Are companies that make games for PC happy that what they are targeting can have an infinite number of configurations?

Supporting multiple Switch configurations has got to be easier. They don't even have to do that much for most cases. Make it work at some baseline, and let dynamic resolution handle the rest would be the minimum. The worst part would be testing on multiple Switch platforms.

A developer putting more time and money into it may have a different texture set that can be available for higher end models, or additional shaders.

The big difference between this and PC is that this would have fewer settings to test.
 

Dakhil

Member
Mar 26, 2019
4,459
Orange County, CA
I don't know if the job listing means anything since it's from Nintendo Technology Development in Redmond, Washington. And I don't know if Nintendo Technology Development has made significant contributions to Nintendo hardware since I imagine that the Nintendo Platform Technology Development in Kyoto, Japan, makes the final calls when it comes to Nintendo hardware. But if it does, I wonder if any of the display technologies listed in the job listing would be featured in the "Nintendo Switch 2".

www.linkedin.com

Nintendo hiring Senior Engineer – Multimedia (NTD) in Redmond, Washington, United States | LinkedIn

Posted 9:15:56 AM. Nintendo Technology DevelopmentThe worldwide pioneer in the creation of interactive entertainment…See this and similar jobs on LinkedIn.
Nintendo Technology Development

The worldwide pioneer in the creation of interactive entertainment, Nintendo Co., Ltd., of Kyoto, Japan, manufactures and markets hardware and software for its Nintendo Switch™ system and the Nintendo 3DS™ family of portable systems. Since 1983, when it launched the Nintendo Entertainment System™, Nintendo has sold more than 4.4 billion video games and more than 700 million hardware units globally, including Nintendo Switch and the Nintendo 3DS family of systems, as well as the Game Boy™, Game Boy Advance, Nintendo DS™ family of systems, Super NES™, Nintendo 64™, Nintendo GameCube™, Wii™ and Wii U™ systems. It has also created industry icons that have become well-known, household names, such as Mario, Donkey Kong, Metroid, Zelda and Pokémon. A wholly owned subsidiary, Nintendo Technology Development, based in Redmond, Washington, creates future hardware/software technology and researches North American-based technologies.

Description Of Duties
  • Define end-to-end display architecture.
  • Responsible for various aspects of display software development including design, implementation, and debug.
  • Partner with both internal teams and external teams on display software architecture and implementation.
  • Investigate and evaluate new display technologies and/or features.
  • Make recommendations for best display technologies to use for the product.
Summary Of Requirements
  • 5+ years of experience.
  • Experience working with display infrastructure and pipeline.
  • Knowledge of different display technologies, such as LCD, OLED, HDR, etc.
  • Experience with display measurement & performance evaluation.
  • Excellent software development and debugging skills in C and C++ embedded development.
  • Excellent software design, problem solving, and debugging skills.
  • Experience in developing and debugging of multi-process and multi-threaded applications.
  • Good verbal and written communication skills.
  • Degree in Computer Science, Electrical Engineering or a related field.

We are an equal opportunity employer of individuals with disabilities and protected veterans....valuing diversity…celebrating strengths.
www.resetera.com

Nintendo looking to hire Senior Engineer with HDR support Rumor

https://www.linkedin.com/jobs/view/2148930242/ Lock mods if not enough info or worthy of thread. However it can further push the Switch Pro rumors or maybe this is for Switch 2? Who knows.
 
Last edited:

ShadowFox08

Banned
Nov 25, 2017
3,524
Are companies that make games for PC happy that what they are targeting can have an infinite number of configurations?

Supporting multiple Switch configurations has got to be easier. They don't even have to do that much for most cases. Make it work at some baseline, and let dynamic resolution handle the rest would be the minimum. The worst part would be testing on multiple Switch platforms.

A developer putting more time and money into it may have a different texture set that can be available for higher end models, or additional shaders.

The big difference between this and PC is that this would have fewer settings to test.
They aren't making an infinite number of configurations on PC and mobile. PC/mobile is open ended, and it's designed by software engineers in such a way that consumer computers automatically detect the hardware requirements of a game, and adjust to run it, while also allowing customers to adjust the graphical settings (resolution, textures, etc) as well.

This is different than closed systems like game consoles that have specific CPU and GPU profile speeds for developers to work with, and must be optimized/fine tuned to run smoothly for multiple consoles as well.

I'm not saying developing PC games is easy by any means, but to the best of my knowledge, it's open ended and isn't constrained/bound to specific hardware profiles like dedicated consoles that have to often be optimized (but at least guaranteed to run). With PC, it's up to the consumer to know if a game can run on their PC for one. If your PC specs are suboptimal to your standards, then the games you will play will perform poorly and you need better parts to run it.

So I can't imagine developers having to develop and fine tune games on all closed systems, with more profiles they must work with, every 2 years for the same platform. It going to take up a lot more resources. Especially for AAA game ports, which tend to sell more on consoles.

We can agree to disagree. I just can't imagine dedicated consoles with closed systems refreshing with new specs/hardware profiles every 2 years like phones, especially with all of them having simultaneous support. It's going to be more effort/time and money spent by devs, especially third parties on switch when many are already stretched so thin and struggle to make a profit. Regular gamers will be annoyed and it will likely confuse consumers. This could backfire tremendously. I think 3 years should be the minimum for dedicated gaming consoles for modngem refreshes/new specs

It's a radical concept currently in which I don't know how people will react in the long run. Games on consoles are on different playing field than games on mobile phones as well, since more money is invested by devs for home consoles, and many phone games are f2p typically. And phones and PCs can be used outside of gaming.
 
Last edited:

Sqrt

Member
Oct 26, 2017
5,938
It would be bad for the next Switch to be on 8nm. Discrete PC cards can get away with lower power efficiency, but that's not an option for portable systems.
 

dgrdsv

Member
Oct 25, 2017
12,123
It would be bad for the next Switch to be on 8nm. Discrete PC cards can get away with lower power efficiency, but that's not an option for portable systems.
There is no indication than 8N has inherently bad power efficiency.
A design which pushes for maximum performance will always have bad power efficiency.
 

Dakhil

Member
Mar 26, 2019
4,459
Orange County, CA



It seems that Nano Next (or whatever Tegra SoC the "Nintendo Switch Pro" or the "Nintendo Switch 2" [if Nintendo foregoes releasing the "Nintendo Switch Pro in favour of the "Nintendo Switch 2" in 2021] uses) will be fabricated at Samsung's 8 nm nodes, unless Nvidia for some reason managed to prebook enough capacity for TSMC's 7 nm nodes, which I doubt. I wish Samsung's nodes weren't so underwhelming.
But I guess if the "Nintendo Switch 2" is planned to be released in 2023, the Tegra SoC on the "Nintendo Switch 2" seems pretty likely to be fabricated at TSMC's 5 nm nodes.
 
Last edited:

z0m3le

Member
Oct 25, 2017
5,418



It seems that Nano Next (or whatever Tegra SoC the "Nintendo Switch Pro" or the "Nintendo Switch 2" [if Nintendo foregoes releasing the "Nintendo Switch Pro in favour of the "Nintendo Switch 2" in 2021] uses) will be fabricated at Samsung's 8 nm nodes, unless Nvidia for some reason managed to prebook enough capacity for TSMC's 7 nm nodes, which I doubt. I wish Samsung's nodes weren't so underwhelming.
But I guess if the "Nintendo Switch 2" is planned to be released in 2023, the Tegra SoC on the "Nintendo Switch 2" seems pretty likely to be fabricated at TSMC's 5 nm nodes.

If next year's model does 4K, we won't see Switch 2 in 2023. That is a full new SoC design, they will probably keep it around and even shrink it before they move on, so late 2024 or even 2025 for the upgrade after next year's, which is why a Switch '2' makes sense next year.
 
Apr 11, 2020
1,235
And it seems that they won't be able to get a good price for 5/4 or even 7/6 nm (Samsung or TSMC) before 2024-2025. Apple is literally securing any TSMC advancement for one or two years.
 

Dakhil

Member
Mar 26, 2019
4,459
Orange County, CA
I don't think people are going to be happy if Nintendo releases the "Nintendo Switch 2" in 2021, especially if third party developers abandon the Nintendo Switch and make "Nintendo Switch 2" exclusive games, and since it's still almost impossible to get a Nintendo Switch at the original MSRP.
 

SharpX68K

Member
Nov 10, 2017
10,606
Chicagoland
If the 2021 Switch model (Switch Pro or whatever) has a new Nvidia SoC and is a lot more powerful than the current Switch then I would not expect Nintendo's next-gen/future hybrid system until 2024 or 2025.
 
Apr 11, 2020
1,235
I don't think people are going to be happy if Nintendo releases the "Nintendo Switch 2" in 2021, especially if third party developers abandon the Nintendo Switch and make "Nintendo Switch 2" exclusive games, and since it's still almost impossible to get a Nintendo Switch at the original MSRP.
3rd party devs can't leave the ship if they hadn't joined in the first place. MHR could run in a atrocious way on original hardware. We don't know yet.
 

Plankton2

Member
Dec 12, 2017
2,670
Sorry there is zero chance they cut off 70M user base next year with a successor. They've pretty much publicly said otherwise

If they have a new soc I'd say it's more likely they put an artificial cap on it and somehow balance battery upgrades with potential AI upscaling rather than go with just raw power. Or maybe just experiment with future design choices on it with like AR features and camera or completely overhauling the joyconn something like that, but they are not leaving the people who are buying a switch this year out in the cold.

Is it possible they let 3rd parties do exclusives sure...but just think about the economics at play. By the end of 21' their might possibly be 75-80M switches and 5-10M Switch pros. You're just not going to be spending a significant amount of money to target that small fraction of a user base.
 

fwd-bwd

Member
Jul 14, 2019
726
If they have a new soc I'd say it's more likely they put an artificial cap on it and somehow balance battery upgrades with potential AI upscaling rather than go with just raw power. Or maybe just experiment with future design choices on it with like AR features and camera or completely overhauling the joyconn something like that, but they are not leaving the people who are buying a switch this year out in the cold.
Agreed that this is the likelier scenario. Use the extra power partly on a decent performance bump and 4K upscaling, and use the rest on prolonging the battery run time and perhaps some "pro" OS features such as streaming support.

To alleviate a potential backlash from existing owners and prevent 3rd parties from abandoning the old models, I think that Nintendo should introduce new profiles for the Mariko Switch and Lite to narrow the performance gap. Likewise, when the next Switch revision comes in, say, 3 years, they can add new profiles for the 2021 model to unleash its power in exchange of the battery run time.
 

Pokemaniac

Member
Oct 25, 2017
4,944
Sorry there is zero chance they cut off 70M user base next year with a successor. They've pretty much publicly said otherwise

If they have a new soc I'd say it's more likely they put an artificial cap on it and somehow balance battery upgrades with potential AI upscaling rather than go with just raw power. Or maybe just experiment with future design choices on it with like AR features and camera or completely overhauling the joyconn something like that, but they are not leaving the people who are buying a switch this year out in the cold.

Is it possible they let 3rd parties do exclusives sure...but just think about the economics at play. By the end of 21' their might possibly be 75-80M switches and 5-10M Switch pros. You're just not going to be spending a significant amount of money to target that small fraction of a user base.
Nintendo has explicitly said they don't want to keep starting from 0 in terms of audience every generation. At this point, I think the main distinction between a '2' vs a 'Pro' will largely come down to what sort of hardware upgrades the system ends up having.

Also 5-10 million is greater than 0, which is the size of the install base the games that aren't getting ported to Switch for technical reasons currently are reaching. You could make the exact same argument for PS5/XSX compared to their predecessors.
 
Apr 11, 2020
1,235
Sorry there is zero chance they cut off 70M user base next year with a successor. They've pretty much publicly said otherwise

If they have a new soc I'd say it's more likely they put an artificial cap on it and somehow balance battery upgrades with potential AI upscaling rather than go with just raw power. Or maybe just experiment with future design choices on it with like AR features and camera or completely overhauling the joyconn something like that, but they are not leaving the people who are buying a switch this year out in the cold.

Is it possible they let 3rd parties do exclusives sure...but just think about the economics at play. By the end of 21' their might possibly be 75-80M switches and 5-10M Switch pros. You're just not going to be spending a significant amount of money to target that small fraction of a user base.
Any AR features, camera or joycon overhauling would actually cut the 70M first gen switch from the new model compared to a raw performance upgrade that would only add one power profile (and optionally two with AI upscaling). 75-80M first gen switch and 5-10M second gen switch would really look like 70% Wii - 70% Wii U... And one of the reasons of the Wii U failure is lack of raw power compared to PS4-XO.
 

Hermii

Member
Oct 27, 2017
4,751
So, Nintendo could update older games (BOTW, Mario Odyssey, etc.) and cut down on some jaggies?

Were there ever graphical upgrades to their games between DS models apart from resolution?

1. yes theoretically they can.
2. I think it happened before, but I also don't think past platforms is a very good reference. Since scalability is a core feature of the switch, in past platforms its been tacked on.
 
Status
Not open for further replies.