As you mention in the first paragraph, TFLOPs can be an expression of SM*CLOCK as long as you label the architecture, it is an overall expression of that architecture's performance at a given clock.
Thing is, we can compare 2TFLOPs Ampere to 1.34TFLOPs Turing, and we know that Ampere is faster in general, those bottlenecks were not just a doubling of cuda cores, it also increased performance of tensor and rt cores, as well as other changes that we would need to dive into the white paper for, but the overall result is that the RTX 3070, a 20TFLOPs Ampere card is faster than the RTX 2080TI, a 13.4TFLOPs Turing card, that means your configuration is around 1.4TFLOPs Turing performance, it also as you mentioned is faster than Maxwell, one of the big new features that can offer a 20% uplift is VRS, so when comparing to Maxwell, it's over 1.7TFLOPs Maxwell that you are talking about on average.
When someone adds the architecture to the TFLOPs number, they are (whether they know it or not) expressing the overall performance of that architecture's SM/CU*Clock, and since these architectures maintain their SM/CU configurations, they are accurate to represent performance.
To use your performance metric, Ampere SM's are 40%+ faster than Turing SM's.
Probably should have a big asterisk next to that "faster than a RTX 2080Ti", as that's based on Nvidia's market claims that didn't exactly pan out for the RTX 3080 (although the 3080 is still a very capable card, just not quite hitting Nvidia's claims). Going by
actual measured performance of RTX 3080 vs RTX 2080Ti, (which is a straight-forward comparison, as they're both 68 SM GPUs), the 3080 provides approx 19% more performance per clock at 4K (less at lower resolutions, but then you're more likely to be CPU-limited). Of course this is not necessarily going to translate exactly to a far smaller GPU, or a console use-case vs a PC use-case, but it's as good as we'll get for a rough estimate at the moment.
In any case, my point about using flops as a measure post-Ampere wasn't about the people who are well-informed enough to know the difference between the metric as applied to different architectures, it's that many people don't keep a very close eye on these things and either don't consider a difference in architecture at all, or use rough rules of thumb like "Nvidia flops are better than AMD flops" which don't apply to Ampere (and to be honest didn't really apply to Turing/RDNA1 either). So if I were to say "Nintendo might release a 2Tflop Switch Pro next year" many people would misinterpret that as being 5x as powerful as the original Switch, or more powerful than the PS4, or half as powerful as the XBox Series X, none of which are necessarily true. I'd simply like to avoid those misunderstandings while posting any speculation on any hypothetical Ampere-based Switch hardware.
Every game running 4K is a pipe dream, I don't care about dlls.
I think even most games, or perhaps any game running at 4K may be a pipedream too. Well see next year.
Saying you "don't care about DLSS" kind of misses the point, though, the entire purpose of DLSS is to allow games to
not run at 4K resolution, but still be scaled up to 4K with acceptable image quality. DLSS 2.0 on PC has shown good results with a 1080p internal resolution scaling to 4K, and if Nintendo were to implement it in a new Switch model I'd expect them to target around 1080p rendering resolution in docked mode. Many games already come close to it, and a 50%-100% performance improvement necessary to take most of Nintendo's titles to 1080p is hardly a stretch at this point.
Basically we have two data points which would suggest a DLSS-enabled Switch model outputting at 4K resolution. The first is Bloomberg (who I'd expect to be very reliable), stating that developers have been told to prepare for 4K output resolution. This could mean a number of things, whether it's lower resolutions (eg 1440p) scaled to 4K, some kind of checkerboard setup, or DLSS. The second is the job posting on Nvidia's website looking for a software developer to work on implementing AI technologies like DLSS for a console using a Tegra chip with a proprietary API. Combined, these two would heavily suggest that Nintendo plans to release a device which leverages DLSS to output at 4K resolutions.
It's not even particularly challenging from a technical point of view. As I pointed out on a previous page, if Nvidia's tensor sparsity acceleration applies to DLSS, then Nintendo could implement DLSS for a 4K/60 output with a 6 SM Ampere GPU. That's just 50% "bigger" than the original Switch GPU, which is hardly a huge leap for a device releasing 4+ years after the original.
That's not to say I fully expect Nintendo to release such a device, but given the evidence available to us, it's definitely a possibility.