For sake of shifting discussion to something I think is more interesting (personally to help figure out), but what is the possible per SM power draw of Ampere?
I think the best possible comparisons or the ones to draw possible numbers are the 3070 and 3060TI, they have the same type of RAM (GDDR6), same amount (8GB) and it is the same clock (1750MHz), their SM count and their clocks are, however, different.
3070 has 46 SMs with a base clock of 1500MHz and a boost clock of 1725MHz. TFLOPS: ~20.
3060TI has 38 SMs with a base clock of 1410MHz and boost clock of 1665MHz. TFLOPS: ~16.
Reason is that, I think we can extrapolate from this for the switch, at least some information regarding it.
Even if they don't exactly scale linearly.
Let's take the RTX3060Ti. It consumes "up to 200W", so let's calculate with that and the boost clock. Our target for the GPU power draw is 3W in handheld mode. Let's explore 2 possibilities: a 6SM GPU and a 4SM GPU.
For 6SM, we reduce power draw by 38/6 -> 31.6W at 1665 MHz, for 2.53 TFLOPS. Using the squared frequency rule of thumb, we have to reduce the wattage by a factor 31.6/3 and therefore drop the frequency by sqrt(31.6/3), giving us 1665/sqrt(31.6/3) = 513 MHz, which represents a 0.78 TFLOPS GPU in handheld mode.
For 4SM, we reduce the power draw by 38/4 -> 21.05W at 1665 MHz, for 1.68 TFLOPS. We must reduce the wattage further by a factor 7, giving a frequency of 630 MHz, which represents a 0.64 TFLOPS GPU.
A good rule of thumb would be to double those numbers to get docked GPU performance, although you could probably get away with quite a bit more in docked mode if you wanted to. But yeah, we need to be careful assuming that these simple mathematical computation hold completely, since we don't have a guarantee things scale linearly (and quadratically) in this manner to a much smaller GPU. On the other hand, Switch might allow TDP to be slightly higher than 3W because it is not the average power draw (it is peak power draw). So I dunno, apply plenty of caveats, but the above is not necessarily unreasonable if we do get an Ampere GPU.
Edit: if we do the same for the RTX3070 (TDP=220W), we would get:
For 6SM, the wattage at 1725 MHz is 28.7W. We must drop frequency by sqrt(28.7/3) to 558 MHz, giving a 0.84 TFLOPS GPU in handheld mode.
For 4SM, the wattage at 1725 MHz is 19.1W. We must drop frequency by sqrt(19.1/3) to 683 MHz, giving a 0.69 (nice) TFLOPS GPU in handheld mode.
Edit 2: Let's also do it for the RTX3050, which has a TDP of 90W, a boost clock of 1740 MHz, 18 SMs, and a peak performance of 8 TFLOPS:
For 6 SM, we have a power draw of 30W at 1740 MHz. We must drop the frequency by sqrt(10) to 550 MHz, giving a 0.84 TFLOPS GPU in handheld mode.
For 4 SM, we have a power draw of 20W at 1740 MHz. We drop the frequency by sqrt(20/3) to 674 MHz, giving a 0.69 TFLOPS GPU in handheld mode.
This is a pretty interesting result, since it's in line with the RTX3070. No guarantees, of course, but it could lend some credence to the idea that the power consumption scales pretty linearly to lower SM counts.
Edit 3: Also, if this is based on Orin, then it'd lose the RT cores in favour of bigger Tensor cores, so it's hard to say whether the FLOPS number and power draw will remain unaffected by that