I think you're mistaken. Phone AR is a bit janky still, but you're MASSIVELY underestimating the technology if you think that we still lack the power to accurately interpret visual data and map things onto that. Consider the lowly snapchat AR filter - performant, surprisingly advanced spatial understanding and vision-based tracking, easily used and quite effective. Consider, again, that the FIRST Hololens games were able to interpret your sofa, your chair, your table, your floor, your walls - and then literally play in, with, and among them.
Heck, HoloLens (again 1.0) could already remember the spatial map of your entire house and remember where you put holograms and apps floating or pinned to your walls. Life sized characters having conversations sitting in your chair or around your table. Alien bugs bursting out of your walls. It's not even just room-scale - it was entire building scale. You could walk around or even go outside freely in ways that just aren't feasible with a VR headset on, even with passthrough.
Your pokemon game wasn't just possible back then. It was actively being worked on (well, more like a YuGiOh duel disk game). And it worked, if a bit unpolished. It was stable, it was immersive, it looked pretty good. And that was almost 5 years ago.
I truly believe that we have had the technology for full-potential AR gaming for years. We just haven't had the accessibility, the support, and ultimately the players to justify pursuing it.