If you keep in your mind that an HDR primarily falls within the range of and SDR image, those areas 100nit and above are mostly reserved purely for things like light sources and sun spots. HDR 10 and Dolby Vision use absolute values, so if the content says a specific pixel is 1500nit, then the TV tries to illuminate it to 1000nit.
So this graph shows the ST2084 PQ which both HDR10 and Dolby Vision use
on the left you can see the code value (0-1023)
and along the bottom you can see the nit value that each code value is supposed to generate.
In 8bit RGB levels world is the equivalent of 0-255 , 0 being black and 255 being white.
Rather than say 0 is Black and and 1024 is white and the extra data is used inbetween to reduce graduations, roughly half of the data is used to represent black (0) and White at 100nits (code value 520), then ou have the other 521-1023 code values reserved for things that are whiter than 100 nits.
So a piece of paper on a table on an overcast day will reflect back less light than the same piece of paper on a very sunny day, this is what this additional data can be used to display more accurately.
Likewise in SDR land you might use the same white to represent a piece of paper and the sun, which of course we know is nothing like real life.
This distribution of the HDR code values is also why colour banding can still exist, as the reality of it is that there are only twice as many code values there for the bulk of the image, so there are now twice as many bands*
This graph shows what the difference between an SDR and an 10000nit HDR image is
see
As you can see, the bulk of what makes up the picture falls within that 100nit range.
Obviously the problem is if the TV physically can't deliver that brightness, how should it handle it?
The Metadata that is sent alongside HDR content helps the TV to understand how the SDR image would look (it's often created by running the SDR and HDR grades's through software which compares them) , so in a situation where the TV can't display the information it is being presented with, the TV knows how to make that image look correct and can map the tones of brightness down to a level that the TV can display. The goal being to ensure that you make use of the additional data and create an image that has more dynamic range than the SDR version would have had. Extended Dynamic range if you like.
Both HDR10 and DV give this information. The primary difference between the 2 is that DV can give more metadata to make this tone mapping algorithm more granular (frame by frame or shot by shot , HDR10+ will do this too) and DV has a preset set of rules as to how you tone map downwards. With HDR10+ these rules are set by the manufacturer of the TV and may even be unique to the model. Ultimately they are trying to achieve the same thing - interpret data that cannot be displayed in a way that looks right, within the constraints of what the hardware can display.
Dolby Vision is the theatrical standard, because it is more predictable (the TV manufacturer's recommendations are taken out of consideration).
So in the absence of Metadata, DV will simply stop trying to display highlights beyond what the TV can display, something 650nits and 1000nits will display at 650nits, which will result in lost detail.
This becomes less of an issue as the TVs get better and better, you can already get TVs that get close to 2000nit, the chances of major perceivable loss of detail become smaller and smaller.
So in a nutshell : Metadata is only useful when you have a TV that can't hit the nit values of the content. As time goes by, TVs will get better and the need for metadata becomes zeroed.
At which point Dolby will remove the cap on 4000nits and push it up to 10,000, requiring firmware updates / new chips/new TVs...
With videogames, the content is created on the fly, they can tell the game to make sure that the sun never becomes brighter than the maximum capability of the display and everything else within the game that produces or reflects light working relative to this (these are the games with HDR silders). The game operates within a fixed range, set by the user and the TV displays exactly what it is told to display. No metadata is required.