18 part thread, so you'll have to dive into the Tweet to read the full thing.
Skipping ahead:
Finale:
Skipping ahead:
Finale:
By this time, the game had come out, and all hopes of this being a weird fluke only a couple devs would ever see were dashed, as players all over the place started reporting their companion quests failing
Wow, I only had that bug happen to me once with Parvati. Getting off the ship fixed the failed state of her companion quest.
That's the bug, the NPC doesn't stop climbing while you're in the conversation, gets way too high above the ground, then you stop talking, the NPC gets off the ladder and falls to their death.I don't get the explanation. Yes, no new interactions can be started while in conversation, but what about when the conversation ends?
Interaction 1: enter ladder
Conversation starts, NPC can not exit ladder because it's a new interaction
Conversation ends
Interaction 2: sweet, let's go
Debugging is nothing simple at all. Anytime someone says "lazy devs" when talking about bugs, I want to punch my computer monitor.
I don't get the explanation. Yes, no new interactions can be started while in conversation, but what about when the conversation ends?
Interaction 1: enter ladder
Conversation starts, NPC can not exit ladder because it's a new interaction
Conversation ends
Interaction 2: sweet, let's go
There exists what is known as an event handler in games that works as a queue. Event handlers are good because they let multiple parts of the game throw all their actions into a giant list all at once, then let different modules "gobble" up the event as it goes from one handler to another. This is how things like joystick polling are done, your joystick throws an event, then he game says, like, "are we in a menu? If so, menu handler, gobble this event. If not, are we on the ground? If so, ground interaction handler, gobble this event. If not, are we in the air? if so, gobble this event" and so forth. It's done so that the queue shrinks as different modules handle different relevant events. They are "gobbled" (i.e. removed from the queue, and that specific handler has code deciding what to do when it gobbles such an event). He says that they block the even when in a conversation, which reads to me as the conversation handler "gobbles" all events regarding furniture interactions to prevent them from happening. So when a character enters a ladder, an event is put into the queue, and the "ladder handler" says "ok, make the character start climbing up." When the character reaches the top of the ladder, it throws an event, and the "ladder handler" is supposed to say, "ok, now stop climbing." What happens is that the "conversation handler" gobbles the event first, which it's supposed to do, and says "do nothing." So the event that causes the character to stop climbing never reaches the right handler. Apparently the event to stop climbing only happens when you reach the top of the ladder. Since the character never exits the ladder, it keeps climbing, and never throws the event again. It just climbs infinitely, never reaching another point where it'd send another event to stop climbing. The conversation handler ate the ladder handler's lunch.
For important one-off events we therefore also whenever possible implement polling. The event is used 99.99% of the time to make processes real-time, with a polling solution (usually once per hour or so) to catch glitches and automatically fix them. Twice the work, but simultaneously more robust and more effective.
Debugging is nothing simple at all. Anytime someone says "lazy devs" when talking about bugs, I want to punch my computer monitor.
People don't expect games to ship bug free, especially huge ones, don't remember seeing any complaints about the state OW shipped in. There are still other games that shouldn't have been shipped when they did, and the first thing most people blame is the publisher not giving QA enough time, rather than the devs.Every time someone says "Did they not have QA?", "How QA passed this?", I wanna strangle people
and the first thing most people blame is the publisher not giving QA enough time, rather than the devs.
Dunno, I see it a lot thrown(mostly stupidly) at other things like graphic, shitty animation, missing Pokemons, etc... But rarely at bugs, it's always QA and publishers that are blamed for that!
Dunno, I see it a lot thrown(mostly stupidly) at other things like graphic, shitty animation, missing Pokemons, etc... But rarely at bugs, it's always QA and publishers that are blamed for that!
I spend most of my time debugging software, I can only imagine what kind of hell is debugging such a massive game with many edge cases like this.
As someone with even just some basic programming experience, I empathize. Debugging is a fucking nightmare.several years ago, I wrote a GPU accelerated tilemap system using openGL shaders. I wanted the entire rendering process to happen outside of the CPU, on the GPU. The concept was a simple atlas system. I would have tiles that were 8x8 pixels big in a large tilemap in VRAM, then pass a second texture that was called a block map. The block map was a series of what looked like random pixels. Each pixel would be point sampled by the GPU iterating through, left to right, top to bottom. The red subchannel of the pixel would be a look up table value to draw the appropriate tile. So like, pixel 1 on the block map would might return a red color value of "1" meaning it would draw the first 8x8 tile in the tilemap. Pixel 2 might return a red color value of "3" which meant it would draw the third 8x8 tile in the tilemap. So far, so good. I would repeat this step - 8x8 tiles -> 64x64 blocks -> 256x256 chunks -> 1024x1024 level segments. This let me build extremely large levels while using just a tiny amount of VRAM, I could swap in and out level segments to create huge, seemingly endless levels with no breaks in between them. It worked fine on AMD and Nvidia GPUs, and seemingly most intel GPUs. However, one day, while testing on a random integrated intel GPU core, I started noticing black lines running through my maps. Now, this is sometimes a common error caused by floating point division precision where you hit a fringe area. The way the GPU reads pixels is through a process known as point sampling, you tell the GPU an area of a texture in memory to look at, but not in precise terms like "look at the pixel at location 1,1". The way sampling works on GPUs is that you feed in a ratio and tell it to look at the sample there. I.e. look at "1%" to the top, and "1%" down. This is commonly known as UV sampling. I thought the error might be a weird rounding error, but was stumped at why it'd only present on a single type of integrated graphics core, and only intel.
MONTHS of testing to fix this problem. I consulted many, many graphics engineers, like people from blizzard, valve, etc. People were honestly stumped on it. Eventually, I stumbled upon the solution -- it's a very, very weird bug involving a very, very specific driver for a not-common intel graphics core, where point sampling is off by a small percentage. The first thing everyone suggested was "aim for the center of the pixel" which is what I'd do. I'd sample at not whole number increments, i.e. "not looking at the pixel at (1,1)" but rather "(0.5, 0.5)" but it wouldn't fix the problem. Normally when hunting down bugs like this that occur on the CPU, you can use all sorts of debugging tools like gdb to stop execution and look at memory. But GPUs don't work like that, they are basically separate machines running side by side with your computer. You, the programmer, control the CPU, and send commands to the GPU, but the GPU is it's own box. Now, there ARE tools which let you debug the GPU, like renderdoc, but this specific intel driver would crash renderdoc, so it was useless. The only way I could debug the GPU was guerilla methods. When you have no debug tools available to you on the CPU side of things, you usually resort to placing "printf" statements throughout your code, a command which makes your program output text to a console. You don't have printf on your GPU. The best equivalent is drawing single pixels to the screen. So I would draw a pixel to the screen, whose color channels would represent values I'd like to test. If I wanted to see what I was sampling at a certain point in detail, I would have to relay that data as a series of pixels, that I would take a screenshot of, then open up in gimp, and read the colors using the color finder tool, to look at the numbers hidden inside.
Long story short, the sampling position of this driver was off by 0.25 in both directions. If I sampled at 0.5, 0.5, it would think I was sampling a 0.75, 0.75. This was juuuust enough in very weird situations to hit what is known as the diamond-exit rule. So, even though my math worked on other drivers, it wouldn't work right on that specific intel driver. And, while newer intel drivers weren't subject to this error, not all older intel integrated GPUs could run that newer driver.
So, after months of figuring out the bug, the solution was... it was unsolvable. The method I was using to draw maps in this way would have to have a custom shader made for that specific graphics core. I said fuck that, and instead wrote a software rasterizer from scratch and presented it as an option, because if a bug like that could exist on one driver, it could exist on multiple drivers and I might never catch them all. So, in case of extreme fringe errors, I provide a software fallback. After all my debugging and error finding, I still had to basically write an entirely separate graphics module to solve it.
This was for a "simple" 2D game.
Debugging is nothing simple at all. Anytime someone says "lazy devs" when talking about bugs, I want to punch my computer monitor.