ratrosaw Administrator Oct 23, 2017 132 Jun 5, 2018 #1 Hi! #tech team here. Many members have inquired about the two recent service interruptions we experienced after last Thursday's server upgrade. With ResetEra's first E3 coming in less than two weeks, we wanted to detail our upgrade plan a little bit more and explain the reasons behind the technical difficulties. Hopefully this will address the common concerns. About the Server Upgrade We are always looking for ways to improve ResetEra's performance and reliability, with a focus on dealing with traffic fluctuations during major events. So far we have completed two major architectural upgrades to ResetEra, migrating the site from one single server node to a multi-machine cluster. The architecture, very loosely speaking, works as follows: As you can see from the diagram, ResetEra consists of two parts, an app server part, for serving all the requests from browsers, and a database part, for storing the data. Our February upgrade was designed to make the database backend automatically scalable, while the update last week grants the app server part the ability to automatically scale itself. More specifically, the app server cluster is now capable of monitoring its own usage, and will allocate/deallocate resources automatically in response to fluctuations in traffic. The allocation/deallocation process happens behind the scenes and will not interrupt your browsing experience at all, which we believe is a perfect solution for our E3 challenge. That all sounds great, so why did the service interruptions happen? Scalable though it is, the database server cluster, due to some technical restrictions, cannot resize itself in the same way that the app server cluster does. It is true that we can add/remove resources to/from the database backend at any time; however, every time we add resources to the database backend, it takes some time to replicate the data. As a result, we still need to carefully plan how many resources we would like to allocate, and prepare accordingly beforehand. Unfortunately (or fortunately, depending on how you look at it), after the migration last week, we underestimated ResetEra's growth in the past 7 months and the scale of pre-E3 traffic. During peak hours, the app server cluster scaled as expected, yet the database backend got overloaded. Failing to get a response from the database backend, the app server cluster became unresponsive as well. Our auto-healing system then kicked in and tried to solve the problem by rebooting itself, hence the on-and-off behavior many members experienced. We apologize for the oversight. It took some time for us to adjust to the new architecture and we have learned a lot from the two incidents. More resources have been allocated to the database backend and we have updated our provisioning plan for E3. Although it is impossible to promise that ResetEra will never go down at any point during E3, we do want to assure you that we are prepared. Sincerely, #tech P.S. We are aware that some members are experiencing random logout issues. A fix for this will be applied today. Before that fix comes online, you can avoid the problem by making sure that you are visiting www.resetera.com instead of resetera.com.