{"id":34,"date":"2024-02-13T14:03:23","date_gmt":"2024-02-13T11:03:23","guid":{"rendered":"http:\/\/caitiem.com\/?p=34"},"modified":"2024-02-13T15:28:56","modified_gmt":"2024-02-13T12:28:56","slug":"clients-are-jerks-aka-how-halo-4-dosed-the-services-at-launch-how-we-survived","status":"publish","type":"post","link":"http:\/\/www.caitiem.com\/2015\/06\/23\/clients-are-jerks-aka-how-halo-4-dosed-the-services-at-launch-how-we-survived\/","title":{"rendered":"Clients are Jerks: aka How Halo 4 DoSed the Services at Launch & How We\u00a0Survived"},"content":{"rendered":"\n
At 3am PST November 5th 2012 I sat fidgeting at my desk at 343 Industries watching graphs of metrics stream across my machine, Halo 4 was officially live in New Zealand, and the number of concurrent users began to gradually increase as midnight gamers came online and began to play. \u00a0Two hours later at 5am Australia came online and we saw another noticeable spike in concurrent users.<\/p>\n\n\n\n
With AAA video games, especially multiplayer games, week one is when you see the most concurrent users. Like Blockbuster movies, large marketing campaigns, trade shows, worldwide release dates, and press all converge to create excitement around launch. Everyone wants to see the movie or play the game with their friends the first week it is out. The energy around a game launch is intoxicating. However, running the services powering that game is terrifying. There is nothing like production data, and we were about to get a lot of it over the next few days. To be precise Halo 4 saw 4 million unique users in the first week who racked up 31.4 million hours of gameplay.<\/p>\n\n\n\n
At midnight on November 6th PST I stood in a parking lot outside of a Microsoft Store in Seattle surrounded by 343i team members and fans who came out to celebrate the launch with us and get the game at midnight PST. I checked in with the on call team, Europe and the East Coast of the US had also come online smoothly. In addition the real time Cheating & Banning system I wrote in a month and half before launch had already caught and banned 3 players who had modded their Xbox in the first few hours, I was beyond thrilled. Everything was going according to plan so after a few celebratory beers, I headed back into the office to take over the graveyard shift and continue monitoring the services. The next 48 hours were critical and likely when we would be seeing our peak traffic.<\/p>\n\n\n\n
As the East Coast of the United States started playing Halo after work on launch day we hit higher and higher numbers of concurrent users. Suddenly one of our APIs related to Cheating & Banning was hitting an abnormally high failure rate, and starting to affect other parts of the Statistics Service. As the owner of the Halo 4 Statistics Service and the Cheating & Banning Service I Ok\u2019d throwing the kill switch on the API and then began digging in.<\/p>\n\n\n\n
The game was essentially DoSing us. We were receiving 10x the number of expected requests to our service on this particular API, due to a bug in the client which reported suspicious activity for almost all online players. The increased number of requests caused us to blow through our IOPS limit in Azure Storage, which correctly throttled and rejected our exorbitant number of operations. This caused the request from the game to fail, and then the game would retry the request three times, creating a retry storm, only exacerbating the attack.<\/p>\n\n\n\n