2015: A Year in Review Posted on 13.02.202413.02.2024 By caiti335 2015 has been a whirlwind of a year, which started off in a new city, with a new job as the Tech Lead of Observability at Twitter. The year was full of travel spanning 10 states, 3 different countries, and 2 continents. This year I also had numerous opportunities to share my experiences with programming and distributed systems via, talks, blog posts, podcasts, and articles. Below is the recap. Articles Clients are Jerks: aka how Halo 4 DoSed the Services at Launch and How We Survived in Caitiem.com [June 23rd 2015 ] The Verification of a Distributed System in ACM Queue [Nov/Dec 2015] Interviews & Podcasts Caitie McCaffrey on Scaling Halo 4 Services, the Orleans Actor Framework, Distributed Programming on InfoQ [April 24th 2015] Taming Distributed Architectures with Caitie McCaffrey on Software Engineering Daily [September 10th 2015] Programming Committees Taming Distributed Architectures Track for QconSF [Nov 18th 2015] Talks Orleans: A Framework for Cloud Computing Presented at Papers We Love SF: Video & Slides [February 19th 2015] Abstract Caitie McCaffrey stops by and talks about the Orleans: Distributed Virtual Actors for Programmability and Scalability paper by Bernstein, Bykov, Geller, Kliot, and Thelin. Orleans is a runtime and programming model for building scalable distributed systems, based on the actor model. The Orleans programming model introduces the abstraction of Virtual Actors. Orleans allows applications to obtain high performance, reliability, and scalability. This technology was developed by the eXtreme Computing Group at Microsoft Research and was a core component of the Azure Services that supported that powered Halo 4, the award winning video game. Building the Halo 4 Services with Orleans Abstract Halo 4 is a first-person shooter on the Xbox 360, with fast-paced, competitive gameplay. To complement the code on disc, a set of services were developed to store player statistics, display player presence information, deliver daily challenges, modify playlists, catch cheaters and more. As of June 2013 Halo 4 had 11.6 million players, who played 1.5 billion games, logging 270 million hours of gameplay. Orleans, Distributed Virtual Actors for Programmability & Scalability, is an actor framework & runtime for building high scale distributed systems. It came from the eXtreme computing group in Microsoft Research, and is now Open Source on Github. For Halo 4, 343 Industries built and deployed a new set of services built from the ground up to support high demand, low latency, and high availability using using Orleans and running in Window Azure. This talk will do an overview of Orleans, the challenges faced when building the Halo 4 services, and why the Actor Model and Orleans in particular were utilized to solve these problems. Architecting & Launching the Halo 4 Services Presented as the Closing Keynote of SRECon15: Video & Slides [March 17th 2015] Abstract The Halo 4 services were built from the ground up to support high demand, low latency, and high availability. In addition, video games have unique load patterns where the majority of the traffic and sales occurs within the first few weeks after launch, making this a critical time period for the game and supporting services. Halo 4 went from 0 to 1 million users on day 1, and 4 million users within the first week. This talk will discuss the architectural challenges faced when building these services and how they were solved using Windows Azure and Project Orleans. In addition, we’ll discuss the path to production, some of the difficulties faced, and the tooling and practices that made the launch successful. The Saga Pattern Presented at Craft Conf 2015 & Goto: Chicago 2015 Video & Slides [April 23rd 2015 & May 12th 2015] Abstract As we build larger more complex applications and solutions that need to do collaborative processing the traditional ACID transaction model using coordinated 2-phase commit is often no longer suitable. More frequently we have long lived transactions or must act upon resources distributed across various locations and trust boundaries. The Saga Pattern is a useful model for long lived activities and distributed transactions without coordination. Sagas split work into a set of transactions whose effects can be reversed even after the work has been performed or committed. If a failure occurs compensating transactions are performed to rollback the work. So at its core the Saga is a failure Management Pattern, making it particularly applicable to distributed systems. In this talk, I’ll discuss the fundamentals of the Saga Pattern, and how it can be applied to your systems. In addition we’ll discuss how the Halo 4 Services successfully made use of the Saga Pattern when processing game statistics, and how we implemented it in production. Scaling Stateful Services Presented at StrangeLoop 2015 Video & Slides [September 25th 2015] This talk was incredibly well received, and I was flattered to see write-ups of it featured in High Scalability and InfoQ Abstract The Stateless Service design principle has become ubiquitous in the tech industry for creating horizontally scalable services. However our applications do have state, we just have moved all of it to caches and databases. Today as applications are becoming more data intensive and request latencies are expected to be incredibly low, we’d like the benefits of stateful services, like data locality and sticky consistency. In this talk I will address the benefits of stateful services, how to build them so that they scale, and discuss projects from Halo and Twitter of highly distributed and scalable services that implement these techniques successfully.d On the Order of Billions Abstract Every minute Twitter’s Observability stack processes 2+ billion metrics in order to provide Visibility into Twitter’s distributed microservices architecture. This talk will focus on some of the challenges associated with building and running this large scale distributed system. We will also focus on lessons learned and how to build services that scale that are applicable for services of any size. So We Hear You Like Papers Presented as the Evening Keynote at QconSF with Ines Sombra: Video, Slides, Resources, & Moment [November 16th 2015] Abstract Surprisingly enough academic papers can be interesting and very relevant to the work we do as computer science practitioners. Papers come in many kinds/ areas of focus and sometimes finding the right one can be difficult. But when you do, it can radically change your perspective and introduce you to new ideas. Distributed Systems has been an active area of research since the 1960s, and many of the problems we face today in our industry have already had solutions proposed, and have inspired new research. Join us for a guided tour of papers from past and present research that have reshaped the way we think about building large scale distributed systems. Tech Insights
Origin Story: Becoming a Game Developer Posted on 13.02.202413.02.2024 Over the past few weeks I have been asked over a dozen times how I got into the Games Industry, so I thought I would write it down. TLDR; My first Console was a SNES. I learned to program in High School. I attended Cornell University and got a B.S…. Read More
Recommended Engineering Management Books Posted on 13.02.202413.02.2024 Over the past 3.5 years my career has grown and transformed from Individual Contributor (IC) to an Engineering Manager of multiple teams, and all the roles in between as I built the Azure Sphere Security Services (AS3) Team from 2 people to 20 people. I undertook this journey in the… Read More
2017 a Year in Review Posted on 13.02.202413.02.2024 2017 was a year of change, personal and professional. I started the year in San Francisco, working at Twitter as an Individual Contributor, and in a long term relationship. I ended the year in Seattle, working at Microsoft Research as a Lead, sans that long term relationship, and a brand… Read More