2015: A Year in Review

2015 has been a whirlwind of a year, which started off in a new city, with a new job as the Tech Lead of  Observability at Twitter.  The year was full of travel spanning 10 states, 3 different countries, and 2 continents.  This year I also had numerous opportunities to share my experiences with programming and distributed systems via, talks, blog posts, podcasts, and articles.  Below is the recap.


Interviews & Podcasts

Programming Committees


Orleans: A Framework for Cloud Computing

Presented at Papers We Love SF: Video & Slides [February 19th 2015]


Caitie McCaffrey stops by and talks about the Orleans: Distributed Virtual Actors for Programmability and Scalability paper by Bernstein, Bykov, Geller, Kliot, and Thelin.

Orleans is a runtime and programming model for building scalable distributed systems, based on the actor model.  The Orleans programming model introduces the abstraction of Virtual Actors.  Orleans allows applications to obtain high performance, reliability, and scalability.  This technology was developed by the eXtreme Computing Group at Microsoft Research and was a core component of the Azure Services that supported that powered Halo 4, the award winning video game.

Distributed Systems Track at Goto Chicago: Neha Narula, Caitie McCaffrey, Chris Meiklejohn, Kyle Kingsbury

Building the Halo 4 Services with Orleans

Presented at Qcon London: Video & Slides [March 5th 2015]


Halo 4 is a first-person shooter on the Xbox 360, with fast-paced, competitive gameplay. To complement the code on disc, a set of services were developed to store player statistics, display player presence information, deliver daily challenges, modify playlists, catch cheaters and more. As of June 2013 Halo 4 had 11.6 million players, who played 1.5 billion games, logging 270 million hours of gameplay.

Orleans, Distributed Virtual Actors for Programmability & Scalability, is an actor framework & runtime for building high scale distributed systems. It came from the eXtreme computing group in Microsoft Research, and is now Open Source on Github.

For Halo 4, 343 Industries built and deployed a new set of services built from the ground up to support high demand, low latency, and high availability using using Orleans and running in Window Azure. This talk will do an overview of Orleans, the challenges faced when building the Halo 4 services, and why the Actor Model and Orleans in particular were utilized to solve these problems.

Architecting & Launching the Halo 4 Services

Presented as the Closing Keynote of SRECon15: VideoSlides [March 17th 2015]


The Halo 4 services were built from the ground up to support high demand, low latency, and high availability.  In addition, video games have unique load patterns where the majority of the traffic and sales occurs within the first few weeks after launch, making this a critical time period for the game and supporting services. Halo 4 went from 0 to 1 million users on day 1, and 4 million users within the first week.

This talk will discuss the architectural challenges faced when building these services and how they were solved using Windows Azure and Project Orleans. In addition, we’ll discuss the path to production, some of the difficulties faced, and the tooling and practices that made the launch successful.

On stage during Strange Loop 2015 at the Peabody Opera House

The Saga Pattern

Presented at Craft Conf 2015 & Goto: Chicago 2015 Video & Slides [April 23rd 2015 & May 12th 2015]


As we build larger more complex applications and solutions that need to do collaborative processing the traditional ACID transaction model using coordinated 2-phase commit is often no longer suitable. More frequently we have long lived transactions or must act upon resources distributed across various locations and trust boundaries. The Saga Pattern is a useful model for long lived activities and distributed transactions without coordination.

Sagas split work into a set of transactions whose effects can be reversed even after the work has been performed or committed. If a failure occurs compensating transactions are performed to rollback the work. So at its core the Saga is a failure Management Pattern, making it particularly applicable to distributed systems.

In this talk, I’ll discuss the fundamentals of the Saga Pattern, and how it can be applied to your systems. In addition we’ll discuss how the Halo 4 Services successfully made use of the Saga Pattern when processing game statistics, and how we implemented it in production.

Scaling Stateful Services

Presented at StrangeLoop 2015 Video & Slides [September 25th 2015]

This talk was incredibly well received, and I was flattered to see write-ups of it featured in High Scalability and InfoQ


The Stateless Service design principle has become ubiquitous in the tech industry for creating horizontally scalable services. However our applications do have state, we just have moved all of it to caches and databases. Today as applications are becoming more data intensive and request latencies are expected to be incredibly low, we’d like the benefits of stateful services, like data locality and sticky consistency. In this talk I will address the benefits of stateful services, how to build them so that they scale, and discuss projects from Halo and Twitter of highly distributed and scalable services that implement these techniques successfully.

Ines Sombra & Caitie McCaffrey’s Evening Keynote  at QconSF

On the Order of Billions

Presented at Twitter Flight: Video & Slides [October 21st 2015]


Every minute Twitter’s Observability stack processes 2+ billion metrics in order to provide Visibility into Twitter’s distributed microservices architecture. This talk will focus on some of the challenges associated with building and running this large scale distributed system. We will also focus on lessons learned and how to build services that scale that are applicable for services of any size.

So We Hear You Like Papers

Presented as the Evening Keynote at QconSF with Ines Sombra: Video, Slides, Resources, & Moment [November 16th 2015]


Surprisingly enough academic papers can be interesting and very relevant to the work we do as computer science practitioners. Papers come in many kinds/ areas of focus and sometimes finding the right one can be difficult. But when you do, it can radically change your perspective and introduce you to new ideas.

Distributed Systems has been an active area of research since the 1960s, and many of the problems we face today in our industry have already had solutions proposed, and have inspired new research. Join us for a guided tour of papers from past and present research that have reshaped the way we think about building large scale distributed systems.


Comments are closed.