2020 a Year in Review

Even writing this feels weird, but so many of the year’s rituals have been up-ended or discarded, that it seemed necessary to continue with one that I can easily do on my own, so here we go. 2020 presented a lot of challenges, however I was incredibly fortunate and privileged to have been able to work from home this whole year, stay in touch with family and friends via Zoom, Teams, and FaceTime, and escape into the outdoors often to recharge.


The little Microsoft Research (MSR) team I joined in the summer of 2017 continues to grow and hit major milestones.  On February 24th 2020 Azure Sphere became a Generally Available Product. 

I also appeared on the Azure Friday show talking about End-to-end Iot device security with Azure Sphere and the Internet of Things Show to give an Introduction to the Azure Sphere Security Service.  Both of these were recorded pre-covid hence the lack of social distancing and masks.

My team, the Azure Sphere Security Services Team, continued to grow hiring 8 new developers and 2 new PMs.  I couldn’t be more proud of this team for its resilience, collaboration, and willingness to tackle challenges while caring for each other. 


I wrote  two blog posts in 2020.

I also wrote two essays for the book 97 Things Every SRE Should Know

  • On-Call Health: The Metric You Could Be Measuring
  • Helping Leaders Prioritize On-Call Health


In February I spent a long weekend on Oahu, HI playing ultimate frisbee with my women’s master team Mint at the Kaimana Klassik.  The sun, sand and surf were a warm welcome from gray rainy Seattle.  We played on fields lined with palm trees, and flanked by mountains in the distance.  There was a considerable amount of wind which made moving the disc on offense a struggle at times, but overall spirits were high.  So high that we won the tournament’s team Spirit Award, a highly coveted prize!

I fondly remember one of the evenings sitting around playing cards, and laughing so hard my stomach hurt and cheeks ached from smiling.  The sport of ultimate frisbee  and its community continues to bring me an immense amount of joy. Even during Covid, when we could not play on the fields together, we found ways to support each other remotely, and took up a new form of virtual competition via weekly Zoom Trivia. Team ConfineMINT has had several strong showings this year.

Covid kept me grounded and confined to Washington state for the vast majority of the year, as all travel got indefinitely postponed.  Thankfully Washington state has so much beauty and variety to offer, that once I got over my initial angst at having to cancel highly anticipated trips, I felt an immense amount of gratitude to live in such a stunning locale.

I relied heavily on escapes into nature this summer to reset and recharge from the constantly online life I now was living.  From July through November I was perpetually escaping into the backcountry, hiking over a 150 miles, and many of those carrying a 30+ pound pack.  We hiked to alpine lakes, watched sunsets at fire lookouts and camped on top of ridgelines.  The days and nights I spent in the back country this year were some of the best parts of 2020. 

I started a public Instagram account IntrepidTechie to practice and share my photography and post pictures from my explorations, feel free to follow along. 


I read 36 books this year.  As I often did as a teenager I escaped into books as a way to explore, while stuck in a particular spot.  Some favorites by genre include

  • Memoir – Thirst: 2600 Miles to Home: Tells the story of Heather Anderson’s Fastest Known Time (FKT) record attempt on the Pacific Crest Trail.  Anderson is a masterful story teller, and this was a gripping account of her attempt, I did not know the out come when I picked up the book, and found myself rooting and cheering for her as the book progressed, and feeling waves of elation when she reached the northern terminus. 
  • Fiction – Disappearing Earth: Julia Phillips weaves a tail of two girls in north eastern Russia who go missing.  The story is told through a series of chapters each, each told from a different perspective and exploring a different set of characters on the Siberian peninsula.  Each chapter is a masterful short story in and of itself with remote and rugged vistas and nuanced characters.  Phillips deftly links all the stories together in the final chapter in a surprising satisfying conclusion. 
  • Non Fiction – The End of Everything (Astrophysically Speaking) – Katie Mack is an astro physicist and describes the leading theories on how the universe will end.  This book is exceptionally well written, and Mack clearly and passionately describes the physics, in simplified terms, to take the reader on a journey from big bang to universe destruction.  I found myself giddy reading this book, remembering my summer working at Stanford’s Linear Accelerator Center as an intern.  There I heard about quantum physics and dark matter for the first time, and how it broke and stretched my brain simultaneous.  Mack’s book brought back all the excitement, wonder and awe I felt that summer, along with new knowledge along with feelings of amazement at the workings and mystery of the universe around us

What’s Next

As 2020 comes to a close and 2021 begins, a lot will still not be back to normal, so its difficult to say what’s next. I am eagerly anticipating the return of ultimate frisbee hopefully at some point in 2021.  I am also looking forward to next summer and backpacking as my list of hikes and wilderness explorations continues to grow.

Upon reflection, the opportunity of 2020 was a lesson in resilience.  I learned to continue, to change, to move forward. To practice not forming attachments, and living in the present, even as we desire so feverishly to get back to that fictional normal state.  It gave me the opportunity to practice gratitude even when it was challenging.  There were plenty of opportunities to feel the angst, the despair, the discomfort acknowledge it, feel it fully, and then figure out how to move past it.  With all the challenges it presented it gave me a chance to take stock, to think, and reflect on what’s important. 

So with all that said I know that 2021 will come with its own set of challenges and opportunities to explore and experience.

Recommended Engineering Management Books

Over the past 3.5 years my career has grown and transformed from Individual Contributor (IC) to an Engineering Manager of multiple teams, and all the roles in between as I built the Azure Sphere Security Services (AS3) Team from 2 people to 20 people.  I undertook this journey in the Summer of 2017 to help transform a Microsoft Research project, Project Sopris, into a Generally Available (GA) product Azure Sphere

As the AS3 Team grew, I went through a massive amount of personal growth and learning as well.  In December 2017 I stepped into a manager role for the first time in my career.  I had been a professional software engineer for over a decade at this point, and was up for a totally new challenge.  I knew that the skills and job of growing and managing a team and then an organization were totally different than what I had been actively developing over the last decade of my career.  As I went through this period of growth I was lucky enough to have several friends, coaches, and mentors share their experiences with me, and recommend some great books to help me along my learning journey. 

Below is my curated list of the most influential and impactful books that helped me along the way, and that I highly recommend to Engineering Managers 

The Manager’s Path: A Guide for Tech Leaders Navigating Growth & Change by Camille Fournier

This is a must read for anyone considering managing engineering teams, it is as good as or better than the hype surrounding it.  The book starts off from the individual contributor perspective and then each subsequent chapter explores the next level of management complexity.  Each chapter focuses on an engineering management role, like technical lead, manager of people, or manager of multiple teams, and manager of managers.

Over the past 3.5 years I’ve come back to this book several times, re-reading the chapters around my newest role.  For instance when I transitioned into a manager of managers role, I re-read that chapter and the one before and after it.  These were great reminders on what I should be focused on, what challenges my directs were facing, and what challenges and motivations my boss was focused on. 

One of the sections I really loved was on debugging dysfunctional teams.  As an engineer I am a great debugger, often able to piece together logs, metrics, and weird behavior to diagnose what’s going wrong in a system.  This was the first time I had seen the term debugging applied to people and organizations, and it helped frame the work of how to start investigating dysfunction in a team, and how to discover the source of the problem. 

Honestly I recommend this book to every engineer, solely for the chapter on how to be managed, and what you should and can expect from your manager.  Even if you never plan on taking on a management role this book provides an excellent overview of the challenges and motivations of folks in varying levels of an engineering organization and will help you better navigate your org. 

Thanks for the Feedback by Douglas Stone & Sheila Heen

Prior to managing I had given peer feedback as part of performance reviews at various companies, but I did not consider this a strength of mine.  Now giving feedback was a critical skill for my role as a manager.  Early on I sought out several resources on how to give and receive feedback well, and found “Thanks for the Feedback” to be an tremendous resource.

Thanks for the Feedback is framed as a resource for receiving and processing the multitudes of feedback you receive.  However, any one who reads this book, will also learn how to be a more skilled feedback giver. 

I highly recommend reading the book in its entirety, but I wanted to share one of the fundamental ideas in this book that resonated with me.  Feedback is really three different things, with three different purposes:

  • Appreciation,  the goal is to help people feel appreciated.  Sometimes they may just need motivation and encouragement to keep going when tackling tough problems
  • Coaching, the goal is to help the receiver expand their knowledge, sharpen skills, and increase confidence. 
  • Evaluation, the goal here is to rate or rank the receiver’s work against a set of standards to align expectations and inform decision making. 

This idea that feedback serves a variety of different purposes was eye opening, and help me more clearly think about how and when to provide feedback.  Taking this perspective along with the multitude of lessons learned from this book was tremendously helpful to me.

The Hard Thing About Hard Things: Building a Business When There are No Easy Answers by Ben Horowitz

The book starts off acknowledging a problem with most management books “they attempt to provide a recipe for challenges that have no recipes.  There’s no recipe for really complicated, dynamic situations.”  This instantly resonated with me as I was reading it in November 2019.  I had been a manager for about two years, and had grown the AS3 team from 2 to 15 people.  We’d spent the past two years turning a research project into what would soon be a GA’d product in February 2020.  There is no recipe for how to do this, what I was doing was a hard thing. 

Horowitz packs the book with a variety of entertaining stories from his career as a venture capitalist and entrepreneur to emphasize his message and points on leadership and various challenges.  In addition there are a few key takeaways that have stayed with me.

  • Take care of the people, the product, and the profits in that order.  This quote resonated with me and the kind of organization I want to build and run.  Chapter 5, which bears this quote as a title goes into some of the challenges that can arise when setting out with this mission and how to overcome them, like hiring, training, and management debt. 
  • Part of Management is training your people. In big companies this is often neglected as it’s the role of HR or corporate to produce training programs.  These can sometimes be valuable but are often overly generic.  As a manager make sure you are training your people for the specific job they are doing.  Things like how to work in your specific code base, what are the architectural standards and guidelines, what makes a good design doc, etc…Make sure you are intentionally designing on boarding programs, meetings, and workshops to facilitate learning.

Accelerate: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, PhD, Jez Humble, and Gene Kim

Accelerate is a summary of the research and learnings discovered by Dr Forsgren and her colleagues from the 2014-2017 State of DevOps report.  This book is a must read for any engineering leader, as it gives you a clear outline on how to set your team up for success by investing in 24 key capabilities which will drive improvement in your teams software delivery performance. 

There are some non-surprising findings like using version control is important to team performance, and other not so obvious ones like focusing on continuous integration and software delivery performance positively impacts culture.

This book and the ongoing research released by Dr. Forsgren, et. Al has had a huge influence on the priorities of my team.  We embraced Accelerate principles early on in the development of the Azure Sphere Security Services, and that investment has paid huge dividends over time. 

Dare to Lead: Brave Work.  Tough Conversations.  Whole Hearts. by Brene Brown

I’m a huge fan of Brene Brown’s work as a shame and vulnerability researcher.  In October 2018 she released Dare to Lead applying her research to Leadership.  This quote sums up why this book is important

“There’s an old saying that I lead by now: “People don’t care how much you know until they know how much you care.” Brene Brown

This book dives into what it takes to become a brave and courageous leader.  Spoiler a lot of it involves embracing vulnerability and living authentically, something that is far easier to say than do.  The book defines vulnerability as “the emotion we experience during times of uncertainty, risk, and emotional exposure…Vulnerability is not winning or losing.  It’s having the courage to show up when you can’t control the outcome.”

One section I continually return to is the definitions of the two leadership styles: armored leadership and daring leadership.  Brown’s research uncovered 16 traits for each of these leadership styles and breaks each one of them down.  I highly recommend going through these periodically and checking in on your own leadership style and where there is room for growth and improvement. 

Another great exercise that Brown presents in the book is one around values.  She challenges readers to pick one to two core values, not ten or fifteen, because as Jim Collins said “If you have more than three priorities, you have no priorities.”  Having clarity around your core values allows you to live and lead more authentically. 

Overall this book is one of the best for dealing with the “soft skills” of leadership.  How do you show up whole heartedly and help create a team that can do the same is one of the biggest challenges of taking on a management role. 

Switch: How to Change Things When Change is Hard

This book is an awesome resource for understanding the psychology and sociology behind what motivates people and how to be successful enacting change from small to epic scale. I loved this book as it provides real practical tips for how to enact change across a team or organization through researched principles and real world examples. 

The book starts by acknowledging that people have both emotional and rational sides, and motivating each is important for effective change.  In engineering orgs we often only motivate the rationale side, because the analytical brain is all important.  Sometimes we fall into the trap of “motivating” people with data, reason, and education and then wonder why these initiatives fail.  But here’s the thing engineers are human beings too and we are all motivated by and susceptible to decisions made by our emotional brain.
The authors lay out three keys to behavior change:

  • Give clear direction to reduce mental paralysis. 
  • Find an emotional connection to motivate people
  • Shape the path.  Make doing the right thing easy, by removing obstacles, modifying the environment, and building habits

Atomic Habits: An Easy & Proven Way to Build Good Habits by James Clear

This book is another favorite as it provides a lot of practical advice.  The premise of this book is that small improvements accumulate into remarkable results over time.  In addition building systems and processes is the most effective way to achieve your goals. 

Habits are simply a behavior that has been repeated so many times it becomes automatic and therefore require less time and energy.  Clear spends the book describing research around habits and how to successfully create new good ones and break old bad ones.  By leveraging habits we can create systems that more easily help us achieve our goals.

I love this book for two reasons, its generally applicable to many aspects of your life, and it helped me think about how to help my team establish and build good habits, or potentially break bad ones. 

Design Docs, Markdown, and Git

About a year ago my software engineering team, the Azure Sphere Security Services (AS3) team, found ourselves struggling with our design document process.  So we ran an experiment, moving all our design documents to be written in Markdown, checked into Git, and reviewed via a pull request (PR). The experiment has been incredibly successful, so we’ve iterated and refined it, and have even expanded it to the broader Azure Sphere team.  The goal of this blog post is to share our process and what we learned along the way.  

Our original design doc process involved writing a Microsoft Word document and sharing it via SharePoint.  Feedback was gathered via in person reviews, doc comments, and emails. Approval was then done over email. To signal that a document was the “approved plan of record” versus “an under review draft”, we toggled a property on the document.  Users could filter documents on the SharePoint by this property to disambiguate between the two states.  

This worked fine when we were a small team, with a small number of documents, but became challenging as the team grew.  For context the Azure Sphere team, started out as a handful of people working in Microsoft Research and has grown rapidly over the past 3 years as we’ve went from research project to Generally Available Product.


Some specific challenges were identified via the AS3 retrospective process.  When evaluating new options we kept these pain points in mind:

  • Comments: Understanding when comments were resolved and when forward progress on the document could be made was challenging. Tracking Comments in the Word Document, in person reviews, and across emails became cumbersome.  It also was unclear when comments were resolved and what the resolution was. Finally once a doc was approved, all the comments were hidden, and this often lost valuable context. 
  • Approval Process:  It was often unclear how to get approval or who had approved a document.  The Word Document did not contain a section on reviewers and approvers. As the team grew there was some ambiguity on how the Approval process worked.  In addition many approved design documents did not have the property set to approved, this resulted in the SharePoint becoming a mixture of approved, under review, and abandoned documents and it was unclear which state they were in.  
  • Context Switching:  For the Individual Contributor (IC) engineers switching contexts and using a different tool chain, Word & SharePoint, was a barrier to writing and reviewing design docs.  It added just enough friction, that design docs felt like a bigger challenge than they needed to.
  • Versioning:  Word and SharePoint do not provide a way to easily version a document.  As the team and the product grew, there was a need to update and version documents.

The Experiment

To address some of these challenges the AS3 team began writing design documents in Markdown and checking them into a new EngineeringDocs Git repo in Azure DevOps (ADO).  Reviews are conducted via pull requests by adding comments, pushing changes, and then resolving comments. Approval was given by signing off on a pull request, anything in master is considered the approved plan of record.  Versioning was also greatly simplified as anyone could submit a pull request to update the document.  

Single Repo vs Next to Code

One of the first early decisions we made was where in the codebase design documents should live.  We discussed two options

  • Single Repo: Create a new Engineering Docs Repo where all design documents would be checked in.
  • Same Repo as Code: Design Docs should be checked into the repo for the code they implement.  

We chose to use a Single Repo for several reasons:

  • Discoverability:  a downside of having the design docs living next to the code is finding all the existing design docs became challenging.  With a quickly growing team we wanted to prioritize discoverability of design decisions to help onboard new team members quickly. 
  • Large Designs: For designs that didn’t neatly map to a single service or library it was ambiguous where these design docs should live.  For example where do design docs that span multiple microservices live?  
  • Unconstrained Design:  If an author first had to pick the code repository where the design doc would live, this initial choice would artificially constrain the design, to only consider changes to that part of the code base.  By having the design docs checked into a single docs repository this artificial constraint was eliminated from the early design process. This frees up the design document author to think about the best way to implement their feature across the system as a whole.  

Conducting a Design Review

The Azure Sphere team uses the OARP model for making decisions, so the below section describes approval and stakeholders in this context.  I recommend having a well defined decision making process and integrating whatever that is for your team into the design document process.  

Identify Reviewers and Approvers via a Pull Request

The first step in our Design process is identifying the stakeholders.  The first pull request includes the title of the Design Doc and a table listing the OARP assignments for this document.  The pull request author is always the Owner.    

This serves a few purposes:  

  • It ensures how the decision is being made is clear.  
  • Informs the team that this is a problem we are attempting to solve now. 
  • Gives people the chance to opt out.  If you are listed as an Approver or a Reviewer, and would like to delegate your responsibility or opt out of the process you can.  
  • Gives people the chance to opt in.  By notifying the broader team via pull request, teammates not initially listed in the OARP model can request to be included in the design process if they are interested, or feel they have expertise to add.  This prevents people from feeling excluded or coming in late to the review process with a lot of additional feedback.

Once the stakeholders are all identified, the Approver approves the pull requests, and the Owner checks in the pull request.  

Writing the Design Document

To author the design document the owner creates a new branch modifying the checked in shell document.  It is highly recommended that input from Reviewers and Approvers is informally gathered prior to writing the document.  This can be via white board session, chats, hallway conversations, etc… This ensures that the design review process is more collaborative, and there are few surprises during the formal review process.   

Design docs are written in Markdown.  Architectural diagrams are added to the design doc by checking in images or using Mermaid.  The AS3 team often generates architectural images using Microsoft Visio.  It is highly recommended that these Visio diagrams are checked in as well for ease in modifying later.  

Once the design doc is ready for review, the engineer submits a new pull request.  All members of the OARP model are listed as reviewers on the pull request.  

Design Pull Request

Once the pull request has been submitted, design review stakeholders can read and submit feedback via comments on the pull request.  All comments must be addressed and marked as either resolved via document updates or won’t fix.  

The document can be committed to master once the Approver has approved the pull request.  This design is now considered a plan of record.

Design Review Meeting

Design review meetings are not required but often held.  A meeting invite is sent out ahead of time. Owners, Approvers and Reviewers are considered Required attendees, Participants are considered optional.  

The meeting invite should be updated with a link to the pull request for the design doc to be reviewed, at least one business day prior to the meeting.  The first 10-15 minutes of the meeting are set aside for folks to read the document and add comments to the pull request if they have not done so already.  In either scenario feedback is added via comments on the pull request.  

We provide two ways for folks to review the document, ahead of time or in the meeting to accommodate multiple working styles on the team.  So folks prefer to digest and think about a design document for a while before providing feedback, others are more comfortable providing feedback on the spot.   

After the reading period the design review meeting spends time focusing and discussing the comments.  The owner takes notes and records the in room decisions in the pull request comments.  

Updating the Design

Throughout the course of the project design docs may need to be updated.  This can happen after design if a major change was made in implementation, or could be later in the life of the project as a new feature or requirement requires a modification.  

Updating the design doc, follows a very similar process.  A pull request with proposed changes are submitted. The original Owner and Approver should be considered required reviewers.


The AS3 team considers the experiment incredibly successful so much so that the broader Azure Sphere team has begun adopting it, including the Program Managers.  

To summarize all the challenges we experienced with Word Documents and SharePoint were addressed by using Git and Markdown.  

  • Comments: Adding comments via the pull request makes it really clear which ones have been addressed, and which ones are still open.  In addition the resolution of the comment is clear, either Resolved or Won’t Fix. The comments and discussion in them are also not lost once the pull request is submitted, as you can always go back and look at the history of the document. 
  • Approval Process: By having an initial pull request identifying the stakeholders, how we are making the decision and who is involved is incredibly clear.  In addition the participants are durably recorded, as is the act of signing off, approving the pull request.  
  • Context Switching:  IC engineers no longer have to switch tool chains to participate in the design process.  What reviews code or design they need to do are easy to find and discover.  
  • Versioning:  Versioning documents via Git and pull requests is incredibly easy.  So much so that engineers often go and make updates to the document once they finish implementing the feature.  In addition, having the history of the document has been incredibly valuable.  

By utilizing a toolchain that developers already use day to day the process feels more lightweight, and writing design documents feels like a less arduous process.  The Program Management team has also been incredibly receptive to using Markdown and Git. While these are new tools for some of them, they’ve embraced our growth mindset culture and dove right in.  

One of the biggest benefits I’ve observed is the clarity it has brought to how decisions are made, and durably recording when things are done.  On a fast growing team like Azure Sphere having clarity and durable communication are key to successfully scaling the business and the team.

Disclaimer: The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.

2019 A Year In Review

My last blog post talked about 2017 being a year of change.  2018 and 2019 have been ones of intense growth and discovery.  It’s felt like a whirlwind, so let’s catch up on the basics. 


I’m still living in and loving Seattle.  Moving back here proves over and over again to be the right decision for me.  I was recently asked if I could live anywhere in the world where would you live?  As participants went around the circle answering the ice breaker with exotic locales like, New Zealand, Italy, France, I pondered the question.  I’ve lived in 5 cities in the U.S. and traveled to dozens more domestically and internationally, but for me the answer is Seattle.  

Plus that house I bought, is amazing.  I’ve settled in, made it my own and created a mini indoor jungle in the living room full of tropical plants including my giant Calathea which just bloomed for the first time in two years.  The roof deck continues to be my favorite feature of the house. I’ve watched dozens of sunsets up there.  I often would rush home from frisbee practice, or being out with friends, climb the three stories to the roof just in time to catch the sky burst into pinks and oranges, staying up there til grays, blues and purples took over and I was ready for bed.  The roof deck has also been host to some incredibly fun parties this year, and I’m looking forward to hosting even more now that I’ve finally purchased a grill. 


I’m also still working for that little team I joined in Microsoft Research (MSR) back in June of 2017, however we are no longer in MSR and we are no longer so little.  In April of 2018 we announced the Azure Sphere project to the world at RSA, a major security industry conference.  Azure Sphere is an end to end solution for securing Internet of Things (IoT) devices.  It’s a vertically integrated solution including custom silicon, operating systems, and cloud services.  It is by far the most full stack project I’ve ever worked on. 

Since joining the team I’ve not only architected and implemented the cloud services, but stepped into a role leading the Azure Sphere Security Services team as well.  In the past 2.5 years I went from Individual Contributor (IC) to Dev Lead, to my current role as Dev Manager of multiple teams.  The team’s grown to include multiple disciplines including Developers, Project Managers, and Site Reliability Engineers.  I’ve had the great privilege of hiring everyone on the current Azure Sphere Security Services team with the exception of one developer who hired me :).  The services team is a kind, caring, collaborative, fearless, and inclusive bunch of people, and I love working with them. 

It has certainly been a wild ride so far on such a fast growing project and team.  We are currently in Public Preview, and have customers like Starbucks.  Azure Sphere MCUs are currently available from one silicon manufacturer MediaTek, and in 2019 we announced that Qualcomm and NXP will be making Azure Sphere certified chips as well.  We ship our services daily, and new features in our operating system quarterly.  We are moving fast, and currently working towards our General Availability date in February 2020!

Ultimate Frisbee

In 2018 I began playing with a Masters (30+) Women’s Ultimate Frisbee team in Seattle called Mint.  In 2019 I began my second season with the Minties.  The camaraderie and friendships I’ve experience on this team have been truly special.  We work hard, have fun, and push each other to be better.  We party like we are back in college together, chow down together, and run wind sprints together.  Growing up and pursuing Computer Science, I have frequently been surrounded by groups of men, often I’m the only woman in the room.  Playing with Mint has been the antidote for the gender imbalance in my life.  

Post Practice Socializing

Perhaps the most special moment occurred at the end of Nationals this year, played on an incredibly hot weekend at the end of July in Aurora Colorado.  We had just won our game to claim 9th place and hold seed.  It was a battle from behind the entire game.  At one point we were down 9-12.  But this team is gritty.  We went on a tear scoring a streak of 4 goals to win the game.  After the cheers, spirit circle, and dancing to Lizzo’s song Phone, we squeezed together sitting cross-legged under a shade tent for a round of popcorn.  Popcorn is an activity where each person picks a teammate and gives them a compliment.  This continues until everyone on the team has been praised.  Typically this takes 15 minutes, but this session went on for over an hour, as love and admiration for our teammates poured out.  A single compliment from a single person wasn’t sufficient.  Teammates plus one’d, snapped, and piled on espousing everything from killer lay out Ds and Hucks at the tournament, to loving someone’s laugh and kind spirit, to their mad dance skills.  By the end we were all nursing aching stomachs from laughing too hard and wiping tears away from our eyes.  My heart felt so full, I could not love this group of women more. 


Even though I love Seattle, I’ve continued to fuel my wanderlust resuming my international and domestic travels in 2019.

Nor Cal

I traveled back to San Francisco and Northern California twice this year.  In February I spent a long weekend practicing yoga, lounging in a cozy cabin, eating vegetarian food, and disconnecting from the internet at a Tibetan Budhist Retreat Center, Ratna Ling Lodge.  I’ve been practicing yoga on and off for a decade now, and I always find the time flowing through vinyasas incredibly centering, no matter how long I’ve been away from my mat. 

Scribe Winery

In October I spent another long weekend celebrating my 33rd birthday with friends from San Francisco and Seattle in Sonoma.  We spent the weekend wine tasting and lounging at the Carneros Resort and Spa.  It was fun to get the two groups together, and to catch up with my San Francisco people. 

Galapagos & Peru

In April I set off on a bigger adventure heading back to South America on an REI Adventures Trip.  I spent the first week island hoping in the Galapagos hiking, snorkeling, kayaking, and even driving a boat.  The Galapagos have been a dream destination of mine for a long time.  Biology and evolution have always fascinated me.  I even took Evolutionary Biology in college incurring the ire of several pre-med students as I messed up the curve on a course that was required for their major.  Traveling to the islands which inspired Darwin’s Theory of Evolution blew away all of my expectations.  Being in the Galapagos is like living inside a Nature Documentary.  Within an hour of arriving I saw a Darwin Finch perched atop a prickly pear cactus, and then in another moment, a sea lion, a sally crab, and a blue footed booby all perched on the same rock outcrop. 

A Darwin Finch perched on a Prickly Pair Cactus

I snorkeled with sea turtles, and sea lions.  The sea lions there are just playful water puppies with no fear of humans, and so we dove and swam circles around each other.  One even played chicken with me as we swam towards each other blowing bubbles than rolled away to swim to the side or under me at the last moment.

Isabella Island

On the second leg of my South America trip I headed to Cusco Peru, to explore the Sacred Valley and visit Machu Picchu.  In the scared valley Incan ruins, and alpacas were a common feature of the landscape.  The Andes surrounded us soaring towards the sky.  The equator allows for giant green covered mountains which tower at 13,000 – 14,000 ft above sea levels.  Snow and glaciers were reserved for only the tallest peaks pushing 19,000 ft of elevation or more. 

One night before heading to Machu Picchu we stayed at a resort where I had a private casita nestled at the base of a mountain in the Sacred Valley.  The rooms were incredibly comfortable but I loved the quiet and the calm even more.  The darkness settled in blanketing us.  revealing the southern constellations, including the Southern Cross.  I’ve been to the southern hemisphere before, but on this trip seeing the Southern Cross made me realize and consider how limited our perspectives are.  Something we have always known to be true, what stars are in the sky when we look up, is not quite so universal, and all it requires is a change of location. 

Incan Ruins and Alpacas

Visiting Machu Picchu was everything I imagined and more.  We took the train in and spent two days in the ruins.  I had seen all the picture, and wasn’t quite sure what it would be like to finally experience this much photographed place in person.  It was stunning.  Photos cannot truly capture the etherealness of the place.  A giant Incan city perched atop a mountain top, with a river wrapping around its base.  It is hard to fathom that is was possible to construct, even though the proof of its existence is directly in front of you. 

On our first afternoon, we spent our time exploring the city, passing by iconic view points, and learning about the mountain gods that surround the city and the totems the Incans built to honor them.  As we wound through the various sections a light rain broke out, and then another Incan god, the Rainbow graced us with her presence.  Standing in that city, it was easy to see how the Incans could worship the surrounding mountains, rainbows, and thunder as deities. 

Rainbows at Machu Picchu

On our second afternoon, I headed up Montana Machu Picchu.  This is the taller of the two mountains you can climb, and it overlooks the city.  The climb is strenuous.  You start at 8,000ft of elevation and climb 2,000ft straight up, since the Incans were so kind as to build stone staircases up the mountain.  During the trek I could definitely feel the elevation, my breath would become short, and my legs would start to feel wobbly, like Jello.  When this happened a quick 5 minute break to catch my breath would make me feel as good as new!  It was all worth it though.  The views from the top were outstanding.  It was a clear blue sky day with almost no clouds, so we had 360 degree visibility of the Andes, and Machu Picchu looking like a doll house tucked next to Huayna Picchu below.  We sat and took in the view for 30 minutes or so before heading back, but I could have stayed for hours. 

Machu Picchu viewed from Montana Machu Picchu

Todos Santos Mexico

In November as the days grew shorter and darker in Seattle, my boyfriend, Will, and I migrated south to spend a week on the Baja Peninsula.  We spent the week lounging pool side at Hotel San Cristobal.  We had no agenda for this trip besides to relax, and eat all of the tacos of course.  We consumed 52 tacos between the two of us from 7 different places and I kept a running log on Instagram of our Tacumentary. 

Balandra Beach

In between consuming tacos we managed to explore a lot of the wonderful things in Baja.  We took a day trip to La Paz to snorkel with Whale Sharks and swim in the calm waters of Balandra Beach.  We rented ATVs and went cruising inland towards the Sierra de la Laguna mountains using arroyos, dried riverbeds, as our roads.  We explored a hidden beach which required a journey down an unmarked dirt road and a short trek through a Palm Tree forest.  We watched many spectacular sunsets.  One from a restaurant aptly named the Mirador, which sat on a high point looking out over the water.  We watched another while sitting in the Hotels hot tub sipping a frozen watermelon margarita and watching a local band play, and yet another from the comfort of our rooms balcony.  All of them were stunning. 

2017 a Year in Review

2017 was a year of change, personal and professional.  I started the year in San Francisco, working at Twitter as an Individual Contributor, and in a long term relationship.  I ended the year in Seattle, working at Microsoft Research as a Lead, sans that long term relationship, and a brand new home owner.  

Change can be terrifying, especially when you are comfortable, when you are content.  Nothing was terribly wrong, but I got the nagging feeling that perhaps nothing was going terribly right either.  I was no longer content with being content.  So in 2017 I began to change some things up to make space for new opportunities.  

I made a conscious effort in 2017 to be less busy, to travel and speak a bit less.  2016 was a year of constant travel visiting 19 cities, 7 countries, and 3 continents.  I visited Twitter offices, spoke 15 times at conferences and meetups, and managed to squeeze in trips to see family and friends.  It was an amazing experience, but not a sustainable one for me.

So I made a conscious effort to slow down and was incredibly selective about the talks and travel I took on.  I declined several opportunities to speak and travel to great conferences and locations this year.   I wanted to take a moment to thank all the conference organizers who reached out, I greatly appreciate all of the invitations and fantastic opportunities and unfortunately did not have the bandwidth to do more this past year.  


I gave versions of my The Verification of Distributed Systems talk to larger audiences at Devoxx San Jose in March, and Velocity San Jose in June.  While I’ve given this talk numerous times, I think it’s perennially important, and people consistently tell me how much they learn from it.  

I wrote a brand new talk Distributed Sagas: A Protocol for Coordinating Microservices which I gave at J on the Beach in May and at Dot Net Fringe in June.  This was a passion project for me, as I’d been exploring the ideas for multiple years, and wanted to share the progress I had made.  

I also wrote another new talk for the inaugural Deconstruct Conf in Seattle, The Path Towards Simplifying Consistency in Distributed Systems.  This conference was my favorite of the year.  A single track filled with excellent speakers that focused not only on technology, but the culture and community in tech.  The cherry on top was its location The Egyptian theater in Seattle’s Capitol Hill neighborhood, my old stomping grounds.  

I also spoke at two chapters of Papers We Love, San Francisco and Seattle.  I presented Barbra Liskov’s paper Distributed Programming in Argus.  This brings my total times speaking at Papers We Love chapters to 7, which I think once again makes me the record holder :).  All joking aside Papers We Love is one of my favorite organizations and I love attending and speaking at the meetups because of the community it fosters bringing together academia and industry and the culture of curiosity it inspires.  


I wrote a single blog post in 2017.  Resources for Getting Started with Distributed Systems which is a collection of materials that have greatly influenced me, and attempts to answer the perennial question I get asked “How do I get started with Distributed Systems.”


Earlier this year an old colleague recommended I take a phone call with a group at Microsoft Research.  After a couple phone calls, and an onsite interview, I was convinced that this was a rare opportunity with an amazing team and an industry defining project.  So in June, after 2.5 years of working at Twitter, I decided to leave the flock.  

Working at Twitter was a truly great experience.  It was an incredible ride where I got to learn and work on so many amazing projects including being the Tech Lead of the Observability team, speaking at Twitter Flight, digging into Distributed Build, shipping Abuse Report Notifications, and facilitating TWIG (Twitter’s Engineering Leadership Program).  I also feel very fortunate to have worked with and met so many incredible people.

In July I started as a Principal Software Engineer in Microsoft Research, and have loved every minute of it.  I’m getting to stretch, learn, and grow every day on a project that I truly believe will change the world.  I also adore my teammates, this is by far the smartest and nicest team I have ever worked on.  We consistently talk and live our cultural values of trust, kindness, and fearlessness.  I couldn’t ask for a better team.  And just incase that wasn’t enough change for one year in November I stepped into the Lead role, a hybrid Tech Lead and People Manager, for the Service’s Team, which is another new exciting challenge and opportunity that I’m loving.  


Leaving San Francisco felt inevitable.  I moved to San Francisco to experience the tech scene, to live the cultural phenomenon.  But after 2.5 years I was ready to move on.  San Francisco was not my forever home, our words just did not match.  

Moving back to Seattle was an easy decision.  I first fell in love with Seattle when I moved here after college, and still love it.  Even after all my nomadic wanderings and travel when I visited Seattle in April for Deconstruct Conf I instantly felt like I was home.  I also realized I was quite nostalgic for Seattle earlier in the year when I began marathoning episodes of Grey’s Anatomy again.  

And if all the warm and fuzzy feelings about Seattle weren’t enough, the stars magically aligned and within a week of moving back I made an offer on a house, and it was accepted!  New job, new/old city, and a new homeowner too!

I jokingly tell friends that I blew up my whole life earlier this year, which isn’t entirely untrue.  The top three stressors in life are commonly reported as job change, relationship change, and moving.  I did all three within the span of about two months.  I’d like to take quick moment to thank my community of family, friends, and colleagues who helped and supported me through this whirlwind transition.  I could not have done it without your support.  

Even with all the stressors I honestly could not be happier (with my personal and professional life, the political nightmare of 2017 still fills me with dread, despair, and anger).  I no longer feel comfortable or content.  In fact I often feel decidedly uncomfortable, but in the way that signals learning and growth.  And instead of contentment I often feel a wild unbridled joy and excitement.  I’m energized to go to work every day.  I’ve sang and danced and laughed until my stomach hurts more times than I can count since blowing up my life.  So I guess the lesson once again is, “You are braver than you believe, stronger than you seem, and smarter than you think.”  Oh and always take the phone call :).  


Resources for Getting Started with Distributed Systems

I’m often asked how to get started with Distributed Systems, so this post documents my path and some of the resources I found most helpful.  It is by no means meant to be an exhaustive list.

It is worth noting that I am not classically trained in Distributed Systems.  I am mostly self taught via independent study and on the job experience.  I do have a B.S. in Computer Science from Cornell, but focused mostly on graphics and security in my specialization classes.  My love of Distributed Systems and education in it came once I entered industry.  The moral of this story is that understanding distributed systems doesn’t require academic intervention to learn and excel at.

Books on Theory & Background

  • Introduction to Reliable and Secure Distributed Programming: This book is an excellent introduction to the fundamentals of distributed computing.  It definitely takes an academic approach.  But is a good place to start to understand the terminology and challenges in the field.
  • Replication: Theory and Practice: This book is a summary of 30 years of distributed systems research on replication up to 2007.  Its a great starter and contains all the references to the original work.  Each chapter is incredibly dense, and led me down multiple paper rabbit holes.


This is by no means an exhaustive list, but these papers I keep coming back to, and they have significantly shaped the way I think about Distributed Systems.

A Note on Reading Papers

A note on reading papers: I start with the Abstract, if I find in interesting I’ll proceed onto the Introduction, then the Conclusion.  Only then if I am incredibly interested in the implementation or details will I read the whole thing.  Also the References are a gold mine, they cite related and foundational work.  Often times reading papers is a recursive process.  I’ll start on one then find a concept I’m unfamiliar with or don’t understand, so I’ll read the referenced paper and so on.  This often times results in going down the paper rabbit holes, and one time resulted in me reading a dissertation from the 1980s but it is a great way to learn.

I also highly recommend Michael Bernstein’s blog post “Should I Read Papers?” for more on the motivations and how to read an academic paper.

Blog Posts & Talks

Below is a list of some of my favorite blog posts and talks that shaped how I think about building Distributed Systems.  Most of these are old, but I keep coming back to them, and still find them relevant today.

Learning from Industry

The art of building, operating, and running distributed systems in industry is orthogonal to the theory of Distributed Systems.  I truly believe that the best way to learn about Distributed Systems is to get hands on experience working on one.

In addition Post Mortems are another great source of information.  Large tech companies, like Amazon, Netflix, Google, and Microsoft, often publish a post mortem after a major outage.  These are usually pretty dry to read, but contain some hard learned lessons.


You should follow me on Twitter here



2016: A Year in Review

2016 was a year of constant movement, I visited 19 cities, in 7 countries, on 3 continents.  In the middle of the year I switched teams inside of Twitter and began working on a new challenge, Distributed Build.  I also spoke 15 times at conferences and meetups.  Its been a really long, wonderful and exhausting year, filled with great people, travel to many exciting places, and new challenges.

Below is a summary of the articles, interviews, talks, and programming committees I participated in in 2016.

Articles & Books

My article “The Verification of a Distributed System” was published in  the February issue of Communications of the ACM.  I also began writing Go this year and wrote a Blog Post “A Quick Guide to Testing in Golang” describing the testing methodologies and project setup I use with my team.

I also was a Technical Editor for James Turnbull’s The Art of Monitoring.

Interview & Podcasts

At QConNY I recorded a podcast about Engineering Effectiveness at Twitter and Verifying Distributed Systems.  I was also honored to be included in Helena Dagmar’s Techies Project, which captured the images and stories of silicon valley tech employees who are usually underrepresented.

Myself & Ines Sombra giving a Keynote at Velocity Santa Clara

Programming Committees

I participated in several programming committees this year both academic and industry focused.  I was an industry member of the Principles and Practices of Computing for Distributed Data Workshop co-located with EuroSys.  I also hosted a track on Data & Distributed Systems at GOTO Chicago 2016, and finally I was on the inaugural programming committee for Systems We Love, a one day conference inspired by Papers We Love, but focused on Computer Systems.


After a lot of positive feedback and based on numerous requests I gave versions of my talk Scaling Stateful Services at CraftConf, CurryOn, and at Nike Tech Talks.

I turned my article The Verification of Distributed Systems into a talk, now with more rantifestos, and gave it at GOTO Chicago, QCon New York, and YOW Melbourne, Brisbane, and Sydney.

At the end of June I was honored to give a keynote at Velocity Santa Clara alongside Ines Sombra called So We Hear You Like Papers 2 in which we discuss the academic paper Unreliable Failure Detectors for Reliable Distributed Systems and how it applies to our jobs in industry.

Also at the end of June I spoke at Monitorama about Tackling Alert Fatigue, which summarized the strategies my team used over the past half year to reduce the number of critical alerts the Twitter Observability Services fired by 50%.

Members of my Distributed Systems Track at GOTO Chicago: Christopher Meiklejohn, Myself, Boaz Avital, Peter Bailis, and Aysylu Greenberg

Christopher Meiklejohn and I gave an academic and industry perspective on the Remote Procedure Call in a talk at CodeMesh called A Brief History of Distributed Programming: RPC, and a second time at YOW Brisbane.

I also spoke at three different chapters of Papers We Love this year, bringing my total PWL talks to 5.  In April I filled in last minute for a cancelled speaker at the San Francisco chapter and spoke about Sagas.  While in New York for QCon I also spoke at a special edition of Papers We Love New York, along with Eric Brewer, Evelina Gabasova, and Ines Sombra, on my favorite paper of 2014 Simple Testing Can Prevent Most Critical Failures.  Finally, while in Portland for Monitorama I spoke at the newly formed PDX chapter about Detection of Mutual Inconsistency in Distributed Systems.

I also combined all my Github talk repos into one repo CaitieM20/Talks, which contains references and links to slides and videos for each talk.

Tech WCW #4 – Jean Jennings Bartik


On February 14th 1946 the Electronic Numerical Integrator and Computer (ENIAC) was unveiled to the public.  It was the first general purpose electronic digital computer and it was Turing Complete.  The project was funded by the United States Military to speed up mathematical tasks, most notably artillery firing tables for the Army for World War II.

ENIAC could calculate the trajectory of a shell that took 30 seconds to reach its target, in 20 seconds.  Prior to ENIAC this computation would take a “Computer” (a person who did the mathematical calculation by hand) 1 to 2 days to complete.  Not only was ENIAC faster than a speeding bullet it was orders of magnitude faster than humans at the task.

While ENIAC was completed too late to be used in World War II it was a huge advance in the field of computing, and proclaimed a huge success.  The two engineers, John Mauchly and J. Presper Eckert, who designed the computer were widely celebrated along with the rest of the hardware engineering team.  However the six original Programmers of the ENIAC were widely unknown until Kathy Kleiman started the ENIAC Programmers Project in the 1980s to share the story of Frances “Betty” Snyder Holberton, Kathleen McNulty Mauchly Antonelli, Marlyn Wescoff Meltzer, Ruth Lichterman Teitelbaum, Frances Bilas Spence, and Jean Jennings Bartik.

Jean Jennings Bartik was born on December 27th 1924 on a farm in Gentry County, Missouri.  She read voraciously as a child and longed to leave Missouri.  She saw marriage as an impediment to her desire for adventure claiming, “Why would I want to get married I haven’t been anywhere I haven’t done anything”

At the age of 16 she began college at Northwest Missouri State Teachers college majoring in math and a minor in english.  While Bartik came from a long line of teachers, she did not want to become a teacher herself, so her professor brought her job advertisements, one for a “systems service girl” at IBM and one for a “computer” at Aberdeen Proving Grounds.

Bartik was offered the job of a computer at Aberdeen and jumped on the next train at the beginning of 1946.  Along with dozens of other women she began performing artillery firing tables by hand.  In June of 1946 Bartik applied for a position as a programmer for the newly formed ENIAC project.  She was chosen along with five other women.

Bartik along with the other ENIAC Programmers went through a two month training in Aberdeen to learn how to wire up the boards for the punch-card machines, before returning to The University of Pennsylvania in August 1946.  Once there they were given the Operational Manual for the ENIAC.  Bartik and Holberton holed up in a classroom and began the arduous task of understanding how the machine worked without any access to the machine itself.

[Diagrams from the ENIAC Operating Manual]

In September of 1946 all the ENIAC Programmers reconvened in one room and the project began in ernest.  The acceptance test for the project was calculating trajectories.  While all the programmers had done this by hand, translating this task to run on the ENIAC was not simple.  The ENIAC had twenty accumulators which could each be programmed separately and then a master programmer terminal which would coordinate the sub-programs.  While the ENIAC could be programmed to perform complex operations like loops and branches, none of these existed as stored programs, the Programmers had to create these from scratch for every program by wiring up the machine correctly and setting the right switches.

Programming the ENIAC was an arduous task, designing the program on paper often took weeks, and then setting the machine to run it could take a day or more, followed by another day or so spent verifying and debugging the program, which often required crawling into the machine.  Do to the lengthy set up time programs were only changed after numerous calculations had been performed.

After the war Dr. Richard Clippinger and John von Neumann began work on stored program computers, a computer that stores program instructions in electronic memory.  Clippinger recruited Bartik to set up a group of four or five programmers at The University of Pennsylvannia under his direction, who would program the ENIAC and help him turn it into a stored program computer.  Von Neumann and Clipplinger would work on the instruction set at Princeton and consult with Bartik and her group ever couple of weeks.  Bartik advocated for shrinking the instruction set to make it simpler at first to get it working on the ENIAC.  While they were working on the theory she helped keep them grounded in the reality of the hardware the instructions had to run on.  By 1948 the ENIAC was successfully transformed into a stored program computer which could run programs sequentially.

Meanwhile Eckert and Mauchly had left the University of Pennsylvannia to form their own company the Eckert-Mauchly Computer Corporation (EMCC).  They recruited Bartik to program their new computers the BINAC and the UNIVAC.

Bartik and Spence programming the ENIAC

The BINAC, the Binary Automatic Computer, was the worlds first stored program computer.  Unlike the ENIAC it used binary  instead of the decimal system.  It also used magnetic tape to store data and programs instead of punch cards.  Bartik programmed a guidance system to run on it for Northrop Aircraft.

The UNIVAC, the Universal Automatic Computer, was designed for business and administrative use.  The United States Census Bureau purchased one as well as CBS which used it to predict the results of the 1952 presidential election in the United States.  Bartik was responsible for the design of the UNIVAC’s logical circuits.

In 1951 Bartik left the computer industry to raise her three children.  In 1967 she re-entered the field working for a series of companies, Auerbach Corporation, Interdata, Systems Engineering Lab in Florida, Honeywell, and Data Decisions.  When Data Decisions closed in 1986 Bartik was unable to find another job in the industry do to gender and age discrimination.

Bartik died on March 23rd 2011 in Poughkeepsie, New York, USA

In 2009 Bartik received a Pioneer Award from the IEEE Computer Society.  In 2008 she was named a fellow by the Computer History Museum in Mountain View California.  Her alma-mater Northwest Missouri State named the Jean Jennings Bartik Computing Museum after her.

Her autobiography Jean Jennings Bartik and the computer that changed the world was published in 2013.


Read more about what TechWCW is, and check out all of my Tech Woman Crush Wednesdays.


A Quick Guide to Testing in Golang

When I started writing Go in May, I found a lot of useful documentation on Getting Started with Go.  However, I found recommendations on testing best practices lacking.  So I decided to write down what I pieced together, and create a Github Repo of a base project with examples.  Essentially this is the guide I wish had existed, and is a record for future me when I invariably forget this information.  In this blog post I’ll walk through a simple example of writing and testing a FizzBuzz application using unit tests, property based testing, mocks & fakes.

Code: FizzBuzz

So let’s start by writing a basic function FizzBuzz, which takes in a number and returns a string according to the following rules.

For multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Here is my version (fizzbuzz.go), pretty simple right?  Now that we’ve written our code, we need to test it.

Unit Test Cases

Basic testing in Go is easy, and well documented.  Go test cases are usually placed in the same directory as the code they are testing and typically named <filename>_test.go, where filename is the name of the file with the code under test.

There are four basic outputs we expect from FizzBuzz: Fizz, Buzz, FizzBuzz, and the input number.  These can all be covered by 4 basic test cases that I wrote in fizzbuzz_test.go which provide the input 3, 5, 15, and 2 to the fizzBuzz function and validate the result.

Helpful Commands

go test -v -race ./...

-v prints out verbose test results.  This will show a pass fail for every test case ran.

-race runs the Golang race detector, which will detect when two goroutines access the same variable concurrently and at least one of the accesses is a write.

Continuous Integration

Continuous Integration is crucial for fast & safe development.  Using a tool like Travis CI or Circle CI, makes it easy for developers to ensure all submitted code compiles and passes test cases.  I setup my project to run gated checkins using TravisCI, starting with the golang docs, and then adding some modifications. My .travis.yml file ensures the following:

  • The code compiles
  • The code is formatted correctly (gofmt)
  • The code passes go vet
  • All test cases pass with the -v & -race flag
  • Code Coverage of test cases is uploaded to codecov.io

Code Coverage

Code Coverage is another important tool that I include in every project where possible.  While no percentage of code coverage will prove that your code is correct, it does give you more information about what code has been exercised.

I personally use code coverage to check if error cases are handled appropriately.  Anecdotally I find that code coverage gaps occur around error handling.  Also in Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems, the authors discovered that the majority of catastrophic failures are caused by inappropriate error handling and that  “In 23% of catastrophic failures … the incorrect error handling in these cases would be exposed by 100% statement coverage testing on the error handling logic.”

Testing and verifying distributed systems is hard but this paper demonstrates that rigorously testing the error handling logic in our program dramatically increase our confidence that the system is doing the right thing.  This is a huge win.  I highly recommend using Code Coverage in your Go projects.

There are a variety of Code Coverage Tools out there.  I set up my repo to use CodeCov.io.  It easily integrates with TravisCI and is free for public repos.  CodeCov.yml is my projects configuration file, and testCoverage.sh is a script which runs all the tests in the project and creates coverage.txt file which is uploaded and parsed by CodeCov to create coverage reports.

Property Based Testing

Now we have 100% test coverage of the current implementation with our unit test cases, however we have only covered 9.31e-10%.  That’s a very small percentage of all possible inputs to be validated.  Assuming that the code was more complicated, or we had to test this in a black box manner, then our confidence that our code was doing the correct thing for all inputs would be low.

One way to explore more of the input state space is to use property based testing.  In a property based test, the programmer specifies logical properties that a function should fulfill.  The property testing framework then randomly generates input and tries to find a counterexample, i.e. a bug in the code.  The canonical property testing framework is QuickCheck, which was written by John Hughes, it has since been re-implemented in numerous other languages including Go (Gopter is the GOlang Property TestER).  While Property Based testing cannot prove that the code is correct, it greatly increases our confidence that the code is doing the right thing since a larger portion of the input state space is explored.   

The docs for Gopter are rather extensive, and explain all the bells and whistles, so we shall just go through a quick example.  Property based tests can be specified like any other test case, I placed mine in fizzbuzz_prop_test.go for this example, but typically I include them in the <filename>_test.go file.

properties.Property("FizzBuzz Returns Correct String", prop.ForAll(
    func(num int) bool {
      str := fizzBuzz(num)

      switch str {
      case "Fizz":
        return (num % 3 == 0) && !(num % 5 == 0)
      case "Buzz":
        return (num % 5 == 0) && !(num % 3 == 0)
      case "FizzBuzz":
        return (num % 3 == 0) && (num % 5 == 0)
        expectedStr := strconv.Itoa(num)
        return !(num % 3 == 0) && !(num % 5 == 0) && expectedStr == str

This test passes the randomly generated number into fizzBuzz then for each case ascertains that the output adheres to the defined properties, i.e. if the returned value is “Fizz” then the number must be divisible by 3 and not by 5, etc…  If any of these assertions do not hold a counter-example will be returned.

For instance say a zealous developer on the FizzBuzz project added an “!” to the end of the converted number string, the property based tests would fail with the following message:

! FizzBuzz Returns Correct String: Falsified after 3 passed tests.
ARG_0: 11
ARG_0_ORIGINAL (31 shrinks): 406544657
Elapsed time: 200.588µs

Now we have a counter example and can easily reproduce the bug, fix it and move on with development.

Where Gopter & QuickCheck excel beyond random input and fuzz testing, is that they will try to shrink the input to cause the error to a minimum set of inputs.  While our example only takes one input this is incredibly valuable for more complex tests.

I find Property Based testing incredibly valuable for exploring large state spaces of input, especially things like transformation functions.  I regularly use them in addition to unit tests, and often find them just as easy to write.  

Helpful Commands

go get github.com/leanovate/gopter

Install Gopter to get started with property based testing in Go.

Code: FizzBuzz Handler

The project scope has increased!  Now we need to provide FizzBuzz as a service and/or command line tool.  Now our FizzBuzz calculator may be long lived and can take advantage of caching results, that users have already requested.

In order to do this I added a new interface Cache, this allows the user to provide their favorite Cache of choice.  That could be a simple in-memory cache backed by a dictionary or perhaps a durable cache like Redis, depending on their requirements.

type Cache interface {
  Put(key int, value string)
  Get(key int) (string, bool)

And a new file fizzBuzzHandler.go, with a method RunFizzBuzz, which takes an array of strings (presumably numbers) tries to convert them to integers, and then get the FizzBuzz value for them, either from the cache or by calculating FizzBuzz via our previously defined method.


Now we have new code that needs to be tested, so we create fizzBuzzHandler_test.go.  Testing bad input is once again a simple unit test case.  We can also simply test that the correct value of FizzBuzz is returned for a variety of supplied numbers when RunFizzBuzz is called, however, FizzBuzz returning the correct value has already been extensively tested above.  

What we really want to test is the interaction with the Cache.  Namely that values are stored in the cache after being calculated, and that they are retrieved from the cache and not re-calculated.  Mocks are a great way to test that code interacts in the expected way, and to easily define inputs and outputs for calls. 

Go has a package golang/mock.  In Go only Interfaces can be Mock’d.  Mocks in Go are implemented via codegen.  The mockgen tool will generate an implementation of a mock based on your interface.  Then in a unit test case, a mock interface object can be created, and expected method calls specified and return values defined.

func Test_RunFizzBuzz_CacheMiss(t *testing.T) {
  mockCtrl := gomock.NewController(t)
  defer mockCtrl.Finish()

  mockCache := NewMockCache(mockCtrl)
  mockCache.EXPECT().Get(5).Return("", false)
  mockCache.EXPECT().Put(5, "Buzz")

  handler := NewHandler(mockCache)
  str, err := handler.RunFizzBuzz([]string{"5"})

  if err != nil {
    t.Error("Unexpected error returned", err)
  if str[0] != "Buzz" {
    t.Error("Expected returned value to be Buzz", str)

In the above code, I create a mockCache with the NewMockCache command, and define that I expect a Cache miss to occur, followed by a Put with the calculated value.  I then simply call RunFizzBuzz and verify the output.  This not only validates that the correct value is returned from RunFizzBuzz, but also that the cache was successfully updated.

Code Generated mocks should be checked into the code base, and updated when the interface changes as part of a code review.

Helpful Commands

go generate ./…

will run the code gen command specified in files with the comment:

//go:generate <cmd>

For instance to generate cache_mock.go when running go generate./… the following comment is added at the top of the file.

//go:generate mockgen -source=cache.go -package=fizzbuzz -destination=cache_mock.go


A fake is a test implementation of an interface, which can also be incredibly useful in testing, especially for integration tests or property based tests.  Specifying all the expected calls on a mock in a property based test is tedious, and may not be possible in some scenarios.  At these points Fakes can be very useful.  I implemented cache_fake.go, a simple in-memory cache to use with fizzBuzzHandler_prop_test.go to ensure there is no unintended behavior when the cache is used with numerous requests.

Tests that utilize fakes can also easily be repurposed as integration or smoke-tests when an interface is used to abstract a network interaction, like with the FizzBuzz Cache.  Running this test with the desired cache implementation can greatly increase our confidence that the interaction with the physical cache is correct, and that the environment is configured correctly.


The golang ecosystem provides numerous options for testing and validating code.  These tools are free & easy to use.  By using a combination of the above tools we can obtain a high degree of confidence that our system is doing the correct thing.

I’d love to hear what tools & testing setups you use, feel free to share on Twitter, or submit a pull request to the repo.

You should follow me on Twitter here

2015: A Year in Review

2015 has been a whirlwind of a year, which started off in a new city, with a new job as the Tech Lead of  Observability at Twitter.  The year was full of travel spanning 10 states, 3 different countries, and 2 continents.  This year I also had numerous opportunities to share my experiences with programming and distributed systems via, talks, blog posts, podcasts, and articles.  Below is the recap.


Interviews & Podcasts

Programming Committees


Orleans: A Framework for Cloud Computing

Presented at Papers We Love SF: Video & Slides [February 19th 2015]


Caitie McCaffrey stops by and talks about the Orleans: Distributed Virtual Actors for Programmability and Scalability paper by Bernstein, Bykov, Geller, Kliot, and Thelin.

Orleans is a runtime and programming model for building scalable distributed systems, based on the actor model.  The Orleans programming model introduces the abstraction of Virtual Actors.  Orleans allows applications to obtain high performance, reliability, and scalability.  This technology was developed by the eXtreme Computing Group at Microsoft Research and was a core component of the Azure Services that supported that powered Halo 4, the award winning video game.

Distributed Systems Track at Goto Chicago: Neha Narula, Caitie McCaffrey, Chris Meiklejohn, Kyle Kingsbury

Building the Halo 4 Services with Orleans

Presented at Qcon London: Video & Slides [March 5th 2015]


Halo 4 is a first-person shooter on the Xbox 360, with fast-paced, competitive gameplay. To complement the code on disc, a set of services were developed to store player statistics, display player presence information, deliver daily challenges, modify playlists, catch cheaters and more. As of June 2013 Halo 4 had 11.6 million players, who played 1.5 billion games, logging 270 million hours of gameplay.

Orleans, Distributed Virtual Actors for Programmability & Scalability, is an actor framework & runtime for building high scale distributed systems. It came from the eXtreme computing group in Microsoft Research, and is now Open Source on Github.

For Halo 4, 343 Industries built and deployed a new set of services built from the ground up to support high demand, low latency, and high availability using using Orleans and running in Window Azure. This talk will do an overview of Orleans, the challenges faced when building the Halo 4 services, and why the Actor Model and Orleans in particular were utilized to solve these problems.

Architecting & Launching the Halo 4 Services

Presented as the Closing Keynote of SRECon15: VideoSlides [March 17th 2015]


The Halo 4 services were built from the ground up to support high demand, low latency, and high availability.  In addition, video games have unique load patterns where the majority of the traffic and sales occurs within the first few weeks after launch, making this a critical time period for the game and supporting services. Halo 4 went from 0 to 1 million users on day 1, and 4 million users within the first week.

This talk will discuss the architectural challenges faced when building these services and how they were solved using Windows Azure and Project Orleans. In addition, we’ll discuss the path to production, some of the difficulties faced, and the tooling and practices that made the launch successful.

On stage during Strange Loop 2015 at the Peabody Opera House

The Saga Pattern

Presented at Craft Conf 2015 & Goto: Chicago 2015 Video & Slides [April 23rd 2015 & May 12th 2015]


As we build larger more complex applications and solutions that need to do collaborative processing the traditional ACID transaction model using coordinated 2-phase commit is often no longer suitable. More frequently we have long lived transactions or must act upon resources distributed across various locations and trust boundaries. The Saga Pattern is a useful model for long lived activities and distributed transactions without coordination.

Sagas split work into a set of transactions whose effects can be reversed even after the work has been performed or committed. If a failure occurs compensating transactions are performed to rollback the work. So at its core the Saga is a failure Management Pattern, making it particularly applicable to distributed systems.

In this talk, I’ll discuss the fundamentals of the Saga Pattern, and how it can be applied to your systems. In addition we’ll discuss how the Halo 4 Services successfully made use of the Saga Pattern when processing game statistics, and how we implemented it in production.

Scaling Stateful Services

Presented at StrangeLoop 2015 Video & Slides [September 25th 2015]

This talk was incredibly well received, and I was flattered to see write-ups of it featured in High Scalability and InfoQ


The Stateless Service design principle has become ubiquitous in the tech industry for creating horizontally scalable services. However our applications do have state, we just have moved all of it to caches and databases. Today as applications are becoming more data intensive and request latencies are expected to be incredibly low, we’d like the benefits of stateful services, like data locality and sticky consistency. In this talk I will address the benefits of stateful services, how to build them so that they scale, and discuss projects from Halo and Twitter of highly distributed and scalable services that implement these techniques successfully.

Ines Sombra & Caitie McCaffrey’s Evening Keynote  at QconSF

On the Order of Billions

Presented at Twitter Flight: Video & Slides [October 21st 2015]


Every minute Twitter’s Observability stack processes 2+ billion metrics in order to provide Visibility into Twitter’s distributed microservices architecture. This talk will focus on some of the challenges associated with building and running this large scale distributed system. We will also focus on lessons learned and how to build services that scale that are applicable for services of any size.

So We Hear You Like Papers

Presented as the Evening Keynote at QconSF with Ines Sombra: Video, Slides, Resources, & Moment [November 16th 2015]


Surprisingly enough academic papers can be interesting and very relevant to the work we do as computer science practitioners. Papers come in many kinds/ areas of focus and sometimes finding the right one can be difficult. But when you do, it can radically change your perspective and introduce you to new ideas.

Distributed Systems has been an active area of research since the 1960s, and many of the problems we face today in our industry have already had solutions proposed, and have inspired new research. Join us for a guided tour of papers from past and present research that have reshaped the way we think about building large scale distributed systems.