Resources for Getting Started with Distributed Systems Posted on 13.02.202413.02.2024 By caiti335 I’m often asked how to get started with Distributed Systems, so this post documents my path and some of the resources I found most helpful. It is by no means meant to be an exhaustive list. It is worth noting that I am not classically trained in Distributed Systems. I am mostly self taught via independent study and on the job experience. I do have a B.S. in Computer Science from Cornell, but focused mostly on graphics and security in my specialization classes. My love of Distributed Systems and education in it came once I entered industry. The moral of this story is that understanding distributed systems doesn’t require academic intervention to learn and excel at. Books on Theory & Background Introduction to Reliable and Secure Distributed Programming: This book is an excellent introduction to the fundamentals of distributed computing. It definitely takes an academic approach. But is a good place to start to understand the terminology and challenges in the field. Replication: Theory and Practice: This book is a summary of 30 years of distributed systems research on replication up to 2007. Its a great starter and contains all the references to the original work. Each chapter is incredibly dense, and led me down multiple paper rabbit holes. Papers This is by no means an exhaustive list, but these papers I keep coming back to, and they have significantly shaped the way I think about Distributed Systems. Time, Clocks, and the Ordering of Events in Distributed Systems Impossibility of Distributed Consensus with One Faulty Process Unreliable Failure Detectors for Reliable Distributed Systems CAP Twelve Years Later: How the Rules Have Changed Harvest, Yield and Scalable Tolerant Systems Dynamo, Amazon’s Highly Available Key Value Store The Chubby Lock Service for Loosely-Coupled Distributed System Fallacies of Distributed Computing A Note on Reading Papers A note on reading papers: I start with the Abstract, if I find in interesting I’ll proceed onto the Introduction, then the Conclusion. Only then if I am incredibly interested in the implementation or details will I read the whole thing. Also the References are a gold mine, they cite related and foundational work. Often times reading papers is a recursive process. I’ll start on one then find a concept I’m unfamiliar with or don’t understand, so I’ll read the referenced paper and so on. This often times results in going down the paper rabbit holes, and one time resulted in me reading a dissertation from the 1980s but it is a great way to learn. I also highly recommend Michael Bernstein’s blog post “Should I Read Papers?” for more on the motivations and how to read an academic paper. Blog Posts & Talks Below is a list of some of my favorite blog posts and talks that shaped how I think about building Distributed Systems. Most of these are old, but I keep coming back to them, and still find them relevant today. Notes on Distributed Systems for Young Bloods by Jeff Hodges Jepsen Blog Posts by Kyle Kingsbury Everything Will Flow: Distributed Queues & Backpressure by Zach Tellman Bad As I Wanna Be: Coordination and Consistency in Distributed Systems by Peter Bailis Learning from Industry The art of building, operating, and running distributed systems in industry is orthogonal to the theory of Distributed Systems. I truly believe that the best way to learn about Distributed Systems is to get hands on experience working on one. In addition Post Mortems are another great source of information. Large tech companies, like Amazon, Netflix, Google, and Microsoft, often publish a post mortem after a major outage. These are usually pretty dry to read, but contain some hard learned lessons. Tech Insights
Recommended Engineering Management Books Posted on 13.02.202413.02.2024 Over the past 3.5 years my career has grown and transformed from Individual Contributor (IC) to an Engineering Manager of multiple teams, and all the roles in between as I built the Azure Sphere Security Services (AS3) Team from 2 people to 20 people. I undertook this journey in the… Read More
Creating RESTful Services using Orleans Posted on 13.02.202413.02.2024 After the announce of the Orleans preview, there was a lot of discussion on Twitter. One comment in particular caught my eye. .NET’s actor model uses static factories, RPC Interfaces and code-gen client proxies for comms, WCF all over again: http://t.co/PyIq291Kvh — Demis Bellot (@demisbellot) April 3, 2014 I think… Read More
2017 a Year in Review Posted on 13.02.202413.02.2024 2017 was a year of change, personal and professional. I started the year in San Francisco, working at Twitter as an Individual Contributor, and in a long term relationship. I ended the year in Seattle, working at Microsoft Research as a Lead, sans that long term relationship, and a brand… Read More