Архивы Tech Insights - caitiem.com Tue, 13 Feb 2024 12:29:07 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 https://www.caitiem.com/wp-content/uploads/2024/02/cropped-3445189-32x32.png Архивы Tech Insights - caitiem.com 32 32 2017 a Year in Review http://www.caitiem.com/2017/12/19/2017-a-year-in-review/ http://www.caitiem.com/2017/12/19/2017-a-year-in-review/#respond Tue, 13 Feb 2024 11:37:38 +0000 http://caitiem.com/?p=49 2017 was a year of change, personal and professional.  I started the year in San Francisco, working at Twitter as an Individual Contributor, and in a long term relationship.  I ended the year in Seattle, working at Microsoft Research as a Lead, sans that long term relationship, and a brand...

Сообщение 2017 a Year in Review появились сначала на caitiem.com.

]]>
2017 was a year of change, personal and professional.  I started the year in San Francisco, working at Twitter as an Individual Contributor, and in a long term relationship.  I ended the year in Seattle, working at Microsoft Research as a Lead, sans that long term relationship, and a brand new home owner.  

Change can be terrifying, especially when you are comfortable, when you are content.  Nothing was terribly wrong, but I got the nagging feeling that perhaps nothing was going terribly right either.  I was no longer content with being content.  So in 2017 I began to change some things up to make space for new opportunities.  

That coding life pic.twitter.com/SW6KoTazDv

— Caitie McCaffrey (@caitie) September 13, 2017

I made a conscious effort in 2017 to be less busy, to travel and speak a bit less.  2016 was a year of constant travel visiting 19 cities, 7 countries, and 3 continents.  I visited Twitter offices, spoke 15 times at conferences and meetups, and managed to squeeze in trips to see family and friends.  It was an amazing experience, but not a sustainable one for me.

So I made a conscious effort to slow down and was incredibly selective about the talks and travel I took on.  I declined several opportunities to speak and travel to great conferences and locations this year.   I wanted to take a moment to thank all the conference organizers who reached out, I greatly appreciate all of the invitations and fantastic opportunities and unfortunately did not have the bandwidth to do more this past year.  

Talks

I gave versions of my The Verification of Distributed Systems talk to larger audiences at Devoxx San Jose in March, and Velocity San Jose in June.  While I’ve given this talk numerous times, I think it’s perennially important, and people consistently tell me how much they learn from it.  

I wrote a brand new talk Distributed Sagas: A Protocol for Coordinating Microservices which I gave at J on the Beach in May and at Dot Net Fringe in June.  This was a passion project for me, as I’d been exploring the ideas for multiple years, and wanted to share the progress I had made.  

I also wrote another new talk for the inaugural Deconstruct Conf in Seattle, The Path Towards Simplifying Consistency in Distributed Systems.  This conference was my favorite of the year.  A single track filled with excellent speakers that focused not only on technology, but the culture and community in tech.  The cherry on top was its location The Egyptian theater in Seattle’s Capitol Hill neighborhood, my old stomping grounds.  

I also spoke at two chapters of Papers We Love, San Francisco and Seattle.  I presented Barbra Liskov’s paper Distributed Programming in Argus.  This brings my total times speaking at Papers We Love chapters to 7, which I think once again makes me the record holder :).  All joking aside Papers We Love is one of my favorite organizations and I love attending and speaking at the meetups because of the community it fosters bringing together academia and industry and the culture of curiosity it inspires.  

Writing

I wrote a single blog post in 2017.  Resources for Getting Started with Distributed Systems which is a collection of materials that have greatly influenced me, and attempts to answer the perennial question I get asked “How do I get started with Distributed Systems.”

Work

Earlier this year an old colleague recommended I take a phone call with a group at Microsoft Research.  After a couple phone calls, and an onsite interview, I was convinced that this was a rare opportunity with an amazing team and an industry defining project.  So in June, after 2.5 years of working at Twitter, I decided to leave the flock.  

Working at Twitter was a truly great experience.  It was an incredible ride where I got to learn and work on so many amazing projects including being the Tech Lead of the Observability team, speaking at Twitter Flight, digging into Distributed Build, shipping Abuse Report Notifications, and facilitating TWIG (Twitter’s Engineering Leadership Program).  I also feel very fortunate to have worked with and met so many incredible people.

Today is my last day at Twitter. What an incredible ride the last 2.5 years, so grateful for this experience & the folks I met along the way pic.twitter.com/pWKQTd27yo

In July I started as a Principal Software Engineer in Microsoft Research, and have loved every minute of it.  I’m getting to stretch, learn, and grow every day on a project that I truly believe will change the world.  I also adore my teammates, this is by far the smartest and nicest team I have ever worked on.  We consistently talk and live our cultural values of trust, kindness, and fearlessness.  I couldn’t ask for a better team.  And just incase that wasn’t enough change for one year in November I stepped into the Lead role, a hybrid Tech Lead and People Manager, for the Service’s Team, which is another new exciting challenge and opportunity that I’m loving.  

Personal

Leaving San Francisco felt inevitable.  I moved to San Francisco to experience the tech scene, to live the cultural phenomenon.  But after 2.5 years I was ready to move on.  San Francisco was not my forever home, our words just did not match.  

Moving back to Seattle was an easy decision.  I first fell in love with Seattle when I moved here after college, and still love it.  Even after all my nomadic wanderings and travel when I visited Seattle in April for Deconstruct Conf I instantly felt like I was home.  I also realized I was quite nostalgic for Seattle earlier in the year when I began marathoning episodes of Grey’s Anatomy again.  

And if all the warm and fuzzy feelings about Seattle weren’t enough, the stars magically aligned and within a week of moving back I made an offer on a house, and it was accepted!  New job, new/old city, and a new homeowner too!

I jokingly tell friends that I blew up my whole life earlier this year, which isn’t entirely untrue.  The top three stressors in life are commonly reported as job change, relationship change, and moving.  I did all three within the span of about two months.  I’d like to take quick moment to thank my community of family, friends, and colleagues who helped and supported me through this whirlwind transition.  I could not have done it without your support.  

Even with all the stressors I honestly could not be happier (with my personal and professional life, the political nightmare of 2017 still fills me with dread, despair, and anger).  I no longer feel comfortable or content.  In fact I often feel decidedly uncomfortable, but in the way that signals learning and growth.  And instead of contentment I often feel a wild unbridled joy and excitement.  I’m energized to go to work every day.  I’ve sang and danced and laughed until my stomach hurts more times than I can count since blowing up my life.  So I guess the lesson once again is, “You are braver than you believe, stronger than you seem, and smarter than you think.”  Oh and always take the phone call :).  

Сообщение 2017 a Year in Review появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2017/12/19/2017-a-year-in-review/feed/ 0
A Quick Guide to Testing in Golang http://www.caitiem.com/2016/08/18/a-quick-guide-to-testing-in-golang/ http://www.caitiem.com/2016/08/18/a-quick-guide-to-testing-in-golang/#respond Tue, 13 Feb 2024 11:25:08 +0000 http://caitiem.com/?p=48 When I started writing Go in May, I found a lot of useful documentation on Getting Started with Go.  However, I found recommendations on testing best practices lacking.  So I decided to write down what I pieced together, and create a Github Repo of a base project with examples.  Essentially...

Сообщение A Quick Guide to Testing in Golang появились сначала на caitiem.com.

]]>
When I started writing Go in May, I found a lot of useful documentation on Getting Started with Go.  However, I found recommendations on testing best practices lacking.  So I decided to write down what I pieced together, and create a Github Repo of a base project with examples.  Essentially this is the guide I wish had existed, and is a record for future me when I invariably forget this information.  In this blog post I’ll walk through a simple example of writing and testing a FizzBuzz application using unit tests, property based testing, mocks & fakes.

Code: FizzBuzz

So let’s start by writing a basic function FizzBuzz, which takes in a number and returns a string according to the following rules.

For multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

Here is my version (fizzbuzz.go), pretty simple right?  Now that we’ve written our code, we need to test it.

Unit Test Cases

Basic testing in Go is easy, and well documented.  Go test cases are usually placed in the same directory as the code they are testing and typically named <filename>_test.go, where filename is the name of the file with the code under test.

There are four basic outputs we expect from FizzBuzz: Fizz, Buzz, FizzBuzz, and the input number.  These can all be covered by 4 basic test cases that I wrote in fizzbuzz_test.go which provide the input 3, 5, 15, and 2 to the fizzBuzz function and validate the result.

Helpful Commands

go test -v -race ./…

-v prints out verbose test results.  This will show a pass fail for every test case ran.

-race runs the Golang race detector, which will detect when two goroutines access the same variable concurrently and at least one of the accesses is a write.

Continuous Integration

Continuous Integration is crucial for fast & safe development.  Using a tool like Travis CI or Circle CI, makes it easy for developers to ensure all submitted code compiles and passes test cases.  I setup my project to run gated checkins using TravisCI, starting with the golang docs, and then adding some modifications. My .travis.yml file ensures the following:

  • The code compiles
  • The code is formatted correctly (gofmt)
  • The code passes go vet
  • All test cases pass with the -v & -race flag
  • Code Coverage of test cases is uploaded to codecov.io

Code Coverage

Code Coverage is another important tool that I include in every project where possible.  While no percentage of code coverage will prove that your code is correct, it does give you more information about what code has been exercised.

I personally use code coverage to check if error cases are handled appropriately.  Anecdotally I find that code coverage gaps occur around error handling.  Also in Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems, the authors discovered that the majority of catastrophic failures are caused by inappropriate error handling and that  “In 23% of catastrophic failures … the incorrect error handling in these cases would be exposed by 100% statement coverage testing on the error handling logic.”

Testing and verifying distributed systems is hard but this paper demonstrates that rigorously testing the error handling logic in our program dramatically increase our confidence that the system is doing the right thing.  This is a huge win.  I highly recommend using Code Coverage in your Go projects.

There are a variety of Code Coverage Tools out there.  I set up my repo to use CodeCov.io.  It easily integrates with TravisCI and is free for public repos.  CodeCov.yml is my projects configuration file, and testCoverage.sh is a script which runs all the tests in the project and creates coverage.txt file which is uploaded and parsed by CodeCov to create coverage reports.

Property Based Testing

Now we have 100% test coverage of the current implementation with our unit test cases, however we have only covered 9.31e-10%.  That’s a very small percentage of all possible inputs to be validated.  Assuming that the code was more complicated, or we had to test this in a black box manner, then our confidence that our code was doing the correct thing for all inputs would be low.

One way to explore more of the input state space is to use property based testing.  In a property based test, the programmer specifies logical properties that a function should fulfill.  The property testing framework then randomly generates input and tries to find a counterexample, i.e. a bug in the code.  The canonical property testing framework is QuickCheck, which was written by John Hughes, it has since been re-implemented in numerous other languages including Go (Gopter is the GOlang Property TestER).  While Property Based testing cannot prove that the code is correct, it greatly increases our confidence that the code is doing the right thing since a larger portion of the input state space is explored.   

The docs for Gopter are rather extensive, and explain all the bells and whistles, so we shall just go through a quick example.  Property based tests can be specified like any other test case, I placed mine in fizzbuzz_prop_test.go for this example, but typically I include them in the <filename>_test.go file.

properties.Property("FizzBuzz Returns Correct String", prop.ForAll(    func(num int) bool{      str := fizzBuzz(num)      switchstr {      case"Fizz":        return(num % 3 == 0) && !(num % 5 == 0)      case"Buzz":        return(num % 5 == 0) && !(num % 3 == 0)      case"FizzBuzz":        return(num % 3 == 0) && (num % 5 == 0)      default:        expectedStr := strconv.Itoa(num)        return!(num % 3 == 0) && !(num % 5 == 0) && expectedStr == str      }    },    gen.Int(),  ))

This test passes the randomly generated number into fizzBuzz then for each case ascertains that the output adheres to the defined properties, i.e. if the returned value is “Fizz” then the number must be divisible by 3 and not by 5, etc…  If any of these assertions do not hold a counter-example will be returned.

For instance say a zealous developer on the FizzBuzz project added an “!” to the end of the converted number string, the property based tests would fail with the following message: ! FizzBuzz Returns Correct String: Falsified after 3 passed tests. ARG_0: 11 ARG_0_ORIGINAL (31 shrinks): 406544657 Elapsed time: 200.588µs

Now we have a counter example and can easily reproduce the bug, fix it and move on with development.

Where Gopter & QuickCheck excel beyond random input and fuzz testing, is that they will try to shrink the input to cause the error to a minimum set of inputs.  While our example only takes one input this is incredibly valuable for more complex tests.

I find Property Based testing incredibly valuable for exploring large state spaces of input, especially things like transformation functions.  I regularly use them in addition to unit tests, and often find them just as easy to write.  

Helpful Commands

go get github.com/leanovate/gopter

Install Gopter to get started with property based testing in Go.

Code: FizzBuzz Handler

The project scope has increased!  Now we need to provide FizzBuzz as a service and/or command line tool.  Now our FizzBuzz calculator may be long lived and can take advantage of caching results, that users have already requested.

In order to do this I added a new interface Cache, this allows the user to provide their favorite Cache of choice.  That could be a simple in-memory cache backed by a dictionary or perhaps a durable cache like Redis, depending on their requirements.

type Cache interface {  Put(key int, value string)  Get(key int) (string, bool)}

And a new file fizzBuzzHandler.go, with a method RunFizzBuzz, which takes an array of strings (presumably numbers) tries to convert them to integers, and then get the FizzBuzz value for them, either from the cache or by calculating FizzBuzz via our previously defined method.

Mocks

Now we have new code that needs to be tested, so we create fizzBuzzHandler_test.go.  Testing bad input is once again a simple unit test case.  We can also simply test that the correct value of FizzBuzz is returned for a variety of supplied numbers when RunFizzBuzz is called, however, FizzBuzz returning the correct value has already been extensively tested above.  

What we really want to test is the interaction with the Cache.  Namely that values are stored in the cache after being calculated, and that they are retrieved from the cache and not re-calculated.  Mocks are a great way to test that code interacts in the expected way, and to easily define inputs and outputs for calls. 

Go has a package golang/mock.  In Go only Interfaces can be Mock’d.  Mocks in Go are implemented via codegen.  The mockgen tool will generate an implementation of a mock based on your interface.  Then in a unit test case, a mock interface object can be created, and expected method calls specified and return values defined.

func Test_RunFizzBuzz_CacheMiss(t *testing.T) {  mockCtrl := gomock.NewController(t)  defer mockCtrl.Finish()  mockCache := NewMockCache(mockCtrl)  mockCache.EXPECT().Get(5).Return("", false)  mockCache.EXPECT().Put(5, "Buzz")  handler := NewHandler(mockCache)  str, err := handler.RunFizzBuzz([]string{"5"})  iferr != nil {    t.Error("Unexpected error returned", err)  }  ifstr[0] != "Buzz"{    t.Error("Expected returned value to be Buzz", str)  }}

In the above code, I create a mockCache with the NewMockCache command, and define that I expect a Cache miss to occur, followed by a Put with the calculated value.  I then simply call RunFizzBuzz and verify the output.  This not only validates that the correct value is returned from RunFizzBuzz, but also that the cache was successfully updated.

Code Generated mocks should be checked into the code base, and updated when the interface changes as part of a code review.

Helpful Commands

go generate ./…

will run the code gen command specified in files with the comment: //go:generate <cmd>

For instance to generate cache_mock.go when running go generate./… the following comment is added at the top of the file. //go:generate mockgen -source=cache.go -package=fizzbuzz -destination=cache_mock.go

Fakes

A fake is a test implementation of an interface, which can also be incredibly useful in testing, especially for integration tests or property based tests.  Specifying all the expected calls on a mock in a property based test is tedious, and may not be possible in some scenarios.  At these points Fakes can be very useful.  I implemented cache_fake.go, a simple in-memory cache to use with fizzBuzzHandler_prop_test.go to ensure there is no unintended behavior when the cache is used with numerous requests.

Tests that utilize fakes can also easily be repurposed as integration or smoke-tests when an interface is used to abstract a network interaction, like with the FizzBuzz Cache.  Running this test with the desired cache implementation can greatly increase our confidence that the interaction with the physical cache is correct, and that the environment is configured correctly.

Conclusion

The golang ecosystem provides numerous options for testing and validating code.  These tools are free & easy to use.  By using a combination of the above tools we can obtain a high degree of confidence that our system is doing the correct thing.

I’d love to hear what tools & testing setups you use, feel free to share on Twitter, or submit a pull request to the repo.

Сообщение A Quick Guide to Testing in Golang появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2016/08/18/a-quick-guide-to-testing-in-golang/feed/ 0
2015: A Year in Review http://www.caitiem.com/2015/12/26/2015-a-year-in-review/ http://www.caitiem.com/2015/12/26/2015-a-year-in-review/#respond Tue, 13 Feb 2024 11:20:35 +0000 http://caitiem.com/?p=39 2015 has been a whirlwind of a year, which started off in a new city, with a new job as the Tech Lead of  Observability at Twitter.  The year was full of travel spanning 10 states, 3 different countries, and 2 continents.  This year I also had numerous opportunities to...

Сообщение 2015: A Year in Review появились сначала на caitiem.com.

]]>
2015 has been a whirlwind of a year, which started off in a new city, with a new job as the Tech Lead of  Observability at Twitter.  The year was full of travel spanning 10 states, 3 different countries, and 2 continents.  This year I also had numerous opportunities to share my experiences with programming and distributed systems via, talks, blog posts, podcasts, and articles.  Below is the recap.

Articles

  • Clients are Jerks: aka how Halo 4 DoSed the Services at Launch and How We Survived in Caitiem.com [June 23rd 2015 ]
  • The Verification of a Distributed System in ACM Queue [Nov/Dec 2015]

Interviews & Podcasts

  • Caitie McCaffrey on Scaling Halo 4 Services, the Orleans Actor Framework, Distributed Programming  on InfoQ [April 24th 2015]
  • Taming Distributed Architectures with Caitie McCaffrey on Software Engineering Daily [September 10th 2015]

Programming Committees

  • Taming Distributed Architectures Track for QconSF [Nov 18th 2015]

Talks

Orleans: A Framework for Cloud Computing

Presented at Papers We Love SF: Video & Slides [February 19th 2015]

Abstract

Caitie McCaffrey stops by and talks about the Orleans: Distributed Virtual Actors for Programmability and Scalability paper by Bernstein, Bykov, Geller, Kliot, and Thelin.

Orleans is a runtime and programming model for building scalable distributed systems, based on the actor model.  The Orleans programming model introduces the abstraction of Virtual Actors.  Orleans allows applications to obtain high performance, reliability, and scalability.  This technology was developed by the eXtreme Computing Group at Microsoft Research and was a core component of the Azure Services that supported that powered Halo 4, the award winning video game.

Building the Halo 4 Services with Orleans

Abstract

Halo 4 is a first-person shooter on the Xbox 360, with fast-paced, competitive gameplay. To complement the code on disc, a set of services were developed to store player statistics, display player presence information, deliver daily challenges, modify playlists, catch cheaters and more. As of June 2013 Halo 4 had 11.6 million players, who played 1.5 billion games, logging 270 million hours of gameplay.

Orleans, Distributed Virtual Actors for Programmability & Scalability, is an actor framework & runtime for building high scale distributed systems. It came from the eXtreme computing group in Microsoft Research, and is now Open Source on Github.

For Halo 4, 343 Industries built and deployed a new set of services built from the ground up to support high demand, low latency, and high availability using using Orleans and running in Window Azure. This talk will do an overview of Orleans, the challenges faced when building the Halo 4 services, and why the Actor Model and Orleans in particular were utilized to solve these problems.

Architecting & Launching the Halo 4 Services

Presented as the Closing Keynote of SRECon15: Video & Slides [March 17th 2015]

Abstract

The Halo 4 services were built from the ground up to support high demand, low latency, and high availability.  In addition, video games have unique load patterns where the majority of the traffic and sales occurs within the first few weeks after launch, making this a critical time period for the game and supporting services. Halo 4 went from 0 to 1 million users on day 1, and 4 million users within the first week.

This talk will discuss the architectural challenges faced when building these services and how they were solved using Windows Azure and Project Orleans. In addition, we’ll discuss the path to production, some of the difficulties faced, and the tooling and practices that made the launch successful.

The Saga Pattern

Presented at Craft Conf 2015 & Goto: Chicago 2015 Video & Slides [April 23rd 2015 & May 12th 2015]

Abstract

As we build larger more complex applications and solutions that need to do collaborative processing the traditional ACID transaction model using coordinated 2-phase commit is often no longer suitable. More frequently we have long lived transactions or must act upon resources distributed across various locations and trust boundaries. The Saga Pattern is a useful model for long lived activities and distributed transactions without coordination.

Sagas split work into a set of transactions whose effects can be reversed even after the work has been performed or committed. If a failure occurs compensating transactions are performed to rollback the work. So at its core the Saga is a failure Management Pattern, making it particularly applicable to distributed systems.

In this talk, I’ll discuss the fundamentals of the Saga Pattern, and how it can be applied to your systems. In addition we’ll discuss how the Halo 4 Services successfully made use of the Saga Pattern when processing game statistics, and how we implemented it in production.

Scaling Stateful Services

Presented at StrangeLoop 2015 Video & Slides [September 25th 2015]

This talk was incredibly well received, and I was flattered to see write-ups of it featured in High Scalability and InfoQ

Abstract

The Stateless Service design principle has become ubiquitous in the tech industry for creating horizontally scalable services. However our applications do have state, we just have moved all of it to caches and databases. Today as applications are becoming more data intensive and request latencies are expected to be incredibly low, we’d like the benefits of stateful services, like data locality and sticky consistency. In this talk I will address the benefits of stateful services, how to build them so that they scale, and discuss projects from Halo and Twitter of highly distributed and scalable services that implement these techniques successfully.d

On the Order of Billions

Abstract

Every minute Twitter’s Observability stack processes 2+ billion metrics in order to provide Visibility into Twitter’s distributed microservices architecture. This talk will focus on some of the challenges associated with building and running this large scale distributed system. We will also focus on lessons learned and how to build services that scale that are applicable for services of any size.

So We Hear You Like Papers

Presented as the Evening Keynote at QconSF with Ines Sombra: Video, Slides, Resources, & Moment [November 16th 2015]

Abstract

Surprisingly enough academic papers can be interesting and very relevant to the work we do as computer science practitioners. Papers come in many kinds/ areas of focus and sometimes finding the right one can be difficult. But when you do, it can radically change your perspective and introduce you to new ideas.

Distributed Systems has been an active area of research since the 1960s, and many of the problems we face today in our industry have already had solutions proposed, and have inspired new research. Join us for a guided tour of papers from past and present research that have reshaped the way we think about building large scale distributed systems.

Сообщение 2015: A Year in Review появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2015/12/26/2015-a-year-in-review/feed/ 0
A WebSocket Primer http://www.caitiem.com/2013/12/02/a-websocket-primer/ http://www.caitiem.com/2013/12/02/a-websocket-primer/#respond Tue, 13 Feb 2024 11:17:10 +0000 http://caitiem.com/?p=38 Over the past year, prior to leaving 343, I spent a large amount of time working with the WebSockets protocol and upgrading the Halo Services to support it.  In order to solidify my knowledge and provide a handy refresher for when this information invariably gets context switched out of my...

Сообщение A WebSocket Primer появились сначала на caitiem.com.

]]>
Over the past year, prior to leaving 343, I spent a large amount of time working with the WebSockets protocol and upgrading the Halo Services to support it.  In order to solidify my knowledge and provide a handy refresher for when this information invariably gets context switched out of my brain in the future, I decided to write a primer on WebSockets.  Hopefully other people will find the introduction to this new protocol useful as well.

Overview

In December 2011 the IETF standardized the WebSocket protocol.  Unlike the typical Request/Response messaging patterns provided by HTTP, this network protocol provides a full-duplex communication channel between a host and a client over TCP.  This enables server sent events, reactive user experiences, and real time components.

The WebSocket protocol provides some advantages over the traditional HTTP protocol.  Once the connection has been established, there is a point to point system of communication where both devices can communicate with one another simultaneously.  This enables server sent events without using a work around like Comet or Long Polling.  While these technologies work well, they carry the overhead of HTTP, whereas WebSocket frames have a wire-level overhead of as little as two bytes per frame.  The full-duplex communication and low packet overhead make it an ideal protocol for real-time low latency experiences.

An important note:  The WebSocket protocol is not layered on top of HTTP, nor is it an extension of the HTTP protocol.  The WebSocket protocol is a light weight protocol layered onto of TCP.  The only part HTTP plays is in establishing a WebSocket connection via the HTTP Upgrade request.  Also the HTTP Upgrade request is not specific to WebSockets but can be used to support other hand-shakes or upgrade mechanisms which will use the underlying TCP connection.

Open a WebSocket Connection

A client can establish a WebSocket connection by initiating a client handshake request.  As mentioned above the HTTP Upgrade request is used to initiate a WebSocket connection.

GET /chat HTTP/1.1
HOST: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13

If all goes well on the server and the request can be accepted then the server handshake will be returned.

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

If an error occurs and the server cannot accept the request, than a HTTP 500 should be returned to indicate that the request has failed and that the protocol is still HTTP.

Once the client server handshake is completed the TCP connection used to make the initial HTTP request has now been upgraded to a WebSocket connection.  Messages can now be sent from either the client to the server or the server to the client.

Code

As a developer most of the nuances of the WebSocket handshake are hidden away by the platform specific APIs and SDKs.  In the .NET world Windows 8 and Windows Server 2012 introduced native support for the WebSocket protocol.  In addition Internet Explorer 10 introduced native support for the WebSocket protocol as well. Also a variety of other platforms support WebSockets.

Client

Using the .NET 4.5 Framework the client code to establish a WebSocket connection in C# would look like this. ClientWebSocket webSocket = null; webSocket = new ClientWebSocket(); await webSocket.ConnectAsync(new Uri(“ws://localhost/Echo”), CancellationToken.None);

Once the connection succeeds on the client the ClientWebSocket object can be used to receive and send messages.

Server

Using the .Net 4.5 Framework on a simple server using HttpListener, the C# code to accept a WebSocket request and complete the hand-shake would look like this. HttpListenerContext listenerContext = await httpListener.GetContextAsync(); if (listenerContext.Request.IsWebSocketRequest) { WebSocketContext webSocketContext = await listenerContext.AcceptWebSocketAsync(); WebSocket webSocket = webSocketContext.WebSocket } else { //Return a 426 – Upgrade Required Status Code listenerContext.Response.StatusCode = 426; listenerContext.Response.Close(); }

The call to AcceptWebSocket request returns after the server handshake has been returned to the client.  At this point the WebSocket object can be used to send and receive messages.

WebSocket Messages

WebSocket messages are transmitted in “frames.”  Each WebSocket frame has an opcode, a payload length, and the payload data.  Each frame has a header.  The size of the header is between 2-14 bytes.  As you can see the header overhead is much smaller than the text based HTTP headers.

 Headers

0123456789ABCDEF
FinalReserved BitsOpCodeMaskPayload Indicator
Extended payload length ( present if payload is longer than 125 bytes )
Extended payload length  ( present if payload length is >= 2^16 )
Extended payload length  ( present if payload length is >= 2^16 )
Extended payload length  ( present if payload length is >= 2^16 )
MaskingKey ( present if masking bit is set )
MaskingKey ( present if masking bit is set )

The first 9 bits sent in every WebSocket frame are defined as follow

  • Final Bit (1 bit) – Indicates whether the frame is the final fragment of a message, as a large message can be broken up and sent over multiple frames.  A message that is one frame long would also set this bit to 1.
  • Reserved (3 bits) – These must be 0, and are currently reserved for extensions.
  • OpCodes (4 bits) – Opcodes define how the payload data should be interpreted
  • Masking (1 bit) – Indicates if the payload data is masked.  The WebSocket protocol specifies that all messages sent from a client to a server must be XOR masked.

The variable length of a WebSocket header is based on the size of the payload and the masking-key

  • Payload Length (7 bits, 7 + 16 bits, 7 + 64 bits) – Bits 10-16 of the header are the payload indicator bits. The number of bits used to encode the payload length varies based on the size of the payload data.
    • 0-125 bytes: payload length encoded in the payload indicator bits
    • 126 – 65,535 bytes: The payload indicator bits are set to 126, and the next two bytes are used to encode the payload length.
    • >65,535 bytes: 127 is encoded in the payload indicator bits, and the next 8 bytes are used to specify the payload length.
  • Masking-key (0 or 16 bits) – If the masking bit is set, then the 32 bit integer used to Mask the payload is specified in this field.  If the masking bit is not set than this is omitted.

 OpCodes

The following table below defines WebSocket frame OpCodes.  Applications should only set the Text or Binary OpCodes to specify how the payload data in the frame is interpreted.

CodeMeaningDescription
0x0Continuation FrameThe payload in this frame is a continuation of the message sent in a previous frame that did not have its final bit set
0x1Text FrameApplication Specific – The payload is encoded in UTF-8
0x2Binary FrameApplication Specific – The payload is a binary blob
0x8Close Connection FrameSpecifies that the WebSocket connection should be closed
0x9Ping FrameProtocol Specific – sent to check that the client is still available
0xAPong FrameProtocol Specific – response sent after receiving a ping frame.  Unsolicited pong messages can also be sent.

Code

Sending and receiving WebSocket messages is easy using the .NET Framework APIs.

Receiving a Message

byte[] receiveBuffer = new byte[receiveBufferLength]; while (webSocket.State == WebSocketState.Open) { WebSocketReceiveResult receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), CancellationToken.None); }

The WebSocketReceiveResult object contains the information sent in one WebSocket frame including the OpCode, Final Bit Setting,  Payload Length, and CloseStatus & Reason if its a Close Connection Frame.  The receiveBuffer will be populated with the data sent in the payload.

Sending a Message

Sending a message is also simple and an Async method is provided in the .NET 4.5 Framework.  The code below echos the message received back over the channel.  The data, Message Type, and Final Bit are specified in the parameter list. await webSocket.SendAsync(new ArraySegment<byte>(receiveBuffer, 0, receiveResult.Count), WebSocketMessageType.Binary, receiveResult.EndOfMessage)

Close a WebSocket Connection

Either endpoint can close the WebSocket connection.  In order to do this the endpoint starts the WebSocket Closing Handshake.  The initiating end point sends a WebSocket message with a closing status code, and an optional close reason (text), and sets the Opcode in the message to the Close Connection Frame (0x8).  Once the message is sent the endpoint will close the WebSocket connection by closing the underlying TCP connection.

As an application developer it is important to note that either endpoint, server or client, can initiate the closing handshake.  Practically this means both endpoints need to handle receiving the close frame.  It also means that some messages may not be delivered, if the connection is closed while the messages are in transit.

Connection Close Code

Connection Close frames should include a status code, which indicates the reason the WebSocket connection was closed.  These are somewhat analogous to HTTP Status Codes.

CodeDefinitionDescription
1000Normal ClosureThe purpose for which the connection was established has been fulfilled
1001Endpoint UnavailableA server is going down, or a browser has navigated away from a page
1002Protocol ErrorThe endpoint received a frame that violated the WebSocket protocol
1003Invalid Message TypeThe endpoint has received data that it does not understand.  Endpoints which only understand text may send this if they receive a binary message and vice versa
1004 -1006ReservedReserved for future use
1007Invalid Payload DataThe payload contained data that was not consistent with the type of message
1008Policy ViolationEndpoint received a message that violates its policy
1009Message Too BigEndpoint received a message that is too big for it to process.
1010Mandatory ExtensionAn endpoint is terminating the connection because it expected to negotiate one or more extensions
1011Internal ErrorThe server is terminating the connection because it encountered and unexpected error
1015TLS HandshakeUsed to designate that the connection closed because the TLS handshake failed.

Connection Close Code Ranges

CodeDefinition
0-999Not Used
1000-2999Reserved for use by Protocol Definition
3000-3999Reserved for use by libraries, frameworks & applications.  These should be registered with IANA
4000-4999Reserved for private use and can’t be registered.

 Code

Once again most of the details are dealt with by WebSocket libraries in your framework of choice.  Application developers must decide when the connection should be closed, should set the appropriate connection close code and may also set a connection close reason.

The .Net Framework makes this very easy, by providing an asynchronous method, which takes in the connection close code, and close reason as parameters. await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, “Normal Closure”, CancellationToken.None);

Microsoft WebSocket Implementations

As mentioned before Windows 8 and Windows Server 2012 introduced native support for the WebSocket protocol.  Also because the Xbox One is running a variant of the Windows 8 operating system it also has built in support for WebSockets.

.Net 4.5

Version 4.5 of the .NET framework introduced support for WebSockets through the System.Net.WebSockets namespace.  The underlying connection is passing through HTTP.sys in the kernel so timeout settings in the HTTP.sys layer might still apply.

WinRT

WinRT only exposes APIs for creating a WebSocket client connection.  There are two classes to do this in the Windows.Networking.Sockets namespace, MessageWebSocket & StreamWebSocket.

 Win32 (WinHTTP)

The WinRT API is also available to C++ developers.  For developers that want more control WinHTTP provides a set of APIs for sending  WebSocket upgrade request, and sending and receiving data on WebSocket connections.

 JavaScript

All the latest versions of common browsers, with the exception of Android, support the WebSocket protocol and API as defined by the W3C.

SignalR

The ASP.NET team has built a high-level bi-directional communication API called SignalR.  Under the hood SignalR picks the best protocol to use based on the capabilities of the clients.  If WebSockets are available it prefers to use that protocol, otherwise it falls back to other HTTP techniques like Comet and Long Polling.  SignalR has support for multiple languages including .NET, Javascript, and iOS and Android via Xamarin.  It is an open source project on GitHub.

Conclusion

WebSockets are a great new protocol to power real time applications and reactive user experiences due to its lightweight headers, and bi-directional communication.  It is also a great protocol for implementing Pub/Sub messaging patterns between servers and clients.  However WebSockets are not a silver bullet for networked communications.  WebSockets are incredibly powerful but do also have their drawbacks.  For instance because WebSockets require a persistent connection, they are consuming resources on the server and require the server to manage state.  HTTP and RESTful APIs are still incredibly useful and valid in many scenarios and developers should consider the uses of their APIs and applications when choosing which protocol to use.

Сообщение A WebSocket Primer появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2013/12/02/a-websocket-primer/feed/ 0
Origin Story: Becoming a Game Developer http://www.caitiem.com/2013/03/30/origin-story-becoming-a-game-developer/ http://www.caitiem.com/2013/03/30/origin-story-becoming-a-game-developer/#respond Tue, 13 Feb 2024 11:12:05 +0000 http://caitiem.com/?p=37 Over the past few weeks I have been asked over a dozen times how I got into the Games Industry, so I thought I would write it down. TLDR; My first Console was a SNES.  I learned to program in High School. I attended Cornell University and got a B.S....

Сообщение Origin Story: Becoming a Game Developer появились сначала на caitiem.com.

]]>
Over the past few weeks I have been asked over a dozen times how I got into the Games Industry, so I thought I would write it down.

TLDR; My first Console was a SNES.  I learned to program in High School. I attended Cornell University and got a B.S. in Computer Science.  My first job out of college was as a network tester on Gears of War 2 & 3.  I joined 343 industries as a Web Services Developer in January of 2010, and recently shipped Halo 4 on November 6th 2012.

In the Beginning

My story starts out in the typical fashion I fell in love with Video Games after my parents got me an SNES as a kid.  However, here is where my story diverges, my career in the games industry was not decided at 7.

In fact I had already chosen my career a few years earlier.  When I was 5, I announced to my mother that I did not need to learn math because I was going to be a writer when I grew up.  I had an active imagination, and loved exercising it by writing stories of my own.  My first major work was a story about ponies entitled “Hores.”  Luckily my parents would not let me give up on math, and helped me with my spelling.

It turned out that I actually did enjoy math, I just was ahead of my classmates in comprehension which is why I found it boring in grade school.  In Middle School I was placed into the Advanced Math program along with about 25 other students selected to take accelerated courses.  I enjoyed the problem sets and challenges, and more importantly I excelled at them.  This put me on Mrs. Petite’s short list of students to recruit.

The Way of the Code

Mrs. Petite taught Computer Science at my High School, and she notoriously recruited any advanced math or science student to take her class.  She was stubborn and didn’t take no for an answer so Sophomore year instead of having an extra period of study hall, like I originally intended, I was in her Intro to programming class, writing a “Hello World” application in Visual Basic.

Mrs. Petite quickly  became my favorite teacher and I took AP level Computer Science classes Junior and Senior year learning C++ and Java, respectively.  We learned programming basics, object oriented programming, and simple data structures with fun assignments like writing AI for a Tic-Tac-Toe competition, programming the game logic in Minesweeper, and creating a level in Frogger.

During High School I began to realize that I wasn’t just good at programming, but I truly enjoyed it.  Computer Science wasn’t just a science, it was a means of creation.  Like writing, programming gave me the power to start with a blank canvas and bring to life anything I could imagine.

“Programming gave me the power to start with a blank canvas and bring to life anything I could imagine.”

Throughout Middle School and High School I played my fair share of video games.  Most notably I acquired a PlayStation and raided dozens of tombs with Lara Croft, and played Duke Nukem 3D my first First Person Shooter, but games were still not my main focus.  I ended up spending more of my time programming, playing lacrosse, singing in choir, participating in student council, and spending time with my friends.  Video Games were great, but I still had not decided to pursue a career in the Games Industry.

I graduated from High School not only having learned to program in Visual Basic, C++, and Java, but with a passion for programming.  In the Fall of 2004 I decided to continue on my coding adventure by enrolling in the Engineering School at Cornell University focusing on Computer Science.

College

I entered Cornell University expecting to major in Computer Science, but to be sure I dabbled in other subjects Philosophy, Evolutionary Biology, and Civil Engineering before declaring my major.  To this day I still have a diverse set of interests and I enjoyed all of these subjects immensely, but none of them lived up to the joys of coding.

We Made It!
Computer Science Best Friends at Graduation

College was this beautiful, wonderful, stressful blur.  I ran on massive amounts of caffeine and memories of crazy weekends spent with friends.  We worked really hard, but played really hard too.  Even with all the pressure, stress, and deadlines I was having the time of my life.  The classes were fast paced, I was being challenged, and I was learning an immense amount from Data Structures to Functional Programming to Graphics to Security.

Sophomore year I declared myself for CS, and also became a Teaching Assistant for CS 211 (Object Oriented Data Structures and Programming).  In addition another immensely important event happened in the fall of my Sophomore year: I bought an Xbox 360, and Gears of War.  I loved the game, and spent many nights during winter break staying up till 2am chainsawing locusts.  I also spent a significant amount of time playing Viva Piñata that break, like I said diverse set of interests.  This new console, some fantastic games, and the Xbox Live enabled social experiences reignited my passion for gaming.  Now I began to consider Game Development as a career.

Internships

After Sophomore year I took a somewhat unconventional but completely awesome internship at Stanford’s Linear Accelerator Center (SLAC).  I lived in a house with 20 brilliant physics majors, learned about black holes, dark matter, and quantum computing while helping to manage the Batch farm which provided all the computing power for the physicists working at the center.  It was an absolutely amazing experience.

After Junior year I once again went West for the summer.  This time to Redmond Washington as a Microsoft intern working on Windows Live Experiences (WEX).  During that summer I got to exercise my coding chops and most importantly fully solidified the opinion that I wanted to be a developer.  I left the Pacific North West at the end of summer with two job offers in WEX, but by then I knew I really wanted to work on games.  So after some negotiation and another round of interviews I managed to secure a 3rd offer in Microsoft Game Studios as a Software Engineer in Test working on the Networking and Co-op of Gears of War 2.  I was beyond thrilled.

I graduated from Cornell in 2008 with a Bachelors of Science in Computer Science from the Engineering School.  It was a bittersweet moment, I had loved my time at Cornell and most of my friends were staying on the East Coast, but I knew exciting things were waiting for me in Seattle.

The Real World (Seattle)

In July of 2008 I moved out to Seattle, and joined the Microsoft Game Studios team working on Gears of War 2.  I quickly was thrown into the fire as I was assigned ownership of testing the co-op experience.  It was terrifying and exciting to be given so much responsibility right away.  I eagerly jumped into the project and joined the team in crunching immediately after starting.

The first few months in Seattle were a whirlwind as we pushed to get the game through to launch.  The hours were long but I was passionate about the project and I was learning a lot.  It was an amazingly gratifying experience the day Gears of War 2 went Gold.  When the game launched I had another immensely satisfying moment; my computer science best friend from college and I played through the game in co-op and at the end we saw my name in the credits. Life Achievement Unlocked!

Midnight Launch Halo 4
Midnight Launch Halo 4

I love social game experiences, both collaborative and competitive; So post launch I focused a lot of my energy on improving my skills in the areas of networking and services.  So as we moved into sustain on Gears of War 2 I began focusing on the matchmaking and networking experience.  I spent my free time diving through the Xbox XDK, learning about the networking stack, and playing around with Xbox Live Services.  As work began on Gears of War 3 I took ownership of testing the matchmaking code and became very involved in dedicated servers for multiplayer.

In the Fall of 2009 I was asked to temporarily help the fledging 343 Industries studio ship one of the first Xbox Title Applications, Halo Waypoint.  I knew it would mean extra hours and a lot of work, but the opportunity to work on new technology, and make connections in other parts of Microsoft Game Studios was too good to pass up.  I dove headfirst into the transport layer of the Waypoint Console app, and helped get them through launch in November 2009.

The next few months I began to evaluate what I wanted to do next in my career.  Working on Gears of War 3 was a great opportunity, but I really wanted to do be a developer.  The parts of my testing job that I found most satisfying were designing systems, coding internal tools, and researching new technology.  So when the opportunity to join 343 Industries as a developer appeared in January 2010 I jumped at it.  It was a perfect fit.  After reaching out to my contacts in 343 and then participating in a full round of interviews I was offered a position on the team as a web services developer to write code that would power the Halo Universe and enable social experiences; I excitedly accepted!

One of my first tasks at the studio was working on the Spartan Ops prototype.  I was elated that I got to utilize both my technical and creative skills to help create a brand new experience; my Spartan adventures were off to an amazing start!  The rest is history and a few years later we shipped Halo 4.  After launch I once again had a intense moment of elation after playing through Halo 4 on co-op with my college bff and seeing my name in the credits.  It never gets old.

Final Thoughts

Some thoughts, all my own and anecdotal. To be successful as a Game Developer first and foremost you have to be passionate about what you do, whether it is programming, art, design, writing, or something else.  You need to be passionate about games and your chosen field.  In addition I believe my love of learning has been a huge asset in my career development and growth.  I am not afraid to dive into new technologies, or get my hands dirty in a code base I do not understand.  I believe doing this helped me get into the industry, and continuing to do so makes me valuable.  Lastly do not be afraid to ask for what you want, no one is going to just hand you your dream job.  Of course there is a bit of luck and timing involved in breaking into the Industry, but working incredibly hard is the best way I know to help create those opportunities.

Сообщение Origin Story: Becoming a Game Developer появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2013/03/30/origin-story-becoming-a-game-developer/feed/ 0
Design Docs, Markdown, and Git http://www.caitiem.com/2020/03/29/design-docs-markdown-and-git/ http://www.caitiem.com/2020/03/29/design-docs-markdown-and-git/#respond Tue, 13 Feb 2024 11:09:12 +0000 http://caitiem.com/?p=35 About a year ago my software engineering team, the Azure Sphere Security Services (AS3) team, found ourselves struggling with our design document process.  So we ran an experiment, moving all our design documents to be written in Markdown, checked into Git, and reviewed via a pull request (PR). The experiment...

Сообщение Design Docs, Markdown, and Git появились сначала на caitiem.com.

]]>
About a year ago my software engineering team, the Azure Sphere Security Services (AS3) team, found ourselves struggling with our design document process.  So we ran an experiment, moving all our design documents to be written in Markdown, checked into Git, and reviewed via a pull request (PR). The experiment has been incredibly successful, so we’ve iterated and refined it, and have even expanded it to the broader Azure Sphere team.  The goal of this blog post is to share our process and what we learned along the way.  

Our original design doc process involved writing a Microsoft Word document and sharing it via SharePoint.  Feedback was gathered via in person reviews, doc comments, and emails. Approval was then done over email. To signal that a document was the “approved plan of record” versus “an under review draft”, we toggled a property on the document.  Users could filter documents on the SharePoint by this property to disambiguate between the two states.  

This worked fine when we were a small team, with a small number of documents, but became challenging as the team grew.  For context the Azure Sphere team, started out as a handful of people working in Microsoft Research and has grown rapidly over the past 3 years as we’ve went from research project to Generally Available Product.

Challenges

Some specific challenges were identified via the AS3 retrospective process.  When evaluating new options we kept these pain points in mind:

  • Comments: Understanding when comments were resolved and when forward progress on the document could be made was challenging. Tracking Comments in the Word Document, in person reviews, and across emails became cumbersome.  It also was unclear when comments were resolved and what the resolution was. Finally once a doc was approved, all the comments were hidden, and this often lost valuable context. 
  • Approval Process:  It was often unclear how to get approval or who had approved a document.  The Word Document did not contain a section on reviewers and approvers. As the team grew there was some ambiguity on how the Approval process worked.  In addition many approved design documents did not have the property set to approved, this resulted in the SharePoint becoming a mixture of approved, under review, and abandoned documents and it was unclear which state they were in.  
  • Context Switching:  For the Individual Contributor (IC) engineers switching contexts and using a different tool chain, Word & SharePoint, was a barrier to writing and reviewing design docs.  It added just enough friction, that design docs felt like a bigger challenge than they needed to.
  • Versioning:  Word and SharePoint do not provide a way to easily version a document.  As the team and the product grew, there was a need to update and version documents.

The Experiment

To address some of these challenges the AS3 team began writing design documents in Markdown and checking them into a new EngineeringDocs Git repo in Azure DevOps (ADO).  Reviews are conducted via pull requests by adding comments, pushing changes, and then resolving comments. Approval was given by signing off on a pull request, anything in master is considered the approved plan of record.  Versioning was also greatly simplified as anyone could submit a pull request to update the document.  

Single Repo vs Next to Code

One of the first early decisions we made was where in the codebase design documents should live.  We discussed two options

  • Single Repo: Create a new Engineering Docs Repo where all design documents would be checked in.
  • Same Repo as Code: Design Docs should be checked into the repo for the code they implement.  

We chose to use a Single Repo for several reasons:

  • Discoverability:  a downside of having the design docs living next to the code is finding all the existing design docs became challenging.  With a quickly growing team we wanted to prioritize discoverability of design decisions to help onboard new team members quickly. 
  • Large Designs: For designs that didn’t neatly map to a single service or library it was ambiguous where these design docs should live.  For example where do design docs that span multiple microservices live?  
  • Unconstrained Design:  If an author first had to pick the code repository where the design doc would live, this initial choice would artificially constrain the design, to only consider changes to that part of the code base.  By having the design docs checked into a single docs repository this artificial constraint was eliminated from the early design process. This frees up the design document author to think about the best way to implement their feature across the system as a whole.  

Conducting a Design Review

The Azure Sphere team uses the OARP model for making decisions, so the below section describes approval and stakeholders in this context.  I recommend having a well defined decision making process and integrating whatever that is for your team into the design document process.  

Identify Reviewers and Approvers via a Pull Request

The first step in our Design process is identifying the stakeholders.  The first pull request includes the title of the Design Doc and a table listing the OARP assignments for this document.  The pull request author is always the Owner.    

This serves a few purposes:  

  • It ensures how the decision is being made is clear.  
  • Informs the team that this is a problem we are attempting to solve now. 
  • Gives people the chance to opt out.  If you are listed as an Approver or a Reviewer, and would like to delegate your responsibility or opt out of the process you can.  
  • Gives people the chance to opt in.  By notifying the broader team via pull request, teammates not initially listed in the OARP model can request to be included in the design process if they are interested, or feel they have expertise to add.  This prevents people from feeling excluded or coming in late to the review process with a lot of additional feedback.

Once the stakeholders are all identified, the Approver approves the pull requests, and the Owner checks in the pull request.  

Writing the Design Document

To author the design document the owner creates a new branch modifying the checked in shell document.  It is highly recommended that input from Reviewers and Approvers is informally gathered prior to writing the document.  This can be via white board session, chats, hallway conversations, etc… This ensures that the design review process is more collaborative, and there are few surprises during the formal review process.   

Design docs are written in Markdown.  Architectural diagrams are added to the design doc by checking in images or using Mermaid.  The AS3 team often generates architectural images using Microsoft Visio.  It is highly recommended that these Visio diagrams are checked in as well for ease in modifying later.  

Once the design doc is ready for review, the engineer submits a new pull request.  All members of the OARP model are listed as reviewers on the pull request.  

Design Pull Request

Once the pull request has been submitted, design review stakeholders can read and submit feedback via comments on the pull request.  All comments must be addressed and marked as either resolved via document updates or won’t fix.  

The document can be committed to master once the Approver has approved the pull request.  This design is now considered a plan of record.

Design Review Meeting

Design review meetings are not required but often held.  A meeting invite is sent out ahead of time. Owners, Approvers and Reviewers are considered Required attendees, Participants are considered optional.  

The meeting invite should be updated with a link to the pull request for the design doc to be reviewed, at least one business day prior to the meeting.  The first 10-15 minutes of the meeting are set aside for folks to read the document and add comments to the pull request if they have not done so already.  In either scenario feedback is added via comments on the pull request.  

We provide two ways for folks to review the document, ahead of time or in the meeting to accommodate multiple working styles on the team.  So folks prefer to digest and think about a design document for a while before providing feedback, others are more comfortable providing feedback on the spot.   

After the reading period the design review meeting spends time focusing and discussing the comments.  The owner takes notes and records the in room decisions in the pull request comments.  

Updating the Design

Throughout the course of the project design docs may need to be updated.  This can happen after design if a major change was made in implementation, or could be later in the life of the project as a new feature or requirement requires a modification.  

Updating the design doc, follows a very similar process.  A pull request with proposed changes are submitted. The original Owner and Approver should be considered required reviewers.

Conclusion

The AS3 team considers the experiment incredibly successful so much so that the broader Azure Sphere team has begun adopting it, including the Program Managers.  

To summarize all the challenges we experienced with Word Documents and SharePoint were addressed by using Git and Markdown.  

  • Comments: Adding comments via the pull request makes it really clear which ones have been addressed, and which ones are still open.  In addition the resolution of the comment is clear, either Resolved or Won’t Fix. The comments and discussion in them are also not lost once the pull request is submitted, as you can always go back and look at the history of the document. 
  • Approval Process: By having an initial pull request identifying the stakeholders, how we are making the decision and who is involved is incredibly clear.  In addition the participants are durably recorded, as is the act of signing off, approving the pull request.  
  • Context Switching:  IC engineers no longer have to switch tool chains to participate in the design process.  What reviews code or design they need to do are easy to find and discover.  
  • Versioning:  Versioning documents via Git and pull requests is incredibly easy.  So much so that engineers often go and make updates to the document once they finish implementing the feature.  In addition, having the history of the document has been incredibly valuable.  

By utilizing a toolchain that developers already use day to day the process feels more lightweight, and writing design documents feels like a less arduous process.  The Program Management team has also been incredibly receptive to using Markdown and Git. While these are new tools for some of them, they’ve embraced our growth mindset culture and dove right in.  

One of the biggest benefits I’ve observed is the clarity it has brought to how decisions are made, and durably recording when things are done.  On a fast growing team like Azure Sphere having clarity and durable communication are key to successfully scaling the business and the team.

Сообщение Design Docs, Markdown, and Git появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2020/03/29/design-docs-markdown-and-git/feed/ 0
Creating RESTful Services using Orleans http://www.caitiem.com/2014/04/04/creating-restful-services-using-orleans/ http://www.caitiem.com/2014/04/04/creating-restful-services-using-orleans/#respond Tue, 13 Feb 2024 11:06:09 +0000 http://caitiem.com/?p=36 After the announce of the Orleans preview, there was a lot of discussion on Twitter.  One comment in particular caught my eye. .NET’s actor model uses static factories, RPC Interfaces and code-gen client proxies for comms, WCF all over again: http://t.co/PyIq291Kvh — Demis Bellot (@demisbellot) April 3, 2014 I think...

Сообщение Creating RESTful Services using Orleans появились сначала на caitiem.com.

]]>
After the announce of the Orleans preview, there was a lot of discussion on Twitter.  One comment in particular caught my eye.

.NET’s actor model uses static factories, RPC Interfaces and code-gen client proxies for comms, WCF all over again: http://t.co/PyIq291Kvh

— Demis Bellot (@demisbellot) April 3, 2014

I think this is a bit of a misunderstanding of how Orleans can and should be used in production services, this blog post is an attempt to clarify and demonstrate how to build RESTful, loosely coupled services using Orleans.

Orleans Programming Model

Orleans is a runtime and programming model for building distributed systems, based on the actor model.  In the programming model there are a few key terms.

  • Grains – The Orleans term for an actor.  These are the building blocks of Orleans based services.  Every actor has a unique identity and encapsulates behavior and mutable state.  Grains are isolated from one another and can only communicate via messages.  As a developer this is the level you write your code at.
  • Silos – Every machine Orleans manages is a Silo.  A Silo contains grains, and houses the Orleans runtime which performs operations like grain instantiation and look-up.
  • Orleans Clients – clients are non-silo code which makes calls to Orleans Grains.  We’ll get back to where this should live in your architecture later.

In order to create Grains developers write code in two libraries.  GrainInterfaces.dll and Grains.dll.  The GrainInterfaces library defines a strongly-typed interface for a grain.  The Method names and Properties must all be asynchronous, and these define what types of messages can be passed in the system. All Grain Interfaces must inherit from Orleans.IGrain. /// &lt;summary&gt; /// Orleans grain communication interface IHello /// &lt;/summary&gt; public interface IHello : Orleans.IGrain { Task&lt;string&gt; SayHello(); Task&lt;string&gt; SayGoodbye(); }

The Implementation of the Grains should be defined in the a separate Grains Library. All Grain implementations should implement its corresponding Grain Interface, and inherit from Orleans.GrainBase. /// &lt;summary&gt; /// Orleans grain implementation class HelloGrain. /// &lt;/summary&gt; public class HelloGrain : Orleans.GrainBase, HelloWorldInterfaces.IHello { Task&lt;string&gt; HelloWorldInterfaces.IHello.SayHello() { return Task.FromResult(&quot; I say: Hello! &quot; + DateTime.UtcNow.ToLongDateString()); } Task&lt;string&gt; HelloWorldInterfaces.IHello.SayGoodbye() { return Task.FromResult(&quot;I say: Goodbye! &quot; + DateTime.UtcNow.ToLongDateString()); } }

At compile time code is generated in the GrainInterfaces dll, to implement the code needed by the Silos to perform message passing, grain look-up etc… This code, by default will be under GrainInterfaces/properites/orleans.codegen.cs  There are a lot of interesting things happening in this file, I recommend taking a look if you want to understand the guts of Orleans a bit more.  Below I’ve pulled out snippets of the generated code.

Every GrainInterface defined in the library will have a corresponding Factory Class and GrainReference Class generated.  The Factory Class contains GetGrain methods.  These methods take in the unique grain identifier and creates a GrainReference.  If you look below you will see that the HelloGrainReference has corresponding SayHello and SayGoodbye methods with the same method signature as the Interface. public class HelloFactory { public static IHello GetGrain(long primaryKey) { return Cast(GrainFactoryBase.MakeGrainReferenceInternal(typeof(IHello), 1163075867, primaryKey)); } public static IHello Cast(IAddressable grainRef) { return HelloReference.Cast(grainRef); } [System.SerializableAttribute()] [Orleans.GrainReferenceAttribute(&quot;HelloWorldInterfaces.IHello&quot;)] internal class HelloReference : Orleans.GrainReference, IHello, Orleans.IAddressable { public static IHello Cast(IAddressable grainRef) { return (IHello) GrainReference.CastInternal(typeof(IHello), (GrainReference gr) =&gt; { return new HelloReference(gr);}, grainRef, 1163075867); } protected internal HelloReference(GrainReference reference) : base(reference) { } public System.Threading.Tasks.Task&lt;string&gt; SayHello() { return base.InvokeMethodAsync&lt;System.String&gt;(-1732333552, new object[] {}, TimeSpan.Zero ); } public System.Threading.Tasks.Task&lt;string&gt; SayGoodbye() { return base.InvokeMethodAsync&lt;System.String&gt;(-2042227800, new object[] {}, TimeSpan.Zero ); } } }

In an Orleans Client you would send a message to the HelloGrain using the following code. IHello grainRef = HelloFactory.GetGrain(0); string msg = await grainRef.SayHello(&quot;Hello Orleans!&quot;);

So at this point if you are thinking, this looks like RPC, you are right. Orleans Clients and Orleans Grains communicate with one another via Remote Procedure Calls, that are defined in the GrainInterfaces. Messages are passed via TCP connections between Orleans Clients and Grains. Grain to Grain calls are also sent over a TCP connection if they are on different machines.  This is really performant, and provides a nice programming model.  As a developer you just invoke a method, you don’t care where the code actually executes, one of the benefits of Location Transparency.

Ok stay with me, Deep Breaths.  Project Orleans is not trying to re-create WCF with hard coded data contracts and tight coupling between services & clients. Personally I hate tight coupling, ask me about BLFs, the wire-struct in the original Halo games, if you want to hear an entertaining story, but I digress…

RESTful Service Architectures

Orleans is a really powerful tool to help implement the middle tier of a traditional 3-tiered architecture. The Front-End, which is an Orleans Client, The Silos running your Grains and performing application level logic, and your Persistent Storage.

On The Front-End you can define a set of RESTful APIs (or whatever other protocol you want for that matter), which then routes incoming calls to Orleans Grains to handle application specific logic, by using the Factory methods generated in GrainInterfaces dll.  In addition the Front-End can Serialize/Deserialize messages into the loosely coupled wire-level format of your choosing (JSON, Protocol Buffers, Avro, etc…).

By structuring your services this way, you are completely encapsulating the dependency on Orleans within the service itself, while presenting a RESTful API with a loosely coupled wire struct format.  This way the clients can happily communicate with your service without fear of tight coupling or RPC.

The below code uses ASP.NET WebApi to create a Front End Http Controller that interacts with the Hello Grain. public class HelloController : ApiController { // GET api/Hello/{userId} public async Task&lt;string&gt; Get(long userId) { IHello grain = HelloFactory.GetGrain(userId); var response = await grain.SayHello(); return response; } // DELETE api/Hello/{userId} public async Task Delete(long userId) { IHello grain = HelloFactory.GetGrain(userId); var response = await grain.SayGoodbye(); return; } }

While this is a contrived example, you can see how you can map your REST resources to individual grains.

This is the architectural approach Halo 4 Services took when deploying Orleans.  We built a custom, light weight, super fast front-end that supported a set of Http APIs.  Http Requests were minimally processed by the front-end and then routed to the Orleans Grains for processing.  This allowed the game code and the services to evolve independently from one another.

The above example uses ASP.NET Web API, if you want something lighter weight checkout OWIN/Project Katana.

*HelloGrain Code Samples were taken from Project “Orleans” Samples available on Codeplex, and slightly modified.

Сообщение Creating RESTful Services using Orleans появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2014/04/04/creating-restful-services-using-orleans/feed/ 0
Clients are Jerks: aka How Halo 4 DoSed the Services at Launch & How We Survived http://www.caitiem.com/2015/06/23/clients-are-jerks-aka-how-halo-4-dosed-the-services-at-launch-how-we-survived/ http://www.caitiem.com/2015/06/23/clients-are-jerks-aka-how-halo-4-dosed-the-services-at-launch-how-we-survived/#respond Tue, 13 Feb 2024 11:03:23 +0000 http://caitiem.com/?p=34 At 3am PST November 5th 2012 I sat fidgeting at my desk at 343 Industries watching graphs of metrics stream across my machine, Halo 4 was officially live in New Zealand, and the number of concurrent users began to gradually increase as midnight gamers came online and began to play....

Сообщение Clients are Jerks: aka How Halo 4 DoSed the Services at Launch & How We Survived появились сначала на caitiem.com.

]]>
At 3am PST November 5th 2012 I sat fidgeting at my desk at 343 Industries watching graphs of metrics stream across my machine, Halo 4 was officially live in New Zealand, and the number of concurrent users began to gradually increase as midnight gamers came online and began to play.  Two hours later at 5am Australia came online and we saw another noticeable spike in concurrent users.

With AAA video games, especially multiplayer games, week one is when you see the most concurrent users.  Like Blockbuster movies, large marketing campaigns, trade shows, worldwide release dates, and press all converge to create excitement around launch.  Everyone wants to see the movie or play the game with their friends the first week it is out.  The energy around a game launch is intoxicating.  However, running the services powering that game is terrifying.  There is nothing like production data, and we were about to get a lot of it over the next few days.  To be precise Halo 4 saw 4 million unique users in the first week who racked up 31.4 million hours of gameplay.

At midnight on November 6th PST I stood in a parking lot outside of a Microsoft Store in Seattle surrounded by 343i team members and fans who came out to celebrate the launch with us and get the game at midnight PST.  I checked in with the on call team, Europe and the East Coast of the US had also come online smoothly.  In addition the real time Cheating & Banning system I wrote in a month and half before launch had already caught and banned 3 players who had modded their Xbox in the first few hours, I was beyond thrilled.  Everything was going according to plan so after a few celebratory beers, I headed back into the office to take over the graveyard shift and continue monitoring the services.  The next 48 hours were critical and likely when we would be seeing our peak traffic.

As the East Coast of the United States started playing Halo after work on launch day we hit higher and higher numbers of concurrent users.  Suddenly one of our APIs related to Cheating & Banning was hitting an abnormally high failure rate, and starting to affect other parts of the Statistics Service.  As the owner of the Halo 4 Statistics Service and the Cheating & Banning Service I Ok’d throwing the kill switch on the API and then began digging in.

The game was essentially DoSing us.  We were receiving 10x the number of expected requests to our service on this particular API, due to a bug in the client which reported suspicious activity for almost all online players.  The increased number of requests caused us to blow through our IOPS limit in Azure Storage, which correctly throttled and rejected our exorbitant number of operations.  This caused the request from the game to fail, and then the game would retry the request three times, creating a retry storm, only exacerbating the attack.

Game Over Right?  Wrong.  Halo 4 had no major outages during launch week, the time notorious for games to have outages.  The Halo 4 Services survived because they were architected for maximum availability and graceful degradation.  The core APIs and component of the Halo Services necessary to play the game were explicitly called out and extra measures were taken to protect them.  We had a game plan to survive launch, which involved sacrificing everything that was not those core components if necessary.  Our team took full ownership of our core service’s availability, we did not just anticipate failure, we expected it.  We backed up our backups for statistics data requiring multiple separate storage services to fail before data loss would occur, built in kill switches for non essential features, and had a healthy distrust of our clients.

The kill switch I mentioned earlier saved the services from the onslaught of requests made by the game.  We had built in a dynamically configurable switch into our routing layer, which could be tuned per API.  By throwing the kill switch, we essentially re-routed traffic to a dummy handler which returned a 200 and dropped the data on the floor or logged it to a storage account for later analysis.  This stopped the retry storm, stabilized the service, and alleviated the pressure on the storage accounts used for Cheating & Banning.  In addition, the Cheating & Banning service continued to function correctly because we had more reliable data coming in via game events on a different API.

The game clients were being jerks (a bug in the code caused an increase in requests) so I had no qualms about lying to them (sending back an HTTP 200 and then promptly dropping the data on the floor) especially since this API was not one of the critical components for playing Halo 4.  In fact had we not built in the ability to lie to the clients we most certainly would have had an outage at launch.

But the truth is the game devs I worked closely with over countless tireless hours leading up to launch weren’t jerks, and they weren’t incompetent.  In fact they were some of the best in the industry.  We all wanted a successful launch so how did our own in house client end up DoSing the services? The answer is Priorities.  The client developers for Halo 4 have a much different set of priorities: gameplay, graphics, and peer to peer networking were at the forefront of their mind and resource allocations, not how many requests per second they were sending to the services.

Client priorities are often very different than the services they consume, even in house clients.  This is true for games, websites, mobile apps, etc…  In fact it is not only limited to pure clients it is even true for microservices communicating with one another.  These priority differences manifest in a multitude of ways: sending too much data on a request, sending too many requests, asking for too much data or an expensive query to be ran, etc… The list goes on and on, because the developers consuming your service are often focused on a totally different problem and not your failure modes and edge cases.  In fact one of the major benefits of SOA and microservices is to abstract away the details of a service’s execution to reduce the complexity one developer has to think about at any given time.

Bad client behavior happens all over the place not just in games. Astrid Atkinson just said in her Velocity Conf talk “Google’s biggest DoS attacks always comes from ourselves.”  In addition, I’m currently working on fixing a service at Twitter which is completely trusting of internal clients allowing them to make exorbitant requests.  These requests result in the service failing, a developer getting paged with no means of remediating the problem, and the inspiration for finally writing this post.  Misbehaving clients are common in all stacks, and are not the bug.  The bug is the implicit assumption that because the clients are internal they will use the API in the way it was designed to be used.

Implicit assumptions are the killer of any Distributed System.

Truly robust, reliable services must plan for bad client behavior and explicitly enforce assumptions.  Implicitly assuming that your clients will “do the right thing” makes your services vulnerable.  Instead explicitly set limits and enforce them either manually via alerting, monitoring and operational runbooks, or automatically via backpressure, and flow control.  Halo 4 launch was successful because we did not implicitly trust our clients instead we assumed they were jerks.

Much thanks to Ines Sombra for reviewing early drafts

Сообщение Clients are Jerks: aka How Halo 4 DoSed the Services at Launch & How We Survived появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2015/06/23/clients-are-jerks-aka-how-halo-4-dosed-the-services-at-launch-how-we-survived/feed/ 0
Recommended Engineering Management Books http://www.caitiem.com/2020/12/28/recommended-engineering-management-books/ http://www.caitiem.com/2020/12/28/recommended-engineering-management-books/#respond Tue, 13 Feb 2024 11:00:58 +0000 http://caitiem.com/?p=33 Over the past 3.5 years my career has grown and transformed from Individual Contributor (IC) to an Engineering Manager of multiple teams, and all the roles in between as I built the Azure Sphere Security Services (AS3) Team from 2 people to 20 people.  I undertook this journey in the...

Сообщение Recommended Engineering Management Books появились сначала на caitiem.com.

]]>
Over the past 3.5 years my career has grown and transformed from Individual Contributor (IC) to an Engineering Manager of multiple teams, and all the roles in between as I built the Azure Sphere Security Services (AS3) Team from 2 people to 20 people.  I undertook this journey in the Summer of 2017 to help transform a Microsoft Research project, Project Sopris, into a Generally Available (GA) product Azure Sphere. 

As the AS3 Team grew, I went through a massive amount of personal growth and learning as well.  In December 2017 I stepped into a manager role for the first time in my career.  I had been a professional software engineer for over a decade at this point, and was up for a totally new challenge.  I knew that the skills and job of growing and managing a team and then an organization were totally different than what I had been actively developing over the last decade of my career.  As I went through this period of growth I was lucky enough to have several friends, coaches, and mentors share their experiences with me, and recommend some great books to help me along my learning journey. 

Below is my curated list of the most influential and impactful books that helped me along the way, and that I highly recommend to Engineering Managers 

The Manager’s Path: A Guide for Tech Leaders Navigating Growth & Change by Camille Fournier

This is a must read for anyone considering managing engineering teams, it is as good as or better than the hype surrounding it.  The book starts off from the individual contributor perspective and then each subsequent chapter explores the next level of management complexity.  Each chapter focuses on an engineering management role, like technical lead, manager of people, or manager of multiple teams, and manager of managers.

Over the past 3.5 years I’ve come back to this book several times, re-reading the chapters around my newest role.  For instance when I transitioned into a manager of managers role, I re-read that chapter and the one before and after it.  These were great reminders on what I should be focused on, what challenges my directs were facing, and what challenges and motivations my boss was focused on. 

One of the sections I really loved was on debugging dysfunctional teams.  As an engineer I am a great debugger, often able to piece together logs, metrics, and weird behavior to diagnose what’s going wrong in a system.  This was the first time I had seen the term debugging applied to people and organizations, and it helped frame the work of how to start investigating dysfunction in a team, and how to discover the source of the problem. 

Honestly I recommend this book to every engineer, solely for the chapter on how to be managed, and what you should and can expect from your manager.  Even if you never plan on taking on a management role this book provides an excellent overview of the challenges and motivations of folks in varying levels of an engineering organization and will help you better navigate your org. 

Thanks for the Feedback by Douglas Stone & Sheila Heen

Prior to managing I had given peer feedback as part of performance reviews at various companies, but I did not consider this a strength of mine.  Now giving feedback was a critical skill for my role as a manager.  Early on I sought out several resources on how to give and receive feedback well, and found “Thanks for the Feedback” to be an tremendous resource.

Thanks for the Feedback is framed as a resource for receiving and processing the multitudes of feedback you receive.  However, any one who reads this book, will also learn how to be a more skilled feedback giver. 

I highly recommend reading the book in its entirety, but I wanted to share one of the fundamental ideas in this book that resonated with me.  Feedback is really three different things, with three different purposes:

  • Appreciation,  the goal is to help people feel appreciated.  Sometimes they may just need motivation and encouragement to keep going when tackling tough problems
  • Coaching, the goal is to help the receiver expand their knowledge, sharpen skills, and increase confidence. 
  • Evaluation, the goal here is to rate or rank the receiver’s work against a set of standards to align expectations and inform decision making. 

This idea that feedback serves a variety of different purposes was eye opening, and help me more clearly think about how and when to provide feedback.  Taking this perspective along with the multitude of lessons learned from this book was tremendously helpful to me.

The Hard Thing About Hard Things: Building a Business When There are No Easy Answers by Ben Horowitz

The book starts off acknowledging a problem with most management books “they attempt to provide a recipe for challenges that have no recipes.  There’s no recipe for really complicated, dynamic situations.”  This instantly resonated with me as I was reading it in November 2019.  I had been a manager for about two years, and had grown the AS3 team from 2 to 15 people.  We’d spent the past two years turning a research project into what would soon be a GA’d product in February 2020.  There is no recipe for how to do this, what I was doing was a hard thing. 

Horowitz packs the book with a variety of entertaining stories from his career as a venture capitalist and entrepreneur to emphasize his message and points on leadership and various challenges.  In addition there are a few key takeaways that have stayed with me.

  • Take care of the people, the product, and the profits in that order.  This quote resonated with me and the kind of organization I want to build and run.  Chapter 5, which bears this quote as a title goes into some of the challenges that can arise when setting out with this mission and how to overcome them, like hiring, training, and management debt. 
  • Part of Management is training your people. In big companies this is often neglected as it’s the role of HR or corporate to produce training programs.  These can sometimes be valuable but are often overly generic.  As a manager make sure you are training your people for the specific job they are doing.  Things like how to work in your specific code base, what are the architectural standards and guidelines, what makes a good design doc, etc…Make sure you are intentionally designing on boarding programs, meetings, and workshops to facilitate learning.

Accelerate: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, PhD, Jez Humble, and Gene Kim

Accelerate is a summary of the research and learnings discovered by Dr Forsgren and her colleagues from the 2014-2017 State of DevOps report.  This book is a must read for any engineering leader, as it gives you a clear outline on how to set your team up for success by investing in 24 key capabilities which will drive improvement in your teams software delivery performance. 

There are some non-surprising findings like using version control is important to team performance, and other not so obvious ones like focusing on continuous integration and software delivery performance positively impacts culture.

This book and the ongoing research released by Dr. Forsgren, et. Al has had a huge influence on the priorities of my team.  We embraced Accelerate principles early on in the development of the Azure Sphere Security Services, and that investment has paid huge dividends over time. 

Dare to Lead: Brave Work.  Tough Conversations.  Whole Hearts. by Brene Brown

I’m a huge fan of Brene Brown’s work as a shame and vulnerability researcher.  In October 2018 she released Dare to Lead applying her research to Leadership.  This quote sums up why this book is important

“There’s an old saying that I lead by now: “People don’t care how much you know until they know how much you care.” Brene Brown

This book dives into what it takes to become a brave and courageous leader.  Spoiler a lot of it involves embracing vulnerability and living authentically, something that is far easier to say than do.  The book defines vulnerability as “the emotion we experience during times of uncertainty, risk, and emotional exposure…Vulnerability is not winning or losing.  It’s having the courage to show up when you can’t control the outcome.”

One section I continually return to is the definitions of the two leadership styles: armored leadership and daring leadership.  Brown’s research uncovered 16 traits for each of these leadership styles and breaks each one of them down.  I highly recommend going through these periodically and checking in on your own leadership style and where there is room for growth and improvement. 

Another great exercise that Brown presents in the book is one around values.  She challenges readers to pick one to two core values, not ten or fifteen, because as Jim Collins said “If you have more than three priorities, you have no priorities.”  Having clarity around your core values allows you to live and lead more authentically. 

Overall this book is one of the best for dealing with the “soft skills” of leadership.  How do you show up whole heartedly and help create a team that can do the same is one of the biggest challenges of taking on a management role. 

Switch: How to Change Things When Change is Hard

This book is an awesome resource for understanding the psychology and sociology behind what motivates people and how to be successful enacting change from small to epic scale. I loved this book as it provides real practical tips for how to enact change across a team or organization through researched principles and real world examples. 

The book starts by acknowledging that people have both emotional and rational sides, and motivating each is important for effective change.  In engineering orgs we often only motivate the rationale side, because the analytical brain is all important.  Sometimes we fall into the trap of “motivating” people with data, reason, and education and then wonder why these initiatives fail.  But here’s the thing engineers are human beings too and we are all motivated by and susceptible to decisions made by our emotional brain.
The authors lay out three keys to behavior change:

  • Give clear direction to reduce mental paralysis. 
  • Find an emotional connection to motivate people
  • Shape the path.  Make doing the right thing easy, by removing obstacles, modifying the environment, and building habits

Atomic Habits: An Easy & Proven Way to Build Good Habits by James Clear

This book is another favorite as it provides a lot of practical advice.  The premise of this book is that small improvements accumulate into remarkable results over time.  In addition building systems and processes is the most effective way to achieve your goals. 

Habits are simply a behavior that has been repeated so many times it becomes automatic and therefore require less time and energy.  Clear spends the book describing research around habits and how to successfully create new good ones and break old bad ones.  By leveraging habits we can create systems that more easily help us achieve our goals.

I love this book for two reasons, its generally applicable to many aspects of your life, and it helped me think about how to help my team establish and build good habits, or potentially break bad ones. 

Сообщение Recommended Engineering Management Books появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2020/12/28/recommended-engineering-management-books/feed/ 0
Resources for Getting Started with Distributed Systems http://www.caitiem.com/2017/09/07/getting-started-with-distributed-systems/ http://www.caitiem.com/2017/09/07/getting-started-with-distributed-systems/#respond Tue, 13 Feb 2024 10:52:13 +0000 http://caitiem.com/?p=24 I’m often asked how to get started with Distributed Systems, so this post documents my path and some of the resources I found most helpful.  It is by no means meant to be an exhaustive list. It is worth noting that I am not classically trained in Distributed Systems.  I...

Сообщение Resources for Getting Started with Distributed Systems появились сначала на caitiem.com.

]]>
I’m often asked how to get started with Distributed Systems, so this post documents my path and some of the resources I found most helpful.  It is by no means meant to be an exhaustive list.

It is worth noting that I am not classically trained in Distributed Systems.  I am mostly self taught via independent study and on the job experience.  I do have a B.S. in Computer Science from Cornell, but focused mostly on graphics and security in my specialization classes.  My love of Distributed Systems and education in it came once I entered industry.  The moral of this story is that understanding distributed systems doesn’t require academic intervention to learn and excel at.

Books on Theory & Background

  • Introduction to Reliable and Secure Distributed Programming: This book is an excellent introduction to the fundamentals of distributed computing.  It definitely takes an academic approach.  But is a good place to start to understand the terminology and challenges in the field.
  • Replication: Theory and Practice: This book is a summary of 30 years of distributed systems research on replication up to 2007.  Its a great starter and contains all the references to the original work.  Each chapter is incredibly dense, and led me down multiple paper rabbit holes.

Papers

This is by no means an exhaustive list, but these papers I keep coming back to, and they have significantly shaped the way I think about Distributed Systems.

  • Time, Clocks, and the Ordering of Events in Distributed Systems
  • Impossibility of Distributed Consensus with One Faulty Process
  • Unreliable Failure Detectors for Reliable Distributed Systems
  • CAP Twelve Years Later: How the Rules Have Changed
  • Harvest, Yield and Scalable Tolerant Systems
  • Dynamo, Amazon’s Highly Available Key Value Store
  • The Chubby Lock Service for Loosely-Coupled Distributed System
  • Fallacies of Distributed Computing

A Note on Reading Papers

A note on reading papers: I start with the Abstract, if I find in interesting I’ll proceed onto the Introduction, then the Conclusion.  Only then if I am incredibly interested in the implementation or details will I read the whole thing.  Also the References are a gold mine, they cite related and foundational work.  Often times reading papers is a recursive process.  I’ll start on one then find a concept I’m unfamiliar with or don’t understand, so I’ll read the referenced paper and so on.  This often times results in going down the paper rabbit holes, and one time resulted in me reading a dissertation from the 1980s but it is a great way to learn.

I also highly recommend Michael Bernstein’s blog post “Should I Read Papers?” for more on the motivations and how to read an academic paper.

Blog Posts & Talks

Below is a list of some of my favorite blog posts and talks that shaped how I think about building Distributed Systems.  Most of these are old, but I keep coming back to them, and still find them relevant today.

  • Notes on Distributed Systems for Young Bloods by Jeff Hodges
  • Jepsen Blog Posts by Kyle Kingsbury 
  • Everything Will Flow: Distributed Queues & Backpressure by Zach Tellman
  • Bad As I Wanna Be: Coordination and Consistency in Distributed Systems by Peter Bailis

Learning from Industry

The art of building, operating, and running distributed systems in industry is orthogonal to the theory of Distributed Systems.  I truly believe that the best way to learn about Distributed Systems is to get hands on experience working on one.

In addition Post Mortems are another great source of information.  Large tech companies, like Amazon, Netflix, Google, and Microsoft, often publish a post mortem after a major outage.  These are usually pretty dry to read, but contain some hard learned lessons.

Сообщение Resources for Getting Started with Distributed Systems появились сначала на caitiem.com.

]]>
http://www.caitiem.com/2017/09/07/getting-started-with-distributed-systems/feed/ 0