A WebSocket Primer Posted on 13.02.202413.02.2024 By caiti335 Over the past year, prior to leaving 343, I spent a large amount of time working with the WebSockets protocol and upgrading the Halo Services to support it. In order to solidify my knowledge and provide a handy refresher for when this information invariably gets context switched out of my brain in the future, I decided to write a primer on WebSockets. Hopefully other people will find the introduction to this new protocol useful as well. Overview In December 2011 the IETF standardized the WebSocket protocol. Unlike the typical Request/Response messaging patterns provided by HTTP, this network protocol provides a full-duplex communication channel between a host and a client over TCP. This enables server sent events, reactive user experiences, and real time components. The WebSocket protocol provides some advantages over the traditional HTTP protocol. Once the connection has been established, there is a point to point system of communication where both devices can communicate with one another simultaneously. This enables server sent events without using a work around like Comet or Long Polling. While these technologies work well, they carry the overhead of HTTP, whereas WebSocket frames have a wire-level overhead of as little as two bytes per frame. The full-duplex communication and low packet overhead make it an ideal protocol for real-time low latency experiences. An important note: The WebSocket protocol is not layered on top of HTTP, nor is it an extension of the HTTP protocol. The WebSocket protocol is a light weight protocol layered onto of TCP. The only part HTTP plays is in establishing a WebSocket connection via the HTTP Upgrade request. Also the HTTP Upgrade request is not specific to WebSockets but can be used to support other hand-shakes or upgrade mechanisms which will use the underlying TCP connection. Open a WebSocket Connection A client can establish a WebSocket connection by initiating a client handshake request. As mentioned above the HTTP Upgrade request is used to initiate a WebSocket connection. GET /chat HTTP/1.1HOST: server.example.comUpgrade: websocketConnection: UpgradeSec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==Origin: http://example.comSec-WebSocket-Protocol: chat, superchatSec-WebSocket-Version: 13 If all goes well on the server and the request can be accepted then the server handshake will be returned. HTTP/1.1 101 Switching ProtocolsUpgrade: websocketConnection: UpgradeSec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo= If an error occurs and the server cannot accept the request, than a HTTP 500 should be returned to indicate that the request has failed and that the protocol is still HTTP. Once the client server handshake is completed the TCP connection used to make the initial HTTP request has now been upgraded to a WebSocket connection. Messages can now be sent from either the client to the server or the server to the client. Code As a developer most of the nuances of the WebSocket handshake are hidden away by the platform specific APIs and SDKs. In the .NET world Windows 8 and Windows Server 2012 introduced native support for the WebSocket protocol. In addition Internet Explorer 10 introduced native support for the WebSocket protocol as well. Also a variety of other platforms support WebSockets. Client Using the .NET 4.5 Framework the client code to establish a WebSocket connection in C# would look like this. ClientWebSocket webSocket = null; webSocket = new ClientWebSocket(); await webSocket.ConnectAsync(new Uri(“ws://localhost/Echo”), CancellationToken.None); Once the connection succeeds on the client the ClientWebSocket object can be used to receive and send messages. Server Using the .Net 4.5 Framework on a simple server using HttpListener, the C# code to accept a WebSocket request and complete the hand-shake would look like this. HttpListenerContext listenerContext = await httpListener.GetContextAsync(); if (listenerContext.Request.IsWebSocketRequest) { WebSocketContext webSocketContext = await listenerContext.AcceptWebSocketAsync(); WebSocket webSocket = webSocketContext.WebSocket } else { //Return a 426 – Upgrade Required Status Code listenerContext.Response.StatusCode = 426; listenerContext.Response.Close(); } The call to AcceptWebSocket request returns after the server handshake has been returned to the client. At this point the WebSocket object can be used to send and receive messages. WebSocket Messages WebSocket messages are transmitted in “frames.” Each WebSocket frame has an opcode, a payload length, and the payload data. Each frame has a header. The size of the header is between 2-14 bytes. As you can see the header overhead is much smaller than the text based HTTP headers. Headers 0123456789ABCDEFFinalReserved BitsOpCodeMaskPayload IndicatorExtended payload length ( present if payload is longer than 125 bytes )Extended payload length ( present if payload length is >= 2^16 )Extended payload length ( present if payload length is >= 2^16 )Extended payload length ( present if payload length is >= 2^16 )MaskingKey ( present if masking bit is set )MaskingKey ( present if masking bit is set ) The first 9 bits sent in every WebSocket frame are defined as follow Final Bit (1 bit) – Indicates whether the frame is the final fragment of a message, as a large message can be broken up and sent over multiple frames. A message that is one frame long would also set this bit to 1. Reserved (3 bits) – These must be 0, and are currently reserved for extensions. OpCodes (4 bits) – Opcodes define how the payload data should be interpreted Masking (1 bit) – Indicates if the payload data is masked. The WebSocket protocol specifies that all messages sent from a client to a server must be XOR masked. The variable length of a WebSocket header is based on the size of the payload and the masking-key Payload Length (7 bits, 7 + 16 bits, 7 + 64 bits) – Bits 10-16 of the header are the payload indicator bits. The number of bits used to encode the payload length varies based on the size of the payload data. 0-125 bytes: payload length encoded in the payload indicator bits 126 – 65,535 bytes: The payload indicator bits are set to 126, and the next two bytes are used to encode the payload length. >65,535 bytes: 127 is encoded in the payload indicator bits, and the next 8 bytes are used to specify the payload length. Masking-key (0 or 16 bits) – If the masking bit is set, then the 32 bit integer used to Mask the payload is specified in this field. If the masking bit is not set than this is omitted. OpCodes The following table below defines WebSocket frame OpCodes. Applications should only set the Text or Binary OpCodes to specify how the payload data in the frame is interpreted. CodeMeaningDescription0x0Continuation FrameThe payload in this frame is a continuation of the message sent in a previous frame that did not have its final bit set0x1Text FrameApplication Specific – The payload is encoded in UTF-80x2Binary FrameApplication Specific – The payload is a binary blob0x8Close Connection FrameSpecifies that the WebSocket connection should be closed0x9Ping FrameProtocol Specific – sent to check that the client is still available0xAPong FrameProtocol Specific – response sent after receiving a ping frame. Unsolicited pong messages can also be sent. Code Sending and receiving WebSocket messages is easy using the .NET Framework APIs. Receiving a Message byte[] receiveBuffer = new byte[receiveBufferLength]; while (webSocket.State == WebSocketState.Open) { WebSocketReceiveResult receiveResult = await webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), CancellationToken.None); } The WebSocketReceiveResult object contains the information sent in one WebSocket frame including the OpCode, Final Bit Setting, Payload Length, and CloseStatus & Reason if its a Close Connection Frame. The receiveBuffer will be populated with the data sent in the payload. Sending a Message Sending a message is also simple and an Async method is provided in the .NET 4.5 Framework. The code below echos the message received back over the channel. The data, Message Type, and Final Bit are specified in the parameter list. await webSocket.SendAsync(new ArraySegment<byte>(receiveBuffer, 0, receiveResult.Count), WebSocketMessageType.Binary, receiveResult.EndOfMessage) Close a WebSocket Connection Either endpoint can close the WebSocket connection. In order to do this the endpoint starts the WebSocket Closing Handshake. The initiating end point sends a WebSocket message with a closing status code, and an optional close reason (text), and sets the Opcode in the message to the Close Connection Frame (0x8). Once the message is sent the endpoint will close the WebSocket connection by closing the underlying TCP connection. As an application developer it is important to note that either endpoint, server or client, can initiate the closing handshake. Practically this means both endpoints need to handle receiving the close frame. It also means that some messages may not be delivered, if the connection is closed while the messages are in transit. Connection Close Code Connection Close frames should include a status code, which indicates the reason the WebSocket connection was closed. These are somewhat analogous to HTTP Status Codes. CodeDefinitionDescription1000Normal ClosureThe purpose for which the connection was established has been fulfilled1001Endpoint UnavailableA server is going down, or a browser has navigated away from a page1002Protocol ErrorThe endpoint received a frame that violated the WebSocket protocol1003Invalid Message TypeThe endpoint has received data that it does not understand. Endpoints which only understand text may send this if they receive a binary message and vice versa1004 -1006ReservedReserved for future use1007Invalid Payload DataThe payload contained data that was not consistent with the type of message1008Policy ViolationEndpoint received a message that violates its policy1009Message Too BigEndpoint received a message that is too big for it to process.1010Mandatory ExtensionAn endpoint is terminating the connection because it expected to negotiate one or more extensions1011Internal ErrorThe server is terminating the connection because it encountered and unexpected error1015TLS HandshakeUsed to designate that the connection closed because the TLS handshake failed. Connection Close Code Ranges CodeDefinition0-999Not Used1000-2999Reserved for use by Protocol Definition3000-3999Reserved for use by libraries, frameworks & applications. These should be registered with IANA4000-4999Reserved for private use and can’t be registered. Code Once again most of the details are dealt with by WebSocket libraries in your framework of choice. Application developers must decide when the connection should be closed, should set the appropriate connection close code and may also set a connection close reason. The .Net Framework makes this very easy, by providing an asynchronous method, which takes in the connection close code, and close reason as parameters. await webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, “Normal Closure”, CancellationToken.None); Microsoft WebSocket Implementations As mentioned before Windows 8 and Windows Server 2012 introduced native support for the WebSocket protocol. Also because the Xbox One is running a variant of the Windows 8 operating system it also has built in support for WebSockets. .Net 4.5 Version 4.5 of the .NET framework introduced support for WebSockets through the System.Net.WebSockets namespace. The underlying connection is passing through HTTP.sys in the kernel so timeout settings in the HTTP.sys layer might still apply. WinRT WinRT only exposes APIs for creating a WebSocket client connection. There are two classes to do this in the Windows.Networking.Sockets namespace, MessageWebSocket & StreamWebSocket. Win32 (WinHTTP) The WinRT API is also available to C++ developers. For developers that want more control WinHTTP provides a set of APIs for sending WebSocket upgrade request, and sending and receiving data on WebSocket connections. JavaScript All the latest versions of common browsers, with the exception of Android, support the WebSocket protocol and API as defined by the W3C. SignalR The ASP.NET team has built a high-level bi-directional communication API called SignalR. Under the hood SignalR picks the best protocol to use based on the capabilities of the clients. If WebSockets are available it prefers to use that protocol, otherwise it falls back to other HTTP techniques like Comet and Long Polling. SignalR has support for multiple languages including .NET, Javascript, and iOS and Android via Xamarin. It is an open source project on GitHub. Conclusion WebSockets are a great new protocol to power real time applications and reactive user experiences due to its lightweight headers, and bi-directional communication. It is also a great protocol for implementing Pub/Sub messaging patterns between servers and clients. However WebSockets are not a silver bullet for networked communications. WebSockets are incredibly powerful but do also have their drawbacks. For instance because WebSockets require a persistent connection, they are consuming resources on the server and require the server to manage state. HTTP and RESTful APIs are still incredibly useful and valid in many scenarios and developers should consider the uses of their APIs and applications when choosing which protocol to use. Tech Insights
2015: A Year in Review Posted on 13.02.202413.02.2024 2015 has been a whirlwind of a year, which started off in a new city, with a new job as the Tech Lead of Observability at Twitter. The year was full of travel spanning 10 states, 3 different countries, and 2 continents. This year I also had numerous opportunities to… Read More
Recommended Engineering Management Books Posted on 13.02.202413.02.2024 Over the past 3.5 years my career has grown and transformed from Individual Contributor (IC) to an Engineering Manager of multiple teams, and all the roles in between as I built the Azure Sphere Security Services (AS3) Team from 2 people to 20 people. I undertook this journey in the… Read More
Origin Story: Becoming a Game Developer Posted on 13.02.202413.02.2024 Over the past few weeks I have been asked over a dozen times how I got into the Games Industry, so I thought I would write it down. TLDR; My first Console was a SNES. I learned to program in High School. I attended Cornell University and got a B.S…. Read More