Decoding BlazorPack
TL;DR: I couldn’t make a custom BlazorPack editor work in Burp, so I used Mallet instead. From an indecipherable binary mess to this, in about 100 lines:
Decoded BlazorPack messages
For details on how to do this yourself, even for other protocols, read on!
On a recent assessment, Marianka ran into a website using BlazorPack. As Microsoft describes it: “Today’s modern apps are expected to deliver up-to-date information without hitting a refresh button. Add real-time functionality to your dashboards, maps, games and more.”
TL;DR: I couldn’t make a custom BlazorPack editor work in Burp, so I used Mallet instead. From an indecipherable binary mess to this, in about 100 lines:
Decoded BlazorPack messages
For details on how to do this yourself, even for other protocols, read on!
On a recent assessment, Marianka ran into a website using BlazorPack. As Microsoft describes it: “Today’s modern apps are expected to deliver up-to-date information without hitting a refresh button. Add real-time functionality to your dashboards, maps, games and more.”
The initial login used Office365 credentials, via OAuth, downloaded some resources, then transitioned to WebSockets for the rest of the application. After a quick JSON-formatted protocol negotiation, the remainder of the communication was in a binary format, making it really difficult to try to tamper with it, or even to understand what was really going on.
Burp view of BlazorPack websockets traffic (note the 4096-byte packets!)
It turns out that there are two versions/implementations of Blazor, client-side and server-side. The client-side version transfers a large WebAssembly blob from the server, which then interacts with the server using a series of HTTP requests. The server-side version actually keeps the application state on the server, and simply sends a presentation layer down to the browser. Having done that, all user interaction (clicks, keystrokes, etc) gets sent back to the server over WebSockets, and the server then sends new rendering instructions back to the browser. We were dealing with this server-side implementation of Blazor.
(You can find a demo site using the server-side approach here, which can be used to walk through the rest of this blog post.)
Now, while you can sometimes change individual bytes in a binary payload, without too much risk of breaking it horribly, actually figuring out which bytes do what can be quite a task. Ideally, we’d like to figure out how to decode this binary stream into something (somewhat) more comprehensible, and then how to re-encode any changes that we make.
A bit of research turned up the DotNet source code for the blazorpack protocol, part of the Microsoft DotNet repository on GitHub. From there I could see that the binary protocol was constructed using a Protobuf-style varint representing the length of the message, with the indicated number of bytes following being a MessagePack-encoded blob.
My initial thought was to use BurpSuite’s extension API to make a Editor that would decode the various WebSocket frames, and present them in a readable form, perhaps JSON encoded for easy tampering. However, I was stumped almost immediately when I realised that the WebSocket frames shown by Burp were a maximum of 4096 bytes each, but the actual message could be far larger than that, spread over several frames. From what I could see, Burp had no support for aggregating multiple WebSocket frames (Continuation Frames) into a single entity, and so any attempt to decode a message that was spread over multiple frames would be doomed to fail. Perhaps PortSwigger could consider adding this to BurpSuite. (This post was written before PortSwigger announced their new API, but from what I can see of the Montoya API, aggregating WebSocket frames is still not supported. It would also be nice to see whether a WebSocket frame is Text or Binary, but I digress!)
Of course, this was not the end of the road! Mallet is a tool that I have been working on for several years, aimed at exactly this problem – proxying and intercepting arbitrary protocols!
To solve this problem, we’d need to put a few building blocks in place first. Mallet already had support for HTTP (1.0 and 1.1), as well as WebSockets. It also had support for decoding and encoding JSON-formatted messages – needed for the initial handshake. Of course, you could simply assume that the protocol negotiation proceeded as expected, and skip the first request and response before starting the BlazorPack decoding, but for completeness, actually handling the JSON messages would probably be good.
And then finally, we’d need a ProtobufVarint32FrameDecoder, that will break up the stream into actual message-sized chunks, by reading the preceding Varint32, and then that many bytes following. Fortunately, Netty already has that, along with the corresponding FrameEncoder. That just left decoding the MessagePack format itself.
My first approach was to use the MessagePack java implementation, and simply wrap it in a couple of Netty classes, to convert the Netty way of doing things to the MessagePack way. Unfortunately, I ran into the first problem that a round trip of bytes to decoded Object, and back again resulted in a differently encoded output. Trying to make sense of the MessagePack library implementation, so that I could understand where the difference had crept in, also had me scratching my head in frustration. It seemed far more complicated than it needed to be!
I then decided to try implement my own MessagePack decoder and encoder, directly from the specification. It couldn’t be that hard, could it?
Famous last words, normally! But in this case, a few hundred lines of code in two classes later, I was decoding and encoding, round tripping back to the exact same input byte array! Fantastic!
This is a great advantage of the Netty framework, and its philosophy. While the MessagePack library needed to cater for decoding in a streaming form, adding chunk after chunk, the Netty approach of knowing up front how many bytes to read before trying to decode simplified the decoder immensely! Not having to be able to record exactly where you are in the object tree, so that you can restart from that point, cuts out an enormous amount of complexity.
(I did decide to skip a few of the more esoteric MessagePack protocol extension features, though, so it isn’t an entirely complete MessagePack implementation, I’m afraid!)
And unfortunately, after getting it all set up in a pipeline, it turned out that I was doing something wrong in my encoding or decoding, and Blazor was reporting errors about “no object ID: 9”, and similar. I made a test suite, with a variety of object types and values, but all that did was confirm that I was decoding things the same way that I was encoding them! I even made use of the “official” Messagepack java implementation to convert the objects to serialised bytes, pass those through my codec, confirm that the decoded object was the same as the original test object, and that the re-encoded bytes were the same as those generated by the official library.
Eventually, still not knowing exactly what data type I was processing incorrectly, I realised that I had been using an older version of the MessagePack-Java library, because it had been renamed at some point to messagepack-core! Tearing out my own implementation, I wrapped the latest version of messagepack-java into a Netty codec, and we were in business! Everything was working, and no errors were being reported!
[...]