rss-bridge 2025-05-31T00:00:00+00:00

Progressive JSON

Why streaming isn't enough.

Progressive JSON

May 31, 2025

Do you know about Progressive JPEGs? Here’s a nice explanation of what a Progressive JPEG is. The idea is that instead of loading the image top to bottom, the image instead is fuzzy at first and then progressively becomes more crisp.

What if we apply the same idea to transferring JSON?

Suppose you have a JSON tree with some data:

header: 'Welcome to my blog',
post: {
content: 'This is my article',
comments: [
'First comment',
'Second comment',
// ...
footer: 'Hope you like it'

Now imagine you want to transfer it over the wire. Because the format is JSON, you’re not going to have a valid object tree until the last byte loads. You have to wait for the entire thing to load, then call JSON.parse, and then process it.

The client can’t do anything with JSON until the server sends the last byte. If a part of the JSON was slow to generate on the server (e.g. loading comments took a slow database trip), the client can’t start any work until the server finishes all the work.

Would you call that good engineering? And yet it’s the status quo—that’s how 99.9999%* of apps send and process JSON. Do we dare to improve on that?

I made it up

Streaming JSON

We can try to improve this by implementing a streaming JSON parser. A streaming JSON parser would be able to produce an object tree from an incomplete input:

header: 'Welcome to my blog',
post: {
content: 'This is my article',
comments: [
'First comment',
'Second comment'

If you ask for the result at this point, a streaming parser would hand you this:

header: 'Welcome to my blog',
post: {
content: 'This is my article',
comments: [
'First comment',
'Second comment'
// (The rest of the comments are missing)
// (The footer property is missing)

However, this isn’t too great either.

One downside of this approach is that the objects are kind of malformed. For example, the top-level object was supposed to have three properties (header, post, and footer), but the footer is missing because it hasn’t appeared in the stream yet. The post was supposed to have three comments, but you can’t actually tell whether more comments are coming or if this was the last one.

In a way, this is inherent to streaming—didn’t we want to get incomplete data?—but this makes it very difficult to actually use this data on the client. None of the types “match up” due to missing fields. We don’t know what’s complete and what’s not. That’s why streaming JSON isn’t popular aside from niche use cases. It’s just too hard to actually take advantage of it in the application logic which generally assumes the types are correct, “ready” means “complete”, and so on.

In the analogy with JPEG, this naïve approach to streaming matches the default “top-down” loading mechanism. The picture you see is crisp but you only see the top 10%. So despite the high fidelity, you don’t actually see what’s on the picture.

Curiously, this is also how streaming HTML itself works by default. If you load an HTML page on a slow connection, it will be streamed in the document order:

<html>
<body>
<header>Welcome to my blog</header>
<article>
<p>This is my article</p>
<ul class="comments">
<li>First comment</li>
<li>Second comment</li>

This has some upsides—the browser is able to display the page partially—but it has the same issues. The cutoff point is arbitrary and can be visually jarring or even mess up the page layout. It’s unclear if more content is coming. Whatever’s below—like the footer—is cut off, even if it was ready on the server and could have been sent earlier. When we stream data in order, one slow part delays everything.

Let’s repeat that: when we stream things in order they appear, a single slow part delays everything that comes after it. Can you think of some way to fix this?

Progressive JSON

There is another way to approach streaming.

So far we’ve been sending things depth-first. We start with the top-level object’s properties, we go into that object’s post property, then we go into that object’s comments property, and so on. If something is slow, everything else gets held up.

However, we could also send data breadth-first.

Suppose we send the top-level object like this:

header: "$1",
post: "$2",
footer: "$3"

Here, "$1", "$2", "$3" refer to pieces of information that have not been sent yet. These are placeholders that can progressively be filled in later in the stream.

For example, suppose the server sends a few more rows of data to the stream:

header: "$1",
post: "$2",
footer: "$3"
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"

Notice that we’re not obligated to send the rows in any particular order. In the above example, we’ve just sent both $1 and $3—but the $2 row is still pending!

If the client tried to reconstruct the tree at this point, it could look like this:

header: "Welcome to my blog",
post: new Promise(/* ... not yet resolved ... */),
footer: "Hope you like it"

We’ll represent the parts that haven’t loaded yet as Promises.

Then suppose the server could stream in a few more rows:

header: "$1",
post: "$2",
footer: "$3"
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"
/* $2 */
content: "$4",
comments: "$5"
"This is my article"

This would “fill in” some of the missing pieces from the client’s perspective:

header: "Welcome to my blog",
post: {
content: "This is my article",
comments: new Promise(/* ... not yet resolved ... */),
footer: "Hope you like it"

The Promise for the post would now resolve to an object. However, we still don’t know what’s inside the comments, so now those are represented as a Promise.

Finally, the comments could stream in:

header: "$1",
post: "$2",
footer: "$3"
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"
/* $2 */
content: "$4",
comments: "$5"
"This is my article"
/* $5 */
["$6", "$7", "$8"]
/* $6 */
"This is the first comment"
/* $7 */
"This is the second comment"
/* $8 */
"This is the third comment"

Now, from the client’s perspective, the entire tree would be complete:

header: "Welcome to my blog",
post: {
content: "This is my article",
comments: [
"This is the first comment",
"This is the second comment",
"This is the third comment"
footer: "Hope you like it"

By sending data breadth-first in chunks, we gained the ability to progressively handle it on the client. As long as the client can deal with some parts being “not ready” (represented as Promises) and process the rest, this is an improvement!

Inlining

Now that we have the basic mechanism, we’ll adjust it for more efficient output. Let’s have another look at the entire streaming sequence from the last example:

header: "$1",
post: "$2",
footer: "$3"
/* $1 */
"Welcome to my blog"
/* $3 */
"Hope you like it"
/* $2 */
content: "$4",
comments: "$5"
"This is my article"
/* $5 */
["$6", "$7", "$8"]
/* $6 */
"This is the first comment"
/* $7 */
"This is the second comment"
/* $8 */
"This is the third comment"

We may have gone a little too far with streaming here. Unless generating some parts actually is slow, we don’t gain anything from sending them as separate rows.

Suppose that we have two different slow operations: loading a post and loading a post’s comments. In that case, it would make sense to send three chunks in total.

First, we would send the outer shell:

header: "Welcome to my blog",
post: "$1",
footer: "Hope you like it"

On the client, this would immediately become:

[...]

Original source