/* UPDATE: */

Found this old (from late 2005) post from Douglas Crockford himself on comments in JSON which I think validates the thinking I have presented here in this blog post. He says:

A JSON encoder MUST NOT output comments. A JSON decoder MAY accept and ignore comments.

This is exactly what I am advocating with this post. And JSON.minify() is simply one decent way to allow the JavaScript implementation of JSON.parse() to do the “MAY accept and ignore comments” thing.

{

Yesterday, I posted JSON.minify() as a little mini open-source code snippet. The idea behind JSON.minify() is to be able to take a JSON-like document (that is, strict JSON + some other “stuff” which I’ll explain in a moment) and strip out/minify the document into something that is strictly valid, parseable JSON. This may seem like a crazy or mis-directed idea on the surface, but I have my reasons, which I’ll also explain in a moment.

Anyway, I started a fire-storm on twitter when I posted it. What ensued was a day-long barrage of tweets back and forth with many different people, most of whom seem to vehemently oppose the idea of there being any benefit to what I was trying to do. The mix seemed to be about 10% in favor, 90% opposed. Although, as the day and the tweet fest wore on, a few of the original “haters” did come to see some reasoning to my madness.

I’ve said many times before that it’s kind of flattering if someone “out there” cares enough about what you say to take the time to vocally disagree with you. Days like yesterday however make me re-think that position a little bit. In retrospect, I think the problem with this assertion in the age of twitter is that the barrier, the amount of energy it takes, to responding with “You’re wrong” is far less than it used to be even a couple of years ago.

Even blog commenting usually involves entering in your name, email, and website URL, or logging in, so it takes a slight amount more effort to do so than it does to click the reply icon and fire away.

What “stuff”?

To boil it all down, JSON.minify() is designed to strip out comments (single line // and multi-line /* … */) from a JSON-like document. Oh, and it also takes out any unnecessary white-space, if any exists. The white-space is not all that important to me, nor does it matter to the official JSON.parse() parser. But the comments… those are a different story.

You may wonder to yourself, “why would I want comments in my JSON? that only makes the JSON more bloated when it gets transferred.” Hold onto that question, because I’ll get to it in a moment. But for now, lemme say: I completely agree, retaining comments in TRANSMITTED JSON messages is a ridiculous idea. As a performance optimization nerd, it shouldn’t come as any surprise that I would feel that way. But strangely, many people who know that about me well seemed to forget that I would have a performance-minded opinion on this topic.

Crockford’s “comments”

Somewhat late in the day, @tobie (via @kangax) pointed me at the JSON saga talk by Mr. JSON himself, Douglas Crockford. I invite you to take a few minutes and read through the transcript (or the 47 minutes to watch to the video), there’s some interesting stuff there.

About 1/3 of the way through the transcript, you’ll see Doug explain several reasons for why comments originally were allowed in JSON but were later removed. He says, in short, “people were using comments wrongly, so I removed them. Also, handling comments made the parser harder to implement, so I removed them. Also, comments didn’t exist in YAML and I wanted JSON to look like YAML, so I removed them.” If you go to the VERY end of the transcript, and read the last couple of sentences, you’ll see him boil it all down:

The main reason I took comments out was that I saw people who were trying to control what the parser would do based on what was in the comments, and that totally broke interoperability. There’s no way I could control the way they were using comments, so the most effective fix was to take the comments out.

I’ll just be honest with you. Those seem like crap reasons to me. Why do I say that? Him saying that he couldn’t control what people did with the parser is a spurious argument because he was in fact in control of the JSON “standard” and if it was ever going to take off it was going to be the way he wanted it to be. Somehow he magically got people not to implement JSON.parse() with extensions to the standard, so I’m not sure at all why he couldn’t keep them (by way of stern scolding, of course!) from using comments inappropriately. He had enough influence to keep JSON parsers to his standard, all he would have had to do was add “And comments shall only be ignored and not parsed.” to his standard and that would have been that.

I know that’s easy to say as “Friday-morning Quarterback” (like a decade later!) and I’m sure it was valid and made sense to him (and maybe others) at the time. But in my opinion, it was a wrong decision. Particularly frustrating is that Doug asserts in this talk: JSON is never going to change again because he intentionally didn’t version the format. So when he decided it was finished, he declared it sealed, and that’s just how it is. Imagine if WebKit got to decide “ok, <canvas> is now final, no more changes!”

So, if we ever wanted to add to JSON, like putting comments back in, we can’t. He in fact says specifically, someday JSON will be replaced by something else better, but JSON will always remain as it is now. That’s a bold and unwavering statement… but knowing the personality of Crockford (who also by the way claims that HTML5 is crap and should be scrapped), I am inclined to believe that he’ll go to his death bed someday defending the rigidness of JSON never changing. But I do hope the eventual outcome of my ramblings is not to suggest that we need JSON+ or JSONX to succeed Crockford’s JSON.

It’s true that the JSON standard itself has been quite stable for a long time. But it’s also interesting to note from the transcript that he admits that the parser “technology” has evolved on several occasions. Namely, he progressively discovered various different “security holes” in what the parser would allow through, and so he added different regular expressions as filters in JSON.parse() to make the parsing safer.

Comments don’t need to be part of the spec

Now’s a good time for me to make my first assertion. Adding a simple set of regular-expression and state machine logic to strip out comments would not necessarily change the JSON spec. Granted, this proposal is not strictly the same as the regular expression filters that Doug put in for security holes, but it’s in a similar vein. It’s also quite like the parts of the parsing which ignore whitespace.

I’m not suggesting there be anything added to the JSON spec in terms of what is functionally allowed to be declared, or how properties and values are interpreted syntactically. In fact, I’m saying the opposite… what is taken as strict JSON should not change at all. In the same way that the whitespace is ignored during parsing, I’m suggesting comments should be too.

Comments are, in my opinion, a special beast. They should not be considered to have the same weight at all as other parts of the grammar/syntax. Comments are almost universally ignored by compilers/interpreters. We all know the primary use of a comment is for developer-friendly maintenance.

Look through the whole spec for JSON on json.org and you’ll only find one tiny little statement at the bottom about whitespace: “Whitespace can be inserted between any pair of tokens.” He goes into quite detail with 3 different types of specifications for each part of the grammar/syntax, a few pages worth of text and diagrams, and then has one little off-handed sentence about how whitespace is basically not an important part of the grammar and is thus ignored.

In fact, there’s nowhere in any of JSON that whitespace has any meaning at all (well, whitespace is preserved inside of string literals… but it still has no meaning to the language).

Comments ~= Whitespace

I’d argue the same is true in practicality, and could officially be true if Crockford were so inclined, of comments. In my mind a comment is no more important or affecting of the “language” than is whitespace. If we can be tolerant of, ignore, and/or remove whitespace from a JSON string before parsing, why can’t we do the same of comments? All Doug would have to say is, “parsers should ignore comments just like they ignore whitespace.” It’s no more complicated than that.

In fact, most parsers/compilers have a pre-processing step where they go through a source document and remove all whitespace AND comments before they apply any syntactic meaning to the tokens found. That’s because in almost every other language on the planet, comments are found to be useful to developers but irrelevant to machines and thus can be safely ignored.

Can you imagine if Crockford had declared: “whitespace isn’t necessary in JSON, so the parser won’t handle/ignore them at all“?

Whitespace serves no purpose at all in JSON other than readability (if you hand-author your JSON documents… more on that in a little bit). The same is true of comments… they are for readability only. That is the heart of my argument.

So, all this is to say… I think JSON parsing could easily be extended to support stripping of comments without affecting in any way shape or form the spirit of the JSON spec. Would implementations have to change? Yes. But they had to change several times before when Crockford found security holes, so I don’t see this as much different.

I’ve also proven the logic to strip comments is pretty straightforward… it’s a few hundred bytes and is implementable in pretty much any language I can think of that I’m familiar with. Even Doug admits in that talk he was never quite sure why implementors had trouble dealing with comments. Hint: bad programmers.

Still not convinced.

It’s ok if you (and Crockford) still disagree with me that comments should be able to be ignored by JSON.parse(). That doesn’t hurt my feelings at all. I still think they should, but let’s just set that issue aside and agree to disagree.

That battle not-withstanding, in my opinion, comments in a JSON-like document still have some value in some cases.

JSON: file, document, message, database, etc?

Let’s take a step back and broaden our view of what JSON can really be used for. Clearly, it’s primarily used as interstitial messages — that is, short little bursts of data interchange between different entities (like a server and browser). In fact, it’d be rightly argued that this is BY FAR the overwhelming usage of JSON.

Just like with XML, which has similar (but even less) flexibility in usage, JSON can be used in other ways other than data “transmission” in the traditional sense. For instance, JSON can be used as a database, a persistence layer for real data. There’s a whole slew of “no-SQL” type key-value pair databases that’s cropped up recently to latch onto this usage of JSON. In this case, JSON is never really “transmitted” but rather just parsed/filtered/transformed as the data needs to be accessed or mutated.

Though these valid use cases do exist, it seems that most people who were outraged at my little JSON.minify() at the heart of things just don’t agree with using JSON in a document/file context. They believe JSON should only be used for message transmission, and in that particular use case, clearly comments have no place. It’s only if you broaden your viewpoint that you’ll see the other uses for JSON which I’m asserting can benefit from comments.

JSON for “data”

JSON is, according to Doug, just intended to be a clean way to express key-value pairs in a structured and useful way that is universal to every language. He continues to say that the primary (although I don’t think even he would assert only) use-case is for transmitting that data.

He goes to great lengths to stress that JSON was, by stroke of sheer genius or alignment of the stars, an outward expression of an almost “natural law” of the universe (ie, computing/information science). He asserts that independently and at the same time, 3 different languages (JavaScript, Python, and Newtonscript) all arrived at exactly the same syntax for expressing a nestable data structure made up of keys and values.

The interesting thing about “data” is that it can take different forms as its uses varies. For instance, pure data is usually stored in a pure “database”. But this other animal, “meta-data”, which is usually used by programs to affect how they behave… it’s not so clear-cut how that data gets stored. But when those meta-data stores end up in files, it starts to make the usage of them a little bit more akin to a “document” than just pure data store.

A perfectly valid and long-held usage for key-value pair setting is configuration files. Almost every major web application framework in existence today, and an enormous amount of the web-related software (web, DNS, etc) out there, have taken their cues from the earliest days of Unix/Linux and used simple key-value pair configuration files.

Consider PHP.ini or Apache’s httpd.conf

These files have been around forever, and have for the most part always been about exposing configuration variables and letting administrators tinker with their values. Of course, every different piece of software defines it’s key-value pair syntax slightly differently. Some use =, others use :, some use quotes, some don’t, etc.

But you know what the other almost universal characteristic of these file formats is, besides the key-value pair syntax? That’s right, you guessed it. Comments!

Why on earth would we do such a silly thing as to put comments in our configuration files? I heard various arguments to this affect on twitter yesterday, amounting to “the variable name should be descriptive enough to explain it’s possible values” or “if documentation for variables/values is needed, it belongs in a separate document called a manual.”

Well, I can’t really explain why like 99% of all web software went with flat configuration files (with comments!), but they did. And I’ll tell you, as one who regularly maintains my own web servers and thus web server software, I definitely appreciate having comments in my configuration files. I can’t imagine how painful editing PHP.ini or httpd.conf would be if I didn’t have comments right there inline explaining the various valid values and the implications of each.

Configuration files and JavaScript

If we now step back and look at the world of server-side JavaScript, we’ll see the revolution of JavaScript taking over in a lot of web software roles. But the fundamental paradigm of needing to configure this software still exists. We can configure it with command-line arguments when we start up the software, but most people who manage web software prefer to store commonly used configuration parameters in, gasp, a configuration file.

Does it make sense for the server-side JavaScript world to go out and define an entirely new and proprietary format for such flat, key-value pair configuration files? Umm, plainly, no. We should use JSON, right? That just makes sense to me! I figured it would to others too, but apparently I was wrong.

I had many people point me to examples of SSJS configuration files which were .js files. The curious thing to me when I looked at those .js files was that they almost all had a very interesting characteristic. They looked exactly like a JSON document, except for a little bit of extra “stuff”. Was that “stuff” complex JavaScript looping or operators? Nope. Was that “stuff” if-statements with conditional logic? Nope.

That “stuff” was comments. Huzzah! Oh, and yeah, that “stuff” was also something like an assignment of the JSON literal to a global JavaScript variable, or in some cases passing the JSON literal to a global function JSON-P style.

But in spirit, this was a JSON document.

It was designed to store meta-data, like configuration variables, in a structured key-value pair way. It had comments to help the developer understand/remember/maintain the document when necessary. The variable assignment (or function call parameter passing) really had almost nothing to do with the spirit of this document. It was more like a concession made to satisfy a pragmatic concern: If I parse a .js file as JavaScript, and all that file has in it is a JSON literal, the .js file will parse and “execute” validly, but that JSON object in it won’t be referenced anywhere in memory that I can use.

You see, the paradigm of needing to put the JSON into some variable or function parameter is the only real reason this file is a .js file and not a .json file. They could just as easily, and I would argue more cleanly, put just the JSON into the file, with no variable assignments or parameter/function calling, and then opened that file directly and read the contents into a string. At this point, with the JSON data in a string variable ready to be parsed, it’s no different than if we’d done an XHR call to grab that JSON into a string from the response text.

Regardless how it gets into the string, then they could pass that string to JSON.parse(), which is built into pretty much all server-side JS environments/engines, and the return value would be their created JSON object that they could assign to any appropriate local variable or pass to any other function.

This would be cleaner, in my opinion, because it would not require any global variables or functions to make it work.

Oh yeah, it was also nice that the JavaScript file parsing on their configuration files allowed for the files to have comments.

Sounds familiar

This is exactly the scenario that I found myself in with some recent server-side JavaScript. Could I have chosen to configure my SSJS application with .js files? Sure. Could I have chosen to pull the configuration from a database/datastore instead? Sure. Could I have chosen to just inline my configuration right into the code? Sure.

But all of these seemed like less than ideal to me. It seems so much cleaner to just have a simple JSON configuration file with key-value pairs to control my application’s behavior. And I foresaw that as my application grew in complexity, and as the possible valid values for those configuration parameters grew, I’d want to be able to rely on some inline comments to help keep the file more maintainable.

Not only that, but if someone else took and used my application some day, I’d want them to be able to configure it with relative ease. I certainly wouldn’t want them to have to be editing what looked like a code file (unless they too were a developer).

So, “config.json” file it was! Problem. If I put comments in, now I can’t parse the file. What do I do? Make a huge concession in the proper and clean design of my system and change to some other method of storing my configuration? Or do I take the lessened usability of my application and say that all “documentation” will just have to reside elsewhere, far away from the file it applies to?

No, I decided, very simply, that adding comments to what was otherwise in every way imaginable a JSON document was the right approach.

Call me crazy. String me up and lynch me. Let’s have a Salem witch trial. Let’s brand me for heresy.

And the simplest solution to my problem of comments being “invalid” in JSON was to create a simple filter, a “minifier” if you will, that could take otherwise valid JSON-like content, and strip it of comments so it was in fact valid pure JSON. This seemed like a decent and fairly graceful approach to my problem.

What followed on, and was the main point of the first half of this post, is that I truly believe that comments should be allowed in pure JSON documents. But since I’ll probably never win that argument with Crockford (or anyone else), the next best thing was that I defined this other thing which is INCREDIBLY SIMILAR TO JSON… namely JSON+Comments. I’m not much for silly overused acronyms, so I won’t go so far as to call it JSON+C. But you get the idea.

And here’s my very strict, and easy to enforce as a standard, definition:

JSON+Comments: valid JSON + valid JS comments.

That’s it. Nothing more. Go smoke on that pipe for awhile.

I’m not trying to turn JavaScript into JSON. I’m trying to enhance JSON ever so slightly with comments. Oh, and I’m claiming that it’s so close to real JSON that this is a silly distinction/argument to make. But nonetheless, there you have it: JSON+Comments. And JSON.minify() is a handy tool to help “convert” your bastardized “JSON+C” into real JSON. Yay.

JSON-P

Let’s go back to JSON messages for just a moment. There’s actually more than one kind of JSON message. There’s strict JSON, which in practicality could only be transferred between server and browser through a direct Ajax call like with XMLHttpRequest().

But then there’s another emerged “standard” that we label JSON-P. JSON-P defines itself as JSON-with-padding. The “padding” is a little bit of a misnomer, in that it’s not “padding” (like, gasp, whitespace) but a function call that passes the JSON literal as a parameter. In fact, I’d argue it should have been JSON-W (JSON+wrapping) because the function call Wraps the JSON literal.

JSON-P is not parseable by JSON.parse() or by any parser I know of except the JavaScript engine. And yet most of us definitely still relate it more to JSON than to JavaScript.

When JSON-P came out, I’m sure some strict purists cringed and said (in a crotchety old voice) “That’s not JSON at all.. it’s just JavaScript”. And I guess there are those who still feel that way. But in all practicality, because of same-origin security issues primarily, JSON-P has actually emerged as a de-facto superset standard of JSON, but still a much restricted subset of true, full-blown JavaScript. It’s proven it’s usefulness beyond argument as a very valid and commonly used solution to cross-domain “Ajax”.

CAN you do full JavaScript in a JSON-P message? Sure. But then that’s not really the intended spirit of JSON-P. Doing so would clearly put you in the realm of just plain ol’ JavaScript. No, I’d argue there is a strictly definable JSON-P (though it has lacked a formal definition, just the de-facto patterns of use) which is this:

JSON-P: valid JSON wrapped in a single function call.

There doesn’t have to be tolerance for JavaScript operations, boolean logic, try/catch’s, loops, or any of that other JavaScript goodness. JSON-P is a strict superset standard of JSON, and there’s no reason that it’s bad for doing so. It’s JSON + some other “stuff” that is helpful… in this case for bridging the cross-domain gap.

Just because browsers (and parsers) don’t officially have, at this time, some actual JSON-P grammar/standard to apply to parsing JSON-P doesn’t mean it couldn’t be that way if we wanted to.

For instance, we could quite easily standardize JSON-P like I’ve suggested, give it a website like “json-p.org” with fancy diagrams, and then take Doug’s JSON parser and add a few extra rules on top of it to allow for only the wrapping function call. The JSONP.parse() function could just check for the validity of the wrapping function call (“blah(….);”) and then pass what’s in between the ( ) to the JSON.parse() function. We could even get some help from browser vendors to define a MIME/Type like “application/json-p”.

If a <script> tag is found with that type, the contents (or the remote source file’s contents) could be checked against my simple JSON-P definition: look for a valid conforming function call, and take the parameter contents and validate it as real JSON.

Wait a second

That’s kind of crazy talk, huh? That actually makes JSON-P sound a lot like my outlandish JSON+Comments. It’s valid JSON, “decorated” by a very small amount of other “stuff” that’s there for simple but valid reasons. For JSON-P, the P is there for crossing the same-origin gap. For “JSON+C”, the comments are there for cases where commenting a file inline helps its readability/maintainability.

}

I’ve rambled on for quite a bit. Let me try to close this post down in a sensible way.

My idea to allow comments in what is otherwise a perfectly valid, parseable JSON file has absolutely no bearing on my feeling that such bloated JSON should never be transmitted. I’m not in any way suggesting that the JSON messages you send between systems should have comments in them.

I’m only saying that JSON files (like, configuration files) which reside in a file system and are generally and primarily only opened/read and nothing else, can benefit from having comments in them, just like almost all other configuration files do.

If for now (or even forever) the tradeoff is that this file must be slightly sanitized before being sent through Crockford.parse(), that’s something I’m willing to accept. I think the slight payoff, for my use case, is higher than the slight cost involved in the minification.

Also, this doesn’t preclude that JSON.minify() could be used in build processes to take developer-friendly “JSON+C” files and build them to real JSON files for use by the actual production system.

Lastly, I would say that JSON.minify() is good at removing whitespace from JSON “documents” too. For instance, if you have a templating system that builds up your JSON documents before they are sent out, that JSON document may end up with extra whitespace in it. Calling JSON.minify() on it before sending it over-the-wire is going to make the file smaller and transmit more efficiently. This is a good thing, right?

(yeah, I know, why would I build a JSON document from a text-based templating engine when I could have all my data in a data object variable and just call json_encode()…? You people just like to contradict everything. It’s possible even this crazy idea might have some merit, too, even if you wouldn’t dream of it.)

Now, I’m just wishing I’d called .minify() on this long post before publishing. Probably a lot of my “comments” here could/should have been stripped out. :)

/* the end. duh. */

Bookmark and Share
This entry was written by getify , posted on Wednesday June 23 2010at 12:06 pm , filed under JavaScript, Misc and tagged , , , . Bookmark the permalink . Post a comment below or leave a trackback: Trackback URL.

2 Responses to “JSON+Comments”

  • Mike McNally says:

    Uhh … YAML does support comments. I use YAML for configuration stuff a lot, and I’ve also used JSON for the same purpose. Of course you can fake JSON comments with bogus “_” properties (that is, add a dummy property and put the comment in the string value for it), but that’s obviously lame.

  • getify says:

    Interesting. I have no experience with it, but my guess is, back then Doug was right, it didn’t have comments… interesting that YAML evolved to support comments, but JSON never did.

Leave a Reply

Consider Registering or Logging in before commenting.

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>