Douglas Crockford: The JSON Saga

Uploaded by yuilibrary on 28.08.2011

>> DOUGLAS CROCKFORD: Good evening, I'm Doug Crockford of Yahoo! This is the JSON Saga,
this is going to be the true story about JSON.
First, a warning: I am a heretic. So if you don't want to hear any heresy, I recommend
you leave right now.
I discovered JSON. I do not claim to have invented JSON, because it already existed
in nature. What I did was I found it, I named it, I described how it was useful. I don't
claim to be the first person to have discovered it; I know that there are other people who
discovered it at least a year before I did. The earliest occurrence I've found was, there
was someone at Netscape who was using JavaScript array literals for doing data communication
as early as 1996, which was at least five years before I stumbled onto the idea.
So the idea's been around there for awhile. What I did was I gave it a specification,
and a little website. All the rest of it happened by itself, and I'm going to explain what that
The story, for me, starts in 2001. Chip Morningstar and I started a company which was going to
be developing application frameworks for what, today, would be called Ajax and comment applications.
We didn't know what to call the company, so our working title was Veil - the idea that
we would unveil the company later, when we'd figured out what the name was going to be.
Even though it was only a temporary name, I designed this logo for it, and I was really
sorry that it was going to be a temporary throw-away name because I really like the
logo. I thought it turned out really well.
What we became was State Software. Our advertising agency came up with this logo: a couple of
frisky parameciums, and the negative space in between them kind of looks like an ‘S',
as for State. So that's what we called it.
This is the very first JSON message. Not quite as momentous as "what hath God wrought?" or
"Mr Watson, come here, I need you!" We didn't know that we were making history at the time.
We were in Chip's garage. His server sent this message to my laptop, in response to
a form submit post, and this is what came back. The text in green was the very first
JSON message.
In our framework, objects were addressed to specific objects - in this case, it was addressed
to the session object. Usually the session object would be instantiating objects which
represent the application. In this case, it was just handling a test, so "do" was the
method that the test here.
We embedded it in an HTML document, because it came back and got put into a frame as part
of the submit process. We did it this way because it worked as well in IE as it did
on Netscape 4, and it was really important for us to work on Netscape 4 in 2001, because
it was still an important browser. There's a lot of talk about how awful Netscape 6 is,
but at that point in time, IE 6 was the best browser that had ever been. Netscape 4 was
so bad, it made Microsoft look brilliant and competent. That's just how bad it was. It
was a crime against humanity.
We wanted to be able to support it because there were a lot of technologically backward
companies that were stuck on it - they would not allow their employees to use IE 6 -- and
we wanted to do business with some of those, including Sun Microsystems and IBM. So this
was the scheme we came up with to do the communication at that time.
The document contained a script. The first line of the script set the document.domain,
so we could get around the same origin policy, and the second statement called the receive
method on the session object in the containing frame. That was how we caused the message
to get delivered, and then the message was included in the document. So it was a really
nice form.
I'd like to tell you that this worked, but it didn't. The very first message we sent
failed. The reason it failed was, we had a reserved word in there. It turns out "do"
is a reserved word in JavaScript. So it took us a while to figure out what happened. Did
you send it? Yeah, I sent it! Where'd it go? This produced a syntax error, and it took
us a couple of minutes to figure that out.
That was when we discovered the unquoted name problem. It turns out ECMA Script 3 has a
whack reserved word policy. Reserved words must be quoted in the key position, which
is really a nuisance.
When I got around to formulizing this into a standard, I didn't want to have to put all
of the reserved words in the standard, because it would look really stupid. At the time,
I was trying to convince people: yeah, you can write applications in JavaScript, it's
actually going to work and it's a good language. I didn't want to say, then, at the same time:
and look at this really stupid thing they did! So I decided, instead, let's just quote
the keys. That way, we don't have to tell anybody about how whack it is. That's why,
to this day, keys are quoted in JSON.
It also had a secondary benefit in that it significantly simplified the JSON grammar.
If you have names, that means you have to have some definition for what a letter is.
It turns out, when you're using unicode, the question of what is a letter is surprisingly
complicated. By saying: hell with it, we're just going to quote everything, we completely
avoided all of that complexity.
Also, it turns out Python has the same notation built into it, and Python does require the
quoting of the keys. So that kind of aligned us with Python, and we thought it might make
us more attractive to the Python community.
Another problem we found in using HTML as the envelope for JSON was that if any of the
strings in your data happened to look like HTML — and in particular, if it happened
to look like a script tag -- that would close the block right there. Then you would get
a syntax error, because you didn't get the whole thing delivered, which was a nuisance.
So, another thing I added to the JSON standard was tolerance of a backslash in front of a
slash, so that we could avoid that. Now you've got stuff that looks like HTML, but doesn't
look like HTML to the browser. That was necessary to get stuff through. JSON doesn't require
that you escape the slash, but it tolerates it, and this is why it tolerates it.
We decided to give it a name, so we called it JSML -- rhymes with dismal -- the JavaScript
Message Language. But it turned out there's another standard that nobody has ever heard
of in the Java world, called Java Speech Markup Language. So I was like OK, we need to come
up with another name, so we came up with JSON: JavaScript Object Notation. There's a lot
of argument about how you pronounce that, but I strictly don't care. I think probably
the correct pronunciation is [French accent] "Je son".
We found it worked really well. It was extremely effective for the thing that we invented it
for - being browser server communication - but we also used it a lot for inter-server communication.
Our platform scaled hugely, so we could have lots and lots of boxes, and they needed to
be kept in sync, and we found JSON was perfect for sending messages between the servers.
We also used JSON to implement a simple database, so we just have keys, and for each key we'd
store some JSON data. It made it really efficient for storing stuff and getting it back.
We liked it a lot, and we tried to convince our customers that it was good. Our customers
said: well, we hate it, because we've never heard of it. Some of our customers said: oh,
I wish you'd told us this six months ago, because we just decided to go with XML, so
we can't consider anything else now. And some of the people we talked to said: it's not
a standard, so we can't use it. I said: it is a standard, it's a subset of ECMA 262.
They said: no, that's not a standard.
OK. So in order to use this, I had to declare that this is a standard. So that's what I
did. I decided it's going to be a standard from now on. So I bought
I put up a one page website that described JSON. And on that one page, I had the grammar
for the language three ways: as simplified BNF, in a format that Bill McKeeman of Dartmouth
recommended, Railroad Diagrams, which I really like, that feed back to Burroughs, and informal
English. I figured anybody who's going to use this has got to be able to understand
at least one of those.
I included a Java reference implementation, just so that people could look at code that
actually parses to JSON, and see how you did it.
This was very late in 2002, and by that time I decided to retire. We had spent the two
years previously trying to raise money in the post bubble, post 9/11 environment. It
was just way too hard to raise money, and by that point we had run out. I decided to
do something else for awhile, so I went back into consumer electronics. I was doing consulting
on high definition television, and the digital conversion. I thought I'd let everybody else
worry about the internet for a few years.
That's all I did. Basically, I put a message format in a bottle, threw it into the internet,
and I was pretty much done with it at that point.
Over time, a number of people stumbled onto my webpage, and looked at it, and said: yeah,
that looks like something I could use, and they started using it. And then a few of them
started sending code back to me. I got contributors who said: I've been a port of the JSON stuff
to Ruby, or Python, can you put a link to my stuff on your page? So I said yeah, OK,
I could do that.
Over awhile, a got support for all of these languages. One of the benefits of having a
really simple description of a data format is it doesn't take much code to implement
it. And when you've got code that's this easy to write, there are a lot of people who will
be willing to write it, and share it. So there's all this stuff out here for all these languages,
so you can have applications written in any pair of these languages, they can communicate
using JSON.
It's because JSON is the intersection of all of these languages, it's the intersection
of all modern programming languages. All languages have some sense of data, and structures of
data. They all have simple values like number strings, and bullions [?]. They all have some
sense of a sequence of values. Different languages will call it different things; some say it's
an array, some say it's a vector, some say it's a list, or some other thing. Every language
has some sense of a collection of named values; it might be an object, or a record, or a struct,
or a hash, or a property list, or something. All languages have these, these are universal
Every language expresses these differently, and will add a lot of other stuff on top of
it, like type systems, and semantics. But they all have the same idea about what the
data looks like, and JSON has the thing that's common to everything. By being at the intersection,
it turns out to be the thing that everybody can agree on, so it's really easy to pass
data back and forth.
Prior data interchange formats tended to try to be the union of all the languages, and
that turns out to be horrendously complex, and very difficult to deal with. JSON, by
being so simple, actually became really easy to use.
On the JSON site, there are examples of how you can implement a JSON parser, and lots
of different techniques. This is a snippet from a recursive descent compiler; really,
really easy to write. This is a snippet from a finite state machine, using a push-down
automaton. Most of the work happens in the green statement, in which we go to a table
and get the current token, and the current state, and execute the function that's stored
there. Turns out JavaScript is brilliant for writing state machines, because you can put
functions right in the state transition tables. So, really, really nice for that.
The way most people use JSON and JavaScript is either using the JSON 2 library, or something
very similar to it, in which you use Eval to actually use the JavaScript compiler to
parse the JSON for you. That turns out to be really unsafe, so it's guarded by four
regular expressions. It started off as one, and someone said whoops, that got through,
and it's like OK. Add a second regular expression, whoops, that got through. In the end it took
four of them, which is kind of a nuisance. So we're not getting the full performance
benefit that we'd hoped for getting Eval.
Fortunately, that's getting fixed in the fifth edition of ECMAScript. JSON.parse is now built
into the language. It's going to have its own compiler, which will be faster than the
Eval compiler, so performance should be really, really good. It'll be really safe, really
reliable. We expect to have ECMAScript Fifth Edition finished and approved this year, but
JSON.parse is available now in better browsers everywhere.
Another benefit of having a really simple description of the language is that it doesn't
take a lot of work to translate it into another human language. I was really happy to have
wonderful people from all over the world submitting translations to me, so now the JSON page is
available in all of these languages, which is just wonderful. If it turns out that you're
fluent in a language which isn't on the list, and you'd like to help out, that'd be really
The thing that really happened that caused people to take notice of JSON was Ajax. In
2005, Jesse James Garrett discovered that you could use web browsers to have fully interactive
applications without having to do a page replacement after every user interaction. A lot of us
had been doing that for five years previously, but it was really important when Jesse discovered
it, because suddenly everybody wanted to do it. We couldn't give it away in 2001, but
suddenly it was really hot in 2005.
A lot of web developers discovered that XML was really tedious to work with, but JSON
was really easy. So it was Ajax that pushed the popularity of JSON.
Now, there were some cranks at the time that said: wait a minute, Jesse James Garrett said
that the X stands for XML, so you can't use JSON, you have to use XML. That didn't last
for very long.
Now we had a growing community of people using the language, and I started observing things
people were doing, and went ugh, I didn't anticipate that. One of them was that people
were putting instructions to the parser in comments, which was a really bad thing, because
that would totally break interoperability, because there's this whole level of meta-language
which would be outside of the standard. So I revised the definition of JSON to remove
the comments. It had had slash slash slash star, but those are gone now.
It also turned out they added a lot of unnecessary complexity. In our use, we'd never used the
comments; I just put them in initially because I thought it might be useful. It turned out
they weren't that useful. And for some of the ports to other languages, about half of
the complexity of doing the thing was just doing the comments. They were surprisingly
difficult. I never understood quite why. But taking them out made it easier to port JSON
into other languages, and that was desirable.
Also, there was another data interchange format called YAML, which stands for something funny.
YAML, coincidentally, was almost a proper superset of JSON; just similar ideas, and
came out almost the same. The biggest point of difference was that JSON had comments in
that style, and YAML didn't. So by taking the comments out, JSON became more closely
aligned with YAML, and there appeared to be some benefit in doing that.
Then the very last change was, I added scientific notation to number. When we were working at
State, we were doing business applications, and never realized the need for them. But
as Ajax got bigger, we found all sorts of things happening in Ajax, so I put them in,
and at that point, closed the door.
There'll be no more changes to JSON, ever. Because I never put a version number in it,
so there's no way to indicate what version of JSON you're using. So there's no safe way
to extend it, or redefine it. As a consequence, JSON will not be changed.
If you put a version something on it -- if there's a 1.0 -- you know there's going to
be a 1.1, and then a 2.0. And everything's crap until it's 3.0.
So we're just going to avoid that. We're not going to have any numbers on this thing, it's
just JSON. Stability is much more important than any feature we can think of. Over the
years, I've heard a lot of suggestions for stuff people could put into JSON, and it's
all useless. Everything you need to be able to do, you can do with it now.
I expect, some day, JSON will be replaced by something which is bigger, or more exotic,
or whatever, and I'm actually looking forward to that day in the future. But until then,
JSON will be just the way it is, and after that, JSON will stay the way it is too. JSON
will be the way it is until the end of time, so that there's at least one piece of the
stack you can depend on forever.
One of the key design goals behind JSON was minimalism. My idea was that the less we have
to agree on in order to inter-operate, the more likely we're going to be able to inter-operate
well. If the interfaces are really simple, we can easily connect, and if the interfaces
are really complicated, the likelihood that something's going to go wrong goes way, way
up. So I endeavored to make JSON as simple as possible.
I had a goal of being able to put JSON standard on the back of a business card. And this is
the card. Come see me if you want one of these cards; it's the JSON card, it's got the JSON
standard on the back.
Now, I'm not suggesting that that should be a goal for every standard. There are some
standards that are just necessarily more complicated than what you can put on the back of a business
card. But I think it's a really nice thing to aspire to, so that when you're in the standards
committee meeting, going: gee, is there any way we could simplify this more, so that we
could actually fit it on a card? Because generally, standards committees don't think about that.
It's easy to make things bigger, it's hard to make things better.
JSON had a lot of influences on its design. It didn't just come out of my head. It's based
on a lot of things that I had observed over the years.
The first -- maybe the greatest -- influence was Lisp, John McCarthy's work out of MIT
in 1958. Lisp was built on a textural representation of simple binary trees. It was really powerful,
and syntactically almost nothing, but it was kind of visually confusing because it's tons
and tons of nested parentheses.
The thing that was brilliant about Lisp was it used exactly the same representation for
programs and data. Originally, the idea was that you would have programs that could act
on themselves as data, and do interesting things.
There were people who recommended that S-expressions should become standard data interchange format,
which would have been a good idea but it was never going to happen for the same reason
that Lisp never became a mainstream language. Which is, the mainstream likes syntax, and
Lisp is just too goofy looking. So that never happened.
Another influence was Rebol. Rebol's a more modern language, but with some very similar
ideas to Lisp, in that it's all built upon a representation of data which is then executable
as programs. But it's a much richer thing syntactically. Rebol is a brilliant language,
and it's a shame it's not more popular, because it deserves to be.
Obviously JavaScript was a huge influence, because JSON is JavaScript; that's where it
came from. I seem to be making a career out of finding little bits of goodness in JavaScript
-- like, I wrote this pamphlet on the good parts of JavaScript. JSON is another of the
good parts of JavaScript.
There's some good stuff in that language, and it's not by accident. Brendan Eich, who
is the designer of the language, is a brilliant guy, and there's brilliant stuff in the language.
There's other stuff too, but you don't need to use that.
One surprising thing is that JavaScript, Python, and Newton were all designed at about the
same time, all in isolation. None of the three designers were paying attention to what the
other guys were doing. They all came up with exactly the same notation for doing nested
objects in arrays. That could be an amazing coincidence, or I think it may be an indication
that this is just a natural idea that's been in the air for a long time, and they all put
it together at the same time.
Another example of this is at NeXT, working on the OpenStep platform in '93. They had
something called Property Lists, which were basically JSON structures. Syntactically they
were slightly different -- they had equal signs instead of colons, and they used semi-colons
instead of commas -- but basically, it was JSON, it was the same idea. They got it right
in '93, then threw it away later with OS 10. But they had it right in the idea that we
can express data structures, and keep our data in this form which is comfortable for
people, and really efficient for machines. That's part of the core idea of this stuff.
It's been around for a long time, JSON just gave it a name.
Then there's XML. The interesting thing, for me, about XML is not any of its characteristics,
but how it became a standard, and how it became so popular so quickly. The world rejected
it as document format back when it was called SGML. XML changed some aspects, but didn't
repair any of the things that made it a bad document format.
I'll offer as evidence of that the fact that XHTML has totally failed to displace HTML.
If XML were a superior document format, XHTML should easily have won over HTML, and it hasn't.
HTML is still dominant, XHTML is failing. So, XML in the first place isn't a very good
document format, and it's an even worse data interchange format.
Given that it doesn't really effectively do any of the things that it was intended to
do, how did it become so popular? Its roots were in HTML. Now, HTML is also based on SGML.
But HTML actually improved significantly on SGML by simplifying it, took a lot of crap
out of it, reduced it down basics, and also made it more resilient.
It turns out one of the things which is bad about this document format is it's really
difficult to get it right; just getting all the things to balance, and getting everything
quoted, is apparently really hard. I don't know why it's so hard, but the evidence is
that nobody has ever done it right. Nobody can open up a text editor and write HTML and
get it right. So the browsers, from the beginning, had to be extremely resilient and forgiving
and intelligent, about trying to make sense of the markup. As any with approach which
says, if we find the slightest error anywhere, kill it and show nothing, it's just death
to the web, and that never took off.
But at the time that the web was emerging, there were a lot of Grade-A CTOs and technologists
who looked at it and said: well, this is obviously not going to work. This is deficient in so
many ways, this is obviously not going to work, this is bad, let's wait for the next
thing. But there were a lot more B-level and C-level technologists who said: wow, this
looks great! And then they got it, and that created the avalanche effect, and eventually
HTML won.
Those A-list CTOs, they weren't wrong, because we're suffering still, every day, from the
problems that they identified. Everything they cited as deficient was correct, they
just asked the wrong question. They shouldn't have asked if it was good enough, they should
have asked is it going to be popular enough. When XML came out, it's from the people who
gave us HTML, and it's got ankle brackets - well no brainer, it's obviously going to
win. So they stepped out of the way and let it go.
In April 2002 I saw John Seely Brown talking at the CTO Forum. Brown ran Xerox PARC for
many, many years. He was in charge there when they came up with object-oriented programming,
graphical user interfaces, local area networking, laser printers -- a whole lot of stuff that
we take for granted today happened on his watch. Brilliant guy.
He was talking about how the next generation was going to made out of loosely coupled systems,
and he thought XML was going to be the thing that would bind them together. He said: maybe
only something this simple could work. It was a really interesting talk.
A couple months later, I went to another conference and heard another guy talking, who was a little
closer to the ground, also talking about XML. He said: maybe only something this complicated
could work.
And that really struck me. In just a couple of months, it went from something that was
so simple to something that was so complicated. What does that indicate? What should we learn
from this? It occurred to me that it's complicated because it doesn't fit; it solves the wrong
problem. It doesn't really adapt itself to doing the thing that we need to do well.
There were other people who noticed this too. For example, there was a popular site called
XML Sucks. The title of the site was: "Why XML is technologically terrible, but you have
to use it anyway."
So, there are basically two schools of thought about XML. One which said this is perfection.
We started with SGML, which the world loved, and then we got it right, perfect. And then
there's the school that said it's awful. But there was one thing they could both agree
on, and that is: XML is the standard, so shut up. Shut up!
But not everybody shut up. There were a lot of tinkerers who were all aware that there's
something wrong here, and started trying to fix it.
This is a list of XML alternatives. Each one of these has a crazy inventor behind it who
had observed that there was something really deeply wrong with XML, and he thought that
he could fix it. This list was compiled by a guy named Paul T. I don't now who he is,
but he was one of the guys, and he was hoping that his would float to the top, and it didn't.
When mine floated to the top he said OK, it's done, and stopped keeping it up to date.
Each of these guys was right in that they saw that XML was deficient, but there's no
way you could build a community out of this stuff. There's probably no guy on this list
who would look at someone else's a say yeah, he got it better. No one would do that, it's
just a bunch of crackpots. None of them could rise above their own noise except for one,
basically because of the Ajax effect. So Ajax won.
The XML community took notice of the ascendance of JSON. They had, early on, been happy about
being a disruptive technology, and then were very unhappy that they were starting to be
disrupted themselves, and tried to stop it. Early on, there were vague threats -- weren't
quite threats, more like stuttering, like: "you'll rue the day you ever questioned the
technological superiority of XML!" You know, that kind of stuff. I'll rue the day someday,
I'm sure that's true.
As JSON started ascending, they started getting a little bit more nasty. "OK, your little
web application, JSON, that's fine -- I know we said it wouldn't work, but OK, you got
that working, that's good. But if you're doing real applications, manly applications, you
need the complexity of XML. That complexity is there for a reason, and if you don't have
it, you will fail." They could never articulate exactly why you would fail, but they were
pretty confident that you would.
Since then, a lot of manly applications have been written with JSON, and what happened
was they didn't fail, they just got faster.
Finally, there were the death threats. Yeah, death threats. For example, Dave Winer, just
before Christmas in 2006, had just discovered JSON, and wrote: "It's not even XML! Who did
this travesty? Let's find a tree and string them up. Now." What an ugly thing to say.
Fortunately, nobody listens to Dave Winer.
James Clark, who was one of the principle architects of XML, a few months later wrote:
"any damn fool could produce a better data format than XML." Which, it turns out, is
So somehow, in the whole XML hysteria, we'd forgotten the first rule of workmanship, which
is: use the right tool for the right job. Instead, we got distracted on this other thing,
which was one tool to rule them all. That's not good engineering, that's not good craftsmanship,
that's not the way you do things. It might be desirable to have one super tool that did
everything, but there's never been such a tool, and tools have always been specialized.
Part of the craft of engineering is determining, of all the tools available to you, what is
the best tool for solving any problem. There's this weird period of time where we forgot
how to do that.
One of the benefits of JSON becoming tolerable is that we're now allowed to consider the
best tool. JSON isn't necessarily the best tool for every job, but for the ones it is,
you can use it. And for the ones that it isn't, there are other tools out there that you can
use. So good engineering has become popular again; I think that's a nice benefit.
That made me think -- where did the idea come from that data should be represented by a
document format? For me, looking back on it, it doesn't make any sense. Where did that
idea come from? It seemed a really powerful idea, because for awhile everybody bought
into it. But it just doesn't make any sense.
So I started looking back through the fossil record to try to figure out where this idea
came from, going all the way back to a program called RUNOFF. This started off at MIT, and
then found its way through Tech Systems, and Multix, and a bunch of other main frames.
This was in the main frame era.
Some of the first versions of this program used punch cards. In those days, punch cards
came only in upper case, so you could insert special codes to indicate which of the upper
case letters were intended to be lower case, so you could print out nice documents. Then
a card which started with a letter was going to have text on it, and the text would get
filled into paragraphs, and the cards that start with a period in column one are command,
which indicate that we're going to skip one blank line, or we're going to tab over four
So there's a lot of explicit control going on here, and it was sufficient for making
manuals and things like that, but there were obviously better things you could do with
Charles Goldfarb from IBM got the idea of doing something he called generic markup language.
Some of these tag names should be eerily familiar to you. He started with a piece of unexpected
punctuation in column one. And then he also came up with the idea of having a closing
punctuation, so that you could then put text on the same line.
One thing you might not recognize is the EOL tag, which doesn't map exactly onto anything
we use now, but you might guess as to what it meant. As Goldfarb was playing with this,
he went through this evolution where it was first a special purpose tag, and then he generalized
it, and then he stumbled onto the idea of angle brackets.
One place we can still see this stuff in HTML today is in entities. An entity has got some
crazy piece of punctuation, and then some letters, and then another piece of crazy punctuation.
How could that have ever made sense? This is where it came from. He ran out of angle
brackets at that point, he didn't have anything else to wrap them in.
The first place where document systems were done right was in Brian Reid's Scribe, which
he developed at Carnegie-Mellon, published it in 1980. Scribe was the first document
form effort that separated document structure from formatting, and did it brilliantly.
Not only that, he had a really nice notation for expressing the document, which was much
easier to write than HTML, and much easier to get right than HTML, and certainly easier
than SGML. He only had one reserved character, and that was the at-sign. So if you want to
have a literal at-sign, you just do double at-sign, and that was that. At-sign followed
by a word, that was a tag.
Generally the tag was followed by a block of stuff with a begin character and an end
character, and within that block you can't use the begin character and end character,
literally, but any other character you could use. He had six sets of begin and end characters
that you could have, so that you didn't have the list problem of having all these parents
that you had to balance.
You had something that was much more tractable visually -- including, I should point out,
angle brackets. Goldfarb saw that and went oh yeah, angle brackets.
He also had a nice form where, for something that was really long, like a chapter, or a
table or something, you could say begin and end and the argument of it would be the tag.
So in this form, you could have anything except end quote in there, and you don't have to
worry about confusion of characters. So it was a really resilient format, syntactically
really simple, just one of these brilliant ideas. Reid was a really brilliant guy.
It's really a shame that Tim Berners-Lee hadn't been more knowledgeable of document formats.
If he had based his World Wide Web on Scribe instead of on SGML, the World Wide Web would
be a better place today. But this doesn't quite answer the question I took you on this
journey with -- there's one more thing to look at.
Scribe also heads a port for bibliographies. Here we have a description of a tech report,
a description of a book, and within those we've got data. In fact, it looks like JSON.
It's a name value pair separated by columns. While it's in a document, this is data. This
is data describing documents. I believe this is the first time when a document was used
to represent data.
Scribe had a big influence on Goldfarb, but unfortunately not a big enough influence.
He took these things, and these became the attributes in SGML, but he just didn't get
the rest of it right. That took the mean into the SGML community that yeah, we can represent
data in the document format, because Reid did it, and we can do it. That idea survived
into the XML age.
When I put the reference implementation onto the website, I needed to put a software license
on it. I looked up all the licenses that are available, and there were a lot of them. I
decided the one I liked the best was the MIT license, which was a notice that you would
put on your source, and it would say: "you're allowed to use this for any purpose you want,
just leave the notice in the source, and don't sue me." I love that license, it's really
But this was late in 2002, we'd just started the War On Terror, and we were going after
the evil-doers with the President, and the Vice-President, and I felt like I need to
do my part.
So I added one more line to my license, which was: "The Software should be used for Good,
not Evil." I thought I'd done my job. About once a year I'll get a letter from a crank
who says: "I should have a right to use it for evil!"
"I'm not going to use it until you change your license!" Or they'll write to me and
say: "How do I know if it's evil or not? I don't think it's evil, but someone else might
think it's evil, so I'm not going to use it." Great, it's working. My license works, I'm
stopping the evil doers!
>> AUDIENCE MEMBER: If you ask for a separate license, can you use it for evil?
>> DOUGLAS: That's an interesting point. Also about once a year, I get a letter from a lawyer,
every year a different lawyer, at a company -- I don't want to embarrass the company by
saying their name, so I'll just say their initials -- IBM…
…saying that they want to use something I wrote. Because I put this on everything
I write, now. They want to use something that I wrote in something that they wrote, and
they were pretty sure they weren't going to use it for evil, but they couldn't say for
sure about their customers. So could I give them a special license for that?
Of course. So I wrote back -- this happened literally two weeks ago -- "I give permission
for IBM, its customers, partners, and minions, to use JSLint for evil."
[laughter and applause]
And the attorney wrote back and said: "Thanks very much, Douglas!"
I've got to wrap up now, but before I do that I want to talk about the logo. When I put
the web page up in 2002, I decided I should have a logo to class up the page, and make
it look more substantial. I came up with this thing. It's based on a famous optical illusion
called the Impossible Torus, which is sort of related to the Ambihelical Hexnut. What
I did was I took it, I made it round, I reoriented it, and gave it some nice shading.
I liked it for a number of reasons. One was: if you look at it as a two dimensional figure,
it's made up of two components which are identical but out of phase, so it kind of suggests the
two sides of a conversation, because it keeps going around and around. Also, I could see
letter forms in it -- there's a J in there, and an N maybe. Clearly an O is in there.
So it had most of the initials that were in the name of the thing.
But after looking at this for several years, I noticed something: it's not impossible.
What it is, is a square which is extruded in a circle, and as it goes around, it does
one rotation and comes back.
>> AUDIENCE MEMBER: It's like a Mobius strip.
>> DOUGLAS: Except it's a full rotation. Otherwise it would be like a Mobius strip, but it does
a full rotation. It does one rotation and one orbit.
So it's not an impossible shape, it's actually a simple shape. It's a square and a circle
with a twist. I think it works really nicely as a symbol for JSON.
Once I figured that out it was like OK, so I can put a mathematical model behind it.
So I rendered this in JavaScript using Canvass, and put some extreme shading on it for a t-shirt
design. It's kind of nice that JavaScript is now powerful enough to render its own logos.
This is the design that I did for the business card. I wanted something that looked like
it could have been around for 100 years, so JSON: the data interchange format mother's
have learned to trust for many generations.
Then finally, the last one for the night. This one was inspired by Shepard Fairey's
Obama poster. I call it: Data Interchange We Can Believe In.
Thank you, and good night.
What do I think would make HTML better? Making it more extensible. Having to fit everything
into the limited set of tags that we have just doesn't work. The thing you mentioned
about making headings work in documents which are not ‘heading-ful' doesn't fit.
I would like to be able to use CSS to say: I want a new thing which is a title, or an
ad, or a controller, or whatever. I can give it the name that I want, and just specify
in CSS what it's supposed to do. That is all I need to do in order to extend the language,
to make it map my application. That'd be a trivial thing to do, but HTML 5 is going off
in a different direction.
Is there case sensitivity in the unit code, hex characters with a backslash U?
>> DOUGLAS: No, you can use upper or lower case.
>> AUDIENCE MEMBER: Do you think you could add that to your spec? I was using JSON [xx],
and it doesn't know about that JSON sensitivity. And then it turns out that the webpage doesn't
know about it, either.
>> DOUGLAS: Huh. I'll have to look at that. I wasn't aware of that.
>> DOUGLAS: In the meantime, you should use lower case.
What would I like to see replace JSON? We're seeing templating languages for JSON now.
Like, the JSONT language I think is absolutely brilliant.
>> DOUGLAS: That doesn't need to be in the format. One of the biggest weaknesses in the
JSON format is also a weakness in XML, which is that it cannot easily represent cyclical
structures, and can't represent general DAGs. For most applications that's not a requirement,
but there's some applications that'd be really desirable for.
I felt bad about leaving that out of JSON, but I had to leave it out because it wasn't
in JavaScript, and one of my other design rules was it had to be a subset of JavaScript.
So, I missed that boat.
Someday I'd like to be able to take the quotes off the keys, because it looks stupid.
>> AUDIENCE MEMBER: What is an example that you could give of an application that couldn't
use the…
>> DOUGLAS: OK. The simplest thing that you cannot encode in JSON. Make an array, so A
equals empty array, A sub zero equals A. That's a cyclical structure, and if you ask JSON
to serialize that, you'll get an infinite number of open brackets. Then you'll die before
you generate the first closing bracket.
What do I think about schemas and DTDs? I don't care. If you want to do that, that's
fine. There are some very clever people who have been working on schemas for JSON. Kris
Zyp over at Dojos has done some really good work.
I considered doing schemas for JSON very early on, because as JSON was starting to ascend,
a lot of people coming in from the XML old world saying: "we can't do it. We can't use
it until it's got schemas." And they would say that for about a month, until they figured
out how JSON worked, and then it's "oh, never mind."
So we never got to that point. I had designed a schema, but I never implemented it because
there really didn't seem to be much need for it. Some people think there is a need for
it, and I'm happy to have them go off and do that, but the core data format itself doesn't
have to change in order to make that useful.
The main reason I took comments out was that I saw people who were trying to control what
the parser would do based on what was in the comments, and that totally broke interoperability.
There's no way I could control the way they were using comments, so the most effective
fix was to take the comments out.