Crockford on JavaScript - Act III: Function the Ultimate




Uploaded by yuilibrary on 20.09.2011

Transcript:
Tonight is Act III: Function the Ultimate. We're going to be talking about functions
tonight. Functions are the very best part of JavaScript. It's where most of the power
is, it's where the beauty is. Like everything else in JavaScript, they're not quite right,
but you can work around that, and there's a lot of good stuff here.
Tonight, unlike the previous two nights, I'm going to be showing you quite a lot of code.
Because we're talking about functions, you need to see how they work. I personally tend
to fall asleep in presentations that put a lot of code on the screen; it's just kind
of not a good time, so I have a lot of examples and I tried to make them all fit on one screen
in big type. They're all going to be simple, but they should be interesting and useful.
Let's begin.
Function is the key idea in JavaScript. It's what makes it so good and so powerful. In
other languages you've got lots of things: you've got methods, classes, constructors,
modules, and more. In JavaScript there's just function, and function does all of those things
and more. That's not a deficiency, that's actually a wonderful thing — having one
thing that can do a lot, and can do it brilliantly, at scale, that's what functions do in this
language.
Here's what a function is. A function is the word 'function'. It optionally has a name,
which can be used to allow it to call itself. It can have a set of parameters, which are
wrapped in parens, containing zero or more names which are separated by commas. It can
have a body which is wrapped in curly braces, containing zero or more statements. A function
expression like that produces an instance of a function object. Function objects in
this language are first class, which means that they can be passed as an argument to
another function, they may be returned as a return value from a function, they can be
assigned to a variable, and they can be stored in an object or an array.
Anything you can do with any other kind of value in this language, you can do with a
function. A function expression is like an object literal in that it produces a value,
except in this case it produces something that inherits from Function.prototype. It
may seem kind of strange that a function can inherit methods from something else, but it
can. So in this language, functions have methods. That may sound odd, but we've got that. I'll
show you some examples of that.
We have a var statement which allows us to declare and initialize variables within a
function. Because JavaScript is not a strongly typed language you don't specify types in
the var statement, you just give a name for the variable. Any variable can contain any
value that's expressible in the language. A variable that's declared anywhere within
a function is visible everywhere within the function; we don't respect block scope.
The way var statements work is the var statement gets split into two pieces. The declaration
part gets hoisted to the top of the function and is initialized with undefined. Back at
the place where the original var statement was, it gets turned into an assignment statement
so that the var gets removed. Here we have an example. I've got myVar = 0 and myOtherVar.
What that does is, at the top of the function it defines myVar and myOtherVar and sets them
both to undefined. Then at the point in the function where the original var statement
was, we have an assignment statement. The separation and the hoisting operation changes
the way you might think of the scoping of variable names.
We also have a function statement. Unfortunately, the function statement looks exactly like
a function expression. The only difference is that the name, instead of being optional,
is now mandatory. But in all other respects it looks exactly the same, and it is confusing
to have both. Why do we have both? Well, the function statement was the older thing, and the function expression,
which is really the more useful form, was added to the language later. What the function
statement does is it expands into a var statement which creates a variable and assigns a function
value to it. That expansion, because it's actually a var statement, splits into two
things. Except unlike the ordinary var statement that we saw earlier, both pieces of it are
hoisted to the top of the function, so things are not necessarily declared in the order
that you think they are.
It's confusing having both function expressions and function statements, so how do you know
which is which when you're looking at it? The rule is, if the first token of a statement
is function, then it's a function statement. Otherwise, it's a function expression. Generally,
function expressions are easier to reason about. For example, you can't put a function
statement inside of an if statement because of the hoisting stuff. You might want to have
a different function being defined if you take the else branch or the then branch, but
hoisting doesn't look at branching, and it happens before we know the result of the if,
so the language definition says that you can't do that. It turns out every browser lets you
do that anyway, but because the language definition doesn't tell you what it's supposed to do,
they all do something different. That's one of those edge cases that you want to stay
away from.
In this language we have function scope. In most other languages that have C syntax we
have block scope, but because of the way vars get hoisted, block scope doesn't work in this
language. In JavaScript, blocks do not have scope. Scope means that, in another language
such as Java, if you declare a variable inside of curly braces, it's visible only inside
of the curly braces and not outside. But that doesn't happen in JavaScript because of hoisting.
The variable declaration gets pulled out of the if statement and moved to the top of the
function, so the variables will be visible everywhere within the function. Only functions,
in this language, have scope. If you declare a variable in a function, that variable is
not visible outside of the function, but it's still visible everywhere within the function.
If you're coming from other languages, this can be confusing. For example, a function
like this will work in most other languages and will fail in JavaScript without an error.
What you'll find is that it will run forever, and that's because the programmer thinks he's
created two i variables, but in fact there's only one i variable. So the inner loop is
constantly resetting the i value so that the outer loop will never finish. That's something
to be aware of: in JavaScript, you can't be depending on block scope.
Because of hoisting, because of the way that variable statements and function statements
work, I recommend that you declare all variables at the top of the function and declare all
functions before you call them. In other languages the prevailing style is to declare variables
near the site of their first use, and in languages which have block scope that's good advice,
but I don't recommend it in this language.
We have a return statement. A return statement allows a function to return early, and also
indicates what value the function should be returning. There are two forms of it: there's
one that takes an expression, and one that does not. If there's no expression, then the
value that gets returned is undefined. It turns out, every function in JavaScript returns
a value, and if you don't explicitly say what the value is, it will return the undefined
value. Unless it was called as a constructor, in which case it will return the new object
that you're constructing. One other note: you cannot put a line break between the word
return and the expression. Semi-colon insertion will go in and turn it into a statement that
returns undefined, which is tragically awful.
There are two pseudo parameters that every function can receive. One is called arguments,
and the other has the unfortunate name of this. Let's look at arguments first. When a function
is invoked, in addition to the parameters that it declares, it also gets a special parameter
called arguments. It contains all of the arguments that were actually specified in the invocation.
It is an array-like object, but it is not an array, which is unfortunate. I'll show
you some examples of why that's unfortunate. It's array-like in that it has a length property,
so you can ask arguments how many arguments were actually passed to this function, which
might be different than the number of parameters that you specify.
It also has very weird interaction with parameters. If you change one of the elements of the arguments
array, you may change one of the parameters that it's associated with. If you do something
really scary like splicing on the arguments array, you may scramble and reassign all of
your parameters. Generally, you don't want to mess with the arguments array. While the
language doesn't require you to treat it as a read-only structure, I highly recommend
that you treat it as a read-only structure.
OK, let's look at an example. I want to have a function in which I can pass it some number
of numbers and it will then add them all and return the result. The way I do that is I
first look at arguments.length to find out how many numbers I'm going to be adding. Then
I will have a loop which will go through each of those members of the arguments' pseudo
array and figure out the total, and then when it's done it returns the total. This is how
you would write that in ES3, or in the third edition of the ECMAScript Standard. This gets
a little bit nicer in the fifth edition. In the fifth edition, arguments is more array-like
than before. It's more array-like in that it actually inherits, now, from array.prototype,
and array.prototype now contains some interesting functions like reduce. I can call arguments.reduce
and pass it a function that does adding, and the result of that will be to add up all the
members of that array and return it. I think it's a more elegant way of expressing the
same program.
Then we have the this parameter. I'm discovering that I don't like the name 'this' because
it makes it really difficult to talk about it. My first sentence: 'the this parameter…'
Already you're in trouble. I mean, it's just hard to talk about it in doing code reviews:
'oh, I see your problem, this is wrong.'
[laughter]
Well, you might be right.
So what is this? The this parameter contains a reference to the object of invocation. This
allows a method to know what object it is concerned with. It allows a single instance
of a function object to serve as many functions. You can take a single function object and
store it in lots of different objects, or put it in lots of prototypes, and allow it
to be inherited by even more objects. There's just one instance of the function in the system,
but all of those objects think that they have that method, and they will do the right thing
with it because they use this to figure out what object they should actually be manipulating.
So this is the key to prototypal inheritance. Prototypal inheritance works in this language
because of this.
We have the parens suffix operator, which is used for invoking, or calling, or executing
the function. It surrounds zero or more comma separated expressions which will become the
arguments of the function, and those arguments will be bound to the parameters of the function.
If a function is called with too many arguments, the extra arguments are ignored. You don't
get an error for that, they're just ignored. But they'll still go into the arguments array,
so if you want to find out about them they're still accessible to you. If a function is
called with too few arguments, that's not an error either. It will fill in undefined
for any things that you did not include. There's no implicit type checking at all, so if the
types of the parameters are important to you then you need to check them yourself within
your function.
There are four ways to call a function. There's the function form, the method form, the constructor
form, and the apply form. They differ in what they do with this. In the method form, we
have an object, and then we say dot function name or subscript, some method name, and then
pass them arguments that will call the function and it will associate this with whatever that
object was. That will allow the function, then, to manipulate this.
Then there's the function form, in which we simply take a function value and call it immediately.
In this case there's no object to associate this to, so in ES3 this was set to the global
object, which was just awful. In ES5/Strict we improve that a little bit: we now bind
this to undefined, which is less awful. But one problem with this form is that sometimes
if you have an inner function inside of an outer method, and that method wants the inner
function to have access to this, but it doesn't have access to it because it has its own this
which is different than the outer this. So in order to make this visible to the inner
function, the outer function can declare a variable, perhaps called that, assign this
to it, and then the inner function will have access to that.
We have a constructor form, which looks like the function form except we have the new prefix.
Now when the function is called, this is bound to a new object that inherits from the function's
prototype number. Then if the function does not explicitly return a value, that new object
will be returned. This is used very much in the pseudo classical style which we'll look
at a little bit later.
Then finally there's the apply form in which we use either the function's apply method
or its call method. What they have in common is they both allow us to specify what this
is. The value that this should have will be the first parameter. The difference between
them is that apply takes an array of arguments and call takes zero or more individual parameters,
which will become the arguments.
I showed how to define call in terms of apply, and also show a little bit of the ugliness
that's caused by the fact that arguments is not a real array. What I want to do to implement
a call is I want to take all of the parameters that were passed except for the first one,
and I do that by using the splice method — except arguments doesn't have a splice method in
ES3, so instead I have to go out and find it. I know that I can find it at array.prototype,
so I go array.prototype.slice.apply, and then I can take that piece of arguments. Really
awful. Again, we fix that in ES5 a little bit.
To summarize, this is a bonus parameter, and its value depends on the calling form. If
its call is a function, it's bound to either the global object in ES3, or to undefined
in ES5/Strict. If it's called as a method it's bound to the object containing the method.
If it's called as a constructor, it's bound to the new object being constructed. And if
it's called in the apply form, then we explicitly pass in an argument that determines what this
is going to be.
We call these things functions, but they don't behave exactly like mathematical functions.
In a mathematical function you would expect that every time you use a function with a
particular set of inputs, you should get exactly the same outputs. There are some programming
languages in which people are trying to match that ideal and there is some attractiveness
in doing that, because the behavior programs write is more predictable, it's easier to
reason about them, and they're also a lot harder to write. Because it turns out that
programs, in order to be interesting, are interacting with the world, and the world
is always different. The functions are always going to be dealing with different things,
so they'll tend to want to keep state, and to manipulate that state and to mutate things.
So functions will tend to have side effects. In JavaScript you can program in the pure
functional sense, in that you can assume: OK, I'm never going to assign to a variable,
and I'm never going to change any object once it's created. And the language will let you
do that, but you're going to find it's really hard. We tend to change things a lot, because
it's just an easier style of programming.
Where did functions come from? Originally, there was something called the subroutine.
The subroutine began life back in the assembly language era, where you'd want to be able
to define your own op codes, and you could take a bunch of instructions that you used
frequently and create a pseudo op and call it. Subroutines were born. They introduced
the idea of call and return, where we call the thing and when it's finished it comes
back and we resume from where we were. That idea has been in virtually every language
since then. In different languages they've been called subs, procedures, procs, funcs,
functions, lambdas, but it's all the same idea of taking some specification of computation
and packaging it so that it can be re-used conveniently.
The first motivation for sub-routines was code reuse. The first generations of computers
had really small memories, so in order to get programs to fit you'd want to take pieces
of the program that were recurring and factor them out so that they were only there once,
and then call them. That was the only way you could hope to get it to fit. It turned
out that was such a good idea that was then used in the design of programs. Treating a
program as a single, monolithic list of instructions was too difficult to reason about, so if we'd
divide and conquer that program into smaller components then we can think about those components
more easily. A subroutine or function was a natural form for doing that.
The next step was using them to do modular things — for example, to create libraries
of routines that could be loaded with any program so that you could have stuff that
could be reused from one program to another. That led to a sense of expressiveness where,
in thinking about how to design an application, you would first think of what set of subroutines
would make it really easy to write this application, essentially designing a programming language
expressed as subroutine calls, which are ideal for implementing this application, and then
write those subroutines. The next step up was higher order functions, in which we're
going to do things with functions which couldn't be done otherwise. That's when some of the
power of the language really starts to work for you.
One of the cases where that occurs is in recursion. Recursion is when a function calls itself,
or is defined in terms of itself. Now, at first this didn't make sense to some programmers.
For example, if you were working in FORTRAN, FORTRAN couldn't do this. It was not possible
for a function to call itself in FORTRAN. A lot of very good programmers looked at the
idea of recursion, and reasoned: well, I've never used it, and I don't understand the
need for it, therefore it could not be very important. It turns out it's actually really
important, and when you learn to think recursively you become a much stronger programmer.
One of the classic algorithms for a recursive solution is the Quicksort, which was invented
basically because ALGOL was invented. Expressing this in a recursive programming language turned
out to be really easy. There are basically two steps in Quicksort. The first is you divide
an array into two groups: all the big values and all the small values. One way you could
do that is you could have two pointers that are going through the array, and one starts
on the small end and one starts on the big end, and when either finds something that's
in the wrong group they swap them, and then continue scanning in until they meet. When
they meet, you're done.
Then you go to step two, where you take each of those groups and call Quicksort on those
groups, and you're done. That's the whole sort, and it's really fast. There are more
optimizations you can do to it that make it even faster, but just doing what I was describing
in the average case is n log n, which is really good for a sort, and you hardly do anything.
It's just really, really simple. So once you can learn to think recursively, a lot of really
interesting things fall out.
Here's another kind of recursion. You might recognize these from the JSON language. We've
got the syntax diagrams for values and arrays, and you might notice that there's a dependence
issue going on here where a value can be an array, but an array can contain a value. A
naïve programmer might struggle, thinking how do I organize my functions in order to
parse something like this? I've got this circular dependency and that's really hard, but it
turns out if you have recursion working for you, there's a trivial solution to this.
Here I have two functions. Each exactly implements one of those syntax diagrams. I've got the
value which, when it sees a square bracket, will call the array method and array function,
and return whatever it returns. Then I've got my array function, and for each of the
things that it finds it calls value to find out what it is. Here I've got mutual recursion
going on, and it all works out. You don't have to think about how to manage the transition
from one to another, just the ordinary function plumbing does all of that work for you. You
don't even have to think about it.
Lisp had this stuff going on in 1958, ALGOL had it in '60, but it took awhile to get into
the mainstream languages, partly because people couldn't think about how to implement it efficiently.
That time, the way subroutine calls worked, on most machines you had self modifying instructions,
so when you called a function it would destroy whatever is in the first word of the function
and replace it with a jump back to the place that it was called from, and you couldn't
do that recursively because once you've clobbered that address there's no getting back. So you
needed some other place to keep the return values, and eventually that turned out to
be a stack.
All modern CPUs now have support for that, usually in the form of auto incrementing or
auto decrementing pointer instructions. These assembly language notions eventually found
their way into programming languages, so these things come straight out of assembly language
but found their way into C and then into Java, and everything else. I don't like them. They
look way too primitive and fish brained to me, if you know what I mean. I think we can
do better than that.
One of the other key ideas which was, again, alien to people who'd never used it, was closure.
People who were working in languages in which closure was not an option were like 'I've
been programming for years without it, I don't understand why you'd ever want it.' But it
turns out JavaScript's got it, and it's really, really good. That's where we're going to spending
most of out time tonight. It's sometimes called lexical scoping, sometimes called static scoping.
It has to do with how variable names are resolved in nested functions. The context of an inner
function includes the scope of the outer functions, so all of the variables that are in the outer
function are available to the inner function, and this continues even after the parent function
has returned. That sounds kind of weird, so I've got a lot of examples to show you what
this means.
I'll start with a simple one. I've got a function called digit_name, and digit_name will take
a number as an argument, and will return the name of that number in English. It will take
advantage of an array of strings it stored in names. As you can see, it's a really simple
function. Unfortunately, the way I've defined it here, names is a global variable. The problem
with that is, if there's anything else in the environment that is also a global variable
that has that name, they're going to interfere with each other and will probably cause this
to fail. That's something you cannot test for, because it's impossible to test with
everything that might be loaded on a page. For example, it might be that a third party
ad gets loaded one day that happens to have a global variable called name, and now your
page died. That's intolerable, so we want to, as much as possible, reduce our dependence
on global variables.
One way we could do that is to rewrite this program so that names is now a local variable
of the digit_name function. And that works. It's a local variable, we have function scope,
names is not visible on the outside, so even if an evil ad comes in and has a names variable
it will not interfere with this one, so that's good. This is a much more reliable version
of the function. Unfortunately, every time we call the function, we're going to allocate
a new array and stuff ten things into it, which is going to take some time. We don't
want to do that; that's a terrible waste. In this case it's a fairly trivial thing,
but we might have a more complicated function with a more complicated initialization, so
we want to be able to factor that out. Closure provides a really nice way to do that.
Now I have a function and it has a private names variable, and it returns a function.
The function it returns is assigned to digit_name. The important thing is, notice at the bottom,
we're invoking the function now. We're invoking the function immediately, so what I'm storing
in digit_name is not the whole function, it is the function that it returns. OK? This
is really important. In order to give the reader a clue that there's something interesting
going on here — because assigning a function looks almost the same as assigning a function
that's immediately invoked — I wrapped it in parens. The whole thing is wrapped in the
golden parens. That's a clue to the reader; it's not required by the language, but I think
it is required by humans. It gives us a clue that there's something really interesting
going on here.
We assign the return value of the outer function to digit_name. The outer function has now
returned, digit_name now contains a function, which is the green function. That green function
still has access to names, even though names is a private variable of a function that's
already returned. That's closure: one function closes over the variables of another function.
This turns out to be one of the most important features in JavaScript; this is the thing
that it got amazingly right. This is the thing that makes JavaScript one of the world's brilliant
programming languages.
There's another pattern going around called lazy function definition. I show you think
as a warning. Don't do this. The idea here is that, in this form, I unconditionally initialize
the function before we're going to start calling it. But what if the initialization is really
expensive, so we don't want to do it unless we know the function is going to end up getting
called at least once? This lazy pattern attempts to do that. What it does is it assigns to
digit_name a function, and when that function is called it will then store another function
into the same variable. So it'll replace itself, it'll modify itself. The idea here is that
that allows us to avoid having to initialize the thing, if we don't need to do it.
But it comes at a cost, and the cost is confusion. Digit_name is no longer first class in that
if I were to pass it to a function and let that function call it, or if I were to assign
it to an object and let someone call it as a method, every time it gets called from that
point on it will do the initialization and stuff a new function into digit_name. Instead
of making it faster we've actually made it slower. It's slower than the slow case we
started off with. Now, the counter-argument is, OK, you've got to be really careful to
not do that, so one of the rules we'll put in the documentation is that this can only
be called from the global variable, you can't use the function value as a function value
except to call it immediately, and that it's worth it because we're saving the initialization
cost. It turns out that analysis is wrong. All we're saving is the cost of an if per
iteration, and let me show you why that's the case.
Here we're going back to the closure form, except I put an if statement in it, so that
if names hasn't been initialized yet, we'll initialize it now, and then we'll do what
we always do. The cost of this compared to the previous one was one if statement per
invocation, which is in the noise, it's not even measurable. The optimization that we
were hoping to get in the lazy form just doesn't pay off, and we get weirdness instead. Now,
an argument about that might be: well, suppose we call this function a million times, or
a gazillion times. A gazillion if statements, that starts to add up to something. You can
go yeah, maybe that's true. But if you think you're really going to call this a gazillion
times, we shouldn't be optimizing the case where we're not going to call it at all.
[laughter]
I thought I heard some applause there. Maybe not.
[laughter]
OK, here's another example. A fade function. This is something you might do in an Ajax
application. I want to take some object — maybe a div or something — and have it fade from
yellow to white, maybe as an indication to the user that something changed and they should
pay attention to it. I've got my fade function. First thing I do is find a DOM element and
create a variable called level, which I'll set initially to 1. Then I'll define a step
function, and then I will call setTimeOut, passing that step function with a time, so
it'll fire in a tenth of a second. And then it returns. Done. That's the end of fade.
Then suddenly, a tenth of a second later approximately, the step function executes. It will first
define a variable H, and initialize it with level.
What is level? Level is the variable of fade. It's not the value of fade when it was created,
it is the current value, it is the current variable. It does the same thing with DOM
— it gets access to the DOM variable and uses that to change the background color of
that DOM node. It then looks at level, and if it's less than 15 — which it will be,
at this point — it will add 1 to it. It's adding 1 to the level variable of the fade
function that's already returned, and then it will call setTimeOut, and in a tenth of
a second will do this again. It will keep doing it until eventually we reach 15, and
then we stop.
Now, suppose we had three things on the page and we wanted them all to fade simultaneously.
We call fade 1, 2, 3, with three different IDs at the same time — are those three executions
going to interfere with each other? No, not at all. Because each invocation of fade has
its own unique set of variables: its own DOM, its own level, creates its own step functions,
and they do not interfere with each other at all. So this works, again, because of closure.
Because step is able to close over the DOM and level variables, it just works. Everybody
still with me?
OK, one more example along these lines. I want to make a later method. It's like setTimeOut
except more object oriented, so I want it to be a method of all objects. I can take
for any object, call later, give it the number of milliseconds in which to wait. It doesn't
actually wait, it puts it on timer queue, and eventually it'll get around to dispatching
it. Give it the name of a method, or perhaps pass in a function which will be treated as
a method, and then the other parameters of that method would need. On the next screen
I'll show you what it looks like.
But again, I'll point out the problem with arguments. What I'm going to want to be able
to say is: arguments.slice(2), so that I can take all of the parameters that were passed
except for the first two and make a nice little array out of it. I can't do that in ES3, instead
I have to write array.prototype.slice.apply(arguments, [2]), which is pretty nasty. So when you see
that on the next screen, you'll know why that is. In ES5, you can do the simpler thing.
I'm going to add this to object.prototype. I could add it to any of the ancestors of
my application. This is one place to put it. Object.prototype is a global object and all
of the problems you have with global variables you have with global prototypes as well, so
this is something you want to do really cautiously. You want to do it conditionally, just in case
the language ever actually adds later as standard equipment, so that you're not going to be
replacing the official version with your version. Generally you don't want to be doing this
in applications, although it's sometimes a reasonable thing to be doing in Ajax libraries.
In this case, if we don't already have an object.prototype.later method, we're going
to define one. We're going to pass in the number of milliseconds in the method, and
then we'll create an array of the additional arguments. We're binding that to this; it's
doing the thing I showed you before, because in the green function we're going to want
access to this, but this doesn't work. This is not captured in closure. But that is, and
so that's how we get that into it. That will call setTimeOut, and will cause that function's
method to get invoked at that time.
One other thing I'm doing here is when later is finished, which happens immediately, it
returns the value of that, which is also this. The advantage of doing that is it allows us
to then cascade on that. So if I had several things that I wanted to have happen later
but at different times, I could say myObject.later5.later10.later20, and so on. I could just cascade all these
things one after another because each returns its own object, so we can then go right on
and invoke the next one. There are a lot of Ajax libraries that carry this idea to excess,
but it's a really nice pattern, and I think it works really nicely in this language.
Another example: partial application. We're starting to get a little theoretical now.
Partial application says I'll take a function and a parameter and return another function
which doesn't execute that yet, but will when it's supplied with additional parameters.
Let's start with the example first.
Using a function called curry, I'm going to pass it an add function — which takes two
arguments and adds them together — and I'm going to pass it 1. It will return a function
which will add 1 to whatever gets passed to it. I'm going to store it in increment, because
that's a good name for that, and then I can call it. So if I now pass a 6 to inc, I get
7. This is called partial application. The implementation of it is, I'll first get an
array of arguments, except for the first one, because the first one is the function and
I don't need that one. In this case I'm assuming I'm on ES5, so I'm not doing the awful array.prototype.apply
trick. Then curry returns a function, and that function will apply the arguments to
the function.
One bit of weirdness that's left over from arguments not being a real array is that if
I pass arguments as a parameter to concat, it doesn't recognize that it's an array and
then take all the members of it and concatenate them to the other thing. It will concatenate
them as a single array, which is not what we want, in this case. We need to turn it
into a real array so that concat will do the right thing to it, and we do that by calling
its slice method. ES5 has the slice method, so slice returns an array, and that will work.
But we shouldn't have had to do that; there's still some things left to get fixed in future
editions. Everybody still with me?
OK, here's one other. Suppose we've got a process which cannot be resolved immediately.
Maybe it's going to require a lot of computation, maybe it has to go out to a worker pool and
do something, maybe it has to go back to the server and get some stuff. But we'd like to
be able to return something immediately that we can start acting on, even though it's not
going to be real for awhile; we don't know when that while is yet. A service that's doing
something like that could return something that's called a promise, and the promise is
an object which allows us to call methods on the thing. If we know what the thing is
then it will immediately get executed. But if we don't know what the thing is yet, it'll
get cued up. It will finally get executed when we know what the thing is. That turns
out to be a really useful pattern for doing a lot of things, particularly when you're
doing a lot of communications.
Here we're going to implement a promise maker, and the promise maker will return a set of
five functions: when, fail, fulfill, smash, and status. You could pass any one, or any fraction of
these functions to someone else. For example, you might have a service, and I want to return
something to you immediately. I give you back an object containing a when and a fail method.
You can then pass to when functions that you want called when the thing is fulfilled. You
can also pass functions to fail for the case where a failure comes back. It'll just sit
on all those things until it knows what the disposition is.
And then the creator of the service might hang on to the fulfill and smash methods.
Fulfill he'll call and pass a value in when he knows what the value finally is, and that's
the thing that will get delivered to the functions. If it turns out that it's going to be an error,
at this point it turns out it's too late to throw an exception because that was a long
time ago, and the other guy's not in your call stack anymore, so instead you smash the
promise, you break the promise, and that will cause all of his fail methods, now, to run.
The way these things work is they depend on the vouch and resolve methods, which are private
to the promise maker. But again, it closes over, so it'll always have access to those
functions and the state that they refer to. Let me show you implementations of vouch and
resolve.
First we've got a few more variables. We've got status, which initially is unresolved,
and eventually could be fulfilled or failed. We've got the outcome, so when we know what
the value is we'll stick it in there. We've got the waiting list of functions that were
registered with when. And we've got the dreading list for the functions that were registered
with fail. Then vouch will take a deed and a function and then it'll look at the status.
If the status is still unresolved, then it will put it onto one of those lists. Which
list it will put it on will depend on what the deed is. But if the current state of the
promise matches the deed, then we can execute it immediately.
Then the other piece of this is resolve. If the status has already been resolved then
we throw an error, because we can only do it once. Otherwise, we'll go through and use
one of the nice thing in ES5 now: we've got a forEach method. We'll figure out which of
those two arrays of a function we've got, and we'll say for each one of those functions,
'call this function'. This function will then go and call each of those with the value.
We had to wrap it in a try catch, because if any of those functions should throw, we
don't want that to interfere with the other functions getting a chance to run. OK, everybody
still with me?
We'll look at one more: sealers and unsealers. Sometimes we'd like to be able to pass secret
information around through the application. Say that I give to you a secret envelope and
tell you to give it to the cashier, and the cashier will take care of you. I want you
to be able to take that envelope to the cashier and get reimbursed, and I'd like you to be
able to give that envelope to someone else and allow them to be reimbursed. But I don't
want you to be able to open it yourself, I don't want you to be able to tamper with it,
and I want the cashier to be able to verify that it is, in fact, the original un-tampered-with
thing. We can do that really easily in JavaScript, it turns out. It sounds like something you'd
need cryptography to be able to do, but that doesn't really work inside of an application.
But it turns out there is a much simpler solution.
The way is works is I've got a sealer maker which will return a pair of functions, a sealer
and unsealer, and they have to be used in the pairs. I will keep the sealer, and I will
give the unsealer to the cashier, and then I can call the sealer with the value that
I want to give to you, and it will return to me a box which I can then give to you.
The box is useless to you, except that if you can give it to someone who's got an unsealer,
they can reclaim the original object. This function is a tiny bit harder to write than
it should be, because in JavaScript object keys have to be strings, they can't be objects.
If they could be objects, this function would be totally trivial. As it is, it's just slightly
trivial.
What I will do is I'll create the box, the secret container, which is just an empty object.
It's really just a token; I'm not actually giving you a real box, but it acts like a
box. I'll store it in my box's array, and right next to it I will store in my value's
array the value that it represents, and then return the box to you. That was really easy.
Then the unsealer uses the new indexOf method that we have in arrays, and goes looking for
that box in the list of boxes. If it finds it then it returns the corresponding value,
and then we've got it. If something goes wrong, if you pulled a substitution, gave an object
that was not sealed, you get undefined back, which is how it should be.
We're going to shift slightly and start looking at inheritance, but we're still going to reflect
it back onto what we can do with closure. Here's an example of how you can do things
with what I call pseudoclassical inheritance. This was the inheritance scheme that was designed
for the language, and I really don't care for it at all. I don't think it looks very
good.
Here we're defining a gizmo, and you can see the gizmo's constructor. Then we add to the
gizmo's prototype the methods that we want the instances to inherit. This just looks
really weird. We're sort of used to the idea of a class containing all of its stuff, and
in this case it's kind of hanging on the end of it in a haphazard way. It also induces
people to do things incorrectly. For example, I've seen people trying to assign functions
to prototypes inside of the constructor because it just seems like that's where you should
do it, and doing it on the outside just feels wrong even though that's how you're supposed
to do it.
It gets even worse in the case of the hoozit where I want the hoozit to inherit from the
gizmo. The way I specify that in the language is I replace hoozit's prototype with a new
instance of gizmo, and that just looks crazy. And it's potentially dangerous. It turns out
that the gizmo constructor would throw if there were no parameters, then it would actually
fail. But this is the way the language was intended to be used, and it's because the
language itself is confused about its prototypal nature. I think there's a better way to do
this. So let me suggest another formulation of exactly these same objects.
Here I'm going to make a gizmo, and to make it for me I'm going to call my new constructor
function. It will make the new instance of gizmo, or the new definer of gizmo. I will
pass to it object because I want gizmo to inherit from object. I'm going to pass to
it the constructor function, and I'm going to pass to it an object containing the methods
that it should add to its own prototype. This does exactly the same thing that we saw on
the other screen, but I think it's just more pleasant looking.
Then it gets even better with the hoozit. With the hoozit I call new constructor, pass
in the gizmo that says I want hoozit to inherit from gizmo, and I also pass it a constructor.
I'll also pass it an object containing additional methods that I want it to add to its prototype.
To my eye, this looks a whole lot more rational than that did, with all the stuff hanging
out and the weird replacement. The language doesn't provide the new constructor function
that you need to do this, but it turns out it's a really easy function to write. So let's
write that function.
Function new_constructor takes three parameters: extend, initializer, and methods. The first
thing it does is it creates the prototype object, which it makes by calling object.create.
Then if there are methods available it will call the keys method — this is a new thing
in ES5 — which will return an array of all of the own keys of that object, which is really
nice because an array has a forEach method, so it will then call that. That will allow
us to easily copy all of the methods into the prototype. It's a really nice construction.
Then we'll create the function itself, which we'll use to make our hoozits or whatever,
and you can see that closure's working in there because it has access to prototype,
and it has access to the initializer.
So it will create a new instance of the prototype using object.create, which makes a new object
that inherits from the object that you pass in. It will then call the initializer, passing
that same object in, and when it's done it will return the object that we just created.
So this does the same thing as new, except we don't use new. Then a little bit of extra
plumbing — we don't really need to this, but just to be nice we'll set the function's
prototype property to the prototype, because in the case of the hoozit, the prototype got
replaced, so we lost the constructor value. We'll fix that there, as well. Again, we're
using closure in order to implement a classical pattern, and I think this works really nicely
in the language.
Another thing we can do with functions is to create modules. We'd like to be able to
minimize using global variables because of the conflicts that they can create, and functions
provide a very nice way of doing that. Here I want to create a singleton object — there'll
just be one instance of it — so you don't want to have to create a class to define something
there's just going to be one instance of; that'd be silly. So I'm going to assign to
singleton not that function, but the consequence of calling that function. Again, I'm wrapping
the whole function and the invocation in parens as a sign to the reader that there's something
bigger going on than just assignment of a function.
There are some people who would put the golden paren around the function, and not around
the whole invocation. That doesn't make sense to me, because what we're trying to tell the
user is: look at the whole thing. Putting parentheses around just part of it is, I think,
counter productive. I think the whole thing needs to be wrapped in parens. The outer function
has variables and functions, and they will return an object using an object literal,
and the object will contain some methods. Those methods will be closed over the private
stuff. We're returning, in this case, two functions. In the earlier cases we returned
one function, but this time we're returning two. We could return as many as we want. And
they share their access; they're both closed over the variables of the parent function.
So they can communicate through that shared state without corrupting the global space.
A related pattern to this is if we want to have a common global object where we'll keep
our whole application. At Yahoo! we keep a lot of stuff in a global Yahoo! object, so
everything that's ours we keep in one common namespace. I want to add a new thing to my
global object called methodical, which will have my two methods in it. Just as before,
I'm going to be assigning the result of my function into that object.
Now, sometimes I want to be adding not a new object but just a couple of methods to that
structure. I can do that as well. Here's another variation on the same pattern. I've got a
function, and it's got the private stuff, and then I'm going to assign to GLOBAL.firstMethod
my first method, and to GLOBAL.secondMethod my second method, the other one. Again, the
whole thing is wrapped in the golden parentheses. In this case, the parentheses are syntactically
required, and that's because I want this to be a function expression and not a function
statement. If it were a function statement, I couldn't immediately execute it, and I want
to immediately execute it. Everybody still with me?
I can take this module pattern and very easily turn it into a constructor pattern. It's the
same basic idea, I'm just going to make lots of instances, not just one instance. Here's
the recipe. Step one: make an object using any of the techniques available in the language.
I can use an object literal, I can use new, I can use object.create, I can call another
of these power constructors and use the thing that it returns. Then step two: I define some
variables and functions, and these will be the private members of the object that I'm
about to make. Step three: I augment the object with privileged methods. A privileged method
is a method which has access to that private state, that closes over the private state.
And step four, I return the object. Really simple recipe, but it's a little abstract,
so let me turn it into a template that's a little easier to follow.
Step one. This is going to be my new power constructor, and I'm going to create a variable
called 'that'. I can't call it 'this', because 'this' is a reserved word. I will initialize
it somehow; somehow I'll turn it into an object. Then step two, I declare secrets, the secret
variable, stuff that's going to be available to my privileged method. Step three, I create
my privileged methods and assign them to that. Step four, I return that. So it's really simple.
Here's gizmo and hoozit again. This is how we would write it, again, in the classical
style, pseudoclassical style. It so bothers me how all this stuff's hanging out. Also,
gizmo's got a constructor, and hoozit's got a constructor, and they both do the same thing.
So even though one inherits from the other, we don't get the advantage of that code reuse.
There's some redundant waste going on there. I want to apply this functional system instead
of doing this. This is how we'd write it. I've got my gizmo, it returns an object literal,
done. That was really easy. Then my hoozit calls gizmo to create an instance, it augments
that, adding its test method, and returns that. Done. So it's really simple.
But there are some other benefits that come from writing in this style. One is that we've
got privacy. Right now, with the way it's written, the ID is a global property of the
object, so anybody could go in and get the ID directly or modify it. Maybe I don't want
them to be able to do that, maybe the integrity of my object depends on nobody being able
to mess with the ID. Writing this in the functional style, we can do that — not only can we
do that, the code gets simpler. We just don't have the ID property in the object. We're
referring now to the ID parameter, and because of closure, our two string method always has
access to that parameter. So we just took the 'this's out, and it's done. We do a similar
thing with hoozit. So again, it just became simpler.
There are other things we could do, too. We could have a shared secret which we pass between
all of the constructors, which could be used to simulate something like a package relationship,
where they all contain something that they know. You can get arbitrarily complicated
with this stuff; you usually don't need to get anywhere near this fancy, but it's nice
knowing that you can, if the need should ever arise.
When I started working with this language, I spent a lot of time thinking about how to
simulate things that we did in the classical languages, like how do we get super functions?
In the pseudoclassical model there's no easy way to write super functions, but in the functional
style it's really easy. Just capture a super function from the thing that I'm inheriting
from, keep that in the closure, and then I can call it at any time I want. It turns out,
though, in my career with this language I've never once written a super function. I just
think about things in a different way so that that style of dependency that I've come from,
I just haven't found the need for it. So if you find yourself wanting to have super functions,
you might step back and figure out: why do I think I need that? Maybe there's a simpler
way to think about this.
Here's another thing we can do. I want to have a memoizer, which will remember the result
of previous callings of a function — particularly recursive functions — so that we can avoid
doing some work. For example, factorial can be given a recursive definition in which it's
the product of the value and of calling factorial on the value diminished by 1. If you're computing
a table of factorials, you could spend a lot of time going over the same ground over and
over and over again, and this function will prevent that.
What I'm going to pass to the memoizer is an array containing some of the values that
we're going to remember. The results for factorial of 0 and factorial of 1 will be 1 and 1, so
we'll pass that in to get it started, and then we'll also pass in a function that defines
what a factorial step is. In this case, it's multiplying n times the recurrence minus 1.
When we go up to the memoizer it takes that memo array and it takes the formula we just
passed in, and it will create a recurrence function, which is the thing that will call
for each iteration, which will first look to see if we already have the result that
we need in the memo array. If it does, then we're done. If not, then we will call a formula
passing in itself, its own recurrence function, so that it can do the next step.
Where this is a big win is in computing Fibonacci, because Fibonacci recurs on two legs at the
same time, so it gets explosive. If you do a Fibonacci of 40, say, it's in the trillions
of iterations, and this gets it down into the tens. So even though the program looks
a little bit more complicated, it's hugely more efficient. Again, this is happening because
of closure, because the recur function closes over the memo array and over the formula that
we're recurring on.
One bit of warning about functions: don't declare functions in a loop. Don't make functions
in a loop, for two reasons. One is it can be wasteful, because a new function object
is created for each iteration. It's just wasteful. JavaScript compilers tend not to do any kind
of loop and variant analysis, so anything you're doing in a loop that doesn't change
over each iteration, you probably want to move it out of the loop anyway just to make
it go a little faster. But the bigger reason is that it gets really confusing, because
you think that you're closing over the current value of the loop variables but you're actually
closing over their final values, or their current values, and that's almost always not
what you want.
Let me show you an example of a really common error. Say you've got an array of divs and
you want to attach an event handler to each one. You go through the array in a loop and
for each one you want to add an onClick handler which will display its ID number when it's
clicked on. What you find is that they all come up with the same number, and it's the
wrong number. You wonder, how did that happen? It's because when you add the function to
onClick it's closing over div ID, which is constantly changing. By the time you finally
get around to clicking on them you're going to be getting the final value, which was the
value that kicked you out of the loop.
The way you get around that is by creating a separate function which you're going to
use to assign the functions to the event handler. Here I have a function called make_handler
which will take the div ID and return the event handler function. Then within the loop
we call make_handler and take its result and stuff it into onClick. By doing that, we avoided
creating any functions inside of the loop, and that way we avoided the confusion that
came from that problem with closure.
Here I have two versions of the factorial function that do exactly the same thing. The
only difference is that one of them uses a variable, and the other uses a parameter to
represent result. Otherwise, they're exactly the same. R. D. Tennent wrote a book called
'The Principles of Programming Languages' in which he demonstrated the Principle of
Correspondence, which was a correspondence between variables and parameters. JavaScript
demonstrates it really well. This shows that you could imagine a subset of JavaScript which
didn't have variables — would that still be a useful language? It turns out yes, and
this is the proof that anything you can write with variables you can write without variables.
You can use a function closure instead to do the same thing.
We can take that thought experiment one crazy step-off-the-edge farther. Suppose we had
a language in which we didn't have variables and in which we didn't have assignment, and
we didn't have named functions. Could we still do recursion? It turns out you can. I'm not
sure you'd want to, but you can. Here is the strangest artifact in computer science: it's
called the Y Combinator. It's a function. It's a really complicated function, although
it's not very big. It's incredibly nested; functions within functions, calling themselves,
passing themselves as parameters to themselves. I call Y passing in a factorial formula. It
returns a function, and the function it returns is the recursive factorial function.
This is really wild stuff. If you can figure this out, you can call yourself a computer
scientist, because this is the really good stuff. You can express this stuff in JavaScript
— I mean, JavaScript is right up there with Lisp and Scheme. It is a functional language.
You can do this stuff. While this may have little practical value, in terms of increasing
your powers as a programmer, this is the stuff to be playing with. You can get really, really
deep. I see a lot of people playing with their Ajax stuff, or wanting to show off — look
at all the stuff I can do — and sometimes doing things which are probably reckless and
ultimately not very smart. If you want to show that you're really smart, you ought to
be doing this stuff. You know, off to the side, where you're not going to hurt anybody.
[laughter]
JavaScript has good parts. It has really good parts. And these, I think, are the best of
the parts. Again, this comes as a big surprise, because when JavaScript was introduced nobody
expected there was anything good about it at all. The stuff that is good about this
language is in there intentionally, by design, it wasn't accidental. You don't get stuff
this good by accident. This is an amazingly good language. And that's why Ajax happened
— we'll be talking a lot more about Ajax next week.
The reason I was able to discover that JavaScript had good parts was because I knew something
about functions. The place where I first learned about functions was in a little book called
'The Little LISper', which I highly recommend to you. The current edition of it is called
'The Little Schemer' — it was updated to be about Scheme. It's not really about Scheme;
there isn't very much Scheme in the book. It's mostly about functions, and it's really,
really good.
It turns out that everything in the book can be written in JavaScript. Although Scheme
and JavaScript couldn't be more different syntactically, at their roots they're surprisingly
similar. There's a simple transformation from one language to the other; it's surprisingly
simple. If you go to this web page, it'll show you exactly what they are, and that'll
give you enough to be able to read and write the examples in the book. I highly, highly
recommend that you go out and get this book. It will change the way you think, and there
are very few books that do that. This is one of those books.
Next time we meet: The Metamorphosis of Ajax. It'll be awful.
[laughter]
See you here. Thank you, and good night.
[applause]