"Verbatim" - Erin McKean speaks at Google


Uploaded by Google on 16.07.2007

Transcript:

SPEAKER 1: Erin and I met over a shared enthusiasm for Buffy
the Vampire Slayer.
Erin has a wide-ranging interest in popular culture a
passion for E. Phillips Oppenheimer, an interest in
the American corpus of spoken and written language, and her
own website www.dressaday.com, which addresses the importance
of 1950s novelty fabrics and circle skirts.

Erin's interests are quirky and wide-ranging, and her
scholarship is thorough.
I think you'll enjoy hearing from her.
Ladies and gentlemen, Erin McKean.

ERIN MCKEAN: Wow.
Betsy's now my official PR person.
I am really, really, really happy to be here, because I
feel about Google the way most people feel about rock stars.
I am this close to writing Google fanfic.

So when I was talking to Pamela about what I could
possibly talk about today, I gave her a list. And she said,
talk about everything.
So that's what I'm going to try and do.
I'm not going to talk about everything in alphabetical
order, which is what I prefer.
But before I started, I was at a design conference in
Pasadena a couple days ago for the Art
Center College of Design.
It was overwhelming.
It was really exciting.
And I got to meet a lot of designers and
very creative people.
And one of them was someone who worked at Applied Minds.
And she said that before I get started ever talking to
anybody, I should always tell them what my favorite word is.
And when somebody who works with Danny Hillis from Applied
Minds tells you to do something, you do it.
So my favorite word, I'm a little embarrassed to admit it
because it's a little egotistical, because my first
name is Erin and my favorite word also starts with those
four letters.
It's erinaceous, which means of, like, or
pertaining to hedgehogs.

And once you know about that word's existence, you just
look for reasons to use it.
You know, I really wanted to be nice to him, but he was so
erinaceous.
Very prickly.
I like to use it about prickly people.
I like to use it about people who like to go through
underpasses under streets in Britain, which is also things
that hedgehogs do.
But the first thing I wanted to talk about today is 10
things that I wish everybody knew about dictionaries.
This is a very selfish topic to speak on, because every
group that a quote unquote "educate"
makes my life easier.
But the first thing that I wish everybody knew about
dictionaries is that there is, in fact, not one book called
The Dictionary.
You will never find a book on a bookshelf where the spine
says The Dictionary.
People often believe there's this platonic ideal of the
dictionary that is actually instantiated that all
lexicographers in the world work on the same tome, and
that there is this one size fits all.
Every word that's ever been in any dictionary is in The
Dictionary somewhere.
They never have actually managed to point me towards
the physical location of The Dictionary.
But one size does not fit all with dictionaries.
Because a dictionary is a tool.
How many people here have more than one screwdriver?
Everybody's got more than one screwdriver,
because it's a tool.
You need Phillips head.
You need flat head.
You need those really cool ones that help
you fix your computer.
You have plenty of them.
Because you want one for every job.
And a dictionary is also a tool.
And you want one for every job.
How many people here have more than one dictionary?
Hey.
Oh my god, you are my people.

So you guys already know this.
I could just cross this one off the list. But you'd be
surprised how many people write to me and say, I need to
use this word, and it wasn't in my dictionary.
And this is often by email.
So I email them back, and I say, what dictionary was it?
And they say, oh, well, it's my breast pocket dictionary.
And I'm like, well, when was it to published?
And they say, 1978.
And often, the word they're trying to find is internet.

And so, I was like, OK.
You have just tried to drive in a nail with a screwdriver.
You used the wrong tool for the job.

People often come up to me and say, what
dictionary should I use?
And they often expect me to say, well, everyone should use
the OED, the Oxford English Dictionary.
That's the wrong tool for a whole bunch of jobs.
For one thing, they started revising it in M. So anything
prior to M has been fairly spottily revised since 1989,
and in some cases, since 1936, and in some
cases since the 1800's.
So there are a lot of gaps there.
It's a very specialized tool.
But it's not the right tool for everybody.
And then, some people then expect me to say, in a very
kind of booster-ish marketing way, you should use the Oxford
American Dictionary.
Of course, that's the best dictionary for all
circumstances, all purposes.
But I really feel that people should use the dictionary that
fits well with them.
Find the one that you like.
Find the one that feels good to use.
Because a tool that you don't might as well be a
paperweight.
And I know that people use their dictionaries as door
stops or to press flowers, or to prop up toddlers on chairs
at dinner time.
And these are all good and reasonable uses for a
dictionary.
But it's not the intended use.
It's all off-label use of a dictionary.
So find a dictionary that fits with you.
Look up one of your favorite words, and read the
definition.
See how it feels to you.
Look up a word you know a lot about.
Does the dictionary jive with your way of thinking?
You want a dictionary that fits with your mindset.
Look up a word that you don't know anything about,
especially that you don't know how to pronounce.
Cover up the head word, and read the pronunciation.
And see can you guess how to pronounce it from the
pronunciation?
If you don't understand the pronunciation key, it's a tool
that's going to fail you.
So find the one that fits for you.

Nobody buys a car without test driving it first. Test drive a
couple dictionaries.
Because hearing for me that the New Oxford American
Dictionary is the best dictionary for you is
ridiculous unless you test drive it first. Now, if you do
test drive and you love it, I will be really,
really, really happy.
But if you buy it on my say-so and you hate it and you never
touch it, that's worse than never buying it
at all, in my opinion.

Once you drive that dictionary off the lot, the next thing I
wish people knew about dictionaries is you have to
read the manual.
People don't believe or understand or think that
dictionaries come with a manual.
But every one does.
And it's called the front matter.
And it is a universally-held and absolutely true belief
that nobody reads the front matter.
I could put obscenities in the front matter in 18-point type.
It would be 15 years before somebody wrote me a letter.
I've been sorely tempted.
I know all of you are going to go flip through it now.
Did she?
Did she?
But people don't read the front matter, because
everybody believes I know how to use the dictionary.
I don't know.
Maybe you have the same problem with tools that you
build that nobody ever reads the user manual, nobody ever
reads the user guide.
And it's true.

If something's just barely above the baseline of
well-designed, you have like, I know how to use that.
And then, when things go wrong, they don't blame
themselves for not reading the user guide.
Luckily, dictionaries hardly ever burst into flame.
So the downside to not reading the user guide is lower than,
for say, some pieces of electronic equipment.
But there's a lot of complicated stuff going in the
dictionary.
There's a lot of information that is being compressed.
It's so compressed.
Someday I worry that the dictionary will actually reach
critical mass and will implode.
That the data will be so tight and put in such a small space
that we'll have a black hole.
Luckily, all the printing is done very far away
from where I live.
So if it does black hole, I won't be
within the event horizon.
So there's a lot of stuff in the dictionary that is not
really as transparent as it looks.
For instance, some of the other dictionaries, not the
Oxford American, they arrange their definitions by the
oldest first. Eight people out of 10 believe that the most
important definitions is first. So if you look up the
word mystery in this dictionary, the first
definition is not a puzzle.
The first definition is the sorrowful
mystery of the rosary.
And that freaks people out.
Are people really talking about the rosary that much for
it to be the most common definition?
No.
And if they'd read the front matter, they would understand.
The front matter tells you what we believe the purpose of
the dictionary is.
It tells you what assumptions we've made.
It tells you the method that we've used
to put entries together.
It tells you our intent.
For instance, when we label certain things as dialect or
humorous or poetic, we explain what all that means.
So I still believe nobody's going to
read the front matter.
But if you have 10 spare minutes, just flip through it.
We've included really good essays in the Oxford American
Dictionary's front matter.
There's one by an etymologist. There's one by a phenologist
that explains how American voices are.
The etymologist explains what we can and can't know about
the history of words.
Nobody ever touches those, which is very sad-making.
The next thing I wish people knew about the dictionary is
that inclusion in the dictionary is not the legion
of word honor.
I am not the curator of the word museum.

The dictionary is not the word's social register.
I don't give out word vouchers for [? AllMax ?]
for the dictionary.
Words are not Pinocchio.
They are not trying to be a real boy.
The reason that a word in the dictionary is because it's a
good tool for reading and writing.
It's useful.
We have evidence that it exists, except for the ones we
put in as copyright traps.
But the idea of being in the dictionary is not something
that words aspire to.
What we're trying to do is make a collection of words
that we think readers and writers and speakers need to
know about.
We're making a toolbox.
And if you have the toolbox full of really beautiful,
hand-made, tiny, left-handed monkey wrenches, that's not a
useful toolbox.
So a lot of people will lobby for words that are just
beautiful, that are just gorgeous.
There's a professor in Michigan who wants the word
presticogitation to be in the dictionary.
And presticogitation is fancy fast sleight of thought.
You're thinking so fast that you've confused everybody in
the room, like you're a magician.
It's a great word.
It is useful to I don't know who.
So until a lot of people show us that they're taking it up
and that they're using it, it's not going to make it in
the dictionary.
And a lot of people feel like if a word's not in the
dictionary, is it really a word at all?
And I always say, think about dogs.
You can have the best pet ever.
He can fetch your slippers.
He can lie at the foot of your bed at night.
He can play with your children, and be like a world
champion Frisbee dog.
Would you say that dog wasn't a dog if it wasn't pedigree?
If it wasn't registered by the AKC, does that make
it less of a dog?
No.
A dog is as a dog acts.
And a word is as a word acts.
And being in the dictionary is an accidental thing.
It's so very close to random.
We found it.
First, lexicographers have to find the word.
Then, we have to figure out how people use the word.
And then, we have to see if it fits.
So there's a whole rigamarole that words have to go in to be
in the dictionary.
And there are lots and lots of very useful real words that
just haven't made it in yet.
And so, don't use the dictionary as a rule of thumb
as to whether or not you should use a word.
As to how you should spell it?
Yes.
As to whether or not you should use it?
No.
Because if you love a word, and you use it, and you get
other people to use it, then eventually it will go in the
dictionary.
And you can affect that.
So just because something happens to be included in its
correct alphabetical order in a 2,000-page book.
It's not a significant achievement for that
particular word.
And the fourth thing I wish people knew about dictionaries
is that lexicographers don't love all the
words that we put in.
And we put in a lot of words that are,
quite frankly, horrible.
Sometimes they sound terrible, like phlegm.
Sometimes they're words that just grate on the nerves of
people who have a highly-developed language
sense, like irregardless.
But we put them in.
Because people use them.
And there's important information that you have to
know about these words.
And people say, well, why can't you leave these
horrible words out?
Just ignore them.
But ignoring bad things and hoping they go away is a
strategy of failure.

Our theory, our philosophy is it is worse to ignore than it
is too warn.
And a great example of this is the words flammable and
inflammable.
So a while back, they realized that when they labeled things
inflammable, people tended to throw lit cigarette butts in
them, with the consequences that comes from that.
And so, I know that there are people who were just really
upset at the change from inflammable to flammable.
But is it really better to hold on to that language
distinction?
Or should you let people immolate themselves?
So when we find words that we feel are
problematic in this way--
They offend large groups people when they are used.
They offend usage mavens.
They offend eighth grade English teachers--
we make notes.
Because this is important information about the word.
It's just as important as the pronunciation.
It's just as important as the spelling.
It's just as important as the meaning and the etymology and
the example sentences.
Unfortunately, we can't print them in quite as big a type as
we like, or quite as direct as we like.
The note for inflammable does not say, don't throw
cigarettes in this, idiot.
It's not quite that blunt.
But we do warn.
And we do that through usage notes and we do
that through labels.
Because it's better to tell, to share information, than to
just put it aside and hope it goes away.
Another thing, the fifth thing I wish people knew about
dictionaries, specifically dictionary entries, is you
should treat every dictionary entry like it's the movie The
Usual Suspects.
You gotta stay all the way to the end to figure out what's
going to happen.
A lot of people don't like to use a dictionary.
So they look at an entry, and the first thing that's even on
the same continent as the meaning they're trying to
find, they grab.
They slam the book shut, and they run away.
And a lot of times, the meaning that they're really
looking for, the understanding that they're really seeking,
is three or four lines down the page.
And most dictionary entries are pretty short.
Of course, set runs to several columns.
But you really don't have to fight your way
through all that much.
Reading even the biggest dictionary entry should take
you less than half an hour.

Everybody in this room, I feel confident making that
projection.
Now, I talked to you before about the dictionary that's
arranged in historical order and some dictionaries arranged
in rough frequency.
New Oxford American Dictionary has an arrangement that is
unlike any other American dictionary.
And we call it core sense subsets.
So what this means is that we are looking for what the most
psychologically salient meaning of a word is.
It may not be the most frequent.
It may not be the oldest. But this is what is in your direct
field of vision when you're thinking about the word.
And then, from that core sense, we show the subsenses
that are in your peripheral vision.
And the way to think about this is we'll go back to dogs.
Everybody in the room, think of a dog.
Did anybody think of a St. Bernard?
Did anybody think of a Teacup Chihuahua?
No.
You thought of a canonical dog.
That doesn't mean St. Bernards and Teacup
Chihuahas are less dog-y.
It just means they're further off the dog continuum than
whatever dog you thought about.
Probably Labrador, Golden Retriever, medium-sized dog,
Jack Russell Terrier.
You thought if a very doggy dog.
So with the course sense, you're thinking about the core
of the word.
And the subsenses are related.
They're metaphorical extensions.
They're newer senses of something that's a core sense.
When you're thinking about a clock, a lot of people think
of an analog clock.
But we also have the digital clock sense connected to that.
Now, if it were by historical order, analog clock would be
way up towards the top.
Digital clock would be way towards the bottom.
How is that helpful?
You want to group like with like.
So one of the examples I use for this is the word belt.
The core sense is the thing that holds up your pants.
But then, you have subsenses that are
things like a fan belt.
And a black belt in karate.
And to hit somebody with a belt.
So those are all substances related to the core sense of
the word belt.
And sometimes we get into trouble like, how we divide up
the court senses.
For instance, asteroid belt is a separate core sense of belt.
Because, quite frankly, there are no pants in space.
So we had to separate that out a little bit.
So that's how we arrange the entries.
And when you're looking at a core sense, subsense entry,
you can play hot and cold.
You look at the core sense, and you're like, is this close
to what I want to know?
And if it isn't, skip all the core senses, because we've
already arranged them semantically for you.
Then you can go on to the next core sense.
But I really advise you to read to the end.
Because especially if it's a word that was unusual enough
just to drive you to the dictionary, it's much more
likely to have a twist at the end than you expect.
Another thing I wish people knew about dictionaries that
is hopeless.
Nobody's ever going to do this.
But you know how the fire department every year when you
put your clocks back tells you to replace the batteries in
your smoke detector?
I kind of wish that there was a day like that every year for
dictionaries.
Where it was, how old is your dictionary?
Is it time to replace your dictionary?
Of course, we couldn't have the great graphics of like kid
cute kids in their pyjamas outside on the street while
their house is burning down to motivate people to change
their batteries.
But I do wish that people knew that dictionaries have an
expiration date.
Now, obviously, an online dictionary, like the OED--
oh, and I totally, totally recommend the online OED over
the print OED.
The print OED is snazzy and is a much better indicator of
class and status.
But it's huge.
And it hasn't been updated since 1989.
And the online dictionary, you can do all
sorts of cool searches.
Like, you can look for every word that the first cited
author is Samuel Taylor Coleridge, that came from
Greek, that has a Q in it, and see if that gets you any hits.
You could search it nine ways from Sunday.
And it's updated quarterly.
So maybe we should start selling lapel pins that say "I
have access to the online OED" so that people who want to buy
one to decorate their house, that kind of thing, would also
have that satisfied.
But print dictionaries have an expiration date.
And there are all sorts of things that change that you
don't notice that they change.
A lot of retronyms, which is the digital analog clock, an
electric guitar and acoustic guitar.
When we invent new things, they may not get new names,
but we have to differentiate them in some way from how they
are different from what went before.
Now, the dictionary expiration date is not like milk.
You don't have like, two years.

Probably a good example is that if you got one for your
high school graduation or when you went to college, and you
go to a reunion and everybody's bald, it's time
for a new dictionary.
But you don't have to throw away the old one.
A lot of people say, oh, if I get a new dictionary, I'll
have to throw away the old one where I made all these notes.
And I'm like, how small of a house do you live in?
One does not replace the other.
You get to keep it.
But I wish that people would update their
dictionaries more.
And not from a purely selfish reason, I would like to sell
more dictionaries.
But because also, it's a better tool.

I have a cordless drill.
I also have a drill with a really, really
long cord on it.
I keep them both.
But when I got that cordless drill, that was
a much better tool.
It was really an improvement.
And I think people will sometimes have that same
experience, especially if they, quite possibly, happen
to be going from another company's dictionary to mine.

The next thing, the seventh thing that I wish people
understood about dictionary.
I call this the facts are good thing that people should know
about dictionaries.
Dictionaries should be based on observation.
There's a long tradition in dictionary making of
introspection.
And the ideal is that a lexicographer can just sit
there, in a tastefully-appointed room and
inspect the library of their own mind and say, this is how
this word is used.
This is how this word means.
That is bunk.
I spend more time thinking about words even more than I
spend thinking about dresses, which is an awful lot of time.
I spend so much time thinking about words.
And the more time I spend thinking about words, the less
I know that I know.
I have this theory that omniscience is tied to
physical mobility.
So the more mobile you are, the less omniscient you are.
And the more omniscient you are, the more likely you are
to be a head in a jar.
And I am so far away from head in a jar.
I don't know everything, not even with Google.
And so me basing the dictionary on what I know from
my own thinking, and from trying to puzzle out what
people do is stupid.
You don't want a faith-based dictionary.
You want a dictionary to be evidence-based, like
evidence-based medicine.
We look and we see how do people really use this word?
And how we're doing that is we're building a corpus, which
is a big, big database of running text.
And I often get people say, well, why are you doing that?
Why don't you just use Google?
'Cause we're trying to build a corpus of
about a billion words.
And people say, even if you take all the porn out of
Google, you're still going to have more
than a billion words.
But we're also looking for a balanced text.
And webpages tend to be made by people who
really like the internet.
And it's not really indicative of all the different speakers
of English.
So I'm looking for journal articles, magazine articles,
newspapers, fanfic--
That I find a lot of online.

We have a database of direct mail solicitation letters.
Because that's a real use of English.
I'm looking for texts of comic books, novels, nonfiction.
Anything that's printed.
Anything that's on line.
All sorts of stuff.
Because I'm trying to get a good cross section picture.
It's like an ice core, basically.
I'm drilling down through English to figure out what all
the different layers are, like rings on a tree.
And by looking at that, we get a fairly decent picture of
what English is.
And it's still not as balanced as we'd like, because we don't
have as much spoken language.
And that's really hard.
People don't like it when you follow them
around with a tape recorder.
But we're working hard on it.
And that's what we're using as our evidence.
And of course, we still have citation cards.
Dictionaries used to be almost completely
based on citation cards.
And the difference between a citation card where somebody
finds a word and an interesting use, they type it
up on a three by five card with the head word and the
sentence, and then they send it to us.
I always explain it as people used to study
butterflies this way.
They'd go somewhere exotic.
They'd find a butterfly.
They'd kill it.
They'd pin it to a card.
They'd send it back to the museum.
And somebody at the museum would study it there.
And contrast that with studying the
butterfly out in the field.
There's so much more information to be gathered
from seeing the butterfly in its natural habitat than
finding the one that stinks of formaldehyde on card.
And so, the usages that trigger people to send us
citation cards, those are still valuable to us.
But we don't get the full picture until we go out and
find that same butterfly in running text, until we find it
in context.
And that's really what the corpus is about is sending us
off to the word Amazon to investigate the word
butterflies.

I wanted to give people an idea of what a
billion words mean.
So I've invented a new unit of measurement.
And I call it the ulysses.

Ulysses has about 260,000 words.
So if you think of a billion-word corpus--
and please correct me if my math is wrong--
that's like, 384 Ulysses.
So that's nothing, right?
If you think of that.
We could be building more and more and more than that.
But just put it in that context.
A billion words sound like a lot.
But I'd like to get even more.
The eighth thing that I wish people knew about dictionaries
is that they're like icebergs.
90% of a dictionary is below the surface.
The 10% that shows up in print--

for every character that shows up in print, there are eight
or nine characters of tagging.
The XML tags that let us sort the dictionary, the ontology,
the taxonomy.
And so, people think, well, you've just done 2,000 pages.
That's not very much text.
But all the tagging that supports this and the tagging
that we keep augmenting, a dictionary can
never be tagged enough.
I don't think I'm going to be happy until there are a dozen
characters for each character that prints.
Till there's two dozen characters for each character
that prints.
Because there's a lot of information that can't really
be disseminated in that linear way that the dictionary uses
that we need to know.
When we're doing semantic sets, when we're doing that
kind of tagging, it would be nice if you could hit a button
on your dictionary and it switched
itself into a thesaurus.
We can't do that yet.
But maybe some day we can.
And I'm kind of [? taggic ?]
for the future.
I'm trying to lay a foundation in the data that when the
physical format of the dictionary reaches the
singularity, that point where I can't predict what a
dictionary's going to look like any more, I can at least
hope that I will have done enough groundwork so that some
poor lexicographer in 2030 doesn't have to go back and
retag everything.
The ninth thing that I wish everybody knew about
dictionaries is only kind of tangentially arranged about
dictionaries.
And I would like to call this beware the good etymology.

The better the story is behind a particular word's origin,
the more likely it is to be completely made up.
Because most etymologies are boring.
They say Latin.
That's what we know.

The better the story, the less likely it is to be true.
Think about how many people are in your extended family.
How many of them are eccentrics?
right?
It's not everybody.
Not everybody can come from a family that everybody's weird.
There's got to be at least one mundane person.
Extrapolate that to etymologies.
And there's a long tradition of folk etymology.
I'm particularly fond of the group that we call CANOE,
which is the Committee Assign Nautical Origins to
Everything.

Find a word you don't know?
It's from boats.
They'll invent nautical technology just to give a word
a nautical origin.
I'd love to go to one of their meeting one day.
Or everything's got to be an acronym.
I'm sorry, posh is not an acronym.
The f-word is not an acronym.
We call them backcronyms. People see a word and they're
like, how can I torture a sentence to make it even be
vaguely the origin of this particular word?
We don't even see backcronyms start until the '50s.
So any word that comes from Anglo-Saxon, they were not
wandering around, fighting the Normans, trying to come up
with acronyms. It just didn't happen.
And the last thing I wish that people knew about dictionaries
is the right way and the wrong way to create a word.
It's much, much easier to tell you what the wrong way is.
The wrong way to get a word in the dictionary is to send me
lots of email about it.
It's even wronger if you'd like to copy the
president of Oxford.

It's almost the perfect storm of wrong if you say, I
understand you pay $1,000 per definition.

Unfortunately, this is absolutely true.
This happens quite a bit.
We don't pay $1,000.
That's not the part that's true.
The part that's true is people think that this is the right
way to go about it.
The right way is to find a real need, find a lexical gap,
find some word that doesn't exist. Some kind of
traditional examples of lexical gap is we have a word
for people that have lost their parents.
They're orphans.
But we don't have a word for parents who have suffered the
unbelievable trauma of losing a child.
I don't know why this is in English.
We don't have the parallel word to orphan.
Now, is this perhaps we just don't talk about it as much?
Who knows why there's lexical gaps in English.
But you want to find a gap.
And you fill it.
Or a really interesting new metaphorical way
to talk about something.
New metaphor is like, you know those faults on the bottom of
the ocean that keep bubbling up forever.
And they spit chemicals into the water, and how they're
like teeming with life?
Metaphor is that for the English language.
When you think about an oblique way, if you're saying,
this is actually this, that creates new words.
Don't be too funny.
Funny words hardly ever really make it.
And you're much more likely to be successful with something
that's funny peculiar than funny haha.
Even if you think about the funniest new coinage of like
the last 50 years, that word is couch potato.
Nobody laughs out loud at couch potato really.
It was kind of a cute joke.
There's an analogy with boob tuber, if you
think a tuber's a potato.
Any joke you have to explain at that length--
Yeah.
So the less funny you are, the more successful you'll be.
So what you want to do is if you've created a word and you
think it meets all these qualifications--
it's not funny, it's pretty useful--
send it to all your journalist friends and see if they bite.
Because getting a lexicographer to notice it is
by getting it into print.
And if you are going to email me about it, send me 15
citations that aren't all under your byline.

Show me that real people are using it
out in the real world.
And then, you wait.
Wait a long time.
Because, quite frankly, we're backlogged.
And also, I want to see if it's got staying power.
I want to see if it's going to hang around for a while.
i'm not going to put in an alternate spelling of hot with
multiple T's, even though I have plenty of
evidence for it.
Because I don't think it's going to be around
in a year or so.
Even thought when I was downstairs looking at where
all the Google searches were, Paris Hilton came up 10 times
in four minutes.
But not hott.
It didn't say Paris Hilton hott.

Now, this is kind of a bonus thing that I wish that you
guys knew about the dictionaries specifically is
that dictionary making is not closed guild.
We are quite embarrassingly grateful for any help.
If you're interested in something, if you think
something would be a great problem.
Right now, we're thinking about--
We get a lot of Google alerts because we have found phrases
the journalists use to set off new words like, scientists
call this x.
So we search on the phrase scientist call it in Google
News, and then they send us an alert.
But what we're trying to do, and which we're completely
hopeless at because we don't have the skills at all, is to
set some kind of Bayesian filter where the ones that are
really productive bubble up to the top, and we
don't have to read--
Grant Barrett who's the project editor for the
Historical Dictionary of American
Slang set up this program.
I call it a program only in the very loosest
sense of the word.
And since he started working at Oxford about two years ago,
he got 180,000 Google alerts.
And so it gets a lot every week.
And we're trying to figure out which are productive.
And we'll go and we'll look through them.
And some of them drop off the list. And
some of them get added.
But we could do this better.
We often ask people who are experts in a particular field
who wants to be our falconry expert?
There aren't as many falconers as there used to be you.
So if you know people with unusual hobbies who would be
interested in looking at a particular set of words and
get their name in the front matter that nobody reads, this
is the kind of thing that we're doing.
So really, we would be embarrassingly, desperately
shake your hand so hard it falls off--
That's our reaction to people who say, how could I help?
What kind of cool things could I do?
Did you ever think of x?
Because, quite frankly, there's not untold wealth in
lexicography.
And it really only tends to attract people that are
interested in the problem.
And we're chronically underfunded and understaffed.
But there's always room for another person who's
interested in the problem.
So that's pretty much the top 10 things, plus the bonus,
that I wish people knew about dictionaries.
And Pamela also said that I could maybe talk for a couple
minutes about how lexicographers use Google.
And I was thinking about that.
And I don't know if we use it that much differently than
anybody else.
I did notice that there were a lot of single-word Google
searches downstairs.
I can't remember the last time I put in just a one-word
search term.
We do use Google Book Search a lot.
I love Google Book Search.
We use it a lot when we're looking for an example
sentence, especially for a word.
We're trying to find a real sentence that somebody who was
not a lexicographer thought up.
Because we do try and take all of our example sentences us
from the corpus, from real English.
But oftentimes, those sentences have identifying
details that we need to strip out.
We have to generify the sentence.
And so, if we've generified a sentence to the point where it
is made entirely of tapioca and conveys no meaning,
sometimes we'll check that at Google Book Search.
We look for words all the time there.
Mostly, we use Google the way a junkie uses heroin.

I really consider a day where I don't touch
Google a wasted day.
I'm working on an hour.
I think the longest non-sleeping amount of time
that I haven't touched Google was--
non-sleeping, non-trapped-in-a-meeting-room
time 'cause Oxford's not quite as wired as this building--
is maybe like 45 to 50 minutes.
But we have to be very careful, of course.
We know that Google is a map.
It's not the terrain.
So what we find in Google, we always check someplace else in
our corpus and other databases.
And we are also very conscious of the fact that when all you
have is a hammer, everything looks like a nail.
So we try not to get into that mindset about it either.
I was at this design conference, and I was talking
to Jimmy Wales, founder of Wikipedia.
I have no idea why I was at this conference.
Everybody there was so much cooler than me.
And I was talking to Jimmy Wales.
And we are talking about what would you
give up to keep Google?
TiVo.
I'd rather have Google than TiVo.
Everybody I asked said, yes, I would give up the internal
combustion engine.

We were getting to the point where we were like, OK, your
non-dominant hand.

So really the way we feel about Google is kind of like,
you know when you're growing up, and you had a friend who
had a really obnoxious little brother or sister that
followed you around everywhere and was like, look at me.
I can do a handstand.
That's kind of the way we feel about Google.
We're like, wow.
What are they going to do next?
What's so cool?
How can we warp this for dictionary use?

We know that you guys do a lot of stuff
that you don't announce.
So we're always trying like regular expression searches to
see if they magically start working.

Really, how lexicographers use Google is if you think about a
non-word obsessed user and you just
add in the word obsession.
No.
We surf for MP3's.
We do that kind of stuff too.
But really, what we're trying to find is using Google as a
lens to focus our attention on the English language.
And if Google went away, if I could no longer use any of the
electronic, the technical, the computorial tools that I use
now, if we had the big electromagnetic pulse and it
killed all the computers, I would not be a lexicographer.
I would give up dictionary work if those tools went away.
Because it would be like traumatic brain injury.
I would not know what to do.
I would have to relearn everything from a completely
different perspective.
Although, I have to say, that the one high point, I think,
of traumatic brain injury is that you read all the books
you loved again for the first time.
I mean, after you learn how to read again, obviously.
But if I couldn't have these tools, I would stop working on
dictionaries.
I think I would be a mailman in Portland, Oregon.
That's pretty much the number two choice.

I guess what I'm trying to convey is a really deep and
abiding love.
I want to thank all you guys for just doing what you do.
I can't believe that there could be anything more fun
other than my job.
So Pamela wanted me to leave a little
bit of time for questions.
I'd be happy to entertain any of them.
But I would only like to find questions that
can't be Google smacked.
So if you ask me something that you could have Googled, I
mean really.

Uh-huh.
AUDIENCE: How do you feel about Dictionary.com?
ERIN MCKEAN: Well.
AUDIENCE: Can you repeat the question?
ERIN MCKEAN: How do I feel about Dictionary.com?

I really don't think that there's such a thing as a bad
dictionary.

I'm not going to say that Dictionary.com is bad.
But there are two things I would like to
point out to you.
Dictionary.com is not about delivering dictionary data.
Dictionary.com is about delivering advertising and
about delivering eyeballs to advertising.
Now, this is a good goal.
The dictionary that they are using, for the most part, is
out of copyright, which means it's 1918.
Now, I said that the dictionary expiration date was
not like that of milk.
But considering there are time capsules that have been put in
an opened since 1918, that's just something I wish people
would keep in mind.
But they're really nice guys.
I like them, personally.
Uh-huh.
AUDIENCE: How does a proper name get into the dictionary?
ERIN MCKEAN: How does a proper name get into the dictionary?
There are kind of like two paths to getting into the
dictionary.
American dictionaries put in a lot more proper names than
other dictionaries around the world.
Because Americans really expect a lot of encyclopedic
information from the dictionary.
I once called the people who do the Q ratings, which
measure how famous people are, and said, hey, wouldn't you
like to give me all your data so I can use that to decide
who goes in the dictionary?
And they said, no.
Click.
But what we really do is when we're thinking about the
biographical entries, we look at people who have historical
importance.
If you were president, you're in.
If you were vice president, you're in.
If you were first lady, kind of.
You had to be a fairly famous first lady.

For pictures, we put in pictures that are free.
So Secretaries-General of the United Nations get a lot of
pictures 'cause they're out of copyright.
And also astronauts.
Man, NASA?
They are so good with the pictures.
'Cause have no money.
And we want to spend all the money on words.
The pictures are kind of an afterthought unless they're
like things that can't be described in words, like
various gears and so forth.
We try and put in people that have achieved a certain level
of notoriety as well.
People might want to look them up to find out details of
their lives.
One of the most nerve-wracking moments I ever had was
watching the O.J. chase.
I had just finished work on a children's dictionary, and I
was like, did we put O.J. in?
Please let it be no.
'Cause children's dictionaries are a place where you don't
really want notorious people.
And I was in an airport.
And it was going to be 22 hours before I could get back
and check the book.
And there was nobody that I could call.
My husband was like, just stop thinking
about it for 30 seconds.
Think about the poor people that lost their lives.
Stop thinking about was he in the dictionary.
We would like to do a fuller biographical dictionary.
But we haven't really seen a demand for a
print product on that.
And, of course, Oxford has the American National Biography
and the Dictionary of National Biography, which covers just
about everybody important whoever died.
They have to be dead.
Because those don't change.

Anybody else?
Yes.
AUDIENCE: What point do you release a new edition of the
dictionary?
What goes into that decision?
ERIN MCKEAN: Oh.
Well, for the print version, we wait till we
run out in the warehouse.
So it's actually not a very real--
What point do we release a new edition of the dictionary?
I forgot to mention that the New Oxford American Dictionary
and the Oxford American Writer's Thesaurus are the
dictionaries in Apple OS 10.
So we have an arrangement with them where we're trying to
send them updates on a regular basis.
But there's like one guy in a bunker in Cupertino who
decides when it gets updated.
And I'm not quite sure when he makes that decision.
Or she.
I don't know.
My email goes through like three levels of email washing
before it ever gets to anybody.
But Apple is really fun to work with.
So we're trying to update on a quarterly basis.
And once we launch our online version, that will get updated
on a quarterly basis.
Not just because we like to set words free
as soon as we can.
But also because it gives us more chances to make people
think about the dictionary.
If you're talking about new words every three months, some
of those will stick to people.
And some of them will say, oh, maybe I should
look something up.
But other than waiting till we've run out in the
warehouse, there's nothing quantitative.
We don't wait until we get 1,000 new words and then, hey,
update time.

Yeah?
Pam.
Do we have to stop?
AUDIENCE: No.
I have a question.
ERIN MCKEAN: Oh, OK.
AUDIENCE: Do you ever retire words from the dictionary?
And, if so, how do you make that decision?
ERIN MCKEAN: Oh yeah.
We do retire words from the dictionary.
I haven't retired very many yet, because the new Oxford
American Dictionary is only in its second edition, which, in
dictionary terms, it's a toddler.
It's the newest American dictionary.
It's even newer than Encarta, believe it or not.

It's kind of like when you move into a new house.
We haven't used up all the storage yet.
We haven't had to rent the mini-locker for the words that
we can't fit in anymore.
But really it's the same decision about putting a word
into the dictionary in reverse.
Is it useful anymore?
And some words sometimes get useful to smaller and smaller
groups of people.
And depending how vocal that constituency is, it may stick
around a little bit past the point.
There's a lot of inertia involved.
Unfortunately, we like words so much that we want to hang
on to as many as possible.
So occasionally, we're make compromises in terms of type
size and letting and [UNINTELLIGIBLE]
to squeeze more people in. it's like, oh yeah, you'll
just sit on their lap, trying to fit more people into a car.
And occasionally, it gets just as dangerous.
So some words will die in the next edition.
But they'll not really die.
They'll just be put in dictionary limbo.
But if you kept your old one, you'll have it forever.

Yeah?
AUDIENCE: I recently looked up the word your, Y-O-U-R,
because I was curious how you could describe what it means.
And the definition included the word yourself--
ERIN MCKEAN: Yes.
AUDIENCE: --which includes the word your.
How do you decide which definitions can actually
include the word they're defining? 'Cause it's just
sort of obvious.
ERIN MCKEAN: I'm sorry.
I was asked, how do you decide what definitions can include
forms of the word being defined?
So I don't think you looked it up in my dictionary.
Because we don't have yourself in any of the definitions.
We define it as "a possessive adjective"-- oh, you can put
the whole dictionary in your Trio or your BlackBerry.
The big one, the 2,000 page one.
There's a disc in the back of the book to load it on.
I would rather give up the phone part of my phone than
the dictionary part of my phone, which means I should
just get a PalmPilot.

So we define it as "a possessive adjective belonging
to or associated with the person or people that the
speaker is addressing." And the example "What is your
name?" But the circular definition
problem is very real.
And I have to say that in function words and pronouns
and propositions, we are much more likely to be circular,
just because these words have such narrow functional
meanings that's hard--
There are not a lot of synonyms for on.

What are you going to do?
However, we really want to avoid the circular situation
where you look up bog and it says, a fen.
And you look up fen and it says, a bog.
That we think is completely horrible.
But we tried not to go too far down this road
philosophically.
Because if we start thinking about who's actually looking
this up, then we get really depressed.
Because we have two motives.
We want to be a useful tool for people who are looking up
the meanings of words.
And we also want to be a semi-complete record of the
English language.

A lot of nonnative speakers do use dictionaries and look up
works like this.
I do think that nonnative speakers should be using
learner's dictionaries at least for a while until like
seven out of ten of the words you look up aren't in the
learner's dictionary.
That's the point where you move to using a monolingual
dictionary as well.
But the whole idea of not defining a word with a word
that's associated with it is sometimes a false stricture in
that, as long as the definition of that secondary
word is so common that we feel like people would really
understand it better, we don't want to unnecessarily
complicate the definition.
And so, this is like an Iron Chef constraint.
They give you some weird ingredient and they're like,
OK, make ice cream out of this.
If you constrain yourself too hard from not doing anything
that's even related to the word in question, you
sometimes make a definition that's unintelligible.
So it's really a judgment call.
I did not answer that at all.
I'm sorry.

You had a question?
AUDIENCE: Do you ever come across an example of usage of
a word in which you say, oh, the writer of this word just
used this word incorrectly?
ERIN MCKEAN: Yes.

Because it's often very clear from the context.

Word have natural environments.
And sometimes you see a word which is a woman in a
ballgown at a 7-11.
And you're like, that's completely out of context.
And you look at it again, and you're like, is there any
reason for her to be here?
Is there any kind of artistic merit?
Is there any kind of story that you can weave around the
woman in the ballgown?
Is she buying cigarettes and beer?
What's going on?
And if there's no narrative, if there's no story--
and especially if the word is very similar to one that would
have been right.
If it was the guy in the ski mask and the gun instead of
the woman in the ballgown, then you'd have a really good
narrative for the 7-11.
But there's the problem with that if enough women in
ballgowns start hanging out at the 7-11, that
makes it their context.
So if enough people use words like say, enormity, which
means horrible, well, enormity, it started to show
up at the 7-11.
And it now means very large as well.
So it got confused with enormous.
Most people who are very careful make this distinction.
But most people aren't very careful.

Things that are not metaphorical extensions,
things that aren't playful use of language, they go from
being wrong all the time, to being wrong some of the time,
to not being wrong.
And this is a process that can take a long, long time.
And everybody in the middle part of the process is
uncomfortable and itchy.
Because I have to say, well, we're just
going to wait and see.
And the people who are really upset about it say, kill it,
kill it, kill it now.
But it's impossible.
You can't kill it.
If enough people get something wrong in this
way, it becomes right.
Or at least becomes another layer on the word.

But yeah.
People make mistakes in language all the time.
It's not perfect.

Oh.
There's one more guy.
Sorry.

AUDIENCE: So I'm curious how your company has decided how
it's going to handle print version versus electronic
versions versus a website?
If you have different target audiences or different
expiration dates that you're trying to build into your
publishing scheme?
ERIN MCKEAN: That's a really good question.
I'm being asked how Oxford is deciding between print and
portable electronic device versions and website versions.
And really, the answer is Oxford's a
500 year old company.
We decide things really slowly.
And also, we have a lot of different dictionary groups
all around the world.
There's Australia.
There's South Africa.
There's the UK group.
There's Canada.
So we have to coordinate with all the different units.
Because especially for online versions, there are no
national boundaries really anymore.
People who go to the Oxford Dictionary's UK website are
often upset that they can't find an
American dictionary there.
It was the first URL they hit.
Why shouldn't all of Oxford everywhere be the same?
So I would like to have a lot of market differentiation.
I would like to have a dictionary to fit every need
and every possible user.
Because I know there are a lot of people out there for whom
the idea of opening a print book that has 2,000 pages in
it is unconscionable.
They're like, why would I do that?
Because it's on the internet, right?
Well, when those people say, it's on the internet, I want
my answer to be yes.
And when people say, I love using the dictionary, but I
travel a lot.
I can't carry that book around.
I haul this thing out, and I say, you're going to have to
carry one of these anyway.
Why not make it to be something you can look
something up on, and, incidentally, play Scrabble?

Well, I won't show you Scrabble.
Everybody knows what Scrabble looks like.

What I'm doing now, though, is I'm designing the data
primarily for electronic use.
And the print will be a subset of the electronic use. 'Cause
there are three constraints on making a dictionary.
Space, time, and money.
So the other thing people say is, oh, if you put it online,
you can include every word.
I'm like, well, the space constraint went
away, more or less.
But I still have the same amount of time and money.
And we don't just have like an underground well full of
definitions that we pump them up from and pour
them into the book.
People have to make those.
So if I don't have any more time and money, the dictionary
online is going to be the same size.
So what I'm trying to do is add things that before we
would pare away, like extra example sentences, semantic
set coding, that wouldn't show up in the print book.
This stuff will show up online.
So right now I'm designing for online use and electronic
version use.
And print is going to be a subset.
It's going to be more and more incidental.
But we'll still keep making them.
Because there's a real good feeling to the print book.
When you look something up online, you have a target and
a objective.
When you look something up in a book, you have a target.
You have an objective.
Looking something up online is like a commando raid.
You drop off out of the helicopter.
You grab the target.
And you get pulled back up.
Looking up something in print book is an overland invasion.
You require a lot of targets between where you start and
where you stop.
And perhaps you're not on a war of acquisition.
But if you have the extra time, you can
get a lot more stuff.
It's more serendipitous.
So I hope that answers your question.
Because I went on an awful long time if it didn't.
Thank you guys so much for being such a wonderful
audience, and for letting me be here.
Thank you.