Learning from StackOverflow.com

Uploaded by GoogleTechTalks on 29.04.2009

>> CHRIS: Hi, everyone. Welcome. I'm Chris, I run the open source group here at Google.
So, when I--when I send off--well, actually when Allen did the hard work of setting up
the room, and setting up the announcement that Joel is coming. I got a lot of email,
I got one was like, "Wow, it is really fine that Joel's coming here on the middle of Perth."
And I thought that was scary and funny at the same time, because I was in the [INDISTINCT]
when I got that. And then, the other one was, "Can we get a copy of StackOverflow for our
internal use?" And I was like, "Well, you can ask him at the thing." Which brings up
questions, we have a [INDISTINCT] set up, if you'd like to use it, you don't feel like
using your own voice and just typing, that's fine and for our friends who are VeeSeeing
from 700 offices worldwide. So, just search for Spolsky, I'm sure you'll find the [INDISTINCT]
page, you're smart people. So that's about it. With no further ado, Joel Spolsky. Thanks.
>> SPOLSKY: Thanks. Thanks for coming. Everybody can--can people hear me? Is this is working?
I'm trying to stand over here because I understand that these scanners are trying to take an
image of my brain, which are going to upload it into a computer and run to do searches
for the best sushi restaurant in Palo Alto for the rest of my life. I'm going to talk
today about StackOverflow. Let me introduce myself a little bit if you don't know the
whole story. I'm actually with Michael Pryor, who's sitting here, the co-founder of Fog
Creek software, which is a software company in--I'm going to walk around a little bit
because when I stand here the camera goes on me. It's a software company in New York
City. We started about eight years ago. It makes fog bugs which is bug tracking and project
management software, used by development teams around the world, to make better software.
The--another project that we launched recently working with [INDISTINCT], who's a blogger
at the website [INDISTINCT].com. It's called StackOverflow and that's would I want to talk
about. StackOverflow is a website where people ask questions and get answers. You may not
be able to see that, so I'll back up just to make sure--yeah, that's [INDISTINCT] getting
the whole screen here. There are a lot of questions and answer sites and there's couple
things I want to talk to you today, this is Google. That have to do with search, because
I actually feel like search engines are kind of failing in a particular realm of expert
questions and answers where something you could ask an expert and the expert would be
able to give you a true and correct answer. And the search engines for various reasons
which still going to are just not finding it. And a lot of the companies organized around
search have tried to make question and answer type of portals and Yahoo is famous for these,
"How is babby formed?" question. Mostly which you find on Yahoo answers appears to be adolescents
asking questions about reproductive sciences. And just to put it politely and most of which
you find--and for various other reasons, the stuff is just not working out that well on
the search engines. But, what I'm really going to do today is tell you a little bit of a
story, and it's a story about anthropology and sociology. And the story is about how,
when you have a group of people and you give them an environment, you don't even have to
have a people, you just create an environment. Those people will come into the environment
and behave according to what you built. In certain very, very subtle ways that you probably
didn't think about. So, these are the Spanish steps in Rome and they're meant to go from
the Spanish Embassy to Bourbon Embassy at the base of the steps up to the Trinita dei
Monti church top. And so they were built to be stairs but they became sort of this living
room for backpackers in the middle of Rome which many of you probably went to in your
gap year. And partially it has to do with the steps being the perfect, comfortable height
to sit on. The steps are very wide so you're not blocking anyone if you're sitting there.
And you have a fantastic view of the Piazza down at the bottom. When you're sitting on
these steps and so they became sort of a living room because of the shape. And this was completely
non-intentional some things--some things are a little bit more intentional where you create
an environment with the goal maybe staring people or letting people wait on line or whatever
that is that you're doing. And some things are even more precisely designed to create
a certain type of behavior, whatever it maybe. And one of the things that we're learning
as we move in to the era--moving from the era of computing to moving into the era of
the internet is that we're no longer worrying about computer-human interaction. Because
that's kind of solved problem usability, I don't want to say solve but usability is something
which is no longer a major impediment and there are a lot of known solutions. What's
not known so well is when you're using a computer and there are other people involved because
it's an email programmer, it's a social programmer, social networking or its web 2.0 now, I'm
not allowed to say that. I take that back. You have to think about human to human interaction
that means you have to be anthropologist and you have think about how humans work? And
you have to think about stuff that ethnographers and culture anthropologists like to think
about all the time. So, that's really what we thought about. The environment in anthropology
is very clear that the environment that you create influences people and how they behave.
And similarly the user interface, you create for your applications will influence how people
behave and you have to think about it as an anthropologist to be able to do a good job.
I'm not talking about how we did that with StackOverflow and why it works and so on.
So when you have an app like Yahoo Answers, let's just--I just want to poke around some
of the other--this is what we did in the early days as StackOverflow, we poked around some
of the competition. The question and answer websites to try to see like what the heck
is going on here in terms of user interface and some of these maybe accidental, ask, answer,
discover. What is it about this website that means if you really do get teenagers asking
questions like, "What eats the ants if I squish?" At least it's not who eats the ants squish--Mahalo
answers, I don't know if you heard of this but this the website from [INDISTINCT] who
has an inornate belief in micro payments for some reason and so the website is all about
earning a dollar, a Mahalo dollar doing this, and spending a Mahalo dollar doing that. And
that creates a certain environment and it creates questions that are all kind of like
scams where you can earn $7 or get a coupon good for $5 off at Supercuts or whatever.
There are much more serious forums, this is the sort of standard, I'm going to call it
the phpBB look almost all web forums are either using software that looks exactly like this
or they're actually using php itself which look exactly like this. With this idea of
topics and what applies and who knows and it all looks very serious. This particular
website which who's name I will not mention but it's got a big hyphen in the middle of
the URL and we'll call it the hyphen website from now on. And it is--I don't want to officially
say the resondetra for StackOverflow, it certainly pushed me over the edge because this is a
website where you ask a question usually typing it into Google about a specific engineering
or [INDISTINCT] topic. And you go--the answer appears to exist there, and when you go there
you don't see the answer, you see a lot of pages saying, "Please sign up, give us money
and you will see the answer" and you say, "Wait a minute, that's not right they must
be showing a different page to Google," and they're showing it to regular people and there's
a secret which I won't let you in on because I want them to burn in hell but that particular
site looks like--what it looks like because--I mean they create a site. What does this look
like? This looks like one of those corporate enterprise start-ups, right? Like we won't
even tell you how much it costs, just go to the bank and get all your money bring it to
us, give it to us and then go back to the bank so you can get a loan. Amazon sort of
kept reading all the articles and Newsweek magazine about how Google was just a big search
box, so they made a big search box where you can ask a question right at the top and I
don't know if that really make sense, but people ask a whole lot of questions in here
and they don't answer. A heck of a lot of questions and that's just the part of like
the design decision to emphasize asking rather than answering. So we built StackOverflow
and it looks kind of bizarre and there's all kinds of little bits and pieces all over the
place which makes it [INDISTINCT] to programmers. But I want to go over, here's a little bit
of the story concept was about the beginning of 2008 we started coding with almost exactly
a year ago we shipped, we launched in September 2008 we had a like a four-week [INDISTINCT]
or something and we've been around for about six months and we're currently running six
million business a lot. And I'll go into more details on that. [INDISTINCT] there are certain
reasons that we thought search engines were failing with various queries, this is--this
is not actually particularly bad example of it but knowing that LI is HTML tag is something
that search engines don't necessarily know. I'll go more into it--this is my list of problems
with search that StackOverflow is trying to solve, and we're trying to solve all these
problems at the same time. Number one, the sign-up scams, so that's website with hyphen
where they're trying to--they tell you that you have to sign up and pay, if you want to
see the answer and actually if you don't. There's just these little road bumps like
register, this is from SQL Service Central but a lot of them have this, it's like, "Hey
we got the answer for you just one--wait--just please register." And that's just a little
bit frustrating and that actually reduces participation, I think dramatically. Now,
in here--now we're going to get into some serious things here, search engines get you
a lot of wrong answers when you search for highly technical questions and that's the
biggest problem, so here are three popular categories, it's the security hole a lot of
times we find a piece of sample code and somebody had said here's the answer to your problem
and it's something like turn off your firewall or more specifically, you know, just the typical
like XSS vulnerabilities, SQL injection vulnerabilities that you see in like so much sample code and
when you see that result. You do a search, you see--you get a result you say, "Whoa,
there's an XSS vulnerability there, I got to tell this guy." There's no way to do it,
there's no way to change it, there's no way to fix it on a typical form software because
you're looking at a discussion that took place four years ago. And then there's the "Hard"
problems, this is a--I don't know how popular it is but a Google interview question implement
Rand(7) in terms of Rand(5) and I won't go into the details here but there's a lot of
wrong answers to this question which is awesome because I can use this as an interview question
for college students and even if they've researched this question they will come in with a wrong
answer because you have to have a certain level of smarts to be able to distinguish
between all those people asking their question and providing wrong answers and people providing
right answers incidentally StackOverflow has the right answer. Multiple answers is much
more common where you get to a discussion form and somebody says, "Have you tried this?"
"Have you tried that?" "Yo, I think there's an article about that. Here's a [INDISTINCT]
based article et cetera, et cetera, et cetera. You have no choice but to try every answer
and see if it works usually in the dark until you accidentally kick upon something that
works so there's, there's, there's no page rank among answers to highly technical questions.
A lot of times you get obsolete results because there's no page rank. When you have a very,
very narrow technical question about some very, very obscure idea that's not working
in some very obscure way on some obscure platform that you're programming for, there's only
a hundred people that are going to ever going to look at that question and therefore nobody
is ever going to link to the question and certainly you can't use page rank to link
to the appropriate answer. Nobody's going to write a blog post about how the best answer
to this obscure question in such and such. So the authority of all these sites and page
rank is [INDISTINCT] unless you know may be the site has some rank but it's just discussion
forum. So it's very, very hard for a search engine to see what the right answers are.
It doesn't have any way of using it's normal methods and in particular you get certain
common problems, obsolete results is a really, really popular one is that Google for example
knowing nothing else about two pages will take the older one and give it a little there--well,
it's been on the internet for longer and give it a little bit of a push. And that's actually
turns out to be fundamentally wrong with a lot of technology so a very, very common problem
we find is, how do I do such and such on my iPhone and then you get to a lot of [INDISTINCT]
no, it's just not possible. There is absolutely no way to write an apple application for the
iPhone unless you make a webpage and access it by a Safari and that was true for a year
and it's not true anymore but there's a lot of pages that still say that. Similarly if
you search right now for interior photography on Google, your number one result is on awesome
article about interior photography written by Phil Greenspun before anybody had digital
cameras. So here's my four key reasons why search engines are struggling with these highly
technical, "I've got a programming question." Problems number one and two where they have
two few views so there's no authority--that's really number two. There are too many ways
to phrase the same problem, a lot of times you're struggling to find the right words
that will happen to coincide of the words that the person describing your problem happened
to use. And that means that your search space basically is kind of diffuses among all the
different possible synonyms for particular--for a particular problem and then there's a preference
for old links. So here's nine things that we did in StackOverflow to try to build the
site that got around all these problems and these are really these are the nine building
blocks of the social engineering that we did to try to create a site that was anthropologically
correct and it would cause people to behave in a way that it would work and it would basically
solve all these problems. I'll go through them one at a time because there's the key
part voting. This is the easiest thing--I mean these things are really, really easy.
Every single one of these ideas are copied from somewhere else. This one is copied from
a red head by a [INDISTINCT]. Voting is the idea that you vote up the answers that are
good and it's actually astonishing on StackOverflow how quickly the voting gets you the right
answer because other people come in and vote up the right answers and this happened--this
is a question that was asked an hour ago and within an hour that number one question already
had nine votes. So as you can see, it--within--usually within a matter of minutes for common questions.
You get a bunch in answer sort of flowing into the best ones get voted up. There's one
little tweak of the voting algorithm which is the person who originally asked the question
has the special power to bestow upon one answer as the official answer on to the top no matter
what the votes are. And right below that you'll see the number community voted. So here's
an example of a question where the person asked the question decided that that answer
over there marked number 26 was actually better than what the community thought or solved
his problem better than what the community thought was the best answer which is number
78. Rather than picking a particular topic of programming and attacking that first or
creating a hierarchy like Usenet which is web 1.0 technology, I guess which is an old
fashion technology. We decided to use tags and if you look at the questions on StackOverflow
people are pretty good in putting tags in their questions. And there are usually two
or three obvious tags to put any question. So it allows you to say, I'm asking this question
from a BB.net perspective not a Csharp perspective or I'm asking this question, you know, I'm
using--like a lot of times you'll see these questions and you'll--or you'll be searching
for questioning and get to a result that doesn't even apply to the correct technology that
you're actually using, it's that same in a different domain. So we put the tags in but
we did really, really need stuff for the tags. So number one, we wanted you to be able to
create a view if you're a python programmer we wanted you to be able to make StackOverflowing
to a site that's great [INDISTINCT] resource. You can tell at what questions you're interested
in? What tags you're interested in? What technology you're interested in? We also wanted you not
to have to see things that you're sick and tired of hearing about. Like let's say you
just really don't--never want to hear about .net programming ever again. And so you can
list ignore tags which you'll never see but then actually StackOverflow is doing all kinds
of really smart stuff of these tags. So, for example my program [INDISTINCT] is exceedingly
obsolete. I worked on a little product called Excel where I was in charge of the Micro language
strategy indicated the thing called visual basic for applications which is a way of programming
the Excel which is very, very old but people still use it. And so those the tags that I
picked, you could see up there, I picked BBA and Excel. And when I come to the site now,
it's going to look for questions that the things I can answer not because I said I was
interested in those tags but because I have successfully answered those tags in the past
because I've--because StackOverflow recognizes me as somebody who can answer these obsolete
questions on Excel BBA. Whenever there is a question on Excel BBA, StackOverflow is
going to try to show it to me because that's the best way they can get it, answered in.
As a result of this intelligence that's sort of tagged base and our knowledge of the participants
in the site and what their skills are we're able to achieve more than 90% of questions
getting answered with an accepted answer. We have editing and here's a little diff showing
a typical edit that occurred. The purpose of editing was originally so that questions
could get better and answers could get better and better and better rather than freezing.
Most form software that discussion thread takes place in 2004 and four, five people
participate and then it just remains as this frozen artifact on the internet until the
end of time and that's why questions get out of date. And then also is why you can't remove
errors when you discover them and the questions never get better. So we kind of look at Wikipedia
and said why if we let everybody come in and added these entries. And make them better
and better and better and the model for StackOverflow, you know one of the early pitches was it is
a Wikipedia for--where every topic is in extremely specific, very, very long tail programming
question. Where we'll have the best--the question--we'll have the answer---the answers will be sorted.
Anybody can come in and edit. So if you've discovered that the number one answer has
a flaw in it, you can hit the edit button and fix it. And if you discover that the question
is poorly written or the question was written too narrowly or the question was not explicit
enough, you can go in and edit the question to make it a better question. And our hope
is that a large body of the questions that are in StackOverflow will become the canonical
place on the internet to learn about very, very narrow specific questions about very,
very narrow and specific programming topics. So in this particular case, this guy Martin
very quickly answered the question and hopes that he would help the person as quickly as
possible. The person asking the question just get something out there. So maybe that will
help the guys solve his problem right away. And then he came back two days later because
he was disappointed that he wasn't earning enough points for his question because his
question was kind of--or sorry his answer was just a couple of lines saying, "Go use
[INDISTINCT] don't role your own." which he spelled wrong. So somebody fix role later
and it's not shown here. And because he wasn't earning enough points and he obviously wanted
to earn some more points and they're just points. He went and edited his question and
provided a sample code. And he got better and earn more points because he provided sample
code but--but a lot times people will edit each others questions and this things get
edited a lot and sometimes they are edited worse which is a problem but in general the
answer to questions are getting better and better and better. In order to get people
to the things that we want them to do on the site. We have a concept of badges and this
is sort of like achievements on X box 360 that sort of model. For that and it's based
on the Napoleonic quote to quote from Napoleon saying that, I don't remember exactly a soldier
will fight long and hard for a little piece of colored ribbon. So we get people these
little pieces of colored ribbon and these badges. If you look at my name up on the top
of the screen, you can see next to that the 4,800, that's how may points I have. And then
there a little 15 and that means they have 15 silver badges and a little 28 says that
I have 28 bronze badges. So you're earning the whole bunch of badges and anybody who
sees my name anywhere appears on the site will know that I have a certain amount of
credibility and they can see what things I've done and people would actually go and try
to earn badges to complete the set. You too can get every single badge on this list except
for beta tester because the beta's over. Karma is based on the philosophy that you can't
just pay people to answer questions. And I believe that that is one of the reasons why
Google Answers never really worked because there is sort of something fundamental Mahalo
Answers I think is going to fail for the same reason. There's something fundamental going
on here that people are willing to do for free but they're not willing to do for small
amounts of money, they may be willing to do them for large amounts of money. If you ask
me how much it would cost to provide a day of my consulting, there is no price but I've
spend at least a day answering peoples questions on StackOverflow. So, we, you know, sort of
fundamentally understood that people are not going to want to--there is no--if you try
to clear a market using what I call the ECON 101 management method. A very, very small
payments, like, I could give you a dollar if you can help me with my problem. Okay,
nobody is answering it, let's try 2 dollars that the market is not clearing this little
levels but people will do things for free in order to contribute to the world. So, we
have Karma. And karma is just points. You earn points for all kinds of things but mainly
for getting you questions and answers voted up. And so the best thing you can do on the
site to earn Karma is to come and vote some things up. There's a few--as you earn more
and more karma, you get more and more privileges on the system that usually pretty nominal
privileges but they do encourage you to--it sort of like a human captcha until you've
asked the question and answered a question, you probably don't have enough points to comment
on questions or to edit questions. And so if you come in and you like I want to comment
on this, you'll hit comment and you'll discover that you can't because you need a little bit
of karma very much. And that will encourage you to go do something to earn a little bit
of karma. And there are all kinds of easy ways to do it. And in fact, the Wikipedia
model, one thing that I've actually encouraged people to do, you know, very explicitly is
if you want to earn some quick karma and you don't know anything about programming, go
find a question that seems to be kind a popular, a lot of people looking at it. Look at the
top 5 answers and merge them into one big glorious answer that just created from those
top 5 answers, you're allowed to cut and paste. And put them all together, edit it really
well, write some sample code, test the sample code and submit that as your answer. And you
will get voted up and you'll earn a lot of points for that and that's exactly the behavior
that we want to see. We want to see people improving the site making questions better
and better. And they're not obsessing over the ownership of a particular question or
answer. We put in pre-search, I don't know if there's a technical term for this. But
when you start typing the name of your question which I've done here, when you hit tab, we'll
look through the words that are most likely to be key words and then we'll go quick search
of already ask questions, and we'll provide them for you there. And you can click on it
and you see the answer and that's meant as a deduping mechanism that's meant to prevent
duplicate. This is not a form for answering the same things again and again and again,
and having people said that this has already been asked here. This is--we're trying to
create canonical questions and answers and so we want it to be one. Or too really, really
well which is sort of surprising because we don't have a very good search technology,
we just have SQL server but it still works. And one of the things, I mean the way I knew
that, that StackOverflow is going to work is on the second day of the beta I logged
on with an actual programming question. I typed the programming question in, in the
title I hit tab and I saw the question already been asked. And I clicked on it and had been
asked, you know, 8 minutes ago and there were already a whole bunch of answers. And the
answers had already been voted on. And the top most highly voted question and answer
was actually the solution to my problem. And that's a phenomenon that happens again and
again and again with StackOverflow. We designed this not with the idea of the, of SEO which
I think is a little bit too naīve. We designed our site with the assumption that our homepage
is Google. The front page of StackOverflow is you go to Google and type a question. And
that's how most people find StackOverflow because they typed the question. And so we
optimized, I don't only want to say optimized, we built everything around this assumption.
Our URL's have the name of the question in the URL. And they're, you know, permanent
and clean. We have whatever meditags we have to have. We got the site maps. We've got everything,
we've done everything to make our pages looks as reasonable as possible to a search engine
because we knew that that's were a traffic is going to come from. Indeed, as you can
see here 86.6% of traffic is coming in specifically from Google questions. There's a little bit
of direct traffic and that's the smaller group of users of the site, who hang out on the
site all the time. And for some mysterious reasons the other search engines are--there--while
they're there, you can look number 9, is a live.com. I have no idea why it's only 0.2%
of our visitors, it doesn't really make sense, oh well. So, that was our design, that's the
front end. We were really obsessed about performance, we know their getting fast answers and the
site being snippy and snappy and quick and stuff like that was important so here's a
technology stack that we use--that is actually built on a Microsoft stack. The performance
I know you guys don't use it that much here but the performance of Csharp which is a compiled
language is just ridiculously good. This entire site is serving six minutes--16 million pages
a month and we're doing it off two servers which are almost completely unloaded so we've
got ton of head room on this two servers, one server is a web server, the other server
is running Microsoft SQL server 2008 and they're both [INDISTINCT] zions but the--and you know
there were a lot of optimizations that went in there, but you know no matter what people
say this is a pretty good stack and one of the things that I've always been concerned
about is if you start building a technology like this, using a Microsoft stack you are
going to pay for a window server license. That's a lot of SQL server licenses which
are 5,000 bucks for every box that you put out and the idea that you could possibly use
a larger number of cheaper computers and use open source products which are free, it suddenly
occurred to us on the other hand, when I compare our performance to similar sites like that
are running on the open source stack we're using about one tenth of the hardware that
they are unfortunately and maybe that's because they are not good programmers but in terms
of just the tight queries that we're doing and stuff, the Microsoft stack is actually
appears to be paying for itself, in terms of reduced hard work. We thought that a really
important part of making StackOverflow happen, was get people getting questions answered
on day one and it was really important to me that we get a critical mass on there on
day one so that one people tried it out, there was actually somebody there and the questions
actually got answered. So, I think a lot of these other sites like Amazon, Ask, Google
answers, Yahoo Answers, et cetera. Whatever marking power they had behind them, because
those were sites about all possible questions and they didn't really have a way to get all
possible people involved in the site on day one, they suffered from what I call the empty
restaurant syndrome. You're walking you know up and down university, heading downtown Palo
Alto trying to pick a restaurant to go into and one of them is empty, and you will not
go in there in any circumstances so a restaurant turn knows that on day one they got to have
their friends to at least fill up the window tables so that it looks like there's somebody
in there and that was pretty important to us and so one of the reasons why I asked Jeff
Hatwood, to be a partner in this site because between Joan software and Coding Horror those
are the number one and number two most high traffic technology blogs that are written
by an individual who lives in either New York or California and hey--just kidding. We have--we
have, we actually have a lot of traffic , I had it about a million visitors a month and
he had probably about a third of that but that was a lot of programmers and so we started
doing this podcast which you can listen to, it's a weekly podcast where we're actually
basically designing a site and he's giving me a status reports every week and that's
podcast on IT Conversations and it was just another way of making sure that when we launch
there would be, you know, maybe 20-30,000 people listening to this podcast. So here's
the status report, here's what happened with StackOverflow, here's how it worked out as
of about a week ago we've had a 136,579 questions have been asked 91% have been answered. My
definition of answer is not that's somebody typed in an answer, but that somebody typed
in an answer and somebody else uploaded the answer so you got to have enough vote for
it to count as answered. This to me is really good, if you look at the questions that are
unanswered there either on obscure topics or sometimes they're just like way too hard.
Number of posts has been growing linearly more or less we're at about 800,000 our post
is either question or an answer similarly, the number of registered users this is the
smaller community of people that just come to the site all the time, they create and
counsel themselves we actually use this is awesome--we use to open ID just because we
wanted to push it really, really strongly and it was actually successful. We don't have
any log in mechanism other than open ID but as you can see you can logged on just by two
clicks if you have say, Google account which is nice. So the number of registered users
have created accounts on the system is around 60,000 right now and those are sort of the
core group of people and then we got sort of a larger group of people who just finish
the site. The Uniques now is over three million a month, over three million unique visitors
a month and mostly probably unique just means we set a cookie and if it's still there but
if is often Google [INDISTINCT] so whatever Google analytics means by Unique visitors.
Our--the best information we sort of triangulated from a number of sources, the people that
work and develop relations in Microsoft have a number that they use which is nine million
professional programmers in the world there are nine million professional programmers
in the world. So I don't how to get three million is but it looks like we're getting
a bout 30% of them we're using StackOverflow in some way [INDISTINCT] perform and not generally--actually
this is a large crowd. How many of you ever used StackOverflow or seen StackOverflow in
search results? This is kind of a good group I'm guessing that about 40%. So, when I asked
the same question at the MIX conference in Las Vegas and it was about 30% in that room.
So, I have a feeling that in about first six months, we've gotten into about 30% of the
professional English speaking programmers. We don't have any language sites. The number
of visits is over 6 million a month, page views is about--running about 16 million and
of course, growing pretty steadily there. Here's where the traffic comes from. The large--the
larger part, the 86% is directly from Google. And then, you know, this obviously refers
to certain number of people. But direct traffic are the people that I think of as the participants
in the site. These are the people I'd like to hang out and answer questions in order
in karma. So the last slide, what--where we're going in the future, StackOverflow was designed
to be specifically about programming questions and if you try to ask a question that doesn't
involve code, like why does my computer blue screen every night at midnight? You can--you'll
get kicked out and the question will get closed. And so we're building a new site called Server
Fault, which is the system admin version of the site. It's for system administrator questions.
Our goal is specifically for professional system administrators. People whose--that
were not--where we don't want the user at home who's trying to figure out how to plug
their iPod directly into the printer. We're trying to get the actual people working at
their job because what we found, almost from the Yahoo Answers experience is it this kind
of site only works when it's people that are doing something professionally. In other words,
they really care about getting the right answers, they care about expanding their personal professional
knowledge and they're taking this seriously in a level that the teenagers asking sex questions
on Yahoo don't necessarily care about. That much--there are few other things in the future,
none of them are clearly determined, that's really just sort of a couple weeks in the
future. That thing's going to launch. In a long term, another part of the--this sort
of a different direction. This StackOverflow could go in one direction is, you know, more
towards the software product. Another direction is creating lots and lots of vertical StackOverflows
and all kinds of highly technical industries. For example, the StackOverflow for tax accountants
in the United States for example, who have questions about the tax law. And then the,
you know, another possible direction is--as a recruiting market place, which I won't go
in too much detail but we're actually accumulating quite about it--quite a lot of knowledge about
who's good at what technology's in a programming realm knowledge that could be very useful
to programmers that want to get hired or programmers that want to hire other programmers. So, that's
really all I have. I'll turn it over to questions if there's anybody wants to--how do I? Start
in front row? >> [INDISTINCT]
>> SPOLSKY: Where the center tabs is, it's just like any other tabs that you type in
works. But there's an auto complete so it tends to, you know, as you start typing, you'll
find the existing tab and I think there's a mechanism for merging existing tabs. I think
we have some kind of official question system. >> [INDISTINCT]
>> SPOLSKY: Got it. Yes? >> Were you seeing any bad [INDISTINCT] policing
so far? >> SPOLSKY: The questions was, are we seeing
any bad user behavior or as a community pretty self policing so far? We're definitely seeing
some bad user behavior and we're constantly adjusting algorithms just try to fix it and
I would say, you know, we're trying to have a Google-like approach of looking for algorithms
that solve problems or other than individually just plucking out individually users or problems.
So we've got all kinds of methods. There's a place where you can like temp--like if you
behave badly and we warn you and you don't recover from that, then we put you in a penalty
box for a week while you don't get your points until the end of the week. And there's all
kind of stuff going on, they're mostly--it's obviously self policing because of this number
of questions, it has to be reasonably automatic. But we pretty much found, most them, we're
pretty much cured at this point. We're pretty much on top of the typical problems like spam.
Yes? >> So on dory, we have a question.
>> SPOLSKY: Okay, sorry. >> [INDISTINCT] dory room.
>> SPOLSKY: Okay, dory it is. >> What aspect of StackOverflow's user interface
designer are you most proud of? >> SPOLSKY: What aspects of stack overflow's
user interface... >> What makes you so great?
>> SPOLSKY: ...am I most proud of. I'll just pick one thing. Well, you know, really it's
just every time I hear somebody saying, "StackOverflow is awesome, I asked a question, and got an
answer. And when I asked questions, it just blows me away that you'd see the--start seeing
answers like in three or four minutes, coming in." So the fact that, you know what the truth
is, I used to as a programmer, I used to have not the patience to post a question through
discussion forum because I knew that it would take a half of a day to a day to get an answer.
And I would really just sit there and Google for a long, long time trying to find the appropriate
answer, rather than asking it. But what's awesome about StackOverflow is the act of
asking a question is also the act of searching for an answer. And so for an answer is there
you'll find it and if it isn't, it's just one extra click to actually commit to asking
that question and then you will actually get an answer within the usual amount of time,
I mean you don't have to go to lunch, the answers will start coming in pretty quickly--yes?
>> [INDISTINCT] >> SPOLSKY: The question was, as we--have
we considered licensing as to large corporations for licensing for internal, as an internal
technology. We don't really have the resources to do that right now, we're sort of talking
to various people that might be interested in providing various resources its time to
[INDISTINCT] I won't go into--in too much detail but that's something we don't have
the immediate time to do just because were about four people working on this but in the
long run it completely make sense. >> The next question from Devon Mollins, what
aspect of StackOverflows initial design that you're least proud of? What makes you say...
>> SPOLSKY: That sounded like the first one, what I am least proud of in the design. Boy.
I'm stomped is just--let's see, there are--personally it sort of bothers me that right now we are
closing questions that are off topic without having a place to send people so if somebody
ask the question that isn't really programming question but StackOverflow is a perfect technology
to get an answer to this question but it's not a programming question the community will
jump on it and close aggressively in a way that it's not necessarily newbie friendly
and say go somewhere else I don't know where but go somewhere else and we just don't have
a place to send them right now so were hoping that Server Fault will get all those system
administrator questions we'll give them a home but that's a sort of an aspect to the
community that's a little bit too focused in my opinion on the purity of the home page
only being programming questions--questions for the room--yes.
>> [INDISTINCT] >> SPOLSKY: What is our current or future
plans for modernization? We are, not--we don't really talk about that too much, I should
say that we have a little bit of advertising on the site pretty much only people that have
aggressively banged down on our door and literally deposit the money in our bank account without
our permission and so they got little advertisements and that is not enough to keep us start running,
like I said it is four people and two servers, doesn't take a lot to run this and you know
our primary goal is getting answers to program our questions. On the other hand, with the
community of you know basically a third of the world great programmers, we think there
will another modernization opportunities, we are never ever, ever going to charge to
go to the site, we're never going to have interstitials, we're never going to have advertising
that has any kind of motion in it like flashing ever, not just flash but like animated gist
not on the either and the basic core aspects of the site will always remain free for everybody
permanently, we got another dory onw. >> Yeah, yeah, from Guido Van Rossum, I'm
sure you'll referring people to your blog post quote things you should never do part
1 was ever part 2? >> SPOLSKY: Things you should never do part
1 is a--is a--is a--is a riff on the History of the World 1 on Mel Brooks movie, there
is no number two and that's the joke and you may have to wait until the end of time to
determine if there is some things that you should never do part 2 but I think--you know
what the two, probably the number one thing you should never do part one, the things you
should never do for those in the audience have done [INDISTINCT] I wrote a long, long
time ago saying, "If you have working code that is large, lot of people are depending
on does a lot of things," you don't just start from scratch, you got to try to somehow find
a way of moving the existing body code you have into new code and may be you rewrite
every line but you don't start from scratch because you lose too much knowledge that you
spent and you have to fight pretty hard for that working code you have and the time I
wrote that probably the number one victim of that was the Netscape, the major Netscape,
the Mozilla rewrite which in my mind lost them about three years in a time in which
Internet Explorer gained almost 9% market share and it is true that Firefox eventually
came out and kind of recovered and sort of caught up but the only reason Firefox caught
up is because the Internet Explorer team have been disbanded for five years and wasn't working,
so they kind of locked out but you don't--you don't stop, you don't--you don't rewrite things
from scratch and I think at the time I got an email from somebody saying, "Have you looked
into Pro6?" And so this was eight years ago that I wrote this and if I--and then that
was a complete rewrite of [INDISTINCT] that I believe is not yet shipping, I'm not really
a profile or [INDISTINCT] has been pretty much forgotten so, yeah, there is nothing
that you can do is that bad to a software product and you just decides to start over--yes.
>> [INDISTINCT] hardcore professional programmers? >> SPOLSKY: So the question is what kind of
UI changes we have made for--were focused on sophisticated consumers rather than hardcore
professionals. The first thing is I don't think this site is going to work for gardening
questions ever, it might work for gardener questions like professionals but for gardening
questions this is probably not the right kind of site, it has to be professionals that are
sitting in their desk doing something professional and a professional in there desk doing something
professional will probably learn a slightly complicated UI [INDISTINCT] get there, stuffed
on so they'll probably overcome this. That said, there's a bunch of programming specific
stuff that's in StackOverflow. For example, the programmer seemed to be really, really
good at tagging their questions and when they're not so good at tagging their questions anybody
can edit it and add tags. But surprisingly we didn't even have to explain what tags were.
We don't have to tell anybody, guide them in somewhat tag should be. They get in there
and they're tagging it and they're tagging it correctly and they just sort of understand
that. And I don't think that would apply in other fields you have to come up with some
other method other than tagging probably or something a little simpler or I'm not sure
maybe the Flicker experience that will be different. The other thing is that because
we're focus programmers we knew that a lot of this questions we have sample code in them.
So we have this big old control that uses sort of a Markdown syntax and shows you a
preview of what you're going to get from your Markdown. And so you're actually riding in
a slightly awkward language called Markdown, which is sort of beyond simplified HTML. But
it's not an easy way get here by a stretch of imagination. And we have not had a single
programmer complain about that or have any kind of problems about that in any way [INDISTINCT]
performed. But I don't think that Markdown is necessarily the right approach for gardeners.
>> Juxtapose on February 27th, suggested that the number of questions answered posted per
month isn't increasing as you would expect. Does this worry you or it's a steady increase
in the data bank of pertinent questions answers are good enough that's from John Street.
>> SPOLSKY: Yeah, I'm not really sure about this because I haven't dug enough into the
numbers and I don't know enough statistics. All I have is growth charts that I was showing
you a minute ago or I'm showing something that is--I don't know is that linear? Is that
exponential? Is that the beginning of an exponential curve? I'm not really sure. It doesn't worry
me at all as long as I see--I mean like literally every week I go and I look at Google Analytics
and you know we're kind of doubling every four months. And it's been that way since
well, fourth month. Let me take a question from sort of far the back because I've been
just doing in the front rows here, really. Is anybody--anybody in the back? No, too quiet.
Here's one over here. Yes? >> [INDISTINCT]
>> SPOLSKY: Right, so obviously there are old questions that are boring and nobody ever
goes to. There are old questions that have become the canonical place on the internet
to ask a particular very interesting question about a very common programming problem that
people hit. And a lot of the questions are like I'm seeing this particular crash what
could it be? Oh, you got this bug on your code. Okay, and that's never going to be interesting
to anybody ever again. So, there are sort of different classes of questions. The only
tracking that we really have for that. It's when you see a question we track the number
of views and we'll show it to you right there. How many views that question ever gotten?
And occasionally you'll see questions that have tens of thousands of views and those
are usually popular questions like what is your favorite programmer's cartoon? So, let
me bring that up. What is your favorite programmer's cartoons? So, that ones got a 585 votes. It's
been viewed a 120,000 times. I'm surprised that these facts are the questions come out.
First, I kind of liked the--I kind of like the little Bobby Tables one personally. So,
yeah I mean they're certainly like questions like this will get slashed audit and dig and
all that kind of stuff and it's going to be high and Page Ranked even because everybody's
linking to it and so forth. So, there will be questions that are like super, super popular
questions that just die a innocuous death. One thing that we are seeing is that--we are
becoming the place of choice for new programming technology questions to be asked. So for example
when the iPhone came out. Apple made the--iPhone developers [INDISTINCT] and said that they
won't talk to one another and provided them with a very, very crappy forum that nobody
liked for asking each other questions. And so they all came to StackOverflow because
basically they just launched and we're kind of a number one resource for iPhone programming.
And what you see a lot of these days is when a new programming technology comes out rather
than trying to build their own forum or their own place to get questions after their own
community support users. They'll just guide their users by saying hey use the tag whatever
it may be. So, whatever the technology has called them. And so newer technologies tend
to just sort of organize around StackOverflow around the tag rather than trying to make
their own forums. And we expect to see a lot more of that whereas older technologies like
Visual Basic 6. Which is now completely and utterly obsolete and not even supported anymore.
Still have a gigantic body of knowledge in the website with a hyphen and not very much
knowledge in StackOverflow and that'll probably be that way forever. And I don't even know
if we have any cobalt questions. Just so there maybe intensely obsolete technologies, yeah,
when I get 44 cobalt questions. >> Once you have a large expert user base,
I think it's very hard to add new features or change new products. Are you guys facing
this problem and if so, how do you experiment with new features?
>> SPOLSKY: Once you have a large user base--okay, that was on the microphone, so, I don't have
to repeat that. The what--I kind of feel like at this point we're sort of tweaking. And
we have a couple of places where the conversation about new features takes place. Well, basically
about three places. We don't want StackOverflow to have questions about StackOverflow because
that's not the topic, that's off topic by definition. It's not a programming question.
So, there's a user voice site where future requests come in and there's some conversation
about this future request on user voice. There's the podcast, that Jeff and I do and people
will call in with questions and people will email us and we'll talk about new features
as they develop. And there's a StackOverflow blog where new features are introduced and
the comments at the StackOverflow blog are really where most of the discussion about
new features takes place. But at this point we're not--we haven't done anything that I
think would really shock people and--like not be able to use the site. We haven't made
it a Twitter clone on the homepage. So, anything else from the room? Yes.
>> [INDISTINCT] >> SPOLSKY: When somebody edits a responds
it was highly marked. How do you know it's still good? Well, you can't. But it's like
Wikipedia, so you can search the history and sort of see. There is sort of this weird concept
because we're a combination between in some ways like a blog or a personal site were there
is authorship and sites are written by--questions are written by a particular question--by a
particular person. So, all these questions here, sort of have an ownership to the questions
and the answers will have an ownership. But we have this concept over here called the
community Wiki. At a community Wiki is a question or an answer that has either been put in community
ownership where nobody [INDISTINCT] the points and nobody owns the answer. Or has been forced
into community Wiki mode for various reasons of question is getting--one thing that we
don't want to happen is somebody ask what's your favorite program as cartoon and then
gets 585. Oh, this wasn't the number 1 answer, I'm sorry, this is the question itself. This
is the number 1 answer 812, and that wasn't a recognized. That's a number 1 user answer.
Things will eventually get into community wiki mode because we don't want somebody earning
a billion points of Karma just because they ask a really popular and slightly off topic
question to get unto dig. So, if a question gets edited to much or an answer get edited
too much and there [INDISTINCT] for all these things that are very complicated and they
are all documented somewhere, well we'll take away the personal ownership for that one.
Anything else? >> MARR: Yeah, Ivan Marr here. Why are you
saying--this is an anonymous question, why are you saying that kids at Google and Microsoft
are working on hopeless and useless architectural astronomy on quote. So, as if it were something
bad. >> SPOLSKY: I don't understand, I have don't
have the search for that. >> MARR: You use any search engine you like.
>> SPOLSKY: Yeah, I'm going to try to find a search engine here, no, if there is one
in it. Does anybody have a search engine [INDISTINCT]. Why are kids on at Google architecture astronauts
what the heck, I can't spell. >> MARR: I'm sure the L suggests it.
>> SPOLSKY: That's not it though. I wouldn't say that. I can't remember. I'm sorry. I'm
not going to be able to answer that one right now. Is that it? But this isn't me, this is
my discussion [INDISTINCT]. Why Google and Microsoft--let's see what this guy says. His
got a link, only this was it. On grove, okay, I'll click on grove. Live mash, got to get
rid of them. Oh, yeah, Google being on [INDISTINCT] salaries to the kids with ultimate [INDISTINCT]
experience in python. That's actually true your starting salaries are too high guys,
especially out of college. Yeah, I don't know. This was mainly because this is over think
a bit. But Microsoft and Google are really like gigantic vacuum cleaners at Stanford
and Berkeley and U-Dub. And it really is and I'm going to stand by this, it really is the
case that no matter how useful the stuff is that you're working on Microsoft design to
a large extent it's much harder for small companies to get good people because of the
presence of Microsoft and Google. Which for a long time were in a mode of hiring as many
people as we could possibly get under all circumstances. And honestly, I bet you have
colleagues that are working on things that you think are probably not as important as
some other start-ups you may know. So, that's the best I can do to defend that one.
>> You probably got time to one more question. >> SPOLSKY: Sure, one more.
>> There's one from the audience, then were done.
>> SPOLSKY: Any one who wants to raise question? Does any one want to know what's the starting
salary is it for Google for a new [INDISTINCT] this year? Yes.
>> [INDISTINCT] >> SPOLSKY: What we want to embedded executable
codes sample safe in the answers. Right now, I think, I don't exactly know what you're
proposing but I think we're about to code in the source code and so we do have a way
to embed the source code. We definitely be interested in ways to make the embedding in
source codes better. We're using a little tool from Google actually. And having actual
source code on the page, it's sound like a pretty bad idea even if it's safe. No, not
necessarily. And when you think about the type of questions that people are asking here,
seeing it then run the browser, I mean that would probably be applicable only to, you
know, CSS questions or JavaScript questions but to the majority of the type of code that
people are really asking about here. All right, well, thank you very much for coming. I appreciate
the attention.