GDL Presents: Women Techmakers with bitly


Uploaded by GoogleDevelopers on 09.11.2012

Transcript:

[MUSIC PLAYING]

APRIL ANDERSON: Welcome to part two of our last episode
in this week's Women Techmaker series.
It's been really rewarding for us to talk to so many talented
women who are innovating in the tech space this week.
And today, we're excited to have yet another talented
woman techmaker, Hilary Mason, who's the chief scientist at
bitly, joining us from New York City.
Before we turn it over to Hilary, by way of
introduction, my name's April Anderson.
I'm an industry director here at Google.
And I'm working to bring Google's advertising solutions
to our top partners in the retail market.
I've been at Google now for about 10 years, and went to
school just down the road at Stanford University.
I'm joined here by my colleague Kathryn Hurley, who
I'll turn it over so she can do a quick introduction.
KATHRYN HURLEY: Hey, everyone.
My name's Kathryn Hurley.
I'm a developer programs engineer at Google.
And specifically, I work on Google Compute Engine.
And as a developer programs engineer, I help developers
use the Compute Engine API.
So for example, I write sample code.
I speak at conferences.
You might have seen me at London Strata
this year, for example.
And I answer questions on the forum and Stack Overflow.
And prior to Google, I got a master's in web science at the
University of San Francisco.
And so now, at this point, I'd love to turn it over to
Hilary, so she can tell us a little bit about herself and
give us a little bit about her background.
HILARY MASON: Sure.
Hi.
First, thanks for inviting me.
It is really exciting to be talking to both of you about
things I think we're all pretty interested in.
So this is really cool.
And just to introduce myself briefly, I'm Hilary Mason.
I'm the chief scientist at bitly.
Bitly is a URL shortening analytics company.
But the work we really do is about understanding human
behavior through the lens of social sharing.
So we look into the data generated by our analytic
service and try and learn things about humans, which is
pretty interesting.
Because my academic background is in computer science, but my
whole life I've also built things.
And so finally I'm in a position where I'm both
building things and doing interesting research.
I also co-founded a nonprofit called hackNY that helps young
developers find interesting opportunities in the NY
creative tech economy.
The tagline actually is from my co-founder Chris Wiggins.
And we say, "to save kids from the street," but
we mean Wall Street.
And I'm also involved in making trouble in a bunch of
other ways in New York on Mayor
Bloomberg's advisory council.
And also, I'm a member of the hackerspace NYC Resistor.
KATHRYN HURLEY: Wow, that is a really awesome background.
You do a lot of amazing work.
So you briefly touched upon bitly.
But do you mind going into a little bit more detail about
what the company does, maybe how it got started?
HILARY MASON: Sure.
So bitly was a total accident.
It was actually a feature of another product that was a web
product around intellectual discourse, around shared media
experiences.
But it totally failed.
But what did succeed was this little piece of it for sharing
bits of content called bitly.
And we actually have a sister company called Chartbeat that
uses some of the same technologies.
And Chartbeat is real-time web analytics for publishers.
So nobody really planned bitly as a company.
It's not like someone woke up one day and said, oh, I have a
great idea for the world's best URL shortener.
But once that happened, we started to build it.
And we realized that the value would always
be in the data set.
And so eventually, we developed a
business around that.
And that's our enterprise analytics business where folks
who run large brands are celebrities.
Basically, anyone who either publishes a lot of content to
the social web or has a lot of content about them published
to the social web pay us for branding on their shortlinks.
And also, a whole set of analytics about how their
stuff is doing.
So we have these two sides, the data and the business.
And the work I do sort of sits in the middle.
KATHRYN HURLEY: Very cool.
And how would you characterize the bitly user experience?
For example, what do you think that bitly is doing that
contributes to the seamless content sharing
that we see on bitly?
HILARY MASON: Yeah, so bitly is really the best way to
outsource your memory and the content you want to save and
find later.
So you can go to bitly as a consumer--
of course, it's all free.
And you can save things, tag things, put notes on things.
And then you can find out how many other people also saved
those things or were interested, or clicked on
those things.
And you can come back and search for that stuff later.
And you can share easily out to many different networks.
So we see bitly as the hub that sits between all of your
social sharing behavior.
APRIL ANDERSON: Hilary, I'm really interested to learn
more about what it means to be a data scientist.
Can you tell us a little bit more about what that means?
What does a data scientist do?
And in your role as a chief scientist at bitly, what does
your typical day look like?
Take us through that.
HILARY MASON: OK.
So those are two different questions.
So I'll start with the one about data science, which has
only really been around as a profession for a few years.
And I'm pretty privileged to consider myself part of the
larger worldwide conspiracy to create the profession, or at
least help support it.
I think data science deserves a new job title because it's
not anything that we've never been able to do before.
But it's three behaviors that we've never combined into one
professional before.
And those three behaviors are first, being able to do math
and build models that represent the real world.
The second one is being able to write code or engineer the
systems that support those models.
And sometimes to do it at large scale as you guys
certainly do and as we do at a scale as well.
And then third, to ask good questions and be creative
enough to communicate what you've
learned back to anybody.
And that's often the hardest thing to find, especially
combined with someone who knows math and is
an engineer as well.
But that's sort of the practice that I've seen evolve
over the last couple years.
As to what I actually do, that's a very good question.
I'm really privileged to work with a brilliant team of
scientists, engineers, and artists.
And my job is mainly to coordinate the various
research questions that we're investigating, the systems
we're working on, and to work with our product and business
to prioritize our work appropriately.
So I see our role as really laying down the path where our
company will go in the next six months or the next year.
APRIL ANDERSON: That's great.
And I definitely want to hear more about that, but also want
to hear from Kathryn, who has a few questions
for you on big data.
Because that's a very interesting
topic for us as well.
KATHRYN HURLEY: Yes.
If you don't mind switching gears a little bit.
So as a data scientist, you work with lots of data.
What I'm first interested in hearing about is what kind of
data is gathered by bitly from your users.
HILARY MASON: So we primarily see three kinds of data.
We see the save and share events.
So if somebody shortens a link or saves it in bitly.com, we
see the entire set of clicks on that link
once it's shared out.
And that includes things like the location of where people
are clicking on things from, their referral, which social
network it's getting traffic from, the user agent, so what
kind of device and stuff like that.
And then we actually go and crawl the full
content of each link.
And we would do a bunch of analysis on that as well.
So we primarily work with those three types of data.
KATHRYN HURLEY: OK.
That's some pretty interesting data that you gather.
HILARY MASON: Definitely.
KATHRYN HURLEY: Yeah.
And just to give our audience an idea of just how much data
bitly has and stores, and how much data you get to
play with every day.
HILARY MASON: Yeah, so we see tens of millions of links a
day, hundreds of millions of clicks on those links, and
then a lot of content.
I don't remember off the top of my head how big it is, but
it's moderately big.
KATHRYN HURLEY: Moderately big?
It sounds pretty big to me.
And so that brings up a good question.
A hot topic right now is big data.
And I know you have an opinion about big data because I
watched some of your talks.
What do you think about big data?
HILARY MASON: So I think the term "big data" is a little
difficult because it's so hard to define.
So some people think big data is when it won't
fit in Excel anymore.
And some people think big data is when you need more than one
node to actually operate over something or store it.
What I think is actually fundamentally new and
interesting about big data is that it reduces the friction
to working with data.
So it lets you ask a question and get the answer back before
you've forgotten why you've asked that question in the
first place.
So I think the advantages to using these big-data type
technologies are really human advantages.
They're not so much technical advantages.
And I also, every so often, find myself at a conference
with astrophysicists or people who work in bioinformatics.
And they look at our data, and they say, ha.
You thought that was big.
We had to invent a new kind of hard drive to store our stuff.
So those are the people who are really pushing
the edge of big data.
KATHRYN HURLEY: And so it sounded like you have a couple
tools that you like to use to work with big data.
Can you share what some of those are?
HILARY MASON: Sure, though they're fairly embarrassing
and old-fashioned.
My favorite way to work with data whenever possible is on
the command line using the bash tools, so grep and awk.
If you look on bitly's GitHub page, we actually have some
nice scripts you can run to do histograms on the command line
and do some really basic sampling and get a sense of
your data just from that.
That's not always possible, though, so I'm
a huge Python nerd.
We use NumPy, SciPy, and scikit-learn.
We also do have a Hadoop cluster here at bitly.
It's not my favorite piece of infrastructure,
but we do use it.
And it works very well for what we use it for.
KATHRYN HURLEY: Cool.
Thanks for sharing.
HILARY MASON: Of course.
KATHRYN HURLEY: And so working with all this data, you
probably have a lot of insights, make a lot of
interesting discoveries with this data.
Can you share what your most interesting discovery is with
all the bitly data?
HILARY MASON: It's really hard to pick just one.
But one thing we put up on our blog today was we looked at
the half-life of content by topic.
And it turns out that most content actually has very
similar half-lives.
And a half-life is the time it takes for a link to get half
the clicks it'll ever get.
So just sort of looking at how attention to it is
modified over time.
But one kind of content is much
shorter, and that's sports.
So once sports news is out of date, nobody cares about it.
APRIL ANDERSON: That's a point.
That's very true.
HILARY MASON: Not really a surprise, but good to see it
proven in the data.
KATHRYN HURLEY: That's so cool.
And you make these discoveries, but how do they
actually influence bitly as a product?
Do you see any changes in the UI or in the back end?
Or are you just working with the data and having fun?
HILARY MASON: Well, it's all of the above, actually.
So we tend to ask a research question.
And that might be something like, can we build a model of
canonical social sharing?
So is there a consistent way that content is shared through
different social networks?
And so we'll go out and say, OK, how do we
know when we've won?
How do we know when we've done that, when we've actually
answered that question?
And then two, is it scalable?
So can we actually build a system to do this?
Most of our products are real-time analytics products.
So is it possible to do this in real-time?
And then, if so, what should the product look like?
And so that question was actually not hypothetical.
It was what we spent most of last year working on.
And we actually released an experimental product called
Realtime, which is at the domain rt.ly, where you can go
in and look.
And it's showing you the stories that are getting that
initial spike of attention.
And you can now go in and see the actual maps of
attention over time.
So that's an example of something that has gone from
research to infrastructure to experiment.
And soon, to bitly product.
KATHRYN HURLEY: Awesome.
It's always fun to see the work that you do get put into
the product, I think.
HILARY MASON: Absolutely.
We're also opening up some of the APIs there as well, which
I'm pretty excited about.
KATHRYN HURLEY: Awesome.
I like APIs.
That's what I work on every day.
All right, one more question for you, and
then I'll hand it--
I feel like I'm hogging the show.
But finally, have you hit any walls recently?
Have you come across any challenges with bitly's data?
And how have you overcome those challenges?
HILARY MASON: So that's a difficult question because I
think most of my time is spent overcoming challenges.
And a lot of the time, they're human challenges.
So somebody is after something different
than what we're after.
Or we're not quite coordinated within the company about what
we're trying to do.
And sometimes there are technical challenges.
And I can think of examples of both just from today.
One being that we defined some key business metrics for one
of our apps.
And it turned out they were getting written to one data
store and not another.
And so I spent a long time trying to track that down.
In general, the ways I try to work around these things are
to always talk to people in a very positive way.
So not to get frustrated or upset but rather to say, OK,
here's our problem.
How do we solve this problem?
And I'm really fortunate, again, to work with a lot of
smart people who are really nice.
In fact, our number one hiring rule has always been, don't
hire assholes.
And so that tends to work out pretty well.
KATHRYN HURLEY: That's a good rule.
APRIL ANDERSON: Yes, it's a good mantra.
So Hilary, I would love to hear more about your role and
some of those challenges.
And also, some of those opportunities as you work as
the chief scientist at bitly.
How do you really balance wearing the technical hat and
the technical aspects of your role with the things that are
more business-oriented?
HILARY MASON: That's a really good question because it's
something I'm still learning how to do.
And when I started at bitly three years ago, I was writing
code every day.
We were such a small team that everyone knew what
everyone was doing.
And it was pretty clear where we were going.
We had only one thing to do, and that was keep scaling.
And now we've got a couple of different products, a larger
business, a lot more people.
And so it has been a challenge to learn how to grow with the
company and to grow my team from just
me to now nine people.
The ways I try to do that are generally--
it sounds really trite, but actually applying the
scientific method.
So even in product or business problems saying,
OK, here's my theory.
What experiments can we do given the data we have
available, or the people we have available, or the
resources we have available to answer this question?
Can we try something and see what happens, and then try
something else?
And that is generally how I've thought about the world.
And it has worked pretty well.
APRIL ANDERSON: That's great.
And how do you really infuse that with your team?
This is an industry and certainly a product that
requires constant iteration, constant innovation.
How do you really foster that creativity with your teams?
HILARY MASON: I try and lead by example.
And what I mean there is to have lots of bad ideas.
Because you have to have a lot of ideas to
have any good ideas.
And if you don't even try to articulate
those, the bad ones--
if you're afraid to do it or you're going to be
embarrassed, then you're going to miss out on the good ones.
Or when you say something that might not be the best idea,
maybe one of your colleagues can say, well,
that's total bullshit.
But this other twist on it might actually work.
And so we try and keep that kind of open environment.
And also, keep in mind that a lot of the best ideas don't
come from our researchers with PhDs in math.
They come from the people who are working on our community
support team because they're actually there working with
customers all the time.
And so within the company, we try and create a culture that
ideas come from anywhere.
Questions can come from anywhere.
No one should be afraid to have bad ones
or ask silly questions.
Yeah, I do a lot of that.
APRIL ANDERSON: No, I think it's something we all kind of
strive to do, right?
Keep that customer voice infused in
everything that we do.
As a bit of a tangent on data, I'd love to hear, how do you
actually keep track of your own data within bitly?
When you get this kind of feedback, what are the
mechanisms that you use to really make sure that you're
keeping your finger on the pulse of how the product's
doing and what the customers are saying about it?
Your own sort of big data, as it may be.
HILARY MASON: Right.
So one of our responsibilities is the not so sexy area of
business analytics.
That is, even figuring out what metrics correlate to the
kinds of behaviors we want to see.
And for some things, it's really easy.
So you have a certain amount of money, and you
watch that go up.
And that's great, right?
Your problem's solved.
But for a lot of things, it's not that simple.
So one of the projects I'm working on is growing our
platform and working on API design.
And we start to think about, OK, what kind of platform
users do we want?
How do we want to understand their behavior on our system?
On the technical side-- if there are any technical folks
in the audience--
we have a consistent development metaphor within
bitly for all our systems, which is you
have a stream of data.
You have a queuing system.
You have queue readers that read out of that queue.
You have some kind of API layer.
And you have an application.
And what that lets us do is take those streams of data and
put them in places where we can analyze them pretty
easily, or hang things off of them that can fire events when
metrics happen.
And so we track everything we can in StatsD and Graphite.
We also have our own dashboard called the Aquarium.
We're big on the fish theme.
And we just try and make that available to everyone in the
company, so that anyone can ask a question and find that
answer without having to ask someone else.
APRIL ANDERSON: That's great to have that transparency.
I would love to hear a little bit more, too.
I mean, you are doing so many things at bitly.
And you have such an impressive background.
What do you think of your early experiences really
prepared you most for your current role?
And are there any sort of experiences that you'd
recommend for people who are really looking to build a
career in tech and leadership within the tech space?
HILARY MASON: Yeah.
That's another good question.
So I have a mix of experiences.
Some of them are theoretical and academic.
And many of them were much more--
so I really do like video games.
So at one point, I built a system to gather a lot of data
out of video games.
And then it turned out it actually had commercial value.
But it was a total accident, and I was pretty naive about
how I had developed it.
So my only advice there is that you have to actually do
things and not be afraid to do things that are not perfect.
If you do things and they're not that great, people for the
most part just ignore them.
But if you do things and they're really interesting--
and if you're interested in something, there are other
people out there who are going to be interested in it.
The second piece of advice-- and this is something I
learned probably a little bit too late-- is that you need to
build a community.
And in New York, we've been really fortunate to see huge
growth in the number of people working in tech or in
companies that have a large tech component over the last
five years.
But we've also done a lot to build a data community here.
And we've gotten people from-- where I say "we," it's mostly
me and a few other people who said, we want this to
exist in New York.
And we want New York to have this voice.
So we went out and found people who shared that
motivation and that interest and just started introducing
them to each other.
And making sure that we have a meetup every few months where
we get together with some beers, so people
are a little relaxed.
And they can talk about what they're working on or what
they're thinking about.
But that community is really important.
APRIL ANDERSON: No, that's great.
Thank you so much for sharing that, too.
I think it is important to be active and try new things and
really put yourself out there.
So it's great advice to give.
I'm going to turn it back over to Kathryn because we want to
hear a little bit more also about some of your recent
presentations and some of the things that you've talked
about in terms of prioritizing research.
HILARY MASON: Sure.
KATHRYN HURLEY: If we have enough time.
So like I said, I was watching some of your recent
presentations.
One in particular at Devs Love Bacon, which is a very awesome
title for a conference.
And at that conference, you talked about-- and sorry, I'm
reading off my notes here.
I just want to get the question right.
So you talked about several steps in the
data analysis process.
So that included finding, scrubbing, exploring,
modeling, and interpreting the data.
And so I was just curious if you had any tips for data
scientists who are just starting out in this data
analysis process.
And if you could, suggest any tools or any kind of material
that could help them get started in the field.
HILARY MASON: Sure.
Here So I have to say that I love that you watched that
talk because I had given a three-hour machine-learning
algorithm tutorial at a conference the year before.
And the gentleman who organized the BACON conference
attended and he liked it.
And so he said, will you come to my conference
and give the talk?
And I said, sure, I'd love to.
And then when I got there, he said, oh,
and you have 30 minutes.
And so I had this three-hour tutorial planned.
And I was like, oh, oh, how do I condense three hours of
material into something that'll be entertaining and
interesting for 30 minutes and actually not lose the value in
the presentation?
So that was an artifact of a lot of stress and jet lag, but
I think it came out OK.
The process you talked about is something--
it's from a document I co-wrote with my colleague
Chris Wiggins, who's also a co-founder of hackNY, a few
years ago now, where we realized that there really was
no canonical statement of what a data scientist does or even
how you look at data in a non-scientific way.
And so we wrote this down.
And it is, indeed, this--
you get data.
You scrub it.
You model it.
You analyze it and you interpret it, or make it
interpretable for other people.
It's pretty simple and seems obvious now.
But at the time, I hadn't seen it written down like that.
So for young data scientists who want to start to explore,
the first thing you need is data.
And there's actually a lot of data out there.
A lot of it's public data.
You can look at some of the government data.
Even bitly has a public data set of all
the 1.usa.gov clicks.
So we powered the federal government URL shortener.
It's all public data.
It's a good way to play with social data if you're
interested in that.
Or you can scrape it.
So a lot of people do that.
I just did a really fun little side project where I scraped
all the menus for non-fast-food restaurants in
New York City, and then tried to plot clusters of different
restaurant types.
So I found Chinatown.
And I found where you should go for Thai food.
It was all really to find better cheeseburgers, which--
this has mostly worked.
But the data's out there.
And so if you can think of a question you would ask if you
had some data.
Or the other way around, so find some data and then look
at it and try and see what stands out to you.
Then you can start looking at different tools.
You can even start in Excel.
So it's sort of a dirty secret, but a lot of people do
great work in Excel.
And when you move up from that, you can go either to R
if you're a statistician by training or Python if you're
more a computer scientist.
Load that data up.
And especially if you're in an environment like IPython with
the interactive notebook, you can start graphing it and
tweaking things and seeing what you can find there.
And then, my next recommendation is get a GitHub
account and put whatever you did up there.
And then let people tell you why it's wrong.
Because you will learn a lot, and they will think it's cool.
And you'll get a lot of good feedback and start to meet
people who are also interested in what you're interested in.
KATHRYN HURLEY: And you create a lot of cool visualizations
as well with the data.
So are there any libraries that you like to use to create
these visualizations?
HILARY MASON: Well, at bitly, we are huge fans of D3.js.
In fact, my colleague Mike Dewar wrote a short O'Reilly
book on getting started with D3.
So if you'd like to check that out, we use it on
our Realtime site.
And it's a really good book.
Otherwise, so data visualization in the service
of usefulness, where it really doesn't have to be pretty, I
still mostly use matplotlib in Python, which is a very simple
way to make fairly uninteresting graphs.
KATHRYN HURLEY: Cool.
All right, maybe one more question?
OK.
And then I just want to ask about the prioritizing
research blog post that you wrote recently.
So people that might not be familiar with it, I'll just
cover it real quick.
So you have five different steps for
prioritizing research.
And that's you first state a question.
Then you ask yourself, how do you know when you've won?
And correct me if I'm not getting these right, please.
Then you assume the question was answered correctly.
What are the first things that you're going to build with it?
And then, if everyone in the world uses it--
let's see--
how does that change human behavior?
And the last one, my favorite one, is, what's the most evil
thing that can be done with this?
So first, my question is, what about that last question?
Why did you include that?
HILARY MASON: I included that because it lets you turn the
work upside down and think about the implications of what
you're doing in a really creative way.
And just getting people to brainstorm about how they
could do something terrible with what they're working on
really makes them think about it in a new way.
And we've had a lot of really good ideas come out of that.
I do want to add that we're not evil.
We're generally trying to build very good things, sort
of build things that make the world more like the world we
want to live in.
But that question tends to get people to think creatively
about the research in a way that nothing
else I've found works.
APRIL ANDERSON: That's great.
KATHRYN HURLEY: Cool.
APRIL ANDERSON: I think we're wrapping.
KATHRYN HURLEY: Yes, I think we'll wrap up.
So I just want to say thank you so much, Hilary, for
joining us.
It was--
HILARY MASON: Oh, thank you, both.
This was super fun.
APRIL ANDERSON: Yeah, it was really nice
to meet you, Hilary.
Thank you so much for sharing.
I'm fascinated and very much inspired by all the work that
you've done in your career thus far.
So thank you for coming here.
HILARY MASON: Likewise.
And thank you for hosting.
APRIL ANDERSON: Absolutely.
KATHRYN HURLEY: That's great.
Learning so much great information from you.
And I guess, should I wrap up?
Sorry.
And to our viewers, thank you so much for joining us for
this final episode of the Women Techmaker series.
And all of our programming can be viewed and shared on
Google+ or Google Developers Live.
Tell us what you think about this series by hashtagging
your comments with WTM.
And keep an eye out for more Women Techmakers and Google
Developers Live events.
Thanks.
APRIL ANDERSON: Thanks, everyone.

[MUSIC PLAYING]