So Much Storage, So Little Time

Uploaded by GoogleDevelopers on 26.07.2012


CHRIS RAMSDALE: Hello, and thank you for coming aboard
this tour of application data at Google.
I'm Chris Ramsdale, product manager for Google App Engine.
I'd like to begin with some quick trivia to provide some
context for the capabilities and products I'm going to talk
about today.
Every minute, over 60 hours of video is uploaded to YouTube.
Google's "Caffeine" Search Index was over 100 million
gigabytes in June of 2010 and growing very quickly.
There are over 350 million active Gmail users.
And it takes Google approximately one fourth of a
second to respond to a search query.
So what did Google learn building these products?
At Google, the reliability and performance of the systems we
build are critical to our business.
We realized the traditional data solutions just wouldn't
do what we needed them to do.
The kind of scale we're talking about here requires
that we do things differently.
So we spent more than a decade building data systems that are
extremely reliable and available, have multiple
layers of redundancy, and can handle the loss of any given
data center, and are wicked fast.
We published the details of our early infrastructure,
MapReduce, Big Table, and the Google File System, which
became the model that was adopted by the open source
community that sprung up around big data.
But those were just the table stakes.
Whether deliver high-quality search results, relevant
advertising, or just to learn how to continually improve our
algorithms and systems, at Google, we believe fast is
better than slow.
And our focus on performance is very pervasive through the
systems and products we build.
At the core of our data offerings are services that
help businesses store and access their data in the
Google Cloud using the same technology that we use to run
our business.
Google Cloud Storage for unstructured content like
files and media, the App Engine Datastore that provides
a NoSQL environment for structured data, and Google
Cloud SQL for relational data.
Google's mission is to organize the world's
information and make it universally accessible and
useful, which meant that we got very good at finding out
the meaning of data and doing so very quickly.
As the business and data landscapes have evolved, some
of the problems that we set out to solve that seemed
fairly specialized to our business are now pervasive in
the industry.
Data is a core business asset, becoming ever more important
as a competitive advantage.
There are fewer and fewer low-hanging fruit.
At the same time, data is growing faster than most
organizations' ability to leverage or even capture it.
Going from a few gigabytes to several terabytes to petabytes
of data has fundamentally changed the tools
needed to work it.
And traditional BI is struggling to keep up.
Several commercial and open source initiatives have sprung
up to solve these problems, many of which are based on
technical papers that Google wrote to share its innovations
with the world.
Hadoop is based on Google's MapReduce technical paper,
HDFS on the Google File System, and
HBase on Big Table.
In the years since we published those technical
papers, our technology has continued to evolve.
And now, we're starting to make our latest and greatest
data tools available as cloud-based services.
In the Google Cloud, you can use App Engine to build
scalable web applications with high productivity
and low time to market.
Google BigQuery lets you unlock the insights hidden in
large data sets.
And the Google Prediction API enables you to use Google's
machine learning technology with your data.
Beyond storing and accessing your data, though, we're
really focused on making that data useful through an
integrated platform of data services, premium APIs, and
platform services that let you store, analyze, and leverage
your data in your own applications.
I'll talk a bit more about leveraging your data in the
Google Cloud in a little while.
Let's first look at the core data services.
First, Google Cloud Storage.
This product was built from the ground up to store
unstructured data with high reliability, availability, and
performance, and to allow you to seamlessly scale your data
as your needs grow.
Data is stored with multiple levels of redundancy for high
availability and performance.
One of the many benefits of using Google Cloud Storage is
that you put Google's network to work for you.
When you store your data in the cloud, you quickly realize
that cloud services live and die by the network.
That's been an area of significant investment by
Google, and we think it shows in the quality of service.
At high scale, reliable storage has many
uses in and of itself.
In order of increasing value, you can use it as a tape
replacement strategy for your data backups.
Use it for active archives, storing your data archives in
the cloud for easy access and no maintenance overheads.
You can use it for all kinds of content and media-serving
applications to quickly and easily share data with the
organization, with a small or large group of external
partners or customers, or freely over the internet.
You can use it to store data for your applications or in
conjunction with some of our other services for analysis
and computation.
When most people think about their structured data
strategy, they think about two kinds of products.
NoSQL datastores that offer a very high scale and support
for millions of queries per second, usually in exchange
for less rich data representation--
typically they're denormalized data--
and/or tradeoffs in data consistency.
SQL-based data stores that are well-suited for applications
that need relational data, but cannot deliver the kind of
global scale and performance that NoSQL stores can.
At Google, we believe that each has its own place.
The App Engine high-replication Datastore is
based on Google's Megastore technology and combines the
scalability and performance benefits of a NoSQL data store
with high availability, transactional capabilities, a
flexible query engine, and strong consistency guarantees.
The Datastore is schemeless, which means your data
structure can be flexible and evolve with your needs.
Unlike many NoSQL solutions, it doesn't sacrifice
It offers the ability to perform asset operations
within transactions and consistent reads outside.
It's a very reliable service, too.
Data is written to multiple data centers, making it
capable of withstanding multiple data center failures.
That replication Datastore has had 100% uptime since its
launch a year ago.
But to get all these benefits, you do need to think
You can't use tools that depend on
traditional SQL databases.
You need to denormalize your data.
These tradeoffs make the Datastore most suitable for
data sets that need to scale seamlessly and deliver high
performance and availability.
For those applications that do not need the kind of scale
that the Datastore provides and that benefit from the
ability to use a traditional database, we recently launched
Google Cloud SQL as an experimental service that is
open for public use.
This new service gives you a fully managed MySQL database
in the Google Cloud.
So you can use more traditional tools and
represent your relational data effectively.
The data is replicated for availability.
Once your data is in the cloud, you can make use of
other Google technologies.
BigQuery is a fully hosted cloud data analysis service
that enables you to get critical insights from
multiterabyte data sets at near real-time speeds.
It's built on Google's infrastructure based on
production tools used by Google to build its own big
data business.
You can perform SQL-like queries via UI or API to gain
insights from your data, share and distribute them, build
your application dashboards, et cetera.
It offers some unique benefits.
You can find a needle in a haystack.
Tens of terabytes.
You can get near-instant results in unlimited scale.
No maintenance and no long-term commitments needed.
We think that BigQuery represents a discontinuity in
what you can do with data.
You can reduce time to insight from weeks to minutes,
allowing your business to move faster.
You can get straight to business insights without
building and maintaining a data center, developing
specialized technical capabilities, or licensing
expensive software.
You don't need to build indexes to do
traditional BI planning.
So you can be more flexible.
Because, let's face it, it's practically impossible to know
all the questions you'll need to ask beforehand.
For when you need to build data-centric applications, we
offer Google App Engine, which lets you build internet-scale
applications quickly, easily, and cost-effectively.
You only pay for the resources you use, which means you don't
need to do complex capacity planning or keep large amounts
of rarely used buffer capacity on hand.
The data generated by your applications can be quickly
analyzed using Google Cloud Storage and BigQuery.
App Engine is also a great way to build applications that
provide reporting, dashboarding, and other
capabilities to your business, your partners,
or even your customers.
I'd like to spend a couple of minutes talking about a few
illustrative cases where people have leveraged Google's
technology to power their data.
DNAnexus is a bay area-based company focused on unlocking
the potential of DNA-based medicine and biotechnology
with a collaborative and scalable data technology
platform built on the cloud.
They wanted to give researchers around the world
tools to search for and access sequence read
archive data sets.
They had an ambitious goal--
to completely re-engineer how researchers interacted with
the data, mine results, and download
data sets of interest.
By using Google Cloud Storage to store and serve the data,
they were able to quickly build a web-based interface to
the SRA data sets that had garnered praise from the
research community.
Crystalloids is an Amsterdam-based analytics firm
that helps businesses analyze massive amounts of data to
improve profitability.
They first used Google BigQuery and Cloud Storage to
provide a customized product for one of their customers.
Based on their success with this project, they launched
Crystalloids Innovations, a subsidiary that offers
cloud-based tools to the retail
and hospitality industry.
Daffodil Software provides a suite of web-based
applications, including a business customer relationship
otherwise known as CRM--
program and an enterprise resource planning--
app for schools.
Daffodil needed a cloud-based hosting platform for its
applications that would save money and time
over in-house hosting.
They built their applications using Google App Engine and
Google Cloud SQL.
So in conclusion, you can now put Google's technology to
work for you.
You can store and access your structured and unstructured
data, mine your data for insights, and build new
applications and capabilities quickly and effectively.
In return, you can capture more data, gain business and
competitive advantages by unlocking insights in data
sets of any size.
And you can do so cost-effectively.
Thanks for watching.
I hope this helps you understand how you can use
Google's infrastructure to store your applications' data
and power your business.
For more information, be sure to check out