J.L. Needham Testifies on Sitemap Protocol Before Senate

Uploaded by Google on 11.12.2007


CHAIRMAN LIEBERMAN: John Lewis Needham is the Manager of
Public Sector Content Partnerships for Google, Inc.
I know I shouldn't do this, but I feel I have a
responsibility to my family and friends to ask whether you
were at the Google wedding this weekend.
CHAIRMAN LIEBERMAN: All right, good.
Thank you.
Mr. Needham, thanks for being here.
We're all great admirers and users of Google.
Look forward to your testimony now.
Chairman Lieberman, Ranking Member Collins, and members of
the committee, it's a great pleasure to be with you this
morning to discuss Google's role in making government more
accessible to citizens.
My name is John Lewis Needham.
I am the Manager of Public Sector Content
Partnerships at Google.
In this capacity, I lead the company's efforts to build
public-private partnerships with government agencies in
the US and internationally.
Google's mission is to organize the world's
information and make it universally
accessible and useful.
Making government information more accessible doesn't just
help citizens find the content they need, it also enables the
government to provide services more efficiently to taxpayers,
and makes our democracy more transparent,
accountable, and relevant.
This committee has a long tradition of promoting these
values which Google shares.
For example, Google Maps and Google Earth, which rely in
part on government-provided geospatial data to be used by
the government to better serve its citizens.
To offer two illustrations, the US Geological Survey
recently used Google Maps to show real-time data on
earthquakes all around the world.
And the National Parks Service is using Google Earth to
inform citizens about recreation opportunities
across the country.
This morning, I'll focus my testimony on how people
throughout our nation are using the power of web search
engines to find and interact with their government.
First, I'll share some trends in how Americans connect with
government online.
Second, I'll identify the challenges that citizens face
in trying to find government information
services on the internet.
Third, I'll explain a technology known as a Sitemap
Protocol which enables government agencies to make
their content more accessible to search engine users.
And finally, I'll highlight a few of our successful
partnerships with government, and outline steps that
agencies can take to make their websites more
Let me start by describing how citizens today are connecting
with government information online.
According to recent research by the Pew Internet and
American Life Project, at least 77% of US internet users
go online to find some form of government information.
We also see that internet users are choosing search
engines like Google as their preferred way to connect with
the government.
To clarify, search engines work by sending a software
program to crawl the pages on public websites, adding this
information to our index.
As a result, when a Google user types a query into the
search box, we very quickly access that index to return
relevant search results.
Here's an example.
The National Institutes of Health's website, nih.gov,
offers a rich collection of public health and medical
information from the 27 institutes and centers that
comprise NIH.
Let's say that you're trying to find out the status of a
study on avian flu.
You may not be aware of one relevant NIH service, which is
located at clinicaltrials.gov, or how to get directly to the
page that lists all current avian flu related studies.
So you start your search on google.com.
This is a likely scenario, given that very few internet
users go directly to the nih.gov website.
In fact, according to our analysis of internet traffic
to NIH websites during July 2006, only 4% of visitors
arrived at nih.gov web pages through typing the address
nih.gov directly into their browser.
This example is consistent with research by Google and
others on the flow of internet traffic, which indicates that
as many as four out of five internet users in the US reach
government websites by using Google and
similar search engines.
But if the information on a particular government website
is not part of a search engine's index, citizens are
bound to miss out on that information.
Search engines have made connecting to online
government resources easier, but challenges remain.
Specifically, we have found that many government agencies
structure their websites in ways that prevent search
engines from including their information in search results,
often inadvertently.
The most common barrier is a search form for a database
that asks users to input several fields of information
to find what they're looking for.
Our crawlers cannot effectively follow the links
to reach behind the search form.
Let me offer an illustration.
A citizen may be interested in locating the Environmental
Protection Agency's enforcement actions regarding
a particular company, so that user conducts a search on
google.com with the company's name and the keywords EPA

The results of this search for EPA enforcement and a company
name would include relevant information, obviously, but
would not include information from the EPA's Enforcement and
Compliance History Online database, which offers a list
of enforcement reports for specific companies.
This is because the information in this database--
again, this EPA Enforcement and Compliance
History online database--
cannot be included in a search engine's index.
Now, epa.gov is certainly not the only government website
that search engines have difficulty indexing.
In fact, we estimate that the information in all or part of
2,000 federal government websites is not included in
search engine results.
Now, with all that said, the good news is that there's a
simple, technical solution to address this problem.
In 2005, Google introduced a standard called the Sitemap
Protocol that helps ensure the accessibility of information
on a website.
It allows a website owner to produce a list, or map, of all
web pages on the site and systematically communicate
this information to search engines.
When a federal agency places a sitemap on its website, search
engines can readily identify the location of all pages on
the site, including database records lying
behind a search form.
Using this sitemap, search engines are more likely to
index and make visible to citizens the information on
the agency's website.
The Sitemap Protocol has been widely embraced by the search
engine industry, including Google, Microsoft, Yahoo,
ask.com and others, What this means is that by implementing
Sitemaps, a government agency can ensure that it is serving
the American people no matter which search
engine they are using.
Implementing the Sitemap Protocol is free and easy.
It does not require site redesign, the purchase of new
technology, or more than a few hours or days of
a webmaster's time.
Implementation involves creating a list of web pages
in an acceptable format and adding a file that contains
this list to a website.
Google provides a variety of tools to accomplish this task
and we present them to public sector web managers at
It's important to note today that I'm only talking about
information that is already public.
Content that is maintained on internal websites including
personally identifiable and classified information should
not be made accessible through any search engine and is not
the type of information we're working to index.
We believe it will be technically simple for federal
government agencies to produce a Sitemap for the information
on their websites, and that doing so would bring
significant benefits.
And we know that implementing the protocol is easy to do
because we've worked with many government partners at all
levels to take this step.
For example, the Department of Energy's Office of Scientific
and Technology Information operates a large database that
makes research and development findings
available to the public.
OSTI developed a Sitemap for its energy citations and
information bridge services in just 12 hours, opening 2.3
million bibliographic, records, and full text
documents to crawling by search engines.
After its implementation of Sitemaps, OSTI saw a dramatic
increase in traffic to its services as more citizens
discovered these resources.
Other federal agencies that have recently embraced
Sitemaps include the Government Accountability
Office, which used the standard to make a database of
30 years of GAO reports visible to
search engine users.
The Library of Congress, which has made its American Memory
collections easier to find.
The National Archives and Records Administration, which
is now in the process of Sitemapping the federal
government's largest public database.
And govbenefits.gov, referenced by Administrator
Evans, which now makes its profiles of over 1,000 benefit
programs just one search away.
And them OPM, referenced by you, Mr. Chairman, which is
also taking the step of making its job postings more
accessible to search engine users.
At the state and local level, we've launched partnerships
with the states of Arizona, California, Florida, Michigan,
Utah, Virginia, and with the District of Columbia.
These partnerships are making it easier for residents to
uncover job postings, reports on school performance, and the
professional license records of contractors.
The private sector long ago recognized the increasing
importance of web search.
But unfortunately, the federal government lags behind.
Last month, this committee took an important step in
elevating the profile of these efforts by voting in favor of
the E-Government Reauthorization Act of 2007.
The act directs OMB to create guidance and best practices
for federal agencies to make their websites more accessible
to external search engine crawlers.
It also requires federal agencies to ensure their
compliance, and directs OMB to report annually to Congress on
agencies' progress.
We commend Chairman Lieberman, Ranking Member Collins, and
the committee members for their
leadership on this issue.
And we look forward to working with you to enact this
important legislation.
Mr. Chairman, while my remarks today have focused on websites
and search engines, it's clear that in the years ahead,
government agencies will need to make information in other
formats more accessible.
In the Web 2.0 world, where more and more citizens are
using blogs, Wikis, online mapping, video-sharing
services, and social networking sites to
communicate and collaborate with each other, there will be
even more demand for government to bring
information to citizens through these new platforms.
We at Google are excited by the promise of this trend, and
we're committed to continuing to better connect government
to citizens.
Thank you, and I look forward to