Google I/O 2009 - Text-To-Speech & Eyes-Free Project:Android


Uploaded by GoogleDevelopers on 02.06.2009

Transcript:
Raman: Hello, everyone.
I am T.V. Raman and this
is my officemate Charles Chen.
And we hack together, code together,
so we also talk together.
Hopefully not at the same time
and hopefully the phone won't talk
while the two of us are talking too.
This talk is about building eyes-free interfaces on Android.
We are calling it, "looking beyond the screen."
That doesn't mean you hold your phone like this
and look at the world over it.
It's basically building user interfaces
that go over and beyond the screen.
This talk is structured into two halves
where the first half I'd like to walk you through
some of the thought processes we went through
in terms of coming up with some
of the UI paradigms we are trying out.
And in the second half of the talk,
Charles will sort of walk you through
some of the libraries we are open-sourcing
that actually allow you to bring
some of this UI code into your own applications.
So let's start working through
the media sections of this talk.
The first section of this talk,
I'd like to focus on what it means
to sort of be someone who does UI research
and UI implementations in the mobile world,
because I believe this is distinctly different
from the last 35 years of human-computer interaction
that has gone before.
And you'll sort of see why I think that
in the next few slides.
But we are all geeks here. We are developers here.
Let's sort of celebrate our geekdom for a minute
by sort of looking at that phone that we all have
and understanding why that's an engineer's dream.
Obviously, that phone is more powerful
than the desktop workstation I had 20 years ago,
15 years ago.
Obviously, this thing can do a lot more
because it's connected to the network.
Those are enablers.
Now the third enabler that you have, obviously,
is building cool user interfaces
that help surface all this coolness to the end users
so that they can actually use it.
So in that sense,
what does it mean to build innovative user interfaces
for the mobile world?
Let's ask the question,
"what is a user interface?" first.
I didn't include that particular slide in this talk,
but many years ago, I used to carry around a slide
that is a very simple slide that sort of explained
what I thought of as a user interface.
If you pull up very specific implementations
of a user interface, such as a GUI,
such as the next Windows desktop,
a Windows desktop, a Mac UI,
there is something fundamentally simple about the UI
It's about human-computer interaction
as Kai will tell you.
But then that again has become a buzzword.
So there's the human and there's the computer.
What is this interaction we are talking about?
The human-computer interaction that makes up
the user interface is really about two things.
It's you, the human,
communicating your intent to the machine,
the machine computing on that intent,
and coming up with some answer and grabbing your attention.
So I used to draw a little diamond
with a human on one vertex,
the computer on another vertex
and sort of draw arrows saying,
"intention, attention, input, and output."
So from you goes input to the machine
to convey your intentions.
Comes back as output and attention.
And so if you take that view of user interaction,
UIs that you build have to fit
into the user's mode of working,
not the other way around.
So this is about bending technology to our will.
And an open platform,
especially in the mobile world,
is extremely conducive to doing that,
because now the only thing that blocks, limits
how much you can hack is your imagination.
So you have all of the peripherals you want.
You have way more peripherals than what we've had in the past.
And that's one of the most important things
about doing mobile user interfaces.
Let's talk-- go to the next slide
about eyes-free interaction.
So clearly there are many, many cool things you can do
as you innovate along the UI access.
The specific thing that Charles and I had been focusing on
is using these phones without looking at them.
No, this is not just about the blind user.
This is more to say, "how do you use these devices
"if you are not capable of, willing to,
or not in a position to look at the screen?"
And that completely changes your perspective
on how you build such things.
Those situations are obviously many.
You might be driving.
You might be walking along the corridor
talking to your friends.
There are many situations where you do not want
to hold your phone like this and squint at it.
A couple of weeks ago, I was walking down
one of the corridors at Google with my guide dog
and my dog suddenly swerved, and I asked Charles
what happened and he said, "That was a lucky save.
Somebody almost walked into you."
I said, "That was an interesting response," right?
It wasn't, "I walked into someone."
It was somebody almost walked into me.
Well, the reason somebody almost walked into me
was that person was holding their cell phone
like this doing email.
This is not safe, right?
I mean, cell phones aren't just a hazard when you're driving.
I mean, if you start using cell phones like this,
you need to get tickets when you're walking too.
So how do you avoid that? [laughter]
So I think eyes-free interaction
is really about fitting into the user's life.
Technology has to sort of come to that point
where we are able to do that.
So let's sort of go to the next slide
and ask what is so cool about these new phones
that let you do all these things.
One reason I believe that sort of the GUI
hasn't really evolved beyond what Xerox PARC built
in the late '70s is that the peripherals available
to the GUI designer have not evolved since then.
Doug Engelbart invented the mouse,
PARC invented the bitmap screen,
and the rest is history.
And sadly, history is all you've had
in that the GUI has not changed.
Yes, people have worked on how to layout menus better.
People have learned how to cram more things
into that limited real estate.
But the fundamental peripherals with which you interact
with the graphical user interface
have not changed.
Look at that phone and count out the peripherals
versus what your laptop has.
Your laptop has a display, a monitor, a mouse,
a microphone and a speaker, and perhaps a webcam.
Those latter three, the microphone,
the speaker, and the webcam,
are actually quite awkward to use on a laptop.
Especially the laptop.
Look at your camera. Look at your phone.
It's a camera. It's a keyboard.
It's all of these things.
But now think of it not as all those devices,
but think of it as your computer in your pocket
with the ability to sense the world
and see how many more ways it can sense the world
as opposed to your laptop.
Simple example.
There is a, you know,
mobile scavenger hunt being announced today.
There is a piece of paper stuck on the wall
with a QR code.
You can just walk past it, point your phone at it,
get a URL, and start browsing.
Isn't that cool?
That's because your phone has many eyes
and many ears by which it can sense the world.
And the more it can sense the world,
the better position your phone has
with respect to getting a handle on your intent,
on your intention as a user.
And the cool thing about building
mobile user interfaces--
and this may be a little contradictory
to someone who does user interfaces
as their be-all, end-all goal--
is you actually minimize user interaction.
So in the desktop world it was always cool
to make the user interact and then build
newer, more powerful widgets and gadgets
with which the user interacted.
With your mobile devices, I believe we need
to go one step further, which is,
"user interaction is cool, but wouldn't it
"be even nicer if your device could do
"what you wanted and show you what you wanted
before you went and interacted with it?"
Because you are not taking your phone out
of your pocket in order to interact with the phone.
You're taking the phone out of your pocket
to interact with the world.
You're taking the phone out of your pocket
to find out where to go have dinner.
You're taking the phone out of your pocket
to go talk to your buddy, not to interact with the phone.
So this is really what is cool
about building innovative user interaction
on mobile phones.
You are really focusing on,
"how shall I minimize clicks?
"How shall I minimize screen switches?
"How shall I minimize context switches?
And how shall I have the user get his work done?"
So let's go to the next section of the slide.
So as a case study, I'd like to take you through
two things that we've built.
You can actually see Youtube demos of this
on the eyes-free Android channel.
What I'd like to work--
We will do demos during the session,
but what these slides are really about
are sort of walking you through the thought process
that went towards coming up with these solutions.
Because I, personally, when I see someone
show me a solution they've come up with,
I often find the process of "how did you get that"
more interesting than the eventual artifact.
And hopefully that applies in this case too.
So the problem we were solving--
The specific problem that we solved last September
was I wanted to start using the Android phone
as my primary phone because I believed
that that was the only way I would build
the right user interaction environment
that matched my needs and the needs
of eyes-free interaction.
So the Android phone is a nice, smart computer.
It's all of these things.
But the reason you first put it in your pocket
was because it's a phone.
And many of those platforms I had had before it
did a lot of things, but they were so complicated
to use that I stopped making phone calls with them.
And to me, that was a problem.
So the first thing we worked on was,
"How can I use the phone eyes?"
So this is an interesting question.
You could have sort of pandered on the interesting problem here
and said, "Well, it has a keyboard.
Pull out the keyboard and dial it."
But that's inconvenient.
And as I said, if I had done that
I would have joined the club of people walking like this
and I promise you I would definitely walk into things
if I did that.
So the question was, "How do you actually use
that touch display and do things with it?"
Common wisdom said, if you cannot see,
you cannot use a touchscreen.
And so the way you sort of debunk myths like that
for yourself is to go ask the question, "Why?"
You know, two-year-olds ask the question "Why?"
And succinctly, I think we as developers,
geeks, and scientists need to ask
that question the whole time.
So why is it that most people believe
that you must be able to see in order to use a touchscreen
to use an on-screen keyboard?
It's very simple.
How do you use a touchscreen?
There are two atomic acts involved
in activating an on-screen control by touch.
You need to look at the screen
in order to locate the control.
Then you need to go push it
and get some feedback for having pushed it.
Clearly, if you're not looking at the screen,
you need feedback at both levels.
And the showstopper that most people stumble on is,
"Well, you can't see it at all in order to see
"where the button is if you're not looking at the screen.
Therefore, you cannot use it."
So let's ask the "why" question again
like a two-year-old, right?
Why do you need to see the button
to know where it is?
Well, because the button is positioned in a fixed place.
You need to know where, dummy.
Well, why is it positioned in a fixed place?
And the answer immediately emerges.
Well, it needn't be in a fixed place.
Somebody chose to put it in a fixed place.
It was in a fixed place, right?
So the other way to think about it is
rather than you looking at the screen
in order to find the keyboard,
what if the keyboard came and found your finger
when you put your finger down?
So the opposite of absolute positioning
is relative positioning, and that's what we've built.
So this is what we call a stroke dialer.
We came up with a very simple idea
saying wherever you touch is the center of the phone keypad.
Okay?
Put your finger there, it's five.
There, it's five. That's five.
Ah, but now you know where five is.
Well, if you know where five is,
you know where two is, you know where one is,
you know where nine is.
So this very simple insight of asking the why--
irritating "why" question many times
gave us a very simple answer.
So the next time your two-year-old asks why,
don't yell at him.
Answer the question and he'll probably find out something.
So I'll let Charles do a quick demo
of the stroke dialer.
What is interesting about the stroke dialer
with respect to its feedback is that it actually
gives you auditory feedback in terms of sound cues,
it speaks the number, and it also vibrates.
And all of those are synchronized
and that's a big, big win with respect
to doing the user interface correctly
because it gives you a whole sense of realism.
So this is an interesting thing about the real world, right?
If I take a coffee cup and put it on a table,
it goes "click" and it gives you force feedback on your hand.
The table resists.
If, for instance, the table didn't resist,
you would drop the cup.
If you didn't hear the "clink" when you put the cup down,
you would feel something was wrong.
And the same applies to building
a touchscreen interface where you're doing
auditory input and touch.
male computer voice: Nine. One. One.
Raman: So obviously you heard the auditory output.
You even heard a little bit of vibration
because it's on a wooden table.
So this is sort of an interesting exercise
with respect to doing a stroke dialer.
And later on in the talk, Charles will actually show you
how you can actually use this as an overlay
in your own applications.
So the thought process is nice.
Showing you the thing working is nice.
But being able to plug it in to your own applications
is really the icing on the cake,
or, you know, the icing on the Swiss chocolate
or whatever you want to call it.
Let's rip to the next section of the slide.
At this point, you're probably saying,
"Yeah, yeah, big deal.
But nobody dials phone numbers anymore."
It is true.
You don't dial phone numbers anymore.
You use your contact manager.
So how do you do a contact manager
for an eyes-free model?
So this time you're going to do letter input
instead of numbers and obviously
you have to sort of maintain contacts
to all kinds of things.
Some of the problems we actually danced around here
because of the way Android works with a cloud
and it's actually quite cool.
Some hard problems are best solved
by getting rid of the problem.
That's what you will see with respect
to editing contacts.
But let's talk about modifying
the stroke input idea in order to input letters.
Now there are many ways of using a touchscreen
and using your finger to input letters, right?
There's Graffiti.
There's many, many versions of Graffiti.
All of these systems you can actually think of as two steps.
I'm going back to the intention.
You have an intent you're communicating
to the computer idea.
You want to come up with an encoding
that is easy for the computer to process.
This is why Graffiti was invented
as opposed to human handwriting.
So you want a thing that is easy,
unambiguous to recognize by everybody.
And you want to come up with a mapping
that is easy for the human to remember, right?
So I could come up with a set of strokes
that are very easy to recognize for the computer,
but required extra learning.
And, you know, early days of the Apple Newton
showed us that people just give up very quickly.
They don't learn a new system.
So we don't claim that what Charles and I
are showing here is the world's best encoding system for--
from strokes to letters.
But it is a system that's worked for us.
And it actually works particularly well
on the Android screen.
So I'd like to show you that.
So this time, instead of thinking of the phone keypad,
think compass directions, magnetic compass directions.
So you have north and south, east and west,
northeast, southwest, southeast, northwest.
So I intentionally said those in pairs.
There are eight of them.
Let's think of them as four pairs of two each.
Now, you know, you sort of think about it,
there are about 26 letters.
You know, eight, four.
Do the math, you know?
Let's say we put eight letters on each
of these--for each of these pairs.
And let's sort of think circles, okay?
So let's think circles and since this is a Google IO talk
at a Google auditorium,
let's think of it as four colored circles.
Like sssign the four Google colors to them.
Let's put letters "A" through "H" on the first circle.
Let's call it blue.
Let's put the next eight letters on the second circle.
The next eight on the third
and the next eight on the fourth.
Now we normally have eight compass directions.
There are four sets, right?
Let's use the north and south pair to enter a circle.
Let's use the east and west pair to enter a different circle.
Let's use the diagonal screen to the outer sides.
And the way the circle dialer that we've done works--
The circle keyboard, whatever you want to call it--
The way this guy works is that you stroke
in any of the eight compass directions
and that gets you into one of the four circles.
Then you trace along the circle till you get to the letter.
Now since we can enter each circle in two places--
so the north or the south or the east or the west--
think of it as entering either at the top of the circle
or the bottom of the circle.
This will all get obvious as Charles
starts showing it to you.
So, Charles, let's show them how we do an "A."
So remember the first circle has "A" through "H."
So "A" is just an upward diagonal stroke.
male computer voice: Phone book.
female computer voice: "A."
Raman: Now notice that it said it in a woman's voice.
It finalizes it when he picks up his finger.
male computer voice: "A."
Aarthi Raman.
Raman: It's also in his contact manager,
so it's actually jumping to contacts
that start with an "A," but we'll talk about that soon.
So "A" was very easy to do.
Now "A" through "H" are on the same circle,
so to do a "B," he would do an "A"...
female computer voice: "A."
Raman: trace along...
female computer voice: "B."
Raman: and pick up his finger.
male computer voice: No contacts found.
Raman: See, he doesn't have any contacts found at that letter.
Now--So "A, B, C, D, E, F, G, H."
Now once you get used to this input system,
to do an "H," you obviously don't go around
in a circle like a dork.
You actually go the other way around.
So you go from "A" to "H" in one stroke.
female computer voice: "A, H."
male computer voice: No contact.
Raman: Now supposing you wanted to do an "E,"
you could start at "A" and go all the way around
from "B" to "C" to "D" to "E," but we don't do that,
because remember we have two points
at which we can enter the circle.
So we enter directly at "E."
So the downward diagonal.
female computer voice: "E."
male computer voice: "E." No contacts found.
Raman: So now you see the "A" through "H" circle working.
Similarly, going up or down gets you
into the second circle that has "I" through "P."
Just scroll down.
female computer voice: "I... J, K, L, M."
male computer voice: "M."
Mom cell.
[laughter]
Raman: So now this is what we are using
to filter our contact list.
So we did the simple thing of you do a letter
that jumps you to the first contact with that letter.
You do one more letter.
We sort of take that as additional input
and continue filtering.
And so the thing you realize with this
is that you get to your contacts
in about one or two gestures.
Each letter of the alphabet is no more
than three steps away, right?
So if you wanted to do a "C," you go "A, B, C."
So that's about the longest.
Because if you want to do a "D,"
you go to "E" and go back to "D."
So you get very good at this very quickly.
And the color coding sort of helps you to learn it,
but once you learned it, you do it
without looking at all.
And the contact filter thing is very, very nice,
because you can really filter through your contacts
very rapidly with this.
Flip to the next slide.
So finally, the part about the best way
of solving some hard problems is to get rid of them.
How do you edit contacts?
It's best not to edit contacts on the phone.
Nobody does that, right?
What do you do particularly if you want
your friend's phone number?
You say, "Could you call me so I get it in the call log
and I can add you there?"
That's right.
That's why we all do it.
Well, because Android talks to the cloud,
there's an even better way of doing it,
which is just to use your Gmail contact list.
So I actually edit all my contacts online
and it shows up on the phone.
That's great.
And if I meet people at events like this,
then I do the age-old click of,
"Oh, please give me a call and I'll add you to the call log."
So that's how we do contacts.
[indistinct]
Chen: Oh, sure.
And actually it's my turn now.
Raman: Exactly. That's why I wanted a word.
So from here on, we'll sort of go through
all the technologies that we use to implement all this.
I'll let Charles talk about the TTS library,
the gesture library, and all kinds of good stuff.
Chen: So, hello, everyone.
So I just have a quick question before I start.
How many of you saw the keynote yesterday
and the TTS demo?
Awesome.
Now how many of you here, you know,
are interested in writing TTS apps
or have written a TTS app or, you know,
currently working on one?
Wow, this is fantastic.
So I just want to say thank you.
You know, developer interest is very important
and the interest that has been shown to the TTS library so far
has been one of the factors in getting TTS
into the base platform.
So give yourselves a round of applause.
[applause]
And so with that being said, you know, we don't--
It's going to come in Donut, but for right now,
if you want to get a head start and start working
on talking applications, the APIs will be very similar.
We're actually involved in working
with the Android team importing it.
And what you're gonna get is you're gonna get
a head start and you can start developing your apps
and start playing with this by using
our currently released text-to-speech library.
And that's what we'll be talking about
for the rest of this session.
Raman: So on these demos now,
what you hear is eSpeak.
But as developers, go ahead and write your code.
When Donut comes out, you will just
get a better voice and all your apps
will just sound much better.
Chen: Okay. So let's get started, here.
So the TTS library.
So this is something that enables developers
to create Text-To-Speech-enabled apps that talk to users.
The way it works is that as a developer,
you go into Eclipse and you compile
against the library stub jar
that we've included on our developer's site.
And Text-To-Speech is an Android service.
And we have a TTS class
that acts as a wrapper to take care
of all the messy I-Binder stuff.
So you just create it like a regular Java object,
you just do a new TTS, and you can start using it
by doing tts.speak and some text.
And part of the beauty of having this as a library
that gets reused is that the TTS library can be updated
without you necessarily having to update your app,
and it also enables multiple apps
to share the same TTS service.
So the user doesn't have to install it multiple times.
And once it goes in a platform, then everyone just gets it.
So let's look at some of the--
you know, some of the main features that we have.
There's a very simple, easy-to-use speak function.
So it takes the string of text that you want spoken,
a queuing mode, whether you want to speak immediately
and flush any text that's currently waiting to be spoken,
or if you want to queue it up and then some param--
some additional parameters that you could have.
There's a stop call,
because otherwise your application
may talk and never stop,
which would be kind of annoying.
And you can also check the current status--
whether or not it's currently speaking
so if you're trying to synchronize
some on-screen display you can synchronize it
with where you are in the speech.
And we also have methods that let you synthesize to a file
so you can get an audio file that, you know,
you can set as your ringtone or something.
And, you know, you can specify a language
so you can do translation apps,
talking translators, all that good stuff.
And the current behavior of the Text-To-Speech engine
is that it will automatically
prompt the users to install the TTS service.
And if it's absent you can set it to just fail silently
so it won't bother the user,
or you can just have it automatically redirect the user
to Market, where they can download
the current TTS service.
For more information, and, you know,
to check out our source code,
this is all open source, free, so please take it
and we look forward to seeing what you can build with this.
And you'll see the url there. eyes-free.googlecode.com.
And that's where you can get all the source,
all the jars, everything.
And so with that, let's...
I'm going to dive into an example of this
later in the talk.
I'm going to talk about the gesture library right now,
and give a brief overview before I start on the code--
on the coding tutorial.
So the gesture library--
that's what Raman had discussed earlier.
You know, he had shown the stroke dialer
and the contacts input method.
And so both of those use this gesture library
that recognizes very simple strokes.
Now my Mac is not behaving.
There we go. Okay.
So this is an overlay that watches for touch events,
so it will tell you when a user has touched down the screen,
when they're moving around on the screen,
and when they lift their finger up.
So you know which position they finished at.
And it will tell you the identified gestures.
So you can actually see the gesture
as the user is making it, or you can just get
the final result of, well,
the user stroked the diagonal up.
And this exposes the same UI
as the stroke dialer to the user.
And we implemented this as a custom View,
and this is a custom transparent View
that you can layer on top of your applications.
And so this means that, you know,
you can do your UI however you want to,
and you can just overlay this on top,
and it won't have any effect in how your view renders.
To use this what you have to do
is you have to implement a GestureListener.
So you create a TouchGestureControlOverlay,
and you start with a FrameLayout as your base View.
So what you would normally have as your View,
you set that as the first child of your FrameLayout.
And I have an example of this later on in the talk.
But if you do that, then what you'll get
is you'll be able to layer this TouchGestureControlOverlay
as a child on top of that,
and then you can enable or disable the gestures
by adding or removing that View from your FrameLayout.
And so with that,
let's jump into the heart of this presentation
and look at a tutorial with some real code.
And so I'm going to demonstrate a simple music file browser.
How many of you here have gone to Anddev.org?
It's a website with developers and code examples. Cool.
So they actually posted a very nice FileBrowser tutorial
when Android first came out.
You know, how you can explore the SDCard system
and look at what's there.
So we took that, and we show how in just a very few lines of code
you can add both Text-To-Speech and gesture controls to it.
So this is a very simple music file browser.
It lets you browse directories on your SDCard
and play MP3 files that it finds there.
And you play the music by just clicking on that file.
And the directory path is the first entry.
If you click on the directory path,
then it just cycles through
all the MP3s you have in that subdirectory.
And as you scroll through the list,
you get some tactile feedback,
so you actually feel like you're moving through a list.
And so now I'm going to explain how to add spoken feedback.
So you first start off
by creating a TTS object in method onCreate.
And this will cause it to run the Text-To-Speech
as part of its initialization.
And then you can add some application-specific logic there
so that your application comes up talking.
So let me just switch over to Eclipse and show you the code.
Okay, so this is the music file browser--
the base music file browser before we added anything to it.
And let me actually demonstrate that first
before I go any further in this talk.
So it's the first music file browser here.
And you see the contents of my SDCard.
I can scroll through it.
And, you know, nothing's talking,
but, you know, it works.
I can go to MP3s.
And, you know, I can play a couple of popular songs.
Like this.
[Rick Astley's Never Gonna Give You Up]
Okay, enough of that.
Raman: That's popular? Chen: [stops music]
Chen: I-I think it is popular on the Internet.
At least on YouTube.
[laughter]
So--Okay, so let's see--
Now let's see if we can make this a little bit better.
A little bit more interesting.
And, you know, let's see if we can make this talk.
So...
Okay, I need to switch back to my Mac, here.
And--Okay, so what you just saw here
is the basic music file browser.
And this is just the tutorial that was on Anddev.org
that we modified a little bit to handle the playing.
So where we made the modification is down here.
And we added a toggle playing function.
So--So we did that, and the other thing is we added
some tactile feedback to when you scroll through the list.
So if you look at here we have this--
I'll track by that.
Okay, how's the font size for everyone?
Can you guys see it okay?
No? Okay, let me zoom that a little bit.
Okay, so now we're better on text size.
Okay.
So here is where we added our vibration control to it.
Aside from that, it's the same basic file browser.
So let's see, how can we add speech to this?
So here is the version where we've added speech.
And I've helpfully set break points
so I can find all the places where I added some code.
There's not that much code,
so the break points are pretty useful.
So we added a line to-- here to create a TTS object.
Up here.
And then we have a TTS initialization method.
This will get called whenever the TTS is created.
And so you see when we start it we helpfully announce
that, hey, music file browser started.
And then the OnCreate method of your app.
We have now this TTS--new TTS.
And we set the initialization listener to "ttsInitListener."
So this--this is what causes
our Init function up here to get called.
And then--then it's pretty simple, right?
Now, you know, earlier I'd mentioned
we had this vibration feedback
for when you scroll past certain items in a list.
Well, so we can actually latch
the Text-To-Speech functionality on top of that.
So instead of just vibrating, play will be vibrated
and we spoke the directory or the music file name.
Well, that's what we ended up doing.
So if you look down here...
Ta-dah.
So there we have "tts.speak(filename.substring,"
and we just speak it there.
It flushes right away.
And you can--
As a result, you can hear what you're scrolling to.
So--so let me demo that version.
computerized male voice: Music file browser started.
Chen: So there.
So now you'll always know when it actually started up.
And now I can scroll through.
computer voice: Up one level.
Raman: On your own phone it won't talk so loudly.
Chen: Yeah, it only talks this loudly
if you put it next to a microphone.
computer voice: D-sim, download.
D-sim, dow-- espeak data, MP3s.
Chen: Okay.
Raman: The voice you hear-- that is eSpeak,
and as I said, this will-- this--you know, over time you'll
have multiple voices available to you that you can choose.
computer voice: Up one level.
Portal, Still Alive.mp3.
[Portal's Still Alive]
[laughter]
Chen: Okay, so cool. Huge success, right?
So--Okay, so let's kind of see what that actually meant.
So--so you saw all the code that I added.
Those couple of lines-- That was--that was it.
Really, to get TTS working.
And see, if you--
if you actually go back and count--
So all--so these slides will be available,
and our code is already available
on our Eyes-Free Google Code project.
So I challenge you all to go back and download this code--
check it out, and actually do a diff
and count the number of lines.
Because the diff-- You're gonna find
that it's really just 13 lines of difference.
So adding--And this includes, you know, import statements,
just, you know, generic, boilerplate things, right?
Like adding a closing brace; that's a line.
So it add--We only needed to add 13 lines of code,
and we added full Text-To-Speech functionality
to this music file browser.
So--so this is-- this is nice.
You know, now you have a talking music file browser,
but what if you wanted to use this
while you were jogging or something, right?
It's not going to be very convenient
for you to jog and try to use the track ball.
That's going to be a little bit difficult.
It's probably much easier
if you could just do some gestures on the screen;
scroll through what you wanted.
So--
Raman: You don't want to jog with the track ball
because the track ball will keep jogging
and nothing will ever be stable.
You really don't-- Need something else.
Chen: Exactly.
So--so let's add the gesture input method
to the music file browser.
So as I mentioned before,
the gesture overlay is a transparent overlay,
and you can just over--
you can just put it on top of your existing content views.
And this won't interfere with your visual appearance at all
because, hey, the whole thing's transparent, right?
So you add this to handle user input,
and we're going to add two very simple controls.
So we're going to say when you tap down on the screen
that's play or pause,
and if you want to gesture towards the right,
that means you want to just cycle
to the next track in that directory.
So--so this is the type of code that you have to write.
It's fairly simple.
You start off with a GestureListener.
man: [indistinct]
Chen: Ooh, whoa.
Thank you. Sorry about that.
Okay, so yeah.
Anyway, you didn't miss much.
It was basically just-- just code here.
Thank you.
Can we get--give this guy a piece of chocolate too?
[laughter]
Raman: I got it.
We could be like dog kennels, handing out gifts for bugs.
Chen: Absolutely, absolutely. Thank you.
Okay, so--So yes. GestureListener.
So we're going to do--
We're actually doing something very simple because
we're just going to do tap to play or pause.
And we're going to do a stroke to the right
to cycle through the next thing.
In both cases, we actually don't care so much
what the user is doing as they're making the changes,
so this "onGestureChange," we can leave it empty.
We don't really have to implement it.
And then--All we care about is "onGestureFinish."
So if they finish their gesture in the center,
then that's a tap.
So we do the "play/pause music" code here.
And if we see that the gesture is a gesture to the right,
then, hey, it's a stroke to the right.
That means we should go ahead
and play the next track in that directory.
And we don't really care about when they start the gesture.
'Cause, you know,
we're not doing anything that's timing-related here.
Now, to put this into your app,
What you have to do is you should
take your main content View, put that inside a FrameLayout,
and then make your FrameLayout the main View.
So this way you can switch your overlay on and off
by just adding or removing it.
And then to toggle it, then, it's pretty simple
because all you have to do to--
if the overlay is active and you want to disable it,
you just do a simple "removeView," and it's gone.
And then you can touch the screen
and interact with it as if you didn't have gestures.
And then if you want to turn it back on,
you just do "addView," and suddenly now
you get this transparent overlay on top,
touching it becomes gesture inputs,
and you won't trigger anything in the View underneath it.
So let's look at what this-- what this code looks like, here.
So here is our own gesture code.
This is what I had just shown in the slide earlier.
And...
As you see, we didn't need to worry
about gesture changes or gesture starts,
we just look at how the user ended their gesture;
determine what they did.
And the main change that we had to do was over here.
So over here, notice how now I've created a myFrame object.
That's a new FrameLayout.
I'm adding the "myList," which is the content view,
to that FrameLayout,
and then I'm making this gesture overlay--
Which I'm currently not putting it in yet,
because I didn't want to start off in that mode.
And then I set my content view for my app
as the FrameLayout.
So--And so then the FrameLayout manages everything else.
And finally I had to add an "onKeyDown" event
because, you know,
sometimes you do want to switch between the two modes.
You might not always want to be in gesture mode.
For example, it might be nice to be able to just click down
on a directory to drill down into it.
In that case you don't really
want that to get treated as a tap,
so to switch back and forth, I'm using the "Menu" key.
And using a Menu key you swap it on, swap it off.
So--so let's look at this final version, here.
And this time I'll remember to switch.
Yay.
All right.
Go back to the home screen.
computer voice: Music file browser started.
Chen: Okay, so now we have the music file browser.
So it started off without gestures,
so I can actually just click on the screen.
I click this.
So that worked.
Now let's say I want to just start playing using gestures.
And I don't want to care where I'm tapping.
computer voice: Gestures activated.
Chen: Gestures activated.
Okay, so let's stroke.
[Portal Still Alive]
Raman: It worked.
[Rick Astley Never Gonna Give You Up]
[applause]
Chen: Thank you, thank you.
So now, if you look at what-- what that actually amounted to,
again I challenge you to go back and do a diff and verify this,
but when we did this, we added about 40 lines of code.
And that was it.
So adding 40 lines of code, you get this gesture thing,
and it just works for you.
And so with that,
I will hand it back to Raman for the conclusion.
Raman: Okay, thanks, Charles.
So as Charles said earlier, all these libraries
are open source as part of the Eyes-Free project.
Feel free to use it.
Even better, feel free to contribute patches.
Contributions, innovations.
In conclusion, I believe user interaction research
in the mobile space--
especially for devices that can see, hear, and sense the world--
is a very, very exciting area of research that's opening up.
The mistake we shouldn't make is to try and take
the 30-year old GUI from the PC
and push it into the mobile device.
I think that would be a disservice to all of us.
I think these devices can be--
do much better at sensing our intent
based on what they sense of the world,
what they sense of our actions,
what they sense of our history of actions.
And they also have many, many ways of grabbing our attention.
From the type of work I do,
speech output being my primary--
sort of the biggest thing that I am interested in and work on.
Voice on Android, as you heard yesterday during the keynote,
voice output is going to get a lot, lot better
thanks to our friends from Zurich.
So in conclusion I think there's a lot more stuff to build here.
All of you even have the phones to build them on,
so come hack with us, and let's have a great time.
And at the end of the day,
let's build technologies and user interfaces
that bend those devices that you have in your hand to your will,
as opposed to you having changed--
having to change how you work and play to match those devices.
Let's flip to my final slide,
which is my usual Q and A slide that's my dog flying a 767.
So if that's possible, anything's possible.
[laughter]
Chen: So yeah.
So now we're gonna take questions.
So please don't be shy. We don't bite.
Raman: We only bite chocolate.
Chen: Yes.
man: I have two questions.
Chen: Sure.
man: Is it easy for a handset vendor
to change the voice?
I think the voice is pretty...
Chen: So--so that's going to be a Donut question
because that's going to ship with the Donut platform.
And what I would like to point out there is for right now,
the Text-To-Speech library--
it's just a library that you can get off-market.
So you can get any TTS you want off there.
And we do want to make it a pluggable TTS architecture
So you could use our default TTS voice,
or you can use some other TTS voice that you prefer.
But our default will sound pretty darn good.
man: Okay, and the second question is
is there any global settings menu
that application vendor can check the settings value
and ultimately enable these kind of features?
Chen: So I think you're a little bit confused, sir,
about enabling this feature.
Because the way it works is this is an API.
So this is going to be just like, you know,
using the accelerometer on the device or something like that.
The user doesn't have to explicitly
turn on the accelerometer, it's just there.
You just code to it, you write a function call and it works.
Yes.
man: How would you inform the user
of this functionality,
especially the gesture functionality as being there,
in a consistent way?
Raman: That's-- that's a good question.
You're asking the question of discoverability,
and that question is, in general,
one of the hardest in the mobile platform.
So all of you holding those Dream G1 V2 devices,
do you know that holding down the "Home" key--
a long press on the "Home" key brings up a list of six--
the last six used applications?
I discovered that today,
nine months after having a phone.
[laughter]
So you ask a good question. Do I have an answer to that? No.
I believe, though, that over time we need to--
we will come up with gestures that are sufficiently intuitive
that people use it.
There will be some learning involved,
there will be some word of mouth involved.
You know.
And if it is really useful,
people will over time discover it and learn it.
That's the best answer I have.
I don't believe the PC desktop GUI answer to this is--
which was, you know,
"everything shall be made discoverable
by cluttering the screen"
is going to work, unfortunately, in the mobile space.
So if you're--
That's a good research question to answer.
We don't have an answer to that.
Chen: Yes, Clayden.
Clayden: Where do things stand with rooting synthesized speech
out over the-- a phone call?
Is that now possible?
Raman: Routing synthesized speech out over a phone call.
I do not believe--
I do not believe I know the answer to that yet.
I could check on that for you.
Clayden: Okay, thanks.
Chen: Yes.
woman: Hi. Are you also working on Speech-To-Text?
Like, is it possible to input speech?
Chen: So currently in Cupcake
there is a reco API that you could use.
Raman: There's a reco API.
You can--Over time you'll also be able to use an IME
that is speech-input.
That--that work is being done by many people at Google
including the people who do Voice Search.
So that is being worked on.
It's not typically us.
woman: So that's not part of Donut.
Raman: Sorry?
woman: Would that be part of the--
Raman: The Voice Search part is part of Cupcake.
woman: Okay. Chen: It's already out there.
woman: Okay, cool. Thank you.
Chen: Anything else?
Okay, well, thank you all for coming out here.
It was great having you.
Raman: And please send us your feedback.
[applause]
Have a good day.