Gesture and Tactile Interfaces: Applications in Mobile Computing and American Sign Language

Uploaded by GoogleTechTalks on 15.10.2010

It's great pleasure, a really great pleasure to introduce Thad Starner for Georgia Tech
visiting us here in the wonderful Bay Area. [INDISTINCT] while there's sunlight today,
it's really bright and nice and warm. Thad has got his PhD from MIT, and then moved to
Georgia Tech and is possibly the person that defined the use of variable computers in a
productive life. There's a number of people who started running around with beta displays,
but his work is much deeper, and much more interesting than most of the other people.
He's been working on various interfaces since, input devices, variable display devices, and
so on. And it's a real pleasure to having him talk about his most recent research. So,
with that--he's also a friend of Sergey Brin, I take it. But Sergey's...
>> STARNER: We knew each other from--we ran into each other at conferences a while back.
>> But he's out of town today. Unfortunately, he can't make it here. Well, looking forward
to your presentation. >> STARNER: Thank you. Okay, let me first
apologize because I literally got off the plane recently this morning and drove down
here. The weather in Atlanta, we're having high storms, we had a--it looked like a tornado
coming in but it didn't show up. But it meant that all the planes got messed up, and I spent
the night in the airport. So if I seem a little out of it, that's why. We're also going to
be switching back and forth between different devices here. So bear with me and the video
folks as I go between--between systems. Okay. So, the last time I was here, I talked about
how to improve mini QWERTY keyboards, how to do a lot of stuff in mobile computers,
a lot of mobile [INDISTINCT] stuff. Today, if you saw that talk, you won't be bored because
today is almost completely different. Let me start out with something that shouldn't
work. This is something called the Mobile Music Touch. Let me tell you what it does.
With this device, you can actually learn piano melodies without paying attention to it. In
other words, you'll be wearing this glove, right here, and be learning how to play, I
don't know, Star Spangled Banner. Now, how this works is that you have a mobile phone.
In this case, it's the [INDISTINCT] you see on the screen there. And this, you upload
your songs to, and the MIDI player in the phone plays the songs in--sorry, that's actually--coming
online. It plays the songs over and over again in your Bluetooth headset or your earphones,
whatever you have on. But for each note, it actually taps the finger responsible for that
note using this glove. Now, this is a Bluetooth glove, the--there's vibrators in each finger,
which you can see there, and they're on the knuckles, they're tuned to 160 hertz, which
is about the frequency your Pacinian corpuscles are most sensitive to. The fingers--the whole
finger vibrates and so you'll get an idea of which finger goes with which note. Now,
I'm going to pass this around as I talk about this so people can actually play with it.
So, there is a toggle switch in the back, toggle it from off to on and you'll feel it
startup. I think, it's right now, doing the sequence of "Dashing through the Snow." So,
feel free to--feel free to play with this and then, I'll tell you why this particular
glove is so interesting in just a little bit. >> [INDISTINCT]
>> STARNER: No, you don't want me to sing. This is not karaoke night. The--inside the
box there, it was made--many different version of this glove. But it's pretty simple, it's
just a Bluetooth receiver, you can see in the center there, attached to a glove. [INDISTINCT]
a lot about sewing wires into gloves. Sebastian, you got it working?
>> SEBASTIAN: Yes. >> STARNER: Yes, cool.
>> SEBASTIAN: It sounds great. >> STARNER: It sounds great. Well, you really
have to have the music with it right. But, you might say--you might wonder why this works,
right? And in particular, we'll talk about the hands moving left and right on the piano
a little bit. Let me describe to you the simple study we did first, which is we did two newly
composed 10-note passages. Now we did newly composed because the first time we did, we
used Amazing Grace, and "The Dashing through the Snow," part of Jingle Bells. And some
of our subjects were from Muslim countries and had heard neither of them. And some of
our subjects, of course, they were very, very familiar. None of our subjects knew how to
play the piano or had a musical background. But we want to have something that we [INDISTINCT]
as clean as a [INDISTINCT] as we could get, so we actually showed--we gathered 16 subjects,
none of them have musical experience. We showed them the passage once on a keyboard where
the keys light up. And then, they had to try to repeat it. And that was the base case.
Then for the next 30 minutes, they did a reading comprehension exam. That reading comprehension
exam was what you'd find on a normal [INDISTINCT]. As a matter of fact, I think I have it here.
I don't know if you can see that. But it's something where they have to read the paragraph
and answer questions. And we'll come back to that in a second. Then after 30 minutes
of doing this, they have the glove. So, they're playing the passage in their earpiece, in
their headphones, as well as tapping their fingers. That's the experimental condition.
The control condition was just playing the audio in their headphones over and over again.
After 30 minutes, each subject tries to play the song again. And this is within the subject
study, a 2x2 design, and this was presented at KAI this year, so if people are interested
in the details, you can look at it there. Again, here is the distractor task. We actually
tested people on the distractor task. Their scores did not improve, or did--or did not
change in the experimental condition versus the control condition. And this is the total
number of errors after 30 minutes. Now, the green bars or kind of--kind of fluorescent
green on the--on this screen here, show the number of errors made by people in the experimental
condition. The red shows the number of errors they made when they just had the audio playing.
And as you can see here, they don't learn anything with just the audio playing. But
most of them--half of them played the sequence correctly with no mistakes after the 30 minutes
of passive practice. Now, this is really kind of bizarre. I mean, how many people would
have thought that that would've worked? Right? I certainly didn't. So, this is the type of
thing that as a cognitive science major, I go, "How is this working?" So, we've done
the study again and again and again in two different continents with three different
researchers. And it seems to hold true. And it seems to work no matter if the distractor
task is a reading comprehension task., if you're reading your email, if you're watching
a movie, if you're doing a scavenger hunt, if you're playing a memory game, or even,
at KAI, I gave a talk, and had the system teach me Beethoven's Ode to Joy as I was giving
the talk. Which, let me tell you, talk about performance pressure, I just said about how
this thing works and I had to walk up to the keyboard and try it. I didn't even know where
to put my hand down at first. But indeed, I can now play Beethoven's Ode to Joy; I used
the first two passages--the first two phrases pretty flawless--flawlessly.
>> What was the talk? >> STARNER: I don't know. I didn't watch the
video of myself giving the talk. I will tell you that having the audio at--the thing's
volume was too loud. So, it was really distracting. But you can imagine if you're doing things
like email or something where it's more quiet, it might not be so distracting. One of the
things we want to do right now is, we're really curious to see if the audio is necessary,
it maybe just the tapping of the fingers is necessary to give you this sort of muscle
memory. Now, some of you who actually play the piano, might say, "Hey, how about the
left and right movements of the hands?" Well, actually, this technique is better for things
like clarinet or saxophones or flute, something where you're not moving the hands around a
lot. But what we found is that when we did, you know, real piano pieces, and this is still
just one-handed, where you're moving around, you have somebody work on a song until they
can play through it once. And then, you would turn on the glove and let them spend the next--the
rest of the day feeling it on their hand, and they actually continued to learn instead
of forget, so you don't--so it's sort of passive haptic rehearsal at that point. And one of
our reviewer in one of our papers actually said they really liked it because for musicians
with repetitive stress injuries they could actually practice without practicing, which
is kind of cool. The other thing we're kind of interested is, in this--will this work
for other manual learning tasks? Things like typing or sign language or prosthetics or
complicated manual controls. We don't know yet. This whole idea of passive haptic learning,
as we call it, is new. And we're very excited about it, but we don't know how far it's going
to go. One of the things we do have data on, though, is passive haptic rehabilitation.
We worked with the Shepherd Spinal Cord Center in Atlanta; it's one of the nation's premier
centers for dealing with traumatic spinal cord injury, and in particular working with
the Murderball Team. Who--you can see a picture of them here. And what's--we have a pilot
study where we showed that wearing this glove and actually having this vibration seems to
improve these folks' ability to grasp objects and manipulate them, able to--ability to feel
objects on their fingers. And most importantly, the ability to do things for themselves like
buttoning their own shirt. And we only did--ran this with two subjects so far, we're gearing
up for a full-scale study. But there's stuff in the literature that seems to indicate that
having this passive tapping on your fingers actually activates the motor region as well
as the [INDISTINCT] sensory region, and this passive practice may actually help neurons
reconfigure and re-hookup. And I can give people references for that if they're interested.
But we're trying to get this actually hooked up this summer and fall with a more large-scale
study and show--and see if there really is an effect here. That's one of the reasons
why we're interested in whether or not the audio is necessary. If the audio is not necessary,
it could be very, very useful. Okay, any questions on that before I move on? I'll try to make
this interactive. Okay. I have a lot of stuff here, way too much stuff, so, you know, we're
not going to get through it all, ask away. So, as some of you know, I've been wearing
computers for 17 years now. I have a heads-up display on, I use a keyboard called Twiddler,
and in fact right now I'm looking at my notes for this talk in my eyepiece. And we've learned
a lot about mobile devices since then. One of the biggest things is access time--is a
killer. Access time is the amount of time it takes you to physically get the phone out
of your pocket, get it on, get to the right place in the interface--I'm actually doing
this right now. So I'm going to my calendar. And you can see that that took me, you know--and
I'm practiced at this--that took me about 15 seconds. On average, it's about 20 seconds
to get to an application on your phone. It... >> [INDISTINCT]
>> STARNER: Yes, [INDISTINCT], yes. Yes, Windows--Windows is going to be a serious problem on a phone.
What we found out is that anytime you have--and it takes more than two seconds to get access
to your interface. Your use of it tends to go off exponentially. So, if you can make
it a quick interaction, people would do it all the time. If it takes more than two seconds,
the usage of it goes down linearly or exponentially depending on the type of interface. Now, the
other thing we've discovered is that people don't really multitask, they multiplex for
most things. I just talked to you about a real, as far as I know, a real multitasking
application, but most of the time when people are driving and texting, they're either driving
or texting, they are just switching back and forth fast or not as the case may be. By the
way, we're not doing some [INDISTINCT] on driving--texting while driving. It's really,
really bad. There is one obstacle on a course. The course is 28 feet away from it. It's a
telephone pole. And this--the driver still scare us to death driving this car. We're
actually using the autonomous or using the Georgia Tech's autonomous car for this.
>> [INDISTINCT] >> STARNER: Little you know it's because of
my subjects. No. >> [INDISTINCT]
>> STARNER: Yes. But one of the things we've discovered is that when people are actually
using an interface like, say, walking down the street, if you're walking down the street
and, you know, out late nights in Palo Alto, you're going to get some ice cream or something,
you'll spend on average about four seconds on your phone interface before looking up
to see where you're going and looking back down at your display. And that, kind of, leads
to the 4-second rule. If you can actually make your interface happen in four seconds,
it will be much more useful to people. In other words, if you can get a little bit of
useful work done before you have to look back up again, if you can checkpoint, it's actually
much more useful than if you can't. And that's how we're--and that's why we're making this
distinction on microinteractions. Microinteractions are fast to access and allow fine checkpointing.
Now, if you're on something like a bus or a subway, you might actually go spend more
time on your interface. But I'm talking about--the four seconds seems to be a nice rule for making
something that's universally applicable. Now, I might say--I might say--let me give you
an example. Checking your time on your watch, if you actually wear a wristwatch, is a relatively
fast interaction. It takes less than two seconds to do the entire interaction. It's very valuable.
It's fast to access and it gives you a feedback. Now, you notice that a lot people now are
putting their time on--on their cellphone, right? Oops, there we go. Not quite so fast
to access but the cellphone is a useful enough device, if you were willing to take that hit.
Wristwatches actually came into existence during, or into popular use, in World War
I, when you had to time your trench warfare, right? When people had to go over the trench
line all at the same time. And you can't be [INDISTINCT] that fiddling with your pocket
watch when you're about ready to go, you know, run against the Germans. So, that was one
thing that made all the GIs, back then, wear wristwatches. But also aviators; you can't
be flying your plane and be fooling around with your pocket watch. You need something
you can look at quickly and get back to what you're doing. And back then, World War I,
you actually had to fly by your clock. Now--so pocket watches went the way of the dodo. Now,
they've come back, right? They're cellphones. But I think what we're going to see is a lot
more use of very fast access interfaces. Now, one of the things we're that--for that is
something called textile interfaces. Now, we're trying to create interfaces that can
be woven into your clothing. And I really mean woven or knit--embroidered, in this case,
into your clothing. And we're using embroidery because it's a raised thread. You can actually
feel it. So, if I was going to control my iPod with something that was on my sleeve,
I can feel the controls here, I can grope for them. We'll call it grope--good gropability.
And actually interact with it without looking. So, there's no visual distraction. Now, there's
been a lot of work that was done by this--by some friends of mine, Maggie Orth and Remi
Post back in the mid-'90s, but what we've decided to do is start taking a look at this
from a more complete interaction as far as trying to actually reproduce the GUI toolkit
from scratch on--using these devices now. We actually have demonstrations of different
circuits the people can use in the fashion industry. Now, this is a book, and I have
a live version--this machine is hooked up to do--I'll show you this afterwards, to show
these different types of interfaces. This is what's called a knife-edge pleat. It's
got three lines in it. One on each side of the pleat and one in the base, depending on
which way the person strokes the pleat. It moves the slider one way or the other. So
you can imagine that, if you have this embroidered on your pants leg, for example, you could
use this--control a webpage in your heads-up display or, you know, just slide up or down,
or, you can imagine, controlling volume of your MP3 player. Here is a menu widget rendered
in embroidery. So, you can see we have three menus--three categories like, you know, file,
edit, select, like you might have on a Mac, and then you have five options. And so, I'm
actually controlling the graphics here on the right-hand side based on which line I
touch. Now again, imagine it's not on the book but on a piece of clothing, like on your
armband or on your arm, so that you can actually, you know, select different menus on your iPhone
or your--on your GPhone, or wherever else you want to think about. This [INDISTINCT]
called the rocker switch. This is a multi-touch system, not just like the last one though.
So--do you remember the old types of rocker switches where you can rotate--you could pivot
to that point and one that goes--turns a volume down, one turns the volume up? Well, this
has three different configure points. You have three different sliders you can access
and then you just--once you select one, you just pivot about it, hit the two bigger circles
and that adjusts the level on each slider. Now, for those of you who are electrical engineering
types in the crowd, I can give you a quick lesson of how the circuitry is done. It's--this
is not the normal capacitive sensor you might know--think it is, because the problem with
fabric is that as it crinkles and wrinkles, it gets out calibration real quickly. This
is actually recalibrating itself, you know, every time it senses, which is really kind
of cool. Here's a zipper. This is--this has been done before. We're doing it, I think,
a slightly different way. It can sense its position...
>> [INDISTINCT] >> STARNER: Excuse me?
>> [INDISTINCT] >> STARNER: Nothing. It's all conductor embroi--it's
silver thread, so it washes just fine. The [INDISTINCT] lines are conductive thread.
The only thing you got to do is take out the circuitry where it combines in. Yes?
>> The zipper [INDISTINCT] >> STARNER: It will--it will sense falsely
in that case. Yes. Yes. [INDISTINCT] to do it, where you do--basically a [INDISTINCT]
bridge and you can do a little bit better than what we're doing here. This is a proximity
sensor. This is--one of the first things we did, the Brothers embroidery machine we have,
one of its default settings is to embroider Hello Kitty, so we have the Hello Kitty proximity
sensor here. And you can see depending on how close you get, it has different sensitivity
light ranges. It's--it's the brightness of the rectangle indicates how close you are
to the system. This is a really complicated one. With this,
by stroking the--by hitting the top pad in one of the three but--middle buttons, you
select one of the three sliders on the top. By hitting the bottom pad in one of the middle
three slider--three buttons--you hit the three sliders on the bottom. And then you can increase
and decrease it by doing gestures on top of it. Now, unfortunately, this one is not tuned
very well with the video, but you get the idea. Okay. So, if we can switch back to the--having
both screens be the presentation, I'd appreciate it. In that way, I can cheat by keeping on
looking at my notes. Okay. So, this conductive embroidery really got us thinking about all
sorts of things we could do with conductive embroidery. Let me do this. So it's not quite
so distracting. We not only have to do input, we need to do output as well. Remember, what
we're trying to do is make something here where you can interact with an object. You
can get access to the interface in two seconds or less and you do the whole interaction in
four seconds or less. So, we got some output--sorry, some input. How about some output? Well, these
are conductive threads and they have a high impedance, relatively speaking, compared to
a normal wire, but at high voltage, it doesn't matter. The human body senses a voltage current
tuned to the exact right level as vibration. So, what we started looking at is can we make
a wristwatch watchband that shocks you in different patterns. It feels like vibration
to you, but we're trying to figure out how many different patterns we can indicate. So,
you can imagine that you have an SMS or a call coming in, you can have a different not
ringtones but shock-tones, vibrations, you know, good sensations, I don't know, coming
into this wristband. And I have a copy of that up here somewhere, I can show--show you
all. And so, we--this is [INDISTINCT] by Seungyon Lee who just got her--just defended her PhD.
It turned out that this is much higher resolution than the human wrist can feel. Believe it
or not, if you take two points and put them close together in your wrist, you really can't
determine its two points. Most of the time, you just think it's one. And believe it or
not, you have to get out to like a centimeter before you start distinguishing that there
are two points. On your fingertips it's like two millimeters. But on your wrist, it's a
centimeter, sometimes more. So really ridiculous. >> [INDISTINCT]
>> STARNER: Excuse me? >> [INDISTINCT]
>> STARNER: Well, if you--so, if you start using time delay, you can do a completely
different pattern. We're looking at spatial stuff here but you can do all sorts of stuff
at a time. And, as a matter of fact that's what the next slide is going to be about.
You're predicting me. But what's also very interesting is that, while you can actually
sense--well, you can actually tune the system to do decent shock levels on the fingertip,
on your wrists; your wrist is often dry or wet. And so, the amount of current you need
is very different from minute to minute. So, my poor grad student ended up having a little
tattoo. No. It was a very fine threshold between pain and the vibration sensation we wanted.
So, we actually had to go away from this to a vibration pattern, however, now we are going
back to it. I said that we could sense capacitance and resistance using these threads. So, here's
an idea, let's [INDISTINCT] the water content of your skin and then dial up or down the
current depending on how much you need to get the right vibration feel. And so we have
a circuit in our lab right now that does that, it's very crude but it's getting there, and
so we're going to revisit this very soon. Other people have done this sort of thing
on the forehead or on the tongue, turns out the tongue is a very good place for it because
it's always wet, it takes very little current to get a good sensation there, and your tongue
has got a high density of receptors. Your wrist is relatively insensate, but it's a
really good place if you're thinking about wristwatches. So we wanted to keep on going
down this wristwatch form factor. And we decided to make a display that was just three vibrators. Now, these vibrators
are made so that two of them hit your wrist's bones at the top--just where--just where your
arm bone hits your wrist, there's two bones there. We're generally doing this on the bottom
side of your wrist and there's one in the middle but back, going up your arm a little
bit. And we can actually do 24 different patterns here. The patterns differed depending on which
vibrator starts the pattern, one, two, or three. And you can see that in the red, green
and blue columns. It also is the--we have different intensities of the patterns, low
and high. We have what's called "pulse intensities" so that if the vibrator is going "Zzzt zzzt
zzzt," versus "Zzzzt zzzzt," and--see here, we didn't have frequency--oh, I think we have
different frequencies as well. So, 24 patterns total and we're trying to see how well can
people actually sense these 24 different patterns on the wrist. And the answer is not bad, except
for intensity. It turns out intensity is a very, very poor thing for actually getting
it--transferring information from--information of your wristwatch. So again our idea here
is to transfer messages alerts like, who's sending you an SMS, which sort of phone calls
coming in to this sort of wristwatch. And we'll talk more about wristwatch--if people
are interested, I can talk more about wristwatches, wristwatch interfaces afterwards. What's interesting
here is that intensity is a really horrible feature to use, we got rid of it. Direction
was pretty good, temporal pattern was pretty good, starting point was very good. So, if
you're going to actually make a wristwatch with vibrators in it, you know, here's a good
starting point. The next thing is, can we actually use these vibrators while you're
doing other things? Now remember what I said about people don't actually multitask, they
are more of a multiplex. So what we did is we compared, using one of these wearable tactile
displays, to a normal phone. So normally if somebody is SMSing you, you reach in your
pocket and you pull out a device and you look at the--and see who's calling or what the
SMS is and put it back in your pocket. So we made a system where people had to pull
out their phone and hit one of these three buttons on this keypad to complete the trial.
With the vibrator system they had to do one of three different patterns. Now, we're trying
to do something that sort of mimics the high visual intensity, the high visual distraction
of driving and for that we have this. So, they have five seconds to determine whether
or not the number 51 is in this image. Now I know all of you being nerds, you're not
going to listen to me in the next 20 seconds as you're trying to figure out whether 51
is in there. It's not. All right, give it up. But the point is, you can't help but paying
attention to it, right. So, we're doing this on Georgia Tech students, it's a very good
distractor task. So it worked very well to kind of emulate high visual distraction while
getting these different alerts. The BuzzWear system is actually doing these three different
patterns. They are the most distinctive patterns we had. And we're looking at information transfer.
Now notice, ignore the outlier on the right-hand side for a second, we have different difficulty
primary task, one where--is where there's only 10 numbers on the screen, you've got
to find 51--51 is in those 10 numbers. One is the one there is 30 numbers, one when there
is 50 numbers. And so that's the easy, moderate, and difficult. Notice that the bits we can
transfer per second or per minute in this case is actually higher with the tactile display
than it is with the phone. Interestingly also the tactile display does not interfere with
your primary task which is great. But let's look at the left-hand side here. The phone
is having a much higher bit transfer rate when you're just paying attention to the phone
than when you're just paying attention to the wearable tactile display. Why is that?
Well, it's something called the "Yerkes-Dodson Law." people get bored when you only give
them--give them one task at a time, and there mind wanders and because the tact--we think,
because the tactile display is so easy, they are off doing something else in their own
minds; they don't pay attention to the study anymore. And so that's why we think we have
this discrepancy on the left-hand side. With the phone it's still a physically active enough
thing that people are forced to pay attention to it. But what we're most interested in is
this multiplexing scenario, like when you're driving and you're getting an SMS at the same
time. Now, so we've talked a little about how we can actually do input using textile
interfaces, how we can actually do output using vibration and electro-stimulation, but
can we do something more complex? One of the things that we specialize in is gesture recognition,
and you can imagine that if you eventually have an mp3 player, that's basically, you
know, it looks like a hearing aid, you know, it can fit in your ear. The only problem is
you don't have any buttons, you know. It's standard--to be walking down the street doing
this, right, is kind of socially inappropriate. So can we actually make a device where you
can control an mp3 player in your ear, when it's not big enough for buttons? Well, again,
we're looking at the wristwatch, in particular we're looking at accelerometers in the wristwatch
and we're trying to figure out, can we make gestures that are distinct in real life to
control things. Now, making gestures for controlling applications is difficult. For example, supposed
I make a gesture like this for delete email, well then I'm in the middle of a conversation
and I make the same gesture and I accidentally delete all my email. That's not going to fly.
As a matter of fact, you guys are all familiar with this particular problem. That's not purely
gesture recognition but when you have your phone in your pocket, you know, how many of
you have somebody call you back and say, "Hey, your phone called me, what did you want? I
couldn't hear anything." Right? I occasionally get voice messages from other people where
it's just the background noise, their butt called me. No drunk dialing, no sitting and
dialing at the same time, you know. So, there's other places where you get these sorts of
problems and people go through a lot of pain to avoid this. For example in speech recognition
they also have a push-to-talk interface, even if they don't do that, they do something like
computer open file, something to tell the computer to listen in. On the Nintendo Wii,
when you're playing bowling, which is a relatively complex gesture, right, it's doing relatively
fine sensing. It is--excuse me--it is requesting that you actually push a button and hold it
down to do the gesture and then release, that's how it's detecting when the action's happening.
On the iPhone, right, there's--you push something down and you put--do a slide across the interface
to activate the phone. And most phones including--I have a Backflip on me here, it has the same
sort of thing. It has a push button and then--oops--and then you have to hit another button for it
to actually work. And I'm going to need to pull this out anyways in just a second so
I might as well get it out. So what we're trying to do is make a system where you don't
need these push-to-activate. It'll be much cooler if I actually had a system where I
just made the gesture and did the action. If I actually had to have a button on my wristwatch
to activate the wristwatch and then do the gesture, it kind of misses the point. Why
would I do that anyways? I just should have a button, right? Anything that requires too
much attention to push a button is probably the wrong thing. Now, correspondingly you
can imagine I have accelerometer in my mp3 player here and my gesture for change track
is that, but... >> [INDISTINCT]
>> STARNER: That. But then you start looking at, like, Night at the Roxbury as you change
tracks. It's a distinctive gesture by the way, you can do it I just don't necessarily
recommend it. But what we want to do now is actually make a device such as a toolkit,
so that people can research these gestures easily. And what normally happens is people
do some survey. Like suppose you're trying to make a gesture system for the iPod. People
would say, "So what gesture do you need for play?" Could someone give me a gesture for
play? What gesture do you want? If you have a wristwatch on, what gesture do you want
for play? This? Okay. What else? This? This? Okay, anything else? Okay, notice I didn't
get any similar ones yet, everybody has their own gesture. So usually people go off and
do a lot of surveys and try to figure out what's--yeah, there we go--figure out what
sort of gesture people want and then they try to make a gesture recognition system for
it and then they have that system in an actual device and they find out it doesn't work at
all. Right? Because it's false triggering all over the place. So that's where MAGIC
comes in, the Multiple Action Gesture Interface Creation tool. So we're using an accelerometer
on the wrist again, just to start it out with. For those of you who do machine learning and
pattern recognition in the crowd, you can think of this as simple Dynamic Time Warping
just because it's the easiest to explain. For those of you who aren't machine learning
or pattern recognition people, basically, if you have one gesture that's sort of the
one that you want to recognize and you have templates of other gestures that are the one
that indicates, you know, the play function. You compare the red to the green by drawing
lines to the closest thing, closest points on each. And then the difference between that--now
slanting those lines is the error. Now, for those of you who are pattern rec people who
are actually doing this iSAX, it allows us to search very large databases in split seconds,
and so we can actually then make a user interface that just flies. Like I said, the design process
in the past is basically, people try to create a gesture system, then they try to test them
in the real world, they find most of those gestures conflict with real world gesture
that--and they go back to the drawing board. What MAGIC allows you to do is do them both
at the same time. Test your gestures against each other and against the real world. Now
how does that work? What we do is we collect something called the Everyday Gesture Library.
Then we put the sensor you want to use for your iPod on your wrist and give it to somebody
for--to wear for a month. And we try to get, you know, representative actions and representative
people, so we might get an academic, a librarian, a construction worker, you know, a pet-sitter,
you know, just trying to expand the space of people who might use this device and we
gather lots of data from their everyday life. We also, if they'll put up with it, get video
from this cap, this fashionable cap, with a fisheye lens on it. And I notice that fisheye
lens is extreme enough that you really have to get within kissing distance to somebody
to actually recognize who they are in the video image, so it's actually privacy preserving.
And so when--we sort of have a whole huge library of people's everyday gestures and
video of what they were doing when that motion occurred. So then if you have a candidate
gesture you want to try, say, you know, this or this or this or whatever everybody was
telling me. I guess you'd try that against everybody's months of data and see which ones
work and which ones don't. And this is the interface for it. So I am--[INDISTINCT] this
is a cursor here. Yes, here we go. So let's first look at this. This, for the pattern
recognition people, is--this is--each of the classes, we have four different gestures we're
looking at here. Of each of the four gestures, we're looking at the inter--intraclass of
variants versus the closeness of all other classes and their variants as related to that
gesture. So this is both intra and interclass variants. Over here--hey, don't do that. Over
here [INDISTINCT]. That is our month of data and you probably can't see it back there but
there's a little yellow--pink or yellow lines for each gesture as it happened in the month
long data. And so then they can click on one of those, as we see here and it shows you
that particular example of one that happened in a person's everyday--in people's everyday
lives and what they were doing at that time, used in the video. It also gives you some
idea about these different examples of gestures. We're doing a K-nearest neighbor's approach
here and some other details that if you are a panoramic person you can tune. Now we had
a lot of fun with this. We actually had people try to make 8 control gestures for the new
Upod Touchless by Parrot Computer. And people who had the EGL would generally have about
two false positive per gesture per hour. Those people without the EGL had 50 false positives.
Now this--it didn't matter if they claimed to know pattern recognition of not. They all
sucked. They were all very bad at this task. So the EGL really--having this database really
had a big impact on the system. Now the other thing that was kind of cool about this is
that our subjects really did discover ways to improve their performance by doing particular
techniques to get better gesture recognition. And I will switch to a file to show you these.
So this is somebody doing an iconic gesture. In other words, they'll repeat each of these
two times so that you can see it. The first one was iconic and stop, the second one was
really interesting is impacts. In other words when you hit your hand against the other hand,
that looks very distinct in the accelerometer space compared to your everyday actions. This
guy is prefixing every gesture he has with another gesture. So his is the way--he's basically
saying, you know, "Listen to me computer." And then he does this. Let's, see what's this
one? This one was just repeating the same gesture twice. So you get some idea of the
types of gestures you need to get uniqueness. Now the problem is a lot of these things are
not socially appropriate. Right? The--you know, the guy, who's doing "Computer, listen
to me. Okay, I really meant it now," this and this. That--if you saw me doing that walking
down the street, you'd probably think I'm an idiot. Either that or some mage who's doing
incantations. But if you saw me doing something like, you know, this, where I'm just flicking
my fingers. I can do that on my side. I can do it straight up. That is something that
you could just, you know, it's a subtle gesture, you might not even notice me doing it. And
it's very distinct in the database, so we're actually discovering the gestures you can
make that are very subtle for controlling your mobile electronics. Now, the last video
here is just for fun. This is somebody's everyday gesture library. I think this was going on
a hike somewhere. After a while, you forget you have a camera on, so, you know, I'm not
going to try to show the embarrassing EGLs, but you get the idea. And so, again, this
is a video you'd get if you found a conflict. Now, for those of you with Android phones,
how many of you got a phone on you? Can you accept--Android phones, can you accept un--if
you had the Backflip you can't have unsecured apps, but I'm going to show you an app right
now. What we have is an application you can run on your Android phone. It uses accelerometers
in the phone. You can actually--we have a database of somebody walking around with a
phone in their daily life. And now, you can actually make different gestures. You can
add them to the database, it's there. You can see how unique they are compared to the
everyday gesture library of an Android phone. And so you can start thinking about actually
having different gestures for your Android phone to tie to different activities. As a
matter of fact, you can even download the source code for a recognizer that'll recognize
the gestures you trained up. So if people who have the application, you can come up
afterwards and I'll show that to you. Okay. One of the kind of interesting things about
all this is that the people who were in our usual study, kind of feared the EGL, they
thought it was very hard. That was understandable and they don't really care about the video
that was there. They just cared about they had a conflict, but they've actually found
the system very useful in doing the task. Okay. Now I'm going to switch to something
a little bit different here. This is our prime example of doing gesture recognition technology.
This is CopyCat. Let me give you some background on CopyCat. Ninety-five percent of deaf children
are born to hearing parents. Many of those parents, when their child is born, of course,
do not know sign language. And since sign language--American Sign Language is difficult
to learn as Japanese. Many of them will never learn it sufficiently to really communicate
with their children. Well that might seem odd but when you are working two jobs. You
got three children. One of them is deaf and--there's a lot of people who, in the literature, who
say that, you know, if you teach somebody two languages, they won't learn--if you teach
them sign language, they won't learn English. It turns out exactly the opposite way. You
should teach them sign language first; they have a much better chance of learning English.
But what we discovered in the literature, in the research, is that these children actually,
unless they learned some language in the age between zero and three, they would not form
their short-term memory normally. Let me make that clear. So each of us can actually remember
about seven things in our head. If I give you a telephone number like 244-5156, you
will be able to repeat that phone number back to me. The children I worked with often have
a short-term memory of two items. And that happens because they do not learn a language.
When you learn a language, that's when your brain is forced to form their short-term memory.
And so children need access to some language, any language, in order to actually--to form
the short-term memory. So the question is how can we make a--how can we use this gesture
recognition technology we have to actually encourage the formation of short-term memory
and acquisition of language? So what we've been creating is this system here. This is
called CopyCat. So what happens is the hero of the game, Iris the cat here. You can see
her in the bottom left-hand side here right where E is. Iris is a white cat with blue
eyes, because white cats with blue eyes are often deaf. She's the hero. She is trying
to find all the gems that have been stolen and they've been stolen by snakes and spiders
and alligators and all sorts of other monsters. And so the children have to--when they come
upon a scene like this, they have to say that the snake is under the chair. That's the three-word
phrase. Oftentimes, there'll be multiple chairs and multiple snakes. You guys say which one
has the gem. So in that case it'd be the orange snake is under, in this case, the blue chair.
If they get it correct, Iris will magically poof the snake and get the gem and go on the
next level. Now this is a sign language verification task, not just a sign language recognition
task, and we're using gloves for computer vision. We're also using our accelerometers
again. That gives us--well, vision might not give us up and down, the accelerometers do,
so they're--think of them as glorified tilt sensors. This is the scenario we have with
the children in this little kiosk as they're assigned to the game. Now, you might think
this seems like a relatively easy, you know, computer vision tracking system. But remember
we have a lot of different video going on here, a lot of different lighting conditions.
Here, the features we're using--we're using head placement, hand placement, angles' relationship
to each other. We're using--we're doing PCA on our database, so we have the top 20 hand
shapes, or the left and right hand. We're doing FFTs on the accelerometers. We actually
render little eyeglasses on the children's video. So as they are interacting with the
system, the eyeglasses stick on them, so they can stay within the view of the camera. Now,
why this is hard--is hard? Well, it turns out--we're only using 19 signs in our system.
But, they can be done in many different ways. For example, this is bed and this is bed,
and this is bed and this is bed. This is cat, so is this, so is this. Most signers, if you
watch our interpreter here, have a hand dominance. And so most of her signs--I'm actually looking
at her--her hand to figure out which dominance she is. But she's doing all two-handed signs.
There we go. She's right-hand dominant. So most signers have a dominance. The children
we work with actually don't have a dominance. They'll switch dominance in the middle of
the phrase, which causes us all sorts of problems. They also have things like, you know, flowers
which can go right to left or left to right, with right or left hand. So that's a problem.
We love--so with 19 phrases--the 19 signs, we ended up with, believe it or not, 128 different
tokens we're looking at. There's that much variation going on. The other problem is we
have lots of disfluencies. In speech recognition, disfluencies--you had to actually recognize
people are coughing or "Uh," or "Ah," you know, or [MAKES SOUND], right? You have to
recognize all of those different utterances in order to make your speech recognition better.
We had the same thing. We have cough, "Excuse me," and "No, no, no, no," you know, "I didn't
mean that," or, "What am I supposed to sign next?" And we don't, and we want to be able
to recognize, you know, "The orange snake, umm, oh, under the green chair," right? So
we got to be able to handle these disfluencies, to actually recognizing them as well. My favorite
disfluency that we're recognizing is the pick your nose gesture. That's why our gloves are
washable. So we get about 84% accuracy in trying to determine if a phrase was signed
correctly or not. Interestingly enough, our sign linguist, when we're doing a Wizard of
Oz study to collect data, you know, he pretended to be the computer recognizer, he only had
90% accuracy. So we're not that far off with what the human is doing. In truth, we're very
far off it. This is a very hard problem. We got many years of work left, but for this
constrained situation, we're okay. And we actually deployed the system fully automatic
for two weeks where we had six children use it, use the system about six hours or so.
And six children who are control. And we actually saw a significant increase in their short-term
memory, their ability to sign, to express themselves in sign, and their ability to understand
sign. So we're very, very excited about that. This--the sign verification program, this
CopyCat program is the first example, I know of, of a sign recognition system actually
being used for real application in the real world. Now, I said--now, this system we have
works for children ages six to eleven. That's generally after the critical learning period
of a language which is zero to three. We also want to try to actually get a system for children
who are zero to three as well. So what I'm going to do is show you something called SMARTSign
Alert. Okay. So what we have is a system where the parent, throughout the day, gets sign
language alerts. So it's just like an SMS but each SMS is a little video that shows
them a new sign like this. And gives them a little quiz, which one is it? And if they
don't--if they don't know, it will tell them which one it was. That was cat. So, throughout
the day--we try to openly space the lessons so that the parents learn the most in the
least amount of time. And it turned out--it turned out, this was really fun. We did a
Spanish for me and I had a lot of fun learning it. We're actually using the first 80 words
you use when talking with an infant. So, what's exciting about this, we compared learning
sign language on cellphone to learning it using a desk--the same desktop application.
This thing was 40% more efficient on cellphone than desktop. I was really quite surprised
about that. Now, the other thing we have is a system where I can actually--I don't know
if anybody can see this but--what I can do is actually talk into the system and ask for
a sign. For example, thank you. "Thank you." And as it pulls up the word--the things I
said, I click on it, I'll get a little video showing me the sign. [INDISTINCT]. And here
is the sign for [INDISTINCT]. So what we want to do is hit the stage where parents can say
things like, "Go to bed," and up comes a video, Go to Bed. And so the children actually learn
sign in context. Now that--we're trying to get that out as a Google app on the app store--Android
app store before the beginning of the summer, but we didn't quite--we didn't quite get there.
We're still working on it. Unfortunately, the researcher working on is currently now
at IBM, so we'll probably won't get on there until fall. Okay. Now, I know that--know I'm
out of time. Last time, I told you guys a little bit about trying to recognize sign
language directly off a motor cortex, so let me give you an update on that. For those of
you who hadn't--didn't see that talk. If you have somebody who's locked in, somebody who's
paralyzed, has ALS or Lou Gehrig's disease depending on how you know the disease, they
cannot move a muscle. They have no way to communicate. Can we actually have people communicate
through brainwaves alone? The answer is yes. We're doing forced-choice pairs, things like
hot versus cold or chair versus bed and we're actually getting relatively good accuracies
on this. So for these forced-choice pair's getting, you know, 90% accuracy for real signing,
it even works if you're just imagining signing. Right? If you sit there in this fMRI tube
and think about doing the sign. You can still get relatively decent results. And currently,
we're starting to work on entire phrases, instead of, "Are you hot or cold?" You know,
"Hot or cold?" "Are you in pain or are you okay? Do you want to go to your chair or to
your bed?" Now, we're trying to get full phrases like, "The bed's hot, I'm in pain." So that
currently involves a fMRI, a big machine. What I've got with me today is an fMRI sensor
or fNIR sensor. So, this is basically a set of fancy IR transmitters receivers that you
can put on a little portion of your head. It can tell you if that little portion of
your head is activated--is active. And you can actually use this on a mobile device.
So, what I'm currently doing is I'm wiring this up to my wearable computer, to start
seeing if I can think to my computer in sign language. They get to do something for me.
Now granted this is only one bit. This is actually what I call my Kill Bill interface
because I'm putting this right about here, which is the, "Wiggle my big toe." So I'm
trying to make it so that, you know, if I'm trying to have, you know, okay or cancel,
I wiggle my big toe and it triggers this. We'll see how well it works. Okay. I am running
over. There's lots--I have lots of other demos up here if you want to include a system for
the--for deaf folks to be able to TTY into directly the 911 center system, for playing
Dance Dance Revolution on your cellphone using sensors on your feet. That's a lot of fun.
There's a system for people to learn Braille who have low vision. That's already on the
iPhone. That's by Brian Key, so I can show that afterwards. I also have lots of other
stuff which apparently--I can't--I can only go forward. I can't go back in this presentation.
There's lots of other crazy stuff we're working on. This has been our survey. So if you want
to talk to me about talking with dolphins or trying to make better mini-QWERTY keyboards,
and answer those types of stuff, please come up and talk to me afterwards. And the acknowledgment
slide just got killed off but you saw it there earlier. I have a lot of funders; NIDRR, DARPA,
NSF, ETRI. Thank you to all of them and thank you to, of course, this is mostly [INDISTINCT]
work not mine, so thanks to all of those who are on the screen, just a second go. And I'll
take any questions. Thank you very much. >> [INDISTINCT] citing the recording, so do
you think Morse code is going to make a comeback? >> STARNER: No, it's too hard. I don't even
know Morse code well enough. I can do SOS. So, you know, that ringtone that goes [MAKES
SOUND] drives me nuts because it's mostly SOS. I don't know why they chose that. It's
just really frustrating. >> Hi, you talked a little bit--[INDISTINCT]
there we go. You talked a little bit about what is socially appropriate and not socially
appropriate? >> STARNER: Yes.
>> And, of course, you personally have been testing out the stuff for a long time.
>> STARNER: Yes. >> Are there changes in that or, you know...
>> STARNER: Oh, yes. >> ...[INDISTINCT] IN the high level things
you can [INDISTINCT]... >> STARNER: Camera phones. Holy cow. When
I first started this stuff, the idea of actually having an on-body camera, people really, really
hated that and now all of you have on-body cameras. People used to, you know, the idea
of actually recording audio, you know, I don't record audio; I could. But more importantly
all of you could record audio with your cellphones easily, 24 hours a day. Even worse than that,
someone could hack your phone relatively easily and turn on your microphone without you knowing
it, even make a [INDISTINCT] microphone out of all your cellphones. So, every one of you
sitting here is bugged. You know, the whole cellphone revolution. You know, back when--I
remember back when I started, cellphones were this big. Right? They were huge. They weighed
pounds. So now, that everybody has, you know, this supercomputer in their pocket, what I
do is--it looks tame. I'm just trying to control you with my brain.
>> So you have a little display in your left eye. What is it for? What are you doing with
it? >> STARNER: Normally, it is for--as he loses
it--this. So these little notes that I would normally have on my screen as I talk, but
today because I'm having equipment failures, I actually had to put my wearable and actually
used it for the videos. So for example, I forgot to mention that CopyCat is at the CVPR
workshop this Friday on Human Communicative Behavior Analysis. And if you ask real nice,
Zahoor Zafrulla, who is there, will give you a live demo of it. So normally, I have notes
on my talk as I'm giving it. Why do I have it on right now is a good question because
I forgot to take it off. It's the honest truth. I--it's not hooked up right now, so why do
I have it on? I'm just used to it. >> So I happen to notice that you're wearing
a Twiddler. >> STARNER: Yes.
>> And I know that in the past you found that that was the best wearable input device.
>> STARNER: Yes. >> Is that still true or have new things come
up and... >> STARNER: It depends on what you're doing.
It's still the best that I know of because you can get quite fast at--like a burst of
130, I sustain 70 words per minutes. On a mini-QWERTY keyboard, the BlackBerry style
keyboard, you can do, believe it or not, 60 sustained; so you can do equivalent. The only
problem is you needed visual attention. So you can't actually sit here like--I'm a professor;
I teach all the time. There's no way in my class typing their notes in their BlackBerry.
Why? Because they have to do this all the time and they can't look at the blackboard.
With the Twiddler, it's all touch typing and so you can look up just fine. If you try to
do--if you try to do touch typing on a BlackBerry, your error rate goes up to 15% per character;
just horrible. Your typing rate goes down to 45 words per minute instead of 60. So the
BlackBerrys, if you can give your full attention to them are just as fast as a Twiddler. But
as soon as you try to actually do anything where you're on the go, when you're actually
moving, the BlackBerry rates go to hell. But if you're interested, we actually have a thing
called on-Mac Wide Out which looks at the fact you have fat thumbs and you hit multiple
keys at the same time. For those of you who are engineers, you know about kiddie bouncing,
think about kiddie bouncing across multiple keys and once you have that idea you can actually
reduce the amount of errors people make on a mini-QWERTY keyboard by about 25% of all
errors. You can reduce about 50% of just [INDISTINCT] off by one error, probably a whole lot more,
but--so, we can--we can improve mini-QWERTY keyboards, so they're better. But I don't
think they'll ever make the Twiddler yet. There's nothing else out there that has come
close. The only thing I know off that might get there is something called ShapeWriter
on new Samsung phones where you actually do gesture things for entire words. Though [INDISTINCT]
at IBM never did a real true longitudinal study on it, so we really don't know where
it matches--maxes out. >> Okay. [INDISTINCT]
>> STARNER: So the Twiddler--the TeK Gear has bought out Twiddler. So you go to,
T-E-K-G-E-A-R. I sent you, TV, a special invite to talk to Scott Gilliland who's making the
new Twiddler. I happen to know they got their first run of 10 samples from the factory yesterday.
>> Okay. >> STARNER: And as a matter of fact I've been
using one for the past week except for the fact that the greater and less than sign are
where the Z and T are, and G sometimes becomes enter.
>> Does it have Bluetooth? >> STARNER: This one is just USB, but I happen
to know Scott has made it so it may become Bluetooth pretty easily. So if you want to
get on the bandwagon and suggest improvements... >> [INDISTINCT]
>> STARNER: It's--you already have one in your inbox as a matter of fact.
>> [INDISTINCT] the inbox? Okay. Okay. >> STARNER: You're just not paying attention
to my emails. I explicitly invited you to work on this. Yes. So...
>> You know, the [INDISTINCT] is that I do have like [INDISTINCT]
>> STARNER: Yes. So Scott Gilliland is the one you want to talk to. He's the one working
on it right now. And, you know, there's lots--real [INDISTINCT] games and stuff to help people
get up to speed on the Twiddler. >> Thank you, Thad.
>> STARNER: Yes. Thank you.