Intrael primer. A guide to computer vision for web developers


Uploaded by wizgrav on 04.12.2011

Transcript:
Hello, today we'll be presenting a technology that may
help people discover a facet of web development
that doesn't have so much to do with the web
Starting off, let's examine a very recent development
which allows the computer to see
You'll probably have seen by now this device called the Kinect
It's an accessory for the XBox console
which is, basically, a camera
with a twist
it has an IR laser mounted on it
which scatters through a prism
and creates this pattern on the area in front of it
lots of small (invisible) dots that form a semi-random structure
this pattern is tracked by an IR filtered camera
and through processing from a specialized chip this is created
the color hue correlates to the depth measured at each pixel
the depth field is derived mathematically
so it kind of works as a cyclops, not stereoscopically
the kinect also has a normal camera, next to the IR one
which is fairly typical, nothing really special about it
now, why is all this of concern to us?
there's a field in IT science called computer vision
which is a type of processing where we can take data from a camera
and derive information that could be used as stimuli for our application
this IT field has certain elemental procedures
first off, it's basically plain image processing
like one could do with a photoshop kind of program
we get a video stream and process each frame
first we have to isolate interesting features in the image
and throw out the rest
the kinect is very useful here because just by filtering
based on depth we can isolate the features with ease
this is the main innovation of kinect, it allows us to isolate features
the second stage, which is also fairly common in computer vision
is called connected component labeling
here we assign to every pixel we have on our image
a number that's common with every other pixel it is connected with
which belongs to the same "island" in the black sea that surrounds them
this allows us to identify distinct features
so we can process them as separate objects
now we're heading off to intrael
intrael is a server
that performs the procedures we just described
it isolates features based on depth
it filters by depth
identifies objects
measures several of their geometric properties
and makes these available to html pages as a datastream
allowing a web developer using javascript and HTML
to build applications
using the user himself as a "mouse"
>> this could be used for comics
yeah and art in general
potential applications include smart security cameras
interactive art holds great promise
interactive, multitouch, surfaces in general
and whatever else one can imagine
the information that is provided by the system includes:
the nearest and furthest points from the cameras perspective
the x and y axes extremes
from which we can derive the bounding box
so we have 6 points so far for every object
the final point may or may not be on the object area
it's derived and corresponds to the geometric center
these are the 7 points we can use
for every one of them we get their x,y coordinates
the depth that was detected there by the camera
and the depth of the background scenery on the corresponding point
which allows us to calculate eg. the distance of the hand from the background
let's see it in action
in this case, we see the average depth of all pixels of the object
in millimetres
it's basically a data stream of 32 element numeric packs for every object
the developer gets a bundle containing these packs, 30 times a second
and it's his job to do whatever he likes with them
the interface is very low level and leaves
enough freedom to the programmer to do several things
a basic example we can show right now
in this case we can see the average X coordinate of the object
this is a very simple case
it uses only one element from every object, the X coordinate
and plays with the page's elements, in sum:
this page includes in it an SVG
SVG is a format that can be exported from vector drawing programs
imagine it like this:
up to now we knew DIVs which are rectangular pieces
SVG is non rectangular pieces
which can be manipulated at will
this is a simple case
the simplest one can implement
yet it may have some value, for interactive art for instance
Another case is
this is another simple case that could be used
to implement information kiosks for instance
or we could scroll maps
or anything else we can imagine
>> elkosmas.gr: what other applications could be developed with intrael?
As we said earlier, security cameras
someone could develop a fully programmable security camera
where you could say, eg track this specific area in space
and take a picture of anyone who stands for more than 10sec there
the depth camera doesn't need light to function
so there's definitely potential for a security camera
as far as art is concerned...
>> elkosmas.gr: the options are unlimited
yeah, there's great potential there
about interactive surfaces... let's leave it out for now
as a higher knowledge of geometry is needed
but already someone can go the site and check some examples
all the source codes, including the test console
are simple HTML pages
they're online and anyone can download them
they're hosted on a site called jsfiddle.net
which easily allows to get other user's codes, modify them and experiment
all the examples we've shown are online along with descriptions
not necessarily good descriptions, but at least some material is up
for everyone who wants to get involved
several other thoughts also come in mind
it could also be used in setups that don't have to do with the web
the interface used is HTTP
you could use it with anything that also "talks" the HTTP protocol
you could use it with python or java
or anything else, you just get a datastream
>> elkosmas.gr: so in essence you've setup a local server
>> that reads the depth data from the kinect,
processes them,
and spits them to javascript like chewed food in a datastream
it tries to provide data that would allow expressibility to the user
in a way that's actually feasible as javascript couldn't handle the load
it has to be done in a low level language, C in our case
nevertheless it's pretty easy to deploy it
the application runs on windows, linux and probably macosx as well
it's just not tested there yet, I haven't setup hackintosh yet
>> elkosmas.gr: or you just haven't got your hands on a mac, yet
well, I'm not planning on getting one
macs are the devil's work :P
but since it builds on linux and windows
>> elkosmas.gr: that won't be a problem
yeah, it probably is compilable
at some point in the immediate future I'll try
to make and upload a package for macosx
>> elkosmas.gr: so far alot has been done already
the software is quite mature
it's been adequately tested
it does what we said so far, like we said
and doesn't have any major bugs left
it's production quality and you can use it
the interface is low level though
and that means that a developer can't just do stuff out of the box
it requires some prior knowledge
the realization that you have to deal with complex 3D input
it's a big leap from having just one pointer
with x,y coordinates plus one button
now you get multiple objects
each providing seven 3D pointers
plus some extra bits of info
like pixel counts and more
>> elkosmas.gr: ok another question
>> you only take data from the depth sensor
>> or from the other one as well?
it takes from both, I don't have an example right now
but the data from the RGB cam are typical
>> so you could use them as well, theoretically
>> is it open source?
yeah, GPL
the framerate seems choppy
that's because it's called directly
inside an image tag it works fine
the video streams we see
come in a format called Motion JPEG
it's essentially a sequence of JPEG images
that can be further processed if we like
the processing that derives the object measurements
has already been done in the server
the graphics we see are separate components
an img tag(MJPEG) and a canvas on top of it drawing the rectangle
you can see the distinction when moving fast
but they follow each other closely
one of the information that come from the server
is the displacement that's necessary
from the depth camera to the RGB one
so you can crop features from the RGB cam
this is necessary because the cameras
are not placed one exactly on top of the other
so we have to apply a bit of stereoscopy
to shift the images a bit
the intrinsic differences of the cams are factory set in the kinect
so this information is derived as well
this is the only potential use for the other cam
because the objects have already been separated
you can use this image
as a mask for the RGB image
and crop what you want
adjusting it by the displacement amount we mentioned
this is the functionality in general
>> elkosmas.gr: and where can we find the software?
at www.intrael.com where you can find links to the code hosting
intrael.com is a blog containing examples mainly
I hope yours as well
and anything development related
>> what's the name of the software itself?
intrael
you can contact me for any info you may need
the documentation may not be very well written
so I'd appreciate some feedback on that
and feedback for the application as well
>> elkosmas.gr: and testing on other browsers to check how it works there
let's see what the deal with the browsers is
the image part works sub-optimal on explorer
you can take separate images but not a stream
>> elkosmas.gr: that happens because of MJPEG?
yes, it's a technology created by netscape
it's supported by all the rest along with mozilla
>> elkosmas.gr: besides internet explorer
were it will never be supported cause it was a netscape technology
it's a politics issue
the rest of the functionality works there
and you can still take images but not as video
just snapshots
the datastreaming part works in all browsers
and in internet explorer from version 8+
version 9+ is recommended anyway
as that's the point it starts to feel comfortable
the recommended browsers are chrome and firefox
and, potentially, safari as well
>>elkosmas.gr: yeah, since it's based on webkit
chrome and firefox are very similar
you'll write the same code for both
the capabilities are very similar, and the performance as well
you should also keep in mind that this program
is not intended for generic internet usage
it's meant for creating installations
so you have the freedom to choose the browser you want
as well as technologies that you couldn't use so far
stuff like 3D graphics, WebGL
and several other technologies that,
because of the issues with explorer we mentioned,
developers hesitated to touch
now you can use them and build
pretty advanced installations
HTML5 is a very powerful bundle of technologies
which are, for the most part, unexplored
with this system you can use them but
not in the sense of making the new facebook
it's intended for building interactive installations
using the browser as a content tool
>> elkosmas.gr: thanks a lot yanni
thank you, and I'm eager for your feedback
>> elkosmas.gr: sure thing