Artificial General Intelligence: Now Is the Time

Uploaded by Google on 25.07.2007


BEN GOERTZEL: Yesterday we took a bunch of data regarding
the extraction of genes in mice that underwent calorie
restriction diets, threw it into the AI system to see,
well which genes are most important for the calorie
restriction process?
Which genes are related to each other and in which ways?
And you know, we found cool stuff.
Like, we have this MRPL12, this mitochondrial ribosomal
protein gene twelve, which comes out as being central to
calorie restriction.
And the AI figured that out.
Statistics couldn't figure it out, people
couldn't figure it out.
You feed the data in into the AI, it churns using some
statistical pattern mining machine learning algorithms
and figures out new stuff.
But then a humane has to go and take that and cross
correlate with existing research literature, figure
out what the hell it means, figure out what makes their
experiments do I do in mice to take it to the next level.
No AI programs like this one, we wrote in the context of my
company, Biomind.
They're incredibly helpful tools but they always leave it
to humans to interpret what they've done and integrate
things with a wider context and import some new domains.
So my contention, which certainly is not proved, but
it's my conjecture, based on my own experience in
integration of available knowledge.
My contention is that general intelligence at the human
level or beyond, is not going to come about but kind of
incrementally generalizing narrow AI applications.
Like making a better and better search engine, or a
better and better biology data mining engineer, a better and
better car driving engine.
My view is that to achieve general intelligence, you've
really going to set out to make a general intelligence.
And it probably won't be useful in the very beginning.
I mean, I've had three children and none of them were
at all useful for a long time.
And it's arguable whether any of them are of any use right
now actually.
They kind of cost a lot of money and
cause a lot of trouble.
So I mean, an AGI from the only example we have, when you
start out, it is a complete idiot.
I mean, if you saw a baby and didn't know it was going to
grow up-- what, it lies there, goes wah and
then makes a mess.
I think that AGI is likely to be similar in the beginning.
It's not going to know very much.
It has to learn an awful lot from scratch and once it's
been educated sufficiently, then they can be quite
powerful in the variety of different domains.

So I do think the right approach is to create
something like an artificial baby.
I don't want to over interpret that.
It doesn't have to be like a human baby in any details, but
the point is it may not have useful
functionalities in the beginning.
It may have to be taught.
Only once it gets to a certain level are there going to be
practical applications from it.
So just to pursue this a little further--
let's say you want to make in an AI baby of some sort.
How do you do it?
Where do you do it?
Now you could make it a purely chat bot thing --
just as a textual system.
I don't think that's totally unfeasible but I've come to
the conclusion, it's probably not the best approach.
Some folks take a Rodney Brooks type of an attitude.
You've got to make a physical robot.
And I think that's great.
It's just a big hassle.
I've tinkered around with robots, and it just-- you
spend all your time dealing with actuator and sensor
problems. And maybe it's good if you have a huge budget and
a team of robot assists.
I've become attracted to sort of intermediary option, which
is the embodiment in virtual worlds--
where you still have perceptions, action,
cognition, and social interaction but you have to
deal with all the nitty gritty of moving the robot arm here
there and the bum sensor broke and there's too much glare for
the camera and so forth.
The argument against virtual worlds is that, right now,
virtual worlds don't have the sensory and motoric richness
of the real world.
The physics simulations in virtual worlds
are not that awesome.
The amount of data in virtual worlds is not that much.
To the extent that human level general intelligence just
depends on a huge richness of perception and a huge kind of
flexibility of affordances and movements.
Virtual worlds right now aren't as good as the real
world but I think the virtual world technology is improving
at a really incredible pace, mostly due to video games
being so popular.
I'm fairly optimistic that virtual worlds are going to
get better and better so as to support robust learning for
artificial general intelligences.
This is a screenshot from a little virtual world that we
made in our own AI called AGIsim which is just based on
an open source game engine, Crystal Space.
Our AI controls that little guy there.
And a human being can control that guy there.
In this screen shot, we would just run the AI through some
very basic learning experiments based on the
psychological theories of Piaget where he basically--
the teacher takes a little bunny and hides in the box.
Then the baby has just got to figure out that the bunny is
probably going to be in the same box that was hidden in.
Which is really simple, of course, you could program an
AI to know that with one line of code in a
good programming language--
more in C++ which our system is programmed in.
The point is, a human baby actually doesn't know that
objects are permanent.
Like a human baby doesn't know that if I put my cell phone
behind my back, it's still there and is likely to
reemerge somewhere eventually.
Somewhere between six and nine months, a human baby
figures that out.
And so we've done the experiments, making our AI
system figured out based on the body experiences as well.
And that's taking things down to a really
primitive level, of course.
We've also done experiments programming in knowledge like
that and teaching it more and more advanced stuff.
But you can see the embodied modality lets you kind of take
a very basic primitive of approach to teaching an AI
system and understand it itself and the world and its
There's also existing commercial virtual worlds like
Second Life, which this is a screen shot from.
There are virtual pets in Second Life and avatars that
people control as well as avatars that are controlled by
simple scripts right now.
And Second Life is interesting because it's more rich than a
simple simulation world that we built just for testing.
There is millions of subscribers to Second Life.
There's a huge kind of virtual topography there with all
kinds of stuff going on those.
If you put your AI brain inside this dog here or inside
this guy, and it goes around in Second Life, there are
people to interact with it.
There are things for it to do.
People will chase it around.
They can chase people around.
People will say stuff to it.
It has to figure out how to react.
And this is something that we're
actively working on though.
I'll talk about a little later is using our AI system to
control various sorts of agents in Second Life and make
use of the richness of that environment.
As I said right now, Second Life physics does exist but
it's fairly primitive.
We've got Newton's laws in there.
We've got friction.
We don't have fluid mechanics, for example.
But I think all that is going to come.
In the PS3, you do have fluid mechanics and so forth.
It just hasn't been poured into this domain.

So far all I've done is to say some generalities.
I think we should work on artificial general
intelligence directly.
I think we should do it using virtual embodiment.
So I think that the best path forward for the AI field, in
terms of the grand all time goals of the field of really
making a thinking machine at the human level or beyond--
I think the right thing to do is just to focus on making
programs that control embodied agents in virtual worlds and
that learn to act like a little baby or
a dog and so forth.
And once they've mastered those simple behaviors, teach
it more and more and more and more stuff is until it learns
more and more through interacting with people in a
shared perception and action contacts.
And that's my best guess for the right general way to
approach the AGI problem.
Now, of course, that in itself doesn't tell you very much and
is not very original either.
People have been talking about that kind of stuff since well
before I was born.
And I'm 40 years old.
So there's nothing very new there.
Although I find it disturbing that so much of the AI field
has digressed so far from that to all sorts of other things.
Because I do think that focusing on this sort of stuff
is still the most likely way to get to the end goal.
What I'm going to talk about for the rest of the time I
have is my own AI architecture.
Which according to my best guess, when complete, is
likely to be capable of achieving this goal.
And certainly in 20, 25 minutes, I wouldn't be able to
convince anyone that the architecture really will work
as I think it will.
Even if someone had all relevant knowledge and was
emotionally extremely well disposed toward it.
It's just an awful lot of a detail.
And we're developing the Novamente system within a
start up company--
Novamente, LLC --whose business model is focused on
controlling virtual agents in Second Life and in massive
multiplayer online games and in training simulations.
So that all that description of the system is not published
at this point.
We have hundreds of pages of internal documentation.
I have published some overview papers of the architecture in
various AI conference proceedings--
at AAAI IEEE conferences and so forth.
So if you're curious, looking in a little more depth on the
architecture then what I'm going to say in the next 20
minutes, the website,
has a papers page.
And you can look at some of the 8 page conference papers,
which, of course, don't tell you anywhere near everything.
But they will help position the approach a little more
carefully in your mind.
So the Novamente system, how does it work?
Well it is, as I've mentioned before, an integrated AI
And I started by trying to come up with a holistic system
theoretic understanding of how cognition works.
I didn't start with a particular algorithm or
knowledge representation.
I didn't start from computer science at all.
I started from systems theory and cognitive science.
What are the parts of the mind has to have?
What are the overall high level dynamics of the mind?
How they interact with each other?
What are the emergent structures?
Then I took a step back and said, well how the hell could
these things possibly be achieved using tractably
implementable computer science algorithms?
Not necessarily by imitating the brain, because I don't
think we know enough about the brain to use the brain as a
detailed guide.
But using computer science algorithms integrated in an
appropriate way to give rise to the overall structures and
dynamics of the mind.
And that's the high level approach that was taken.
Right now, we're not done building the thing.
There's a detailed design.
We're maybe 40% to 50% complete with the
implementation and a detailed design of it.
What we have now is enough to control agents doing some cool
things in simulation world.
But nowhere near where what we'd like it to be.
So we're progressively building out the system while
applying the system for agent control.
This is just a some of the overview papers which you can
see in the website.
I going to now briefly jump through some of
the technical stuff.
I'll talk about knowledge representation, a bit about
software architecture, I'm going to end up pretty much
glossing over the cognitive processes, which is the most
important part.
But there's not that much time.
And then the high level emergent structures that are
hoped to arise within the system
what once it's complete.

First just a bit on the philosophy of mind underlying
this whole thing, which I don't think is dramatically
original but I found it useful to formulate it in a kind of
precise way.
A human intelligence system is a system that recognizes
patterns in the world and in itself.
And key to this, I think, is a reflexive process of a system
recognizing patterns in itself, then improving itself
based on those patterns.
That doesn't entail, like, deep source code level self
modification necessarily although it could.
But it entails learning and introspective learning, which
humans do from very early age and keep doing throughout the
course of their lives.
And a key part of this is the development of what
psychologists and philosophers of mind called a self, for the
phenomenal self.
The image within the system of the system itself.
If an AI can't even recognize itself as a pattern in the
world and it's bodies interactions with the world,
it's not going to have much grounding for any other kind
of flexible intelligence.
A lot of the key to getting AGI to emerge, I believe, is
getting a system to be good enough at pattern recognition
in its world that can recognize what it is in terms
of how other things react to it.
And that's a lot of what little baby does in the first
few year or so of its life.
Because when a baby is born, it doesn't know the difference
between itself, it's mom and the bed it's lying on.
And at a certain point, it gets this idea, like, hey, I'm
this thing here.
My mom is that thing there.
This pillow is this thing here.
This skin, like, is me but then when I touch something
else, I feel something else.
This basic understanding of what your self is--
it is really critical to ongoing
learning and cognition.

So as noted before, I consider intelligence as the ability to
achieve complex goals in complex environments.
And starting to edge toward the technical side of things--
one might look at the achievement of goals as a
system recognizing uncertain patterns in the form, well, if
I carry out this procedure, in this context, I'll
achieve this goal.
And again that doesn't say very much because it's pretty
obvious that if you could solve that problem in general
you could do anything.
In terms of generally structuring what the problem
is, this is not necessarily the way people would look at
things from say a neural net, or a crisp logic theorem in
proving the approach to AI.
One thing you notice is I place probability theory and
uncertainty at the core of the approach.
We'll see that as I launch into the details.
So I had a book published last year called The Hidden
Pattern, which pretty much just reviews philosophy of
mind, going through the whole gamut of issues and cognitive
science and philosophy of mind and trying to largely explain
why they're not major problems in how they can be resolved
very simply in the context of viewing the mind is a big
system of patterns and recognizes patterns and in the
world and itself.
So getting toward than the nitty gritty, how does our
Novamente system work?
Starting off with the knowledge representation, and
I almost hesitate to use the term knowledge representation
because it can be misleading.
Because I think a lot of what an AGI System has to do is
learn how to represent knowledge.
So you can almost think of this as a proto knowledge
representation and the system has to build its own
context-specific knowledge representations on top of that
for dealing with different sorts of things.
But the low level proto knowledge representation of
Novamente, it's a graph data structure.
You got nodes, you have links, and to look at it really
crudely, it's kind of a synthesis of probabilistic
semantic networks, with attractor neural networks.
In the sense that you have nodes and links.
And you have weights on the nodes and links that represent
probabilistic truth values.
You also have weights on the nodes and links that represent
what we call attention values, which are sort of like
activations or weights in the neural network.
So we're putting together attentional type stuff, like
and in an ANN, with truth value type stuff, like an
probabilistic semantic network.

It's not a neural network in that we're not trying to do
low level brain modeling.
It's also not really a pure semantic network, because
we're not just representing
high-level conceptual knowledge.
We can represent procedures to do stuff, percepts that have
come, and so forth, as well as a high-level conceptual
semantics in the same graph.
So the numbers associated with the nodes and links include
numbers called attention values.
Each node or link has two numbers; a short-term
importance and a long-term importance attached to them.
And, roughly speaking, the short-term importance of a
note or link dictates how much attention is paid to it.
So the, kind of, short-term memory, the attentional focus,
their working memory, is the things with the highest
short-term importance.
The long-term importance dictates whether something
gets kicked out of RAM onto disk or not, basically.
Forgetting has been a big focus of our work, because any
system that is constantly perceiving a simulation world
and generating new ideas generates way too much
knowledge to keep in RAM.
So you need a fairly sophisticated system to guess
what may be relevant in the future.
And we also have probabilistic truth values.
We use a particular system where we use it two-component
truth value.
Each piece of knowledge has a probability and also a number
we call the weight of evidence, which shows you how
much evidence was gathered to support that probability.
And there's a bunch of math underlying there that actually
connects with imprecise probability theory, interval
probabilities, if anyone's looked at that.
But, in general, pieces of knowledge in the knowledge
base are weighted with these two-valued truth values and
the two valued attention values.
We also have a typology of nodes and links.
And the name should be taken with a grain of salt, but we
have nodes that represent percepts coming in from the
external world.
Nodes that represent little procedures for doing stuff,
like move joint at this angle, and so forth.
And we have nodes that can represent abstract concepts.
And, basically, nodes that are just tokens, whose only
purpose is to be linked together by links, and then
the concepts can be viewed as a, kind of, subgraph, or
linkage structure, among a bunch of nodes and links.
There's a lot of science to this.
And, actually, the knowledge representation scheme is
described fairly well in some of the available publications.
But the basic idea is that we're using a weighted,
labeled hyper-graph knowledge representation, where the
weights carry semantics regarding attention, on
different time scales, and semantics regarding
probabilistic truth value, within the same network.
So this just graphically depicts that in the same
network, we're going to have nodes representing stuff about
joints and actuators being on a certain time.
Nodes with no name at all, in English, but they're just
meaningful in terms of their relation to other thing.
This is a node everything a certain particular instance of
raising your arm.
this is another thing the general
concept of raising arm.
And all that stuff can be in the same network that's
managed in the same way.
So you can you can have links to noding a generic
association between things.

I always hate to give these examples because they're a bit
misleading, because most nodes in the system wouldn't have
any English name.
But some of them may correspond to concepts that
can be represented in English, and those happen to be the
easiest ones to make slides about.
Nodes can have a generic association between each
other, just meaning that they tend to be
useful in the same time.
You can have explicit logic represented.
Say that, nodes and links representing a predicate
Coffee is often in a coffee cup.
That would have to have some probabilistic truth value.
This is really only going to be useful when embedded in
some other links structure.
Because only in some contexts is that true.
If it's a coffee bean in a plantation it's not going to
be in a coffee cup.
So the really relevant node and link structure is, as in
any pragmatic system using logic, are going to be big,
nodes and link structures with a lot of the uncertainty
associated with them.
People often drink coffee from a coffee cup, but again we
need a whole bunch of these with probabilistic weightings
and contextual and embeddings to do any use.
But, nevertheless, logical representation, with
appropriate probabilistic weighting is a big part of
what we're doing.
And it's important that that's overlaid with this, kind of,
neural net-like associational representation.
I think you need both of those operating together,
effectively, to adequately represent knowledge for a
general intelligence with a graph-type knowledge
Software architecture, it's important, but it is kind of
standard stuff, so I go through pretty quickly.
We have a big container of nodes and links, which we call
the Atom Space, and a bunch of objects, called Mind Agents,
that act on it, carrying out different cognitive processes.
Then we make it a distributed system.
And we can have a whole bunch of those on different machines
and they'd all hook together with each other.
We can draw a, kind of, boxes and lines diagram,
like anyone else can.
And I think all these diagrams really, kind
of, look the same.
Because, at this level, cognitive science tells you,
you have perception, you have language, you have actuation
control, you have memory, you have goals and feelings.
This looks a lot like the diagrams you'd see from Stan
Franklin's LIDA System or Aaron Sloman's cognitive
Not that different from Minsky's Society of Mind or
Emotion Machine.
If you compress a bunch of Minsky's little boxes into
bigger boxes.
And I think that it's important to understand things
on a high level like this.
But, ultimately, intelligence comes down to what are the
dynamic processes going on inside the boxes and how do
they interacts with each other, rather than this kind
of a high-level portrayal.
A different way of looking at things is as a basic kind of
animal-like cycle for interacting with a world.
Wherein perceptions come into memory.
They elicit feelings in the system, where feeling can be
thought of as a, kind of, internal sensor.
The system has certain goals, some of which you may have
supplied, some of what it may have formulated itself by
refining the supplied goals.
Then it figures out what to do.
Puts a bunch of procedures in some pool of active
procedures, then does something in the world, then
perceives it again.
And this basic animal interaction-type loop is a
different way of looking at the same diagram.
The crux of it is in the cognitive processes that occur
inside the system, which I'm in no way going to be able to
come close to doing justice to the five minutes that I'm now
allotting to it.
At a high level, we can look at three categories of
cognitive processes occurring in the Novamente System.
One is what I call global process.
These are cognitive processes that go through everything in
the knowledge base and just iterate.
An example of that is assigning a long-term
importance value.
Periodically, you just have to go through and say, how
important is this thing.
Upgrade or downgrade the long-term importance, and then
kick out of RAM and the things are unimportant and you've got
to cycle through and do that with everything.
We have what are called control processes, which are
kind of specialized stuff like executing actions.
There's a collection of active procedures
which are called schema.
And you've just got to go through and execute them.
We use a kind of action selection algorithm similar to
Stan Franklin's action selection approach actually,
which is related to Pattie Maes' behavior nets.
Then we have--
to the essence of it is what I call
focused cognitive processes.
And these are cognitive processes that get a small set
of nodes and links in the overall table and do stuff on
that small set to produce more nodes and links in the table.
And this includes logical reasoning and includes some
evolutionary learning type stuff.
That's really where the crux of the thinking is going on.
The stuff that is this kind of mechanical.
So if I want to summarize it in a phrase, the philosophy
that we've used in crafting the set of cognitive
processes, I would put it like this, from acute computer
science perspective.
I've become convinced that essentially every cognitive
algorithm used with an intelligence's exponential
Pretty much all you're doing is making uncontrollable
insane combinatorial explosions one way or another.
If you use evolutionary learning, your population size
that you need just blows up when you're trying to learn
hard problems. If you're doing logical inference, the process
of inference tree pruning and forward and backward chaining,
this leads you to horrible combinatorial explosions that
are hard to prune.
The whole essence of making an AGI design that can work, I
believe, is that making an integrated system that
combines various purpose specific AI algorithms in such
a way that they can cooperate and kind of quell or
ameliorate each other's exponential combinatorial
explosions rather than making them than worse and worse.
So I don't think there's any one algorithm that's critical
to intelligence.
And in fact any one algorithm is going to blow up in an
unacceptable way, which is what you see all throughout
the history of AI.
The question is whether you can hook different algorithms
together so they can kind of calm each other down by
decreasing the constant outside the exponential and
the exponential time complexity.

Since I don't have that much time, I'm going to skip some
of the slides and just verbally give what I think is
the nicest example of that.
Our two most critical cognitive processes in
Novamente are, on the one hand, an algorithm for
probabilistic evolutionary learning.
And on the other hand, a probabilistic logic engine.
And these are both critical ways of taking nodes and links
from the Adam table and creating new nodes and links.
So the probabilistic logic engines is something I've
spent several years on and I think is the best existing
integration of theorem proving logic with qualifiers and
variables all the nice stuff with probability theory,
measuring uncertainty in a fairly sophisticated way using
imprecise probabilities--
so getting logic and probability to work together.
It's nice and lets you do logical theorem proving about
stuff like controlling an agent in the world and
throwing balls and playing fetch stuff.
On the other hand when you try to learn complex stuff with
it, what happens is that you run the same problem everyone
else does and inference tree pruning and forward and
backward chaining inference.
It just becomes unsustainable in terms of the combinatorial
explosion You say, how do you control these inference trees?
On the other hand, evolutionary learning.
A lot you are probably familiar with genetic
What we're using is something called Moses which was
developed by Novamente co founder, Moshe Looks, in his
PhD thesis at Washington University in Saint Louis.
And what Moses does, as compared to genetic
programming, is--
you learn a bunch of little programs to achieve some
fitness function.
Instead of doing crossover and mutation to generate new
programs from the pool of existing ones, you do some
probabilistic modeling.
You build the probabilistic model of which program trees
are good and which program trees aren't.
Then you do instance generation from that
probabilistic model to generate new program trees.
And that works way, way better than genetic programming in a
lot of examples.
And in Moshe's PhD thesis, he just used it for some basic
categorization and symbolic regression.
Since that point within Novamente, we've used it for
aging control and a bunch of other stuff.
But nevertheless, when you try to scale it up to do learning
of large programs with programmatic constructs like
loops, recursion lists and all this stuff, you still run into
a nasty combinatorial explosion.
The population size just gets really big when you make the
tree size too big and the operators of
the nodes too advanced.
So here we have two really nice algorithms which,
however, considered in themselves meet the fate of
every other algorithm in the history of AI, which is they
do well at toy problems. And when we try to scale them up
too big, you just run into unsupportable combinatorial
What we're trying to do within our architectures is get these
two algorithms to help ameliorate each others
combinatorial explosions.
And this happens in a couple ways.
So in the Moses probabilistic learning thing, when you have
a bunch of program trees satisfying some fitness
function, when you're probabilistically making a
model of which program trees are good and which ones are
bad, what you can do is use the probabilistic logic engine
to help with that modeling.
You can use the probabilistic logic engine to help do
reasoning based on the system's long term memory and
the context in which the system is operating, make
inferences about which program trees are good.
In that way, if you have an effective probabilistic logic
system incorporating long term memory, it can help a lot with
doing the probabilistic modeling within
evolutionary learning.
On the other hand, within the probabilistic logic engines,
when the logic engine hits a dead end, you can then say,
well, we have all these options to explore in logical
theorem proving.
We don't know which one is any good.
Let's take one of them, let's take the one that seems most
important according to the short term importance system.
And let's use evolutionary learning to see what patterns
we can mine about this concept.
If you're trying to prove something and one of the nodes
that needs expansion in your backward or forward chaining
probabilistic inference trees-- a node representing
cats you can then use probabilistic evolutionary
learning to mine the knowledge base for interesting patterns
about cats.
you're then stepping out of the domain of logic.
You're doing evolutionary pattern mining to extract
relationships, put them back into the knowledge base, then
you re-expand your inference tree, using the knowledge
gained by evolutionary pattern mining.
So you're trying to use evolutionary learning to bust
the forward and backward chaining probabilistic
inference process out of dead ends, while at the same time,
trying to use probabilistic logic inference to accelerate
evolutionary learning to make it's modeling faster.
That's just one example of two cognitive processes and how
we're trying to get them to learn from each other, to
quell each other's combinatorial explosions.
There's other things have to be drawn in as well.
I haven't even gone into attention allocation.
How does a system decide which things to pay attention to,
which ones not to?
We use some actually simulated economic stuff.
They're modeled on some of Eric Baum's work in the Hayek.
But I think that the key point I want to get across there--
and I just skipped through all the details on cognitive
processes, which I went through verbally.
The key point I wanted to get across there is yes, you can
integrate a whole bunch of cool learning algorithms,
evolutionary learning to learn procedures to see stuff in the
world, probabilistic logic to reason on existing knowledge
and generate new knowledge.
Unless you can plug these algorithms all into each other
to get them to improve each other's performance and stop
each other from blowing up combinatorially, you're not
going to create an AGI.
This is part of why I think you really have to be working
on AGI to build AGI.
I think that if you're just working on one application,
you're going to be able to find some trick to make some
one algorithm do what you want just by tweaking it.
And in bioinformatics with that genetics graph I showed
before, what we use there is we use MOSES, the
Probabilistic Evolutionary Learning thing.
But because it's a particular domain, we can do some funky
parameter joining and pre-processing to get MOSES
just to work well in that bioinformatic domain.
We don't need all this nasty
interprocess integration stuff.
My contention is that for embodied agent control and in
a flexible context in a simulated world, you're not
going to be able to use tricks to overcome the combinatorial
explosions intrinsic in one AI algorithm.
You're going to need to use integration in an appropriate
architecture to get various algorithms to ameliorate each
other's combinatorial explosions.
Where does that ultimately lead?
Getting back to that the high level of things--

I'm going to skip slides and just talk more.
It's more fun.
I'm sick of PowerPoint.
The ultimate crux of this, as I said, in getting the systems
to recognize its own self as a pattern in the world.
I think we can get there by integrating a whole bunch of
these different cognitive algorithms within the
architecture that I've outlined.
What I think of as the fundamental dynamic of
cognition is what I call a loop of combining followed by
explaining, followed by combining, followed by
explaining, followed by combining, and so forth.
We have cognitive processes in the system that take knowledge
and then generate new information from that
knowledge using logic inference, using Probabilistic
Evolutionary Learning and other methods.
Then we have cognitive processes that try to explain
what's there.
You need an interaction between these two things in
the dynamics of your cognitive system.
And the crux of this is you're building the system to
understand its own self.
So in terms of an agent interacting in a 3-D
simulation world, what you have is a system that observes
a bunch of things in the world, it observes itself
doing things in the world, and then it draws logical
conclusions from that, and it recognizes inductive patterns
in that, and puts those patterns in its own mind.
That then directs its behavior based on the
knowledge its got.
Then it has to model itself and say what could I be so as
to act the way that I see myself acting.
Then it uses reasoning, evolutionary learning, and the
other things in its bag of tricks to make a guess of what
could I be in order to display these behaviors that I'm
That model, that [? axtolation ?]
feeds back into its own mind and is then used to generate
new ideas and new bits of knowledge, which the causes it
to act differently.
And then has to explain how it acted differently over again,
and it just kind of loops around.
The question is can you get a sophisticated enough,
collection of cognitive algorithms put together to
kind of fuel that loop where the thing studies what it
does, tries to explain what it does.
That directs its actions, which it then has to
re-explain, and you keep going around and around and around.
My hypothesis is that the collection of algorithms and
the architecture we've put together in the Novamente
system are going to be enough in a simulated world context
to allow the system to observe what it's doing in the
simulation world, gather data about its own actions, store
that data about its own actions in its memory, model
what must I be in order to carry out these actions in the
simulation world, draw conclusions from that, use
those to direct its actions and keep
going around and around.
Ultimately, I spent some time trying to validate this idea
I came out with a nice list of 17 mathematical conjectures
where if I could prove them it would validate that this
approach to AI can work.
Then I set them aside and put them in my hard drive and
never looked at them since, because I decided it would
take me 10 years to prove all those things mathematically.
I would rather just focus on building the system and then
validating or refuting the hypothesis empirically.
This last slide is just a rundown of what
we've done so far.
We built the core system with the Adam table nodes the links
and so forth.
We have a logical reasoning engine using probability
theory to guide uncertain logical inference.
We have the MOSES Evolutionary Learning algorithm, which
you've done a lot of simple things with and a handful of
complex things.
We can enact learned procedures in a simulation
world, and we've been working with our own simulation world,
but we're going to be rolling out some AI controlled
products in Second Life during the next year, which should be
pretty interesting and give more facility for interaction.

We've also done a bit of natural language processing
just to allow us to communicate with the system
and then control what it does.
But there's certainly a long path ahead of us to get this
thing to work.
As I said in the very beginning, at this high level
of abstraction, there's certainly no way I'm going to
convince anyone that this is a viable approach in detail,
even if I hadn't skipped half the slides
due to lack of time.
But I do think there's compelling reasons to think
that if we're going to achieve artificial general
intelligence, we're going to have to work on artificial
general intelligence.
And that embodied learning is a clear path there than
anything else.
Virtual embodiment is easier than physical robotic
None of the existing paradigms of AI are going to be enough
on their own.
So we've either got to invent something totally wild ass and
new, or take an integrative approach where you put
together the best pieces of existing AI paradigms. And if
you are going to put together pieces of existing AI
paradigms, somehow you have to confront all these
combinatorial explosions that exist in every one of them,
and get the pieces from different
paradigms to work together.
That's the framework within which we've been operating.

If you keep an eye on for the next
couple of years, hopefully you'll see progressively more
intelligent systems operating in Second Life in game worlds
and so forth.
We're not going to start out with anything dramatically
super intelligent.
The idea is to do, just like with a human baby, where it
starts out relatively limited, but flexible, autonomous, and
exploratory system, then gains more and more functionality
through learning and interaction.
Any questions?
AUDIENCE: Jeff Hawkins at Numenta has very
similar goals as yours.
What do you think of their system?
Very big question.
BEN GOERTZEL: He said Jeff Hawkins at Numenta has very
similar goals, and what do I think of his system.
I would say that the goals of our project are not the most
unique thing about it, because these goals have been around
since at least the '50s I guess.
BEN GOERTZEL: Hawkins' approach is more biologically
based, at least in principle, although when you look at what
he has, what he really has is a kind of pyramidal vision
architecture with a kind of hierarchical Bayes net
superposed on top of it.
And what I feel that is perhaps a decent qualitative
model of visual cortex, and I don't feel accounts very well
for language learning, motor control, mathematical theorem
proving, dreaming, all kinds of other aspects
of cognition done.
In his book he talks a lot about these other things.
I think his philosophy of the memory prediction framework is
fine so far as it goes, and you could probably map it into
my philosophy of mind as pattern
recognition and so forth.
When you look at the detailed stuff he's doing, it's pretty
much hierarchical pattern recognition, which in my view
doesn't carry you that far toward
making a thinking machine.
I think that's more along the lines of one particular
algorithm, which you can tune to do one particular thing,
like recognize patterns in streams of data.
You can overcome the combinatory explosion problem
just by domain specialization, rather than by fundamentally
coming to grips with it.
AUDIENCE: You mentioned that it's a T++ scalable, and that
typically that usually holds that you can run things in
parallel, and on many machines developed too much
inter-communication between the parts.
I don't know what you put these to scalable in your
meaning in terms of memory, processes, [INAUDIBLE].
How big does a system need to be in order to achieve this
with today's resources?
BEN GOERTZEL: Well, the honest answer is we don't know how
many machines will be needed to achieve a human baby level
intelligence or even chipmunk level intelligence.
We've done back of the envelope calculations, which
suggest that it's not millions or
billions of current machines.
You guys at this company I'm sure have more than enough
computational resources.
We're a small company with a server farm of a few dozen
machines, which I think is not going to be enough to make a
virtual Ben Goertzel.
But really there's a lot of research to be done to figure
out exactly how many machines are going to be needed to
achieve a given level of functionality.
But in terms of the overall architecture, what I mean by
scalable is it runs in a distributed network of
multi-processor/Linux boxes, and you can add more machines
onto it, and the intelligence will increase gracefully
rather than the thing getting overwhelmed and crashing.
But there's certainly a layer of complexity there.
If you have a node on this machine and a node on that
machine, and the guy on this machine links to the guy on
that machine, then it's going to be a lot slower to get a
message across from here to there than if
they're on one machine.
So you have an additional annoying level of complexity
of trying to group nodes that cluster together in terms of
their internal links on the same machines.
We have code in place that handles that kind of thing.
It hasn't been stressed and tuned extensively.
We'd sure be a lot happier to have one massive
supercomputer, or direct processor to processor
interconnect fabrics or something.
I don't think hardware is really the bottleneck in
getting the AGI though.
I think the bottleneck is getting all the algorithms
tuned and interoperating correctly.
Then the absolute worst that's going to happen is once we
have the thinking machine all designed and working, then
you've got to wait five years for hardware to catch up so
that you can run it within your budget.


Are you working with him currently?
BEN GOERTZEL: I am a good friend of Hugo's, and we've
talked a lot about working together.
We haven't yet quite gotten to the point of doing anything
practical together.
What we have discussed doing is the MOSES Probabilistic
Evolutionary Learning algorithms being run on the
FPGAs, according to a design of Hugo's.
He's done a bunch of stuff with basically genetic
programming with neural networks, but it didn't really
have to be neural networks.
Genetic programming using FPGAs to
massively accelerate things.
We've tossed around a bunch of ideas for doing the same thing
for the Probabilistic Evolutionary Learning
We haven't actually done the work.
This is an area where some advanced hardware design would
be useful actually.
FPGA's can do processing really fast, but they don't
have that much on board ram.
What we really need is an FPGA with a
shitload of onboard RAM.
Then you can massively speed up the kind of evolutionary
learning we use, which would make it a lot cheaper to build
a super-thinking machine server farm.
But we can save that for when we become a Google scale AI
company and can buy hardware companies and subvert them to
our purposes.

AUDIENCE: You mentioned that there are different kinds of
for sensing things.
How does the system perceive time?
BEN GOERTZEL: Well, actually that is a technical detail I
did not go into at all.
Each thing that comes in perceptually has a time stamp
associated with it.
There's actually a special index on each server, which is
called the time server, which allows the various processes
to look up atoms by time interval.
So we actually made a separate time management system.
There's a bunch of stuff like that, which has just not gone
into here, but an awful lot of attention has gone into
various kind of indexes into the knowledge for efficient
access, which is a pain in the ass, but seems to be necessary
to get things to work in real time.
That's one thing I would say about the virtual embodiment
is needing to control an agent in real time imposes a lot of
discipline on you, as an AI developer.
Things just have to happen fast, and most academic AI
approaches just aren't up to real time control and feasible
computational resources.


BEN GOERTZEL: It's a good question.

The question is what's the most interesting thing we've
seen the system do so far?
The thing that interested me most I would say is when
training the system to play fetch, and to play tag, just
simple games.
The mistakes it makes and the kind of pathological ways of
playing fetch and tag that it can come up with are
We've not tried to get the system to do
anything really advanced.
All we've done is kind of simple moving around, picking
stuff up in the simulation world.
So the proto AGI system has way less interesting behaviors
than our specialized bioinformatics systems or
something, which have discovered new things about
Parkinson's disease and Chronic Fatigue
Syndrome and so on.
But it's interesting in that if you take a really simple
reinforcement learning and try to teach it to play fetch, it
doesn't make the same kind of mistakes that this system does
when you try to teach it to play fetch.
You can vary the kind of partial reward that you give
it for playing fetch or tag correctly and see how it will
misinterpret the partial reward.
Which pathologies it will get.
Going to pick up the thing that you threw and kind of
coming near you and teasing you, but not quite giving it
to you and so forth.
So you can see the inklings of a personality emerging there.
I'm excited about that in terms of rolling out the
system for controlling agents in Second Life.
Once we do that, which should happen around the end of this
year, then we'll have a whole bunch of people interacting
with it and playing with it, and it should learn a lot from
those interactions.

All right.
Well, thanks for your time.
It's been an interesting place to give this talk.