Pimp my Genome! The Mainstreaming of Digital Genetic...


Uploaded by Google on 25.07.2007

Transcript:

JONAS KARLSSON: My name is Jonas Karlsson and I'm happy
to introduce them Andrew Hessel here, who's going to
talk about genomics here at Google.
And this talk is video recorded for a wider audience,
which is going to be available on Google video later on.
Andrew Hessel has been a genomic scientist, and
facilitated development and adoption of DNA technology for
over more than 15 years.
And his first focus was on sequencing.
Later on, he worked on bioinformatics analyzers since
2003 and practical writing of DNA code.
He's going to talk more about that, which is quite
interesting subject, making it available to lay people.
He's worked with leading edge genomics group in industry and
academia, including Amgen Incorporation, Thousand Oaks,
in California, University Health Network in Toronto,
Canada, MIT, in Cambridge, Massachusetts.
ANDREW HESSEL: Thank you.

It's a pleasure to be here.
I don't expect that many of you have a background in
genomics or bioinformatics, so you won't need that to
understand this talk today.
I haven't really worked in industry for a number of
years, or academia.
What I've have tended to do for the last few years is go
around and tell stories about what's possible, and bring
people together and hook them up to do
some interesting work.
The last couple of years has been with MIT.
And most recently, I'm working with a group number in
Alberta, called the Alberta Ingenuity Fund, which funds
long term science projects.
So it's an ideal opportunity to hook people together.
So this talk [UNINTELLIGIBLE] my genome.
We are trying to appeal to a younger audience.
We're trying to actually show that we are getting to the
point where we can do some modifications.
So this talk will focus on the mainstreaming of what I call
digital genetic engineering.
Biology is the study of life.
For those of you who have no background in biology,
historically it has been classifying animals and
putting them into some sort of evolutionary framework.

Human beings, of course, are considered to be at
the top of the tree.
This my friend, Elizabeth, and her daughter, Leah.
Large animals, like elephants, smaller animals, like cats,
many, many different species of plants, and as we start
getting smaller in the animal kingdom, insects, and beetles,
etc., we see a lot more diversity--
thousands or millions, many millions of different species.
Finally moving into bacteria, this is E. coli under an
electron microscope.
And finally viruses--
viruses aren't typically considered living organisms.
They are really just information packages that tend
to hijack various cellular systems. Lots of different
morphologies, you can see here--
We don't know how much life there is on this planet, in
terms of diversity.
The numbers vary between two million and 100 million.
And I think that may be off by a factor of ten or more.
We really know very little about microorganisms. In fact,
a lot of projects that are going on in science today is
doing something called metagenomics, because most of
the microorganisms that do exist on this
planet we can't grow.
There's a lot of projects now that will go out and actually
collect materials from the environment, just splice the
materials, take the DNA out of it and actually try and
understand what organisms are there, just based
on their DNA code.
Much more computational analysis, this is being
applied in a lot of different areas.
Craig Venter, one of the sequencers, formerly of Celera
Genomics, one of the groups that was doing the human
genome sequencing a number of years ago, has been focusing
on this, taking his boat, Sorcerer II, out and taking
ocean water samples at various areas, learning a lot about
microbial diversity and viral diversity in the oceans.
People are doing this even in the human gut, in the guts of
various animals, like cows, etc.
So we're learning a lot about microbial diversity.

On the left, moving basically to a subcellular component,
there's a tremendous amount of information that's coming out
these days.
On the left is a bacterial cell.
It's basically been squashed.
And that's its linear DNA that's been pressed out of it.
And and on the right, those are human metaphase
chromosomes, basically a very condensed form of DNA.
There are thousands, millions, actually, of different enzymes
that operate in the cell, all in real time.
This is alcohol dehydrogenase.
I'm sure you're familiar with it, if
you've ever had a drink.
It plugs into much larger pathways inside the cell.
In the upper corner here you'll see just the known
biochemical pathways.
Alcohol dehydrogenase fits in right here, takes alcohol and
shunts it into fatty acid metabolism, which is why you
get a beer gut.
Very complex information, all of this working in real time.
Most of biology isn't digital though.
This is typically how biologists work.
They write things in notebooks, very hard to get
the information out.
Science, one of the leading publications, is, again, a
print journal.
And we share a lot of information at conferences and
just in hallways.
This is not the most effective way to do science.
Over the last 15 years, though, digital biology has
come to the forefront, largely because the human genome
sequencing effort created vast amounts of genomic data.
And following from that, we started getting other -omics,
proteomics, the study of proteins, metabolomics, the
study of metabolism in cells, etc.
And we realized we're just getting
swamped by all the data.
We have to start creating some sort of digital framework to
make it understandable.
I'm going to focus mainly on genomics, which
is my area of expertise.

Going back to, the structure of DNA is fairly recent.
It was discovered in 1953--
it's the famous picture of Watson and Crick in Cambridge.
DNA, itself, is a biochemical information storage molecule.
It's digital.
It's base 4, not binary.
And it has the unusual feature that it can replicate,
duplicate, perfect fidelity and copy itself by this
unwinding and enzymatic processes for duplicating each
strand, using it as a template.

I've always looked at DNA as essentially machine language
for biochemical processes going on in cellular systems.
Genomes are very special programs that actually have
enough information to encode the processor, as well as the
machinery to duplicate the program and make sure it gets
installed on the new processors that are made.
Not all genomics circuits, not all DNA circuits, or
constructs, have to have this much information.
If there is an existing operating system in a cell, we
can just add a small bit of DNA that can just add a new
feature of metabolism.
But a genome, typically, is a self-sustaining program.
Genomes vary widely.
Small bacterial genomes have about 1.8 million
base pairs of DNA.
Humans have about three billion.
You'll notice that the number of genes doesn't
scale quite the same.
Escherichia coli has about 4,000 genes, actually.
A human being has somewhere between 25,000 and 30,000.
So bacteria are still very complex organisms. They're
just highly compressed.

From a computer scientist point of view, DNA can really
be looked at as information on a hard disk--
zeros and ones, very similar ideologies going
on between the two.
Most of the processes that you will have on reading and
writing information to a hard disk, we can find similar
processes happening in biochemical
systems in the cell.

We didn't create cells, though.
We had to learn how to read the code.
And this has been an ongoing problem since DNA was
discovered.
We started to make progress in the 1970s.
This fellow is Fred Sanger, who developed a method for
reading DNA, depicted with this high tech tool here.
This was a really laborious process for
trying to read DNA code.
It was toxic.
It required radio isotopes.
It was slow.
In 1980, if you could do 500 base pairs of sequence a day,
you were doing really well.

A fellow by the name of Lloyd Smith ended up developing a
way to do this in an automated fashion, which was ultimately
commercialized in the late 80s by a company called Applied
Biosystems. And throughput started to increase.
This is about the time that the human
genome project was announced.
These machines weren't fast enough to do the human genome,
but they felt that the technology was on a roll and
would continue to grow quickly.
So by 1995, the throughput had gone up to
about 144,000 bases.
And in 1998, when Celera Genomics started to do the
human genome sequence, and public groups, working in a
very high throughput capacity, they were using a machine
called the ABI 3700, which was the real workhorse of the
human genome.
Today, almost ten years later, we have throughputs that
basically are approaching about a gigabase a day, very
fast, one sequencing run, extremely cheap.
So the ability to sequence DNA has grown very rapidly.
In fact, graphed out, the rate of being able to do sequence,
and the cost of producing sequence, essentially falls to
the point--
in another ten years, 15 years, we should be able to
see your entire genome sequence as a standard
therapeutic or diagnostic procedure.
That's assuming no dramatic leaps in technology.
However, the X Prize Foundation--
and I noticed you have that wonderful SpaceShipOne hanging
in your lobby--
has announced late last year that they would have a $10
million dollar incentive prize for technology that would
allow basically a hundred genomes to be sequenced in ten
days, at a cost of about $1,000 per genome.
So they said the first prize was to go and venture into
outer space, now they want to go into inner space.
So we're going to see some dramatic developments in the
short term.
DNA is language.
So we've got reading pretty much on track.
Comprehension has changed a lot over the last 15 years.
This was my first bioinformatics machine,
essentially, and with small little shareware programs,
like DNA Strider, which was more than sufficient for
analyzing the amount of genetic sequence
I had to play with.
My first copy of GenBank came on floppies.
Now on the bottom is a picture of Blue Gene/L, one of the
most powerful supercomputers on the planet, which is now
often tasked to various bioinformatics processes,
beyond my league.
So biology is moving digitally.
This is just a representation of an enteric genome,
basically a small gut bacterium genome.
We can lay on lots of different data now.
Most of these types of genome visualizations and analyses
are pretty much fully automated today.
We can do comparative genomics, looking at how one
genome compares to another.
Most of these processes are done very quickly.
With the newer machines, we can sequence a bacterial
genome in basically one run and produce a map of its
metabolism in an afternoon.
So we've got about 500 or 600 bacterial genomes available
publicly now.
And more genomes are being added every day.

We can also look at other areas or
windows into the genome.
This is what's called comparative genomic
hybridization.
It's basically looking at changes in copy number from
breast cancer tissues.
This is the type of thing we sometimes see, where regions
of the genome have been selectively amplified because
some gene important in the cancer processor, that's
involved in cancer process, is being upregulated
at a genomic level.
Genomes are not fixed entities.
They are highly plastic and dynamic.
And more groups are starting to realize this now and
publish data on this.
We can also look at things like gene
expression over time.
This type of heat map is taking a look at essentially
messenger RNAs, which are like working copies of the genome
going out into the cellular metabolism and being turned
into proteins, etc.
This is a really quick way of looking at what the cell is
doing at a given point in time.
And we can analyze basically every gene in the human genome
in one single experiment, or any other life form,
basically, at this point.
AUDIENCE: What's the definition of a gene?
I mean, I understand it's [UNINTELLIGIBLE], but--
ANDREW HESSEL: You can basically look at it as a
functional biochemical unit.
The old terminology was one gene produced one
protein in the cell.
But there's a lot of regulatory genes, as well, and
other genes that may only act at the level of controlling
some other process.
And it doesn't necessarily have to
go always to a protein.
But you can just look at is a functional unit.

Today the digital biology has been growing so quickly, and
pulling in a lot of different areas, again, metabolism,
proteins, reaching out and looking into literature as
well, digital literature, and protein-protein
interactions, etc.
It's a complicated web of data now that has been
growing very quickly.
In fact, we're starting to run into the problem that we keep
drilling down onto finer and finer levels into cellular
processes, and we run up against complexity.
We just don't really understand what's going on.
And it takes a tremendous amount of processing power.
The more we look, the more complexity we see.
Today just storing and accessing some of the
biological data sets is becoming a real challenge
because there's no real commercial model to support
the growth of the server systems and the data systems.
Most of it is publicly supported.
And of course, there's our limited, finite comprehension.
We simply can't wrap our brains around biology.
Most of the people working today will drill down into
very fine specializations.
So we're getting a lot of functional barriers between
different groups working.
This has led to the growth of something called systems
biology, which is really applying algorithms to doing
things like determining gene function, doing diagnostics
and being able to make predictive models, and
visualizations, etc.
This is Lee Hood.
He is one of the leaders in systems biology work.
He has an institute in Seattle called the Institute for
Systems Biology.
And he's struggling in some ways, because they can't even
agree on a definition of systems biology today.
In many ways, it's just biology in the digital age.
And it's going to become increasingly complicated.

Why is a complicated?
Well, it's because we can't always extract the information
of why things are.
This is a representation of a structure in a cell called a
nuclear pore.
It's basically a doorway between the inside of the
nucleus, where DNA is kept in a eukaryotic call and the
cellular metabolism.
As you can see in this cartoon, there's all sorts of
little parts and pieces in this mechanism.
I have no idea how this thing works.

If I needed a pore, or some sort of channel in a synthetic
cell I was building, I wouldn't necessarily recreate
this because I don't know what it can or cannot do.
This has been the work of hundreds of researchers to
determine this structure.
This is an evolved electronic circuit.
Most electrical engineers wouldn't be able to figure it
out either.
It was evolved in an environment that basically
selected for a certain output.
This will take the square root of an input voltage.
But we can't determine how or why it works very easily.
It's not designed for human comprehension.
It's been selected for a function, possibly the same
way that the nuclear pore has been selected for a function.
And we may never be able to figure it out completely.
So evolution--
a lot of the reasons and, basically, the documentation
behind why something is working a certain way is lost
through the evolutionary process.

So if we're going to do genetic engineering, that
means we have to be able to write code.
I put engineering in quotes because it hasn't really been
engineering up to now.
It's largely been one-off art forms. And this is where we
have to start changing the way we think as biologists.
If we can build it, we really don't understand it.

Genetic engineering dates back to the early 1970s.
The first manipulations of the DNA molecule were essentially
done in 1972 and '73.
They were pretty simple, but they sure set off a chain of a
lot of speculation as to what would be possible.
The main thing that was done by doing this work was that
the species barrier was dropped.
We could take DNA from a plant, or a marine organisms,
and splice it in with humans or any other animal.
This simply wasn't possible before with breeding.
Even then it took a number of years, past 1972, for this to
really start reaching mainstream consciousness.
This is 1977 before it really started making the
front page of Time.
And then the biotech boom, starting in 1981, the first
biotech boom, the wave of companies, many of which still
exist today, capitalizing on some of these new techniques--
DNA and electronics have, in some ways,
a very shared history.
DNA was discovered about five years after the transistor--
has really a lot of potential, the same potential, for large
influences on society, and in some ways, almost a very
similar industrial growth curves.
The companies that grew out of the 1970s and 1980s shared
very similar economic curves in their growth, but have
recently flattened out.
Even looking at labs, a level four containment laboratory,
where you don't want people getting infected by the
materials working with, looks pretty similar to a chip lab,
where you don't want to materials that are being
manipulated infected by humans.

The first generation DNA manipulations were largely
done using what's called splicing, gene splicing.
This is using molecular scissors to cut DNA at a
certain point, as depicted in this wonderful little cartoon,
and put sticky ends together.
They're basically like doing ransom note type work.
I'll show a little graphic on that.
This is a typical molecular biology lab today--
liquid handling devices, all sorts of chemicals and
reagents, and a lot of them homemade, a more sophisticated
unit with robotic liquid handling, being able to do
some high throughput work, and people literally working at
lab benches.
It's a lot like witchcraft, because they simply cannot see
what's going on.
They have to trust that the reactions are proceeding in a
certain way and that things are happening the way they
think they should.
It's all very indirect.
After about 10 years working in biotech, I thought there
has to be a better way.
I took a year off and went to this beach in Thailand and
just started thinking, processing the last few years
of my life, and going, where can I take this to a new
level, where we can actually start speeding things up?
The biotech industry just seem to be
getting slower and slower--
ten years to make a drug, over a billion dollars to get
something out.
It just seemed like it wasn't going to be able to respond to
the needs of humanity over time.
So I started to think a lot about genomic programming.
Cutting and splicing, applied to text, would give you
something like this.
I sat down with last month's Wired magazine and tried to
write this simple sentence.
It took me an hour and 19 minutes.
It took me about 45 seconds to type it on my word processor,
have it spell checked, printed, and uploaded to my
blog for the world.
So the technology that we're using today for most of our
genetic engineering is obsolete.
And this technology is in use today worldwide.
It's still considered the gold standard.
At very best, it's like type setting.
The words have to exist. Yes, if you put it all together,
you can you can print a book.
But it's not very dynamic or flexible.
So we've had this tremendous advance in sequencing going
from physical DNA to digital DNA over the last decade.

The reverse process seemed obvious.
We need synthesis.
We need to be able to go from some sort of digital sequence
that we may devise and go back to physical DNA so we can
upload it into the cell and test it.
When I started talking with people about this, they just
couldn't seem to get it.
No one wanted to work building synthesizers.
No one wanted to put the effort in.
And I couldn't really understand why, so I just had
to let it drop for a while.
We've had the technology for doing DNA
synthesis for over 20 years.
It's a very simple process.
It's cyclical.
Basically DNA is built on a solid glass matrix, a new base
is added, any reaction that doesn't go to completion is
capped, the next base is added.
And it goes on in a cyclical process.
The problem is errors accumulate, so we can't make a
very long chains of DNA.
After about 100 bases, you end up getting, with a one percent
error, you end up having an error
somewhere in that sequence.
These are different types of DNA synthesizers.

Are you going to be able to get that up?

So these are different DNA synthesizers that are
available from various manufacturers.
In the upper left hand corner is Beckman 8-channel device.
Up in the upper right are machines called MerMades.
They go up to 384 four channels.
These were used to do a lot of the sequencing work in the
public human genome project.
Newer technologies use the same chemistries, but just on
smaller scales, so the reagent costs are a lot cheaper.
This is a microreactor, basically working on a lab on
a chip system.
This is a digital light processor chip from Texas
Instruments, which has about 780,000 elements, meaning you
can make 780,000 different strands of DNA on a solid
glass matrix in one single experiment.
And newer technologies have basically no moving parts, are
just ultraviolet LEDs with capillary tubes.

I pulled this up on eBay last night.
You can go and buy this equipment basically for next
to nothing. $89, pick up your own DNA synthesizer, take it
home, play around in your basement.

This slide just shows how they put these
smaller fragments together.
This is where all the work is.
We can make DNA for essentially nothing today.
But going through the assembly process, taking these smaller
pieces and putting them together into larger
constructs, that we can actually make something
useful, is where the technical challenge remains today.
There's been some very large steps forward in this in the
last few years.
But here, they're assembling a gene that glows green if it
assembles properly.
It's only 714 base pairs.
But trying to get to something that's
large enough to encode--
[INAUDIBLE].

You can buy DNA synthesizers for nothing now.
They're basically just big door stops.
As you can see, the prices vary.
We're getting down to around $0.69 a base pair.
That's for basically any size DNA you want.
Small viral genomes will have about 5,000 base pairs, large
viral genomes about 200,000, and a minimal genome,
somewhere on the order of 300,000.
So we're getting to the point where we can do some pretty
interesting genomic programming,
having mail order compiling.
Send your DNA sequence, they'll give it to you, you
send it back.

Graphed out, Moore's law is the red line, DNA sequencing
is the blue line, synthesis is the green line.
Our capacity for synthesizing is growing rapidly and will
probably continue to do so for some time.
There are groups around the world that are those that do
DNA synthesis now.
And essentially, if you have a credit card and can type, you
can get DNA.
So while I was sitting on that beach in Thailand, I put
together another idea, which was well, if you can
synthesize DNA and you can essentially program it, like
you were programming a computer, you can decide how
you're going to program it.
You can do it proprietary and keep all the code for yourself
and try and figure out all the biology.
Or you might want to try doing open source biology.
And so I started writing about this.

And then that introduced me to people that were also thinking
along these lines.
I came across these two guys.
On the left is Tom Knight, at MIT, an electrical engineer.
And on the right is Drew Endy, one of his colleagues,
originally trained, I believe, in civil engineering.
They both focus now on engineering biology.
And they really supported open source biological programming.
In fact, they were pretty far along when I found them.
So the engineering process is basically have some success,
refine it, continue to refine it, so you have more success,
and so on and so on.
And the complexity tends to increase.
And when you build something, you want it to do what you've
designed it to do.
It's stopping the evolutionary process, in some ways, at
certain points.
It's applied to electronics, of course software,
aeronautics, structures, materials, automotive, but not
to biological engineering.
So these guys started engineering biology.
They took DNA code and essentially put it in a
wrapper, gave it a part number, gave it a functional
name, this one being a terminator.
It stops RNA Polymerase at a certain point so that it says
basically OK, end here.

Then they went out and measured as many parameters of
this part as they could, and started to build
documentation.
This allowed the part to be used with other parts in
assemblies to build systems, for example a sender device
that actually puts a molecule out
across a bacterial membrane.
Put all this together, and we can start having some sort of
a framework for engineering biology, with parts, devices,
and ultimately systems that, fully assembled, could even go
on to make full cells.

Mediated between the DNA code and parts is really synthesis.
Everything else is just assembly processes of the
parts and standardized data that can be machine readable,
shared between the different levels.
You want to design systems, you don't need to know the DNA
code, you just have to be able to pull the
parts off the shelf.

So they built the first catalog.
They had some funds for DNA synthesis.
They put together a catalog of various parts, show that it's
working, made them available, open source, and started
putting them out to the world, documenting as many functions
as they could measure.
This is their best part.
It has the most measurements.
Most of the parts don't have that many measurements.
Test and measurement in biology
remains a big challenge.
They also created something called iGEM international
Genetically Engineered Machines.
It's an open source program that essentially tracks people
that want to play with these parts and use them and see
what they can build with them.
It's like a hobby kit for electronics.
So it shares everything, parts, codes, protocols,
experience, publications--
only one rule, share back.
It's grown quickly.
I've worked with iGEM since about 2005.
I'm always impressed by the types of kids that we get out
to it, and how much effort they'll put in.
In the first years, in 2003, it was just in-house at MIT,
as an independent activities project.
They opened it up in 2004, with some
five friendly schools.
In 2005, there were 13 schools.
Last year, we had 37 schools from 15 countries.
And it just keeps growing around the world.
We just close registration for this year--
57 teams, 20 countries.
It's really international now.
I was really surprised by the additions of Russia, and a
number of teams from China.
So we're growing a community of people that can actually do
this work, engineering biology, and grow the number
of parts very quickly.
And it's getting a lot of attention internationally,
which continues to reinforce the growth of this program.
Some of the projects that have come out of this in last few
years, just--
light sensitive bacteria, essentially making a bacteria,
so you shine a light on it and it ends up
producing a color pigment.
It makes a bacterial film.
So here just a few the images they made.
Hello world, of course, tip of the hat, Darwin, and one of
their supervisor, Andy Ellington, from the
University of Texas.
AUDIENCE: So, in that example, did they splice it, or did
they splice in something [INAUDIBLE]?
ANDREW HESSEL: Yeah, they built a circuit that basically
was light sensing and pigment producing, and put it into a
bacterial chassis background.
AUDIENCE: [INAUDIBLE]?
ANDREW HESSEL: Not yet.
Not yet.
AUDIENCE: Can you repeat that question for the video.
ANDREW HESSEL: Oh certainly.
The question was whether this was a complete genome, or
basically just a couple of functions added
to a bacterial chassis.
And indeed it is not a complete
genome that's been assembled.
It's just a new circuit added to the bacteria.
This was a group from Scotland last year that's created a
circuit that essentially built a biosensor for the detection
of arsenic in groundwater, which is a big problem in
Bangladesh.
They're actually trying to commercialize this unit.
It's extremely sensitive, better than the chemical tests
that are available.
It costs about $0.50 a test. If there's arsenic in the
water, it ends up producing an acid, and a color indicator
turns red, means you shouldn't drink the water.
A group at MIT decided that their lab is
a little too smelly.
Bacteria has a odd odor.
So they knocked out that odor, first of all.
And then they made it so that the bacteria would smell like
wintergreen or bananas, depending on its growth,
whether it was in log phase, and actively growing, or
whether it was in stationary phase and had
used up all its sugar.
And here, they're demonstrating the smells.
This actually has some really practical applications,
because a lot of reporter systems require
instrumentation to measure, but our nose is extremely
sensitive at parts per million.
This is what everyone competes for in iGEM, the iGEM cup.
Really that's about it-- recognition, having the chance
to share their stories at MIT, and a big block of aluminum.
Schools around the world are taking notice.
Almost every school that gets an iGEM team,
starts creating a program.
Berkeley and MIT were some of the first. Now there's schools
all over the world that are building undergraduate
programs and considering departments.
This is really interesting.
Biology is very slow to move, but some of these investments
have been fantastic.
This fellow, Jay Keasling, received about $42 million
dollars from the Bill and Melinda Gates Foundation for
doing some work on an antimalarial drug.
Berkeley also got a large biofuels project recently.
As people move beyond essentially the iGEM program
in these parts, they can graduate into something called
Open Wetware, which is modeled along the lines of MIT's Open
Courseware.

Looking forward--
well, according to economists, this technology is just going
to continue to outpace and replace existing recombinant
DNA techniques and really accelerate the process.
i don't know for ready for it.
There's a lot of--
AUDIENCE: Can you explain the difference?
ANDREW HESSEL: The recombinant DNA process is you're actually
working with physical DNA, doing the editing.
It's like the ransom note model.
With synthetic DNA processes, you're basically just typing a
sequence in an electronic text file, and sending it directly
for synthesis--
much faster.
So there's a lot of forces driving this along.
Low cost DNA synthesis and global dissemination of these
technologies is amongst them.
But there's a lot of risks as well.

The ability to synthesize DNA really opens up the ability to
create things like advanced bio weapons, be able to
generate viruses that would normally be under strict
government control, because most of the genome sequence of
things like Ebola, smallpox, anthrax, etc., are already in
the public domain.
So there are some concerns moving forward with this,
rightly so.

How the government responds to it and how people respond to
these technologies over the next few years will really
change the way they're adopted.

If they move into a military domain and people are highly
regulated, it could go underground.
It could be that we just never solve some of the problems and
the rate of development just ends up moving very slowly.
This has been the case in some of the engineering that has
gone on in the past. Or it could just open up wide and we
can have modular life, people working in their garages,
designing cures for cancer using software.
I'm hoping that this is the way we move, because this is
the way computers have moved over the last 25 years.
And I think that the changes we've seen in the electronic
world are fantastic.
But we do know there's a risk.
This fellow, a fellow by the name of Wemmer, synthesized
the first virus back in 2002, and it was a polio virus.
And that concerned a lot of people.
Today we're moving towards creating a minimal bacterium.
So this is getting to your question.
No one has yet booted up a synthetic,
fully independent cell.
But we expect it very soon.
We're starting to get to the point of synthesizing genomes
large enough to support a minimal bacterium.
That's probably a Nobel Prize.
That's closing a cycle of evolution that's billions of
years old, from primordial oceans to really being able to
build a bacterium.
This could very well lead to a next generation biotech
industry that is much more responsive
because they can take--
for example, cancer is always a failure of your genome at
some point in some cells to respond properly.
And so the shortest route to correcting it is to work at
the level of the genome, of DNA.
This type of technology would allow us to take diagnostic
data, put it through therapeutic engine, and
potentially craft a customized solution just for you, using
algorithms. We're already seeing next generation biotech
companies appearing, focusing on things like engineering
organisms for biofuels and therapeutic molecules.
There's probably a dozen of these companies globally now.
And I see more and more business plans for companies
like it, because you don't need the overhead.
You don't need the labs.
Drew Endy has been going around and talking with people
about building what he calls BIOFABs, essentially high
quality parts, along the same lines as the iGEM program, but
very well designed, well documented, that can be used
for much more serious research or possibly therapeutics.
And there's a number of different groups around the
world that are looking at putting together BIOFABs,
sharing information and sharing parts.
There will be proprietary ones as well.
We're also seeing very new ways of going out and raising
support for this type of work, which is considered very
leading edge.
This fellow Aubrey do Grey created his own foundation and
has raised millions of dollars towards doing some cellular
repair type engineering, and very interesting man.
But no one would give him grants, so he just built,
essentially, a granting agency.
This is a simulation of a very simple virus, T7
bacteriophage.
It's a virus that only infects bacteria.
Drew Endy gave me this simulation.
It was done by a couple of people in his lab,
[UNINTELLIGIBLE]
and Jason Kelly.
It's really fast. It's dynamic.
It's basically showing the amount of messenger RNAs,
which correlates to proteins in this virus when it infects
a bacterial cell.
But there's a problem.
It really doesn't correlate to the real world data.
This is one of the simplest organisms
that we can work with.
And we still can't really tie the models into
real world data yet.
We're still learning how to do things.
This is a graphic of just a cellular membrane.
It's an agent base model done by this group in Alberta.
And they keep they keep adding to the membrane and it starts
doing these interesting--

getting all these features on the surface, but
they don't know why.
We don't know this is a very simple model of a cell we
still don't understand the processes that are going on.
Eventually the model tears into the cell and takes a look
at various channels that had been made internally.
But this is about as simple as it gets.
We still don't understand it.

Quite possibly it will take thousands, hundreds of
thousands of people contributing their expertise
into some sort of global model for us to really start getting
our heads wrapped around this.
I love the idea of Google Earth and people being able to
do sketch up, and build their favorite buildings, etc.
Maybe one day we need something like that with
various molecules operating in biochemical systems. At the
level of the bacteria, it probably wouldn't be too
burdensome.
Overall, the area that I look for, and what keeps me very
interested in this--
this is from 25 years ago today, May 3rd, 1982.
It's the kids that are going to really take this technology
and keep pushing it forward.
They're attracted to genomic programming as they were to
programming PCs 25 years ago.
And I just really hope that we can provide this technology
for them as early as possible in the educational system, so
that they can become acquainted with it
and play with it.
25 five years ago you could get these comics from Radio
Shack that basically put in an easy to read form some of the
basics of computing.
The group at MIT has done the same thing.
This comic describes a lot of the process they're doing.
It's been translated into a dozen different languages now.
They are working on issue two.
But this is what really, I think, exemplifies where we
need to be going.
This is a kit I bought at a science center last year.
It's a very powerful DNA extraction kit.
It comes with all these little exercises, freeze dried
bacteria, plasma DNA, for making what's called green
fluorescent protein, and all the instructions for getting
that DNA into the cell.
And it even comes with a little LED key-chain, so you
can check to see if your cells have taken up the DNA and are
glowing under UV light.
Wonderful little kit, $24.95, ages eight and up.
I think that's where we need to be taking these
technologies and moving them forward for the next 25 years.
And that's basically it.
I'd like to thank Drew Endy, for a lot of the slides and
graphics and for his work with the iGEM program; the iGEM
program and all the people that I work with; Bio Era, for
some of the economic data that I used in this cyber cell for
some of the models; Jonas, for the invitation, thank you; and
Alberta Ingenuity, for supporting my going around and
talking to people about this work.
Thank you.

JONAS KARLSSON: Thank you very much.
Andrew.
And I just want to mention that Audrey is going to be
visiting here at the end of the month, kudos to you, who
managed to connect us.
So he will be here giving another interesting talk.
And I leave now the floor open to questions and discussions.
Please repeat the questions now.

AUDIENCE: You spent a lot of time in your talk talking
about creating a virus or creating a cell.
It seems to me that there is a step here that you missed,
from the digital part, which is, in your analogy, the
program in the cell, which is the computer that operates on
the program to generate a cell, that you can't do this
completely digitally, because you're going to end up just a
program, and nothing to execute it.
So I don't see some closure, some transitive closure that's
missing here.
ANDREW HESSEL: Right, so the question is basically how do
we go from a digital program to a fully
operating cellular system.
That's an open question at this point.
Groups are taking a variety of approaches.
One is we can take bacterial cells and essentially destroy
the endogenous DNA in that cell, creating kind of an egg.
So it's a bag of chemicals that is non metabolizing, just
that, if we had a genome to it, can
start to run that program.
So that's one route.
Other groups are taking the approach, well, that's good,
but not good enough.
And they are trying to create what are called protocells,
essentially fully defined, artificial cells that they
could put a defined synthetic genome
in and have it operate--
or not always a genome, sometimes just enough genetic
material to run a metabolic reaction over a period of
time, so almost like a disposable cell.
So those two approaches are under way by a number of
different groups.
No one has successfully made a protocell at this point or
booted up a bacterium.
But we expect it very soon.

AUDIENCE: So, [INAUDIBLE]

So basically right now, you want is something to be done,
like [UNINTELLIGIBLE] biology.
And so how will the computer industry, or how computers can
help you in that phase.
So what is required by you?
ANDREW HESSEL: One of the things that
we're missing today--
I showed you that first catalog of parts.
That is a database that isn't very open.
It's not easy to share.
It's hard to put data in and extract it out.
We don't have a simple editor yet, where we can just take
apart what someone's defined and literally do drag and drop
editing of different modules to create a synthetic
construct or circuit.
I don't think that's a major programming effort, but a lot
of biology works in very small groups,
sometimes just a few people.
And trying to coordinate, trying to get someone tasked
full time to even a small project, can be very difficult
for a lot of labs.
Most research labs operate on a few hundred thousand dollars
per year, total.

I would really like, if anyone is interested in doing some
sort of software development for synthetic biology-- not
going out and trying to aggregate and pull together
and make sense out of the masses of biological data in
the world, the systems biology effort, which, again, I think
in some ways, gets bogged down after a while.
But if someone wants to start doing some sort of programming
to facilitate the design of genomes and the collection of
parts, I'd certainly like to put them together with some
folks at MIT that have a lot of experience with this.
AUDIENCE: I had another question for you.
Now we've been successful using recombinant DNA
technology for cloning different forms of--
it could be lamb, sheep, and so on-- so what is so
complicated about the bacterium that we are not able
to generate, or evolve a complete
bacterium right from scratch.
What constitutes the level of difficulty?
ANDREW HESSEL: Well, the cloning experiments are
actually a little different.
Cloning really doesn't do any re-engineering of the genome.
It's just taking DNA from one cell, and getting it to work
in an egg, to regrow the whole organism.
So you're not actually doing genetic engineering on the
genomic material itself.
But, for example, some companies manufacture
therapeutic proteins by taking the gene for that protein and
putting it into a different cell, so that they can grow it
in a bioreactor.
That type of work is more traditional genetic
engineering.
It's extremely slow and difficult, because, again,
they're working with the physical DNA molecule, and
having to use these molecular scissors to
cut and past it together.
And it's a very slow process.
Today, we can actually do that just by using a computer
program, output the DNA, and do this far faster, test a
much more diverse space in the code, to make this work.
And different groups are doing this now.
It is actually cheaper to synthesize most genes than it
is to clone it now.
AUDIENCE: So using the synthesis of genes, as against
cloning, have you been able to maybe
generate something similar.
As in, let's say, in the past, we've seen that recombinant
DNA, even though it's a very slow technology, has been able
to generate results.
So what form of results have you been able to generate
using the newer form.
As in, do you have libraries of
different forms of organisms?
ANDREW HESSEL: Yes.
Even just in the genetic parts that I mentioned, we have over
1,000 genetic parts in the collection now defined--
so things like promoters for turning genes on, various
coding regions for various proteins, terminators to turn
genes off, various switches and control devices.
We're actually building up a very large library of parts
that are very modular and can interchange.
All of the work that's done with recombinant DNA
technology, we can replicate very quickly.
But more than that, if we know that, for example, that we
have this three dimensional structure of an enzyme, and we
know where the active site of an enzyme is, where the actual
chemical reactions are occurring, we can program the
synthesizers to change the nucleic acids in certain
regions, ultimately leading to diversity in the amino acids
in those regions.
And then we can take all of those variants and put them
through high throughput screens, and ultimately,
select for variants that have different activities.
So we're getting very good at single protein engineering,
using synthetic technologies.
Now it's can we do something more with that, build more
complex systems with fail safe check sums, balances, etc.,
and ultimately synthetic organisms.
By the way, viruses are a little bit different than the
synthetic bacteria because you don't actually need the egg,
so to speak.
You just literally take the DNA or the nucleic material
and infect a cell with it.
And it will produce virus particles.
All you have to do is splice the cell and filter them out.
AUDIENCE: So in computer language, we have a control
statements, which will just [UNINTELLIGIBLE].
What's a control flow will go to this function unit, or that
function unit.
So I just wonder, in this synthetic world,
[UNINTELLIGIBLE]
such kind of counter parts, and the how--
So, looks to me that you cannot just assemble all the
functional units together and expect the cell to do exactly
the same thing, because you need to also have some logic
and to direct how the virus functions are going forward,
and how they are coordinating with each other.
So what's that kind of logic in the synthetic world?
ANDREW HESSEL: We're going to have to build it.
This is one of the challenges now.
The circuits that we're building, if you're doing it
with electronics, would be, they're toys.
And yet this is leading edge genetic engineering.
Trying to put together complex circuits and measure how all
of those circuits are working is one of the challenges that
we all face.
We don't know how to do it.
So we're just starting at the ground up, building the
simplest circuits, and trying to run them, and measure them,
and test them, and have them work reliably.
And hopefully we can build complexity step by step over
time and keep control.

AUDIENCE: So when you say synthesis of DNA, if you are
given a code for DNA, how accurate that
synthesis can be made?
If there is even a single point, 1% off,
[UNINTELLIGIBLE], let's say some modification.
And then it dumps off a real living being.
It can be some sort of disaster, also.
ANDREW HESSEL: The gene synthesis companies that work
today actually make multiple versions of the DNA.
And they end up having to sequence, to verify that the
assemblies have worked properly.
So it's not as accurate as a compiler.
All the biological processes are fuzzy.
There's been some techniques that were developed over the
last few years where they actually use the DNA repair
systems that are present, even in very simple cells, that
look for mismatches of DNA to remove some of the
inappropriately assembled constructs.

So we're getting better at assembly, direct from a text
file to the final output.
But today, the gold standard is we still have to sequence
the result to make sure that it's
accurate to the base pair.
Of course once you actually load that into the cell and
have it operating, there is no guarantee that it's going to
stay stable, because the code can still mutate and change.
So these are problems.
JONAS KARLSSON: I guess what was interesting--
the programming language analogy is actually quite
interesting, I think, because I think a little bit more of
data flow when it comes to programming, more functional.
We have filter functions and a flow of data, where all the
processes continuously exist. This is more close to how
electrical engineering is designed, circuits and so on.
But there is no circuit which is an if, it's more of
something being on and off.
And I think that kind of is an interesting analogy, which
might work very well with this, in my humble opinion.
Is there any more questions here?

AUDIENCE: So the actual process is still you make a
bunch of material, put it together with a bunch of
stuff, and then you look for a percentage, or somehow can
cause it to reproduce or something like
to look at the results.
So it's not like you one of something, inject into one of
something else--
ANDREW HESSEL: It would be great to have
that type of control.
We don't yet.
When when we do the chemical synthesis of DNA, were working
in very small volumes today, picomoles and picolitres.
But we're still making thousands of strands of
DNA at once, so--

Ultimately, if this process is done right, only one genome
goes into the cell at a time, or one circuit into
the cell at a time.
And then we screen the cells that have taken up the DNA to
make sure that it has the function that we expect.
So we use a functional selection at the end of it.
And DNA does tend to be fairly stable for a number of
generations, once it's in the cell.
So as long as the cell isn't being too taxed by whatever
circuit is put in there, it should stay
stable for some time.
But how do we prevent it from evolving after that?
Difficult, because the DNA seems to want to
change under pressure.
It's very adaptable material.

JONAS KARLSSON: OK, this concludes our session.
If there's any more questions, please feel free to meet up
afterwards.
Thank you very much.
ANDREW HESSEL: Thank you.