Chemistry on the Web: How Can we Crowdsource Chemistry to Solve Important Problems?

Uploaded by GoogleTechTalks on 19.04.2010

CAROLE: I am here to introduce Matthew Todd, who's here at the University of Sydney to
talk about Crowdsourcing Chemistry and the Tropical Disease Initiative. So please welcome,
Matthew Todd. >> TODD: Okay. So, first, I should say...
>> CAROLE: You don't need to use this one. >> TODD: Oh, is it one of these two? Okay,
I can stand here on this one, all right. So, first, I should say, thanks, Carole, for introducing
me and hosting the visit. Thanks also to, Christy Burner, who also ultimately set up
the visit. And, yeah, I'm an organic chemist. So hopefully I'm going to do a bit of Chem
101 to make everyone familiar with what I'm talking about. But also, I'm going to be talking
a little bit about Open Science. And the purpose of my visit today really is to preach that
we need tools and applications. So, one of the reasons coming to Google is that your
apps are very intuitive and so on, and we need things for conducting science in the
open which we currently don't have, and one of the reasons we don't have them is because
people haven't designed very intuitive user interfaces for things to be used by scientists
to record their work in the lab. The other real reason for coming to Google today is
because I recently bought a Nexus One phone, and I lost the little black sleeve that comes
with it and I'm hoping to get a new one, so, if you have one. Okay. So, there's going to
be a little bit of chemistry and a little bit of stuff about science and how we do it
and maybe how we should we do it. So it's quite a wide range of things and obviously,
if you have any questions, please just stick out your hand and we'll go. Okay, so a little
bit of--I think I still got animation in here, a little of bit of chemistry at the start.
Okay. So I'm an organic chemist. I teach in research organic chemistry in the University
of Sydney in Australia. And what does that mean? Well, we make molecules. I have graduate
students and post-docs and undergraduate who make molecules. So we put atoms together in
specific ways. And one of the interesting things about this is that as you do chemistry
for a long time and you learn a lot about organic chemistry, you learn about how to
do this and you become proficient at doing this both in your mind and also with your
hands. So you become good at putting things together, bringing atoms together in specific
ways to make complicated molecules and this can be done in several different ways. What
we do is we buy things from commercial catalogues and then we use chemistry in a rational way
to put things together. And we make important molecules that may be are useful for pharmaceuticals
or agrichemicals or fragrances and so on. And to do that, sometimes you want to make
a really complex molecule which has certain properties, and you have to know how to do
that, how to put an atom here, an atom there in specific ways. And it can be quite complicated.
In some ways as it's shown on the top here, the molecule might be going from the top-left
might be obviously related to the thing that you can buy. So on the right-hand side there,
you got things in the box, which you can easily buy and put these things together in a kind
of like a Lego manner and build up a complex molecule, which maybe has some nice property.
In other ways, there maybe something available from nature, like the molecule in the middle
on the right, which you can easily transform into something that you might want. So you
can buy that in large quantities from some natural source and you can convert it into
something that you might want to use. So these are kind of ways of using nature initially
to make things which are kind of complicated. But in many cases, the molecule that you might
want, for example, that thing on the bottom-right here is very--it's structurally unrelated
to anything that you can buy or find. And in order to make that you might have to think
in a very lateral way about how you can buy things and combine those things to make a
complicated molecule. And frequently, we find this, some molecule that we--that have some
potent biological property, is not simply made by a logical combination of starting
materials. And this is the right creative process, so a lot of people say that there's
a lot of art in organic synthesis. To make a complex molecule, you have to have a deep
appreciation for the subject and think about it a lot and perceive hidden patterns. Now,
the reason why this is interesting for me was because this struck a cord as the parallels
between this and a chess game have made before. A chess game also has certain rules that you
follow. And in order to get to some final point, some complicated position or some winning
position, you have to follow a certain paths, and the number of paths diverge from the starting
point combinatorially. So the number of possible games of chess, obviously, is a huge number.
The number of different ways of combining small molecules to get larger molecules is
also colossal. So the question that came to me was, well, "Can we analyze how to make
a big molecule with a computer?" And, well, the answer is obviously yes, but no one's
done it. Well, people have tried, but the progress is quite slow. The contrast that
struck me is that Deep Blue, obviously a computer program, Deep Gary Kasparov of chess, this
is the defining moment for A.I., I guess, for computer power and also software development.
Something as complicated as chess could be mastered by a computer and beat the current
reigning world champion; well, this hasn't happened in organic chemistry so far. For
some reason, people have not designed software yet, he would have made inroads, certainly,
but haven't design software yet that can really take on the masters, the big professors at
various universities around the world, have put molecules together. So this--I wrote this
article about this, appealing for maybe some progress and the application of modern computational
techniques to making organic molecules. And so far this hasn't happened. Now the thing
I'm talking about today is related to this, why haven't we--why haven't people developed
tools that help scientists to do science online and in the open? That is the--that is I guess
the message, why can't we do that yet? Okay, so away from chemistry for a second, I sub-reported
it to my lab and one of them is working in an area of Neglected Tropical Diseases. And
now, there are various diseases in the world; cancer, AIDS, big diseases; and malaria, too.
There are some which are neglected, which means purely that the amount of money being
spent on them and the amount of time being spent on them is relatively small compared
to their impact socially. And there are several examples, here's a graph from a website that
lists several that usually have rather complicated names to say, but the one that I'm interested
in is this thing called Schistosomiasis, which is used to be called Bilharzia. It's a parasitic
disease carried primarily in the regions shown on this map. So it's mainly a sub-Saharan
problem. And it's a particularly nasty disease, it's a parasitic disease and a parasite infects
you and lays eggs in you and these are excreted into fresh water and then this can be taken
up--the parasite matures and is taken up by a snail in fresh water and then the snail--the
parasite matures again in the snail and that excretes it into freshwater and then you pick
it up again. There's a cycle rather like malaria with a mosquito as the intermediate host,
but instead, now, you have a snail and freshwater. And it's unpleasant for several reasons, one
is that the egg burden in you and your major internal organs can become very bad and you
begin to get very sick. You don't necessarily die from this disease but it affects you by
morbidity, so it makes you very sick, and it means that children for example who get
the disease are not--can't develop properly. They tend to have stunted growth and they're
very tired, they're not going to go to school and this kind of things. So neglected tropical
diseases like these often measured by something called a DALY, a Disability-Adjusted Life
Year, which doesn't take into account the number of people who die, but it tries to
quantify the impact of a disease on a society. And by that measure, schistosomiasis is actually
a big problem. It affects more than 200 million people have the disease and another 200 million
people are at risk, pretty colossal numbers actually. Now this is unpleasant but thankfully
there is a good drug for it. As with many tropical diseases, actually, there are drugs
available to treat these things. They're not tremendously good drugs necessarily but they're
inexpensive, small, easy to make, and a few people around the world have been suggesting
that we really need to focus on this, that maybe the drugs aren't fantastic but at least
they're there and we can use them and that we can distribute them for a low price. So
for example in schistosomiasis, the Gates Foundation way back 2002, I think, but maybe
I'm going to be corrected on that, funded something called the Schistosomiasis Control
Initiative which operates out of Imperial College in London. The guy who heads it up
is Professor Alan Fenwick. Now, his idea was that we have one drug available to treat schistosomiasis,
and I'll come to that in a minute. And what we really need to do is distribute this drug
enormously widely to reduce the morbidity of the infected populations. So we take the
drug and we just distribute it to whole populations of countries, and this is not happening in
select countries in Africa and other countries in Africa have also begun their own national
control program. So there's an article here in Public Library of Sciences, neglected diseases,
the whole journal devoted to neglected diseases--Africa's 32 cents solutions for HIV/AIDS. It turns
out also that this drug used for schistosomiasis can also be used to try and slow the transmission
of HIV/AIDS in Africa so there's renewed interest in the drug also from the position of HIV.
So in general, the idea is that we try and use things that we already have. So here is
the drug that's used for schistosomiasis. Now, this is a very small molecule. These--for
those of you who dropped out of chemistry, organic chemistry--the lines are bonds, right?
So when a line changes directions, it's a carbon atom. And where there seem to be double
lines, that means there's two bonds between the couple of the carbons, the oxygen's obviously
the O, and the nitrogen is N. They have double and single bonds, there are rings there, but
this is a small molecule, this is a very small molecule. And this drug was found through
a screen. It's not a naturally occurring compound. It was found through a screen of similar compounds.
Initially, actually, for a similar disease in cattle, and I don't know the story of how
it was worked out but it helps people and I don't necessarily want to know that story,
but it was found out and the drug was developed. Now, this is a very molecule and can be made
cheaply. So this drug through market force actually, is now made by chemical in Shin
Poong out of South Korea who supply the schistosomiasis Control Initiative with the supply of Praziquantel
to this drug. And it's quite striking that this drug is now available for around 12 or
11 Euro cents per gram, which is absolutely remarkable. If you look up most study materials,
you might want to buy to try and make this molecule, they will be available from all
the net. So really, it has been optimized and optimate, it's of patent, obviously. The
drug has been optimized and optimized and is now available for a very low price. So
this is great news. Sadly, the news may be is actually too good. When a drug is this
cheap and is being distributed to this many people and killing this many parasites, you
have a problem. If you--evolution tells us that if you try and kill something, it's going
to do something to try and stop being killed, right? So the parasite presumably is going
to become resistant to this or develop tolerance. And this is a big issue for schistosomiasis
because there are no other drugs available to treat this disease. So we're in a very
dangerous situation, we're using a drug to treat literally millions of people. Sometimes
whole countries and villages and cities are being treated with this drug and there are
no backups for when this drug fails. Now, there are obviously some people who will say
that resistance will not appear and others who say it will appear, and there's this debate
going on. My take it on this is better to be safe than sorry so a lot of the research
that we're doing in my lab to do with schistosomiasis is to, for example, develop new analogs of
this drug before they need it. So drugs with a slight modification. It turns out in medicinal
chemistry if you have a drug like this and suddenly it becomes ineffective through the
development of resistance, you can change a little bit of it. You can introduce a little
group on the left, a little group on the right and you might be able to regain potency. So
we were trying to look at--thinking about simple modifications to the structure which
is, I guess, that's what we do, we make molecules. Another thing we might want to do is try to
find how the drug works. And still after more than 30 years of use, the mechanism of action
of this compound isn't known, which is quite extraordinary; apparently, quite a common
situation in that in parasite medicine. However, there's one thing that we can do right now.
And we were in touch the World Health Organization to try and discuss how we can keep this drug
good for as long as possible. And the World Health Organization's perspective is that
we need to try and use this drug maximal while it's still good. And there's a simple thing
we can do to try and postpone the development of resistance. And what you do is you increase
the dose of the drug, right? So you give more of the drug to try and kill off partially
resistant parasites. One of my students was giving a talk about this and accidentally,
he said that what you should do is give more of the drug to kill off the partially resistant
people. You really get the wrong message. The point is the parasites become resistant,
and you don't want that to happen, so you want--if there are some parasites which are
partially resistant, you want to try and killing those off by increasing the dose, 15-20%.
Unfortunately, the amount of drug you have to take is large. This is a 600 milligram
pill. And the field workers who've come to conferences which I've been, tell me the compliances
is a real issue. If you're trying to give literally tens of millions of people a drug,
sometimes in very remote areas, if the drug is too big or tastes bad, it won't be taken.
People maybe naturally suspiciously of artificial compounds, right? We call it pills. It also
turns out the Praziquantel tastes terrible. So there's another compliance issue. So if
you can't necessarily make the pill bigger but you want to increase the dose, so how
do you do that? Well, we as chemists, we see a way of doing this. Okay. This--just a little
bit more Chem 101 something--this is an issue in organic chemistry. Let's take a few minutes
out. This is an issue of organic chemistry that I wish the public understood. It's one
of the most profound and beautiful aspects of the universe. And this is not known generally
by the public, a fundamental feature of organic molecules. Okay, so organic molecules are--they're
three-dimensional, unlike some of the drawings which I put up of two-dimensional things,
they're three-dimensional things, they're real things with depth structure. And the
larger the molecule is like proteins and DNA, this structure becomes very large, they're
three-dimensional objects. It turns out that three-dimensional objects can have this property
called Chirality. And this is all about a mirror image. So if you mention some object,
let's take a symmetrical object like ball. And if you take a mirror image of that object
that the ball you generate in the mirror looks just like the first thing you start it with.
So the two mirror images are superpimposable, they're the same thing basically. Other objects--familiar
objects don't have this property. The mirror image is not superpimposable back on itself.
Actually, the majority of things you see in nature are asymmetric like this. Your hands
are a good example. Your hands extensively looks the same but if you'll try and put and
back on each other, you can't they're not superimposable, which is why your right hand
doesn't fit in your left hand quite frankly. This is very important in nature because it
turns out the molecules in nature, almost all molecules in nature above from water and
ions and things have this property, that they have a certain three-dimensional orientation
in space. And almost all the molecules in nature have one orientation, not the other
one. So if you imagine just for a second that you're walking along that street and you meet
someone you know, and you want to shake their hand. That works one way around, so if I get
my right hand to someone and they use their right hand, we have a good handshake, right?
If one is using their left hand then handshake's all wrong. And this is an important thing.
When two molecules like these meet, you can have different kinds of interactions. And
it's very important if we get it right, so all of these drugs that we now take which
are now approved by the FDA, you have to define exactly the three-dimensional arrangement
of atoms in space and it can't be a mixture of the two. This was made tragically obvious
by the narration of the Solidimide story. Solidimide is showing those two structures
on the right of the screen. And the difference only really is in the structure is that in
one case, see what a nitrogen is, you kind of have a hash line, that implies that, that
atom is behind the rest of molecule. And on the right hand molecule, the nitrogen has
a kind of thick wedge going to it, that implies to half the molecule is in front of the rest
of the molecule. These two things are mirror images, they're not superimposable. And if
you take one, it has a very different effect from the other. So it turns out that one acts
as a sedative they're saying for morning sickness, I think, but the--one of these other molecule
inhibits the formation of limbs on fetuses. And so this was tragically realized when these
babies were born deformed through this drug. And since then, it's been definitely required
that we have to specify exactly what we give people when they take drugs. It turns out
actually this molecule inter-converts the two when it's in the body so it's not that
easy. But the principle remains the same, that one of them is very bad for you and one
of them isn't. This has rather nicer implications in the case of the molecules on the bottom-left
here, this is Limonene. And the reason I've got this picture of my wife up there holding
lemons and oranges is, one is a picture of her and one's a mirror image and there's lemons
and oranges in both hands. One of these molecules is responsible for the smell of lemons and
one is responsible for oranges, so they're two mirror image molecules. If you smell one,
you get lemons, if you smell a mirror image molecule, you get lemons. So it's very important
that we have molecules that specify exactly the three-dimensional arrangement in space.
Now, why am I'm telling you this? Well, probably because I think it's the most beautiful thing,
one of the most beautiful things I've ever seen. That it is a very profound thing about
the way the world is. And I'd love to explain this more widely but of course I'm in a job.
However, this is relevant to Praziquantel, the drug that I'm talking about. So the structure
I gave you before didn't have that little wiggly H on the top. Now that little--that
carbon in the middle there, where the H is attached, is very special. It's got four different
things attached to it. And that means that makes a small molecule chiral. It means that
in one case if you imagine the hydrogen that's on the left there, it would kind of wedge
in and it's coming towards you. And then on the right side, you've got the hydrogen going
away from you; these two molecules are mirror images. You don't really see that because
I've turned it around but they're mirror images. The one on the left is the drug that works.
And the one on the right, doesn't do anything. In fact, it has mild side effects. And in
fact, the one on the right that doesn't do anything is the one that tastes bad. So we
just want the one on the left, that's it. How easy is that to do? Why don't we just--instead
of making both and giving both to people, so it tastes bad and there's too much drug,
why don't we just give the one on the left which has the right orientation? Well, that's
actually quite difficult. So, these molecules are difficult to make when you have to specify
exactly the orientation in three-dimensions of where all the atoms are. And to give you
a sense of that, I guess if you think about the ball here on the bottom-left, it's a symmetrical
structure. It's very easy to make something like this because you kind of like have a
spinning wheel and it's simple to make something that's symmetrical and round, right? In the
middle plugging plus one, I wanted--on the board a plus one, disclaimer, this is a more
difficult thing to make. This is--it's not--sorry, it's still symmetric but it's less symmetric
than the ball. Suddenly you've got to make the round part, which is quite easy, but putting
the handle you going to have to do that by hand. This is a more complicated structure.
When you get to asymmetric structures like this rodan sculpture, that's actually very
difficult to make, right? Because you've got to, with your hands, put things in various
different places. So the less symmetric something is the more difficult it is to make, I guess.
And a lot of molecules in nature are extremely un-asymmetric. They have little bit here and
there that you have to install by not, well, kind of by hand, except because the molecules
are very small, so how we do this? Well, of course, we have to design other molecules
that act as our hands and install things in certain places. It's very demanding. It's
an area called asymmetric synthesis in organic chemistry and hope that people throughout
their whole careers like me too to this area. How do we make molecules in three-dimension?
So it's very demanding. And so, unfortunately, Praziquantel has this feature. And if you
want to make one rather than the other, it's difficult. Making both at the same time, it's
very straightforward, easy. But making one rather the other is difficult. So this is
where we were a few years ago, we thought, well, the World Health Organization wants
just the one active form of the drug, it's called an Antimo, the active, in-antimo of
the drug, they don't want an inactive one. They want that because then they can reduce
the pill size by half. And the pill doesn't taste bad. It's smaller and you can increase
the dose even a little bit. So it's a much more effective pill. The problem is that as
soon as you're trying to make one an antimo rather than the other, it becomes an expensive
thing. People like me have to get involved and I have to think about ways of doing that.
How do you install that hydrogen on one face of the molecule rather the other when a molecule
is so small you can't put it there with your hands? So we were thinking about this, and
unfortunately, drug is already very cheap, so how can I justify assigning academic resources
to reducing the price of something? I would be fired if I do that, right? That's not academic
work to try and try grossly reduce the price of something, interest will be out in several
years. This is not a problem that is solvable by the academia by any means. We would--we,
obviously, in the business we're trying to get out high impact research in new frontiers.
We're not in the business of reducing the cost of anything. On the other hand, if we
turn to industry, they would not be interested at all in this problem because there's not
money to be made in tropical diseases. There's no money to be made in taking an already cheap
drug and try to modify it little bit. So, there's no real market value for that. This
is already a very, very cheap process to make, the combination of the two in-antimos. So,
we were left with this problem a few years ago. I was thinking well, this is an important
public health problem that the World Health Organization would like to solve. We can't
sell it with academia and we can't sell it with industry. And this seems like an en passe,
right? What do we do? If I try to--this graph is meant to indicate that if I try to assign
people to increase the EE, this will be an enantiomeric excess, which is a measure of
how much of one of those molecules they have rather than the other. And you try and increase
by piling dollars in the programs, it doesn't really--you get to this point where you can't
anymore assign anymore resources to this problem. You got to reach a breaking point we say,
I simply can't justify this anymore. And simply, as you assign more people to this problem,
you think, "Well, is this really worth our while? How we do this in traditional models,
we can't do that." So in the case of this--we felt, well, I was actually in my honeymoon
at the time and I was thinking, "How do we solve this problem?" What I need to do is
try and collaborate with as many people as I can. So this is new to me a few years ago.
And it turns out that of course, well you guys know a little about this. You guys know
all about Open Source things. The idea of Open Source is being somehow illustrated by
this comparison between a cathedral and a bazaar. So in academic research in chemistry,
by and large we operate on the cathedral model. I'm the Professor in charge of students and
I have my resources and my grants. And we work usually pretty much on isolation using
the supposed intelligence of me and my students to try and solve problems. We're this autonomous
unit. We do collaborate occasionally, we select people, but we are a closed unit. And we were
thinking about how to try to solve this problem and we couldn't. The contrast between a cathedral
is the bazaar, where anyone can contribute and everyone's opinion is valid and you listen
to many people as possible to try to solve this problem. Now, you guys will know more
a lot than I do. So this is new to me and I haven't read the requisite literature about
this but this was the essential contrast that we were seeing then. Basically, we don't operate
in science on the bazaar model, we use the cathedral. So we don't tend to discuss problems
openly with strangers and a large number of people we don't know and try and get a solution
through the community, this doesn't really happen. You tend to publish papers in academic
journals and people might respond to that. But it's a slow process, there's not direct
discussion between people unless you actively collaborate. These are very different models.
And as I understand it things have gone very well for the bazaar model and Open Source.
So, from my naïve non-computer science perspective, it seems as if projects like Firefox, Chrome,
Wikipedia, these things have gone extremely well, extremely powerful programs, they develop
really quality products. I think a lot of people tend to associate, outside the movement;
people tend to associate Open Source with endeavors which are purely done by volunteers
and which are not funded. And of course this is wrong. From what I hear, a lot of projects
that are Open Source have involved a funded kernel of activity to which people then respond
and help out. The example that maybe that's relevant here is as I understand it, the Chrome
browser was developed by guys here. But it's Open Source that people beyond these walls
can help change, modify and update it. That's my understanding and I hope to be proved right
on that. And so, schematically, I put these together. This is two--again rather naïve
ideas by how you do things in two different ways. I shouldn't have pointed that out. I
should have put some women rather than men icons, sorry, I just, I was in a bit of a
rush. So in the left here, you have the traditional way of doing science which involves people
working in labs, submitting articles to peer-reviewed journals, waiting a few months. And then the
reviewers of that article, who are anonymous usually, maybe one or two people say, "yes"
or "no." The article gets published and then people read that in a literature and then
people design their own response to it, do their own research. Publish an article again.
The times scale is quite lengthy here, it involves months of waiting around while the
review process happens. And in some cases, peer review is of course flawed, I don't want
to get in that discussion right now but peer review is an excellent system but it does
have these big flaws that sometimes you can get referees who are maybe not impartial,
maybe they have vested interest, maybe sometimes there's only like one referee on an article
and then that appears in print in it's then saying sanction as being valid. Typically,
articles published in academic lecture don't have feedback on them. If you want to criticize
an article you have to publish a substantial paper that refers back to the original. It's
not a very interactive process. It's also fairly slow, so the calendar icons indicates
that it's a slow process and costs a lot of money. We apply for a lot of money to do this.
And in some cases, we're competing with people who are doing similar research to us. So in
many cases, we may be duplicating effort. On the right, the idea here is that instead
of doing that, why don't we post the problem to the community, have as many people as we
can reach helping us out with the scientific problem and publish our results in real-time
which are then in peer-reviewed after they appear, so after publication. This is not
a model that operates in science at the moment. This is not how we do things at all. science
operates pretty much on the basis of peer-review, before publication not after publication.
So Wikipedia is an aftermath to a lot of scientists because corrections happen after something
is made public. Notice that--since Open Science is doing on the right there that data would
then be published in real-time as it's acquired and people would be able to respond as they
see fit and collaborate with you in real-time, even though you may not know who these people
are. This is very important qualification, is this not the same thing as an Open Access
in journals. Open Access is where of course you can read things for free, but the research
may have been done in a very traditional way. Open Access is a very worthwhile pursuit.
Journals--it's very important that journals are Open Access but it's not the same thing
as an Open Source or Open Science, where community participants can actually have an input into
the project itself. So if, for example, I design an organic synthesis of the molecule,
anyone, and I posted that as an open science project, anybody in the world could then come
along and say, "No, I don't need you to do that. You should try this and in fact, I'm
going to try it on my lab and I'll get back to you with what's going to--with the results
of that synthesis." So anyone can change the project and repost it for further community
input. That's a very different way of doing things. Okay. So I should mention that a lot
of people are doing this already. Some people, they know who they are. Up in the top-left
there, for example, is Jean-Claude Bradley, who's at Drexel University in this country
who has enabled UsefulChem Project, the UsefulChem Project, which is--its aim is to do Open Science
where he's trying to make molecules that will eventually be used to treat malaria. Jean-Claude
actually practices something as a proponent of something called Open Notebook Science.
The extreme form I guess of Open Science where you're lab work is on the web completely for
every--everybody to see. So every single datum is published on the web. There's a picture
of Steve Cook who has a biophysics lab in the University of Arizona, who also has--everything
he's doing is on the web, and a bunch of other people. Even Billy Clark, Cameron Neylon and
Daniel Mitchell who are among several zealots of the Open Science movement and they're very
frequent commentators on how we should do this. So there's a very passionate community,
lots of people I haven't mentioned, passionate community about this. But it's still incredibly
small and we tend to have meetings where we all get together and talk about this thing.
And the outside world, the larger scientific area doesn't tend to pick up on what we're
doing, unfortunately. However, there are examples on the web of lots of projects which we've
used open methods. This is a small guide board of all of these things and they vary a lot.
So, for example, on the top-right, the GenBank Initiative; that's not really Open Science,
it's a depository of information where people can deposit genetic information that can be
retrieved free. This isn't really Open Science because you're not really changing things
and collaborating on the site. But it's open data, a very important massive open data resource.
The Fold It Program is an interesting one. If you haven't seen it, it's a game which
the public can play to try and help people to work at how proteins fold. So in a very
ingenious bit of software development, somebody made a computer program to allow the public
to get involved with them. It turns out the public are very intuitive about this and have
really good ideas about how protein should fold amazingly. The Open Dinosaur Project
is something where the public are being involved in measuring bones of dinosaurs from the literature
and collecting data. Galaxy Zoo is something similar in astrophysics where people are classifying
galaxies. These are what--these are projects where public input is required and need to
use effectively. On the other hand, there's something called the Tropical Disease Initiative
which was started by several people including a guy who gave a talk here called Marc Marti-Renom,
which is a sister site to the site that I'm involved with. There's also a big movement
for trying to find drugs, for example, for TB called the Open Source Drug Discovery Project
which is an Indian project, which was started fairly recently. And then on the bottom-left
here it has something called Chemspider which is a community center resource for chemistry,
very unusual. Chemistry has an unusual history. We have a lot of very powerful, very wealthy
organizations who've become involved in collecting and curating chemical data. And these things
are usually--these applications are usually quite expensive and universities tend to buy
subscriptions to these things to gain access to information. If you know any university
which has that kind of resource--that has these resources, then it can be quite difficult
to get your hands on that kind of information. The Chemspider is something which is on the
web and anyone can upload chemical data to Chemspider. So it became a community centered
resource and with recent approaches by the Royal Society of Chemistry, who is keen to
promote this. So this is a selection of different things where open projects are involved or
open data are involved. And some of them like the Galaxy Zoo and Open Dinosaur Project are
actually where this community input, so public input to the project. Open Wetware, the last
thing I want to mention, is a site where anyone can have a lab book online, free and post
data of any kind that they wish. It's a very impressive initiative, operates on the basis
of the Wiki. So in order to have a page on this, you do have to be a little bit savvy
with how to write Wiki pages. But it is completely free and open and anyone can have a lab and
a lab book on the site. So it's very impressive. We sorted out, a few years ago, with this
website called the Synaptic Leap. So if you would like to see more about it, please just
Google us and have a look at it. The idea here is it's a basic--a blog functionality.
And we posted our problems with schistosomiasis on this website. The intention of the site
is actually to be an Open Science site for anyone to conduct any projects they want in
anything to do with biomedical research. We started with the schistosomiasis project because
there's a philanthropic angle, I guess. The idea of spending some of your spare time helping
us out when this project is to do with a neglected tropical disease, gives participants, you
know, good karma, right? So people feel good about contributing to this. But really, it's--the
aim of it is to have something which is wider, so you can do any kind of collaborative biomedical
research on the site. So up on the top there, there's the Schisto Research Community and
on the bottom there's an example of something that my post I recently did, which is a chemical
reaction which we did in the lab. And beneath that diagram is all the data about how that
worked and what happened, including the raw spectral data of what we did. Now I just want
to mention something about this. This is a limited functionality. It still operates as
a blog. It's built on Drupal. And I don't know about you, but whenever I see a blog
post which is interesting and for which there are several comments, maybe 10 or so comments,
by the time the comments get to about 10 or so, I chill down, like a comic book and I'm
reading them. A blog functionality is really quite limiting. It's--you can't do a huge
amount with this. And in terms of community input, it's actually quite limited in what
people can do. What we really want is to have this, but much more intuitive and functional.
At the moment, it's very linear and very flat. We're doing our best with this, but I think
we need something that's much more intuitive. And that's really what I'm trying to get at
here. Just before I get on to the nitty-gritty of that, just a couple of other advantages
of Open Science; more generally, the idea of doing this on the web. So the project we're
posting here, of course, is asking the community to help us device a synthesis for this molecule
for a very, very low price. The other advantages of doing science like this, on the left, is
transparency. You might have heard about the controversy surrounding the climate change
emails in the U.K., the University Anglia. This idea that the public thought of that
was science that was being hidden from public view, where perhaps some research allegedly
was being suppressed because it didn't agree with the idea of climate change. This doesn't
do science any good to have this kind of public perception of what we're doing. And it--there's
a real advantage in doing Open Science is that everything is transparent and you can
see what's going on and nothing is hidden. And I think in terms of public engagement
with science that's going to be very important. In the middle of this picture is a starfish,
again, this is a probably an IT analogy, the starfish and the spider. Open projects don't
have leaders. I mean, I am the leader of the project that we're doing at the moment, but
I don't have to be. The project is the important thing and if I decide to do something else
or if I called to write to do something else, then somebody else will take over. It's a
leaderless organization, which is a big advantage for open projects. Somebody can take over;
anybody can take over and anyone can take on site projects and lead things. The third
picture is meant to imply speed. To me, one of the big advantages of Open Science is speed
of progress. I – my theory is the one thing that we're trying to test out in the next
few years is an Open Science project where anybody can contribute and experts identify
themselves, we'll operate faster than a close lab project. That's the hypothesis. And we
are hoping to try and test that in the near future. But we're starting with our Synaptic
Leap project for this drug. The picture of the plankton here is meant to remind me that
the one important lesson we've learned with Open Science over the brief period that we've
been doing it is something which, I think, is very well known to you, IT professionals,
who have been working in Open Source, which is that it's not enough to post a problem
and have the community input. You have to post data first; you have to post results,
a kernel of activity to which people respond. And this was very clear to us with the Synaptic
Leap which, for the first couple of years of its existence, was very quiet because we
had no funds in our resources to put people on the project. We then went the long route
and asked the Australian government for funding for this project with the World Health Organization
as our sponsoring partner. And we secured a grant for this project in May 2008, which
took a while to get signed off, but is now active. Now, we have somebody working on the
lab who's posting real research data as it's going on. And that means the people have a
lot more to get their teeth stuck into. We've just started, but it means the people can
now respond to us. It's crucial I think in any Open Science endeavor, any Open Source
endeavor to have sometimes a funded kernel of activity which people can respond to. That's
our important lesson. Okay, so I want to say something about experimental science and then
I want to do my appeal for applications. So this is the real nitty-gritty. Being an experimental
chemist, being an organic chemist is very like being a chef. You have things, your resources,
you buy things in, you combine them with various apparatus, which you have in the lab. A lot
of the things in the chemistry lab have a collaborate that's in the kitchen, it's amazing.
You have, you know, a gas flame in the kitchen, we have Bunsen burner. You tend to boil things
off, maybe you want to reduce something and take water off something; we have something
with that in the lab. Lots of things that we use in the lab are very similar to the
kitchen--kitchen kind of chemistry. There's a picture on my students. (inaudible) who--his
bench is right to the left there, and he has all the stuff laid there and his team covered
which is the thing he uses for toxic stuff is behind him. A bench chemist will come in
early in the morning, 8 o'clock in the morning, to think about what they're going to do. They'll
design an experiment, they'll get their chemicals together and use glass and glassware and metal
things and a bunch of different things to do their chemistry. They'll run a reaction.
They'll effectively taste it, as you do when you're a chef by sticking a spoon in and then
licking it. In the chemistry lab, you never do that. But you--there are ways of testing
what's happening in your reaction. And then you isolate the thing you've tried to make.
You analyze it with some instruments and then you write up what you've done. This is what
a chemist does in our life. The analysis is kind of complicated. As a chef, you taste
things because your tongue is a very sophisticated thing. In the lab, you have to take the molecule
that you think you've made and put it into some very large expensive instrument, which
then analyzes if that's the right thing. There's a lot of data here. For example, the instrument
we use the most, we take all molecules and we put it into this large super conducting
magnet. And we spin this molecule very quickly and we blast it with electromagnetic radiation.
And rather like when you hit a bell, you listen to what comes off the molecule after you've
done that. So, when you strike a bell you're going to listen to the tone. And, of course,
what you get off the bell is always a sort of vibrational data which is very complicated,
and your ear transforms that into one note. Similarly, with something called animospecstropy
in the lab, we take molecules and we blast it with this radiation; we get this very complicated
signature that comes off. We use to make all the (inaudible) transform and get these lines
on a piece of paper. And the signature on those lines and how they appear allows me
to say "Yes, that's the molecule we thought it was" or "No, it isn't." Lots of big instruments'
generating lots of data. But in a typical day, a student will use all these different
things. If that student is meant to tell another student how they did something, how did they
do that? Well, they can write up something in the traditional paper. That often hides
little things that you might have done which are special, in a same way that a recipe book--often
recipes don't work, you don't quite follow the instructions right or something was just
missed out or the decimal point wrong somewhere. If you want to capture the research process,
you need ways of doing that that are really quite data-rich. So you want to be able to
capture things with audio and video. You'd like to be able to post raw data to a website
rather than the interpreted lines that we tend to get, so maximizing amount of data
that you publish. And really, you want something which is quite intuitive and rich because
you want somebody to follow what you've done. For example, also in the lab, that's me talking
to one of my students, Althea. We have these fumes covered with kind of prospects covers
and often you write on it, you write on whiteboard at the back here. If you're going to collaborate
with someone, you want to be able to easily collaborate with them as if you were sitting
next to them with a coffee, talking about science. And really, at the moment, we can't
do that outside my lab. We can't collaborate with people who are outside and sitting outside
my lab. If we've got a problem, we go down the corridor and talk to a colleague. But
if we want to throw this project open to the world, we need a really intuitive way of collaborating
as we would in a normal lab. How do we do that? And how do we maximize the input that
people are going to give us? Well, something which, I think maybe some of you will know
as IT professionals, is something called "stack overflow," maybe some of you have heard about,
which is a site where you can post code and ask people to help you out solving certain
problems. With code, that works really well because you can cut and paste code and stick
it on a webpage and people can rapidly respond. It's also a very nice idea because you can
have medals awarded to you for valor of service, right, and your reputation increases. So it's
a good way of trying to develop a reputation for yourself as someone who help people solve
a problem. Of course that's good for text, but for experimental science, this just doesn't
really exist. Something has been started, Chempedia Lab by a guy in San Diego, Richard
Paloka, who has taken the basic functionality and tried to use it for chemistry. At the
moment, it's quite text-heavy, but it's a very good idea to try and use the same idea
in experimental science because what we need is something again that's still very much
more intuitive and allows data-rich things to be posted, allows links to online pages
where all your data are posted, allows links to online Lab Books for science. So basically,
the structure we need is something which is an intuitive Lab Book online, where all of
the data are linked with your experiment and which can easily be analyzed by somebody else
as part of a collaboration and collaborate is composed to your webpage.
>> [INDISTINCT] >>TODD: ...type text and that's fine. And
then you can post something like that, but there is no chemical content in this that
works and they're not understood by a machine to be referring to molecules or chemicals.
So the text is quite dead and if anything was going to search this, you wouldn't really
have a lot of input from the computer about what is chemical information, what isn't.
To be able to take text and convert that to something which is chemically rich, so where
wood is associated with a molecule and can be searched and analyzed and indexed would
be tremendous for HTML XML. And a guy called [INDISTINCT] wrote the language called CML
which is chemically rich mark-up language and has worked with Microsoft and it has a
reword here. Where did Microsoft to develop a chem. word add-in, where a word document
can be searched by a machine and the chemical information can be annotated and extracted
automatically. So when you hover over a word, you get a structure and you can change the
structure and that changes the word. So in a PhD thesis, for example, this become a very
rich document where all the chemicals are part of the actual fabric of the text and
are not simply words. Now an example that's just recently brought to my attention with
something called chemicalized dot org, this takes any given webpage and extracts chemical
information. And you can see, what's happened here is that in the usual Latin text you got
on pages, it spotted beta carotene, which is a molecule and if you hover your mouse
over that, you get the structure of the molecule. If you click on that, you get taken to a page
where there's a bunch of chemical information about the molecule. This is very useful and
it makes, for example, HTML pages are rich for chemists; that means it can be searched
very effectively. We could do with something like this actually for Drupal, so given that
Drupal is Open Source. What we really need is an extra button on this menu up here which
says, "Okay, take the text that I've just entered here, scan it for chemical information
and please annotate these individual words so the page, when it's published, it's clear
that these molecules are in there." So if I write the word "Benzinc" and I click this
little button, it's [INDISTINCT] molecule and then on the HTML pages published, that
becomes an active word that commend the search effectively and can be annotated in this way.
That will be a very nice project that we could do which would enhance Drupal a great deal
for chemists and make the resulting web pages that we make much more functional. Okay, so
the last--just, just summary--the summary of where we are--this is my son, Harvey; he's
playing with his first molecule, which is great; start him early you know? But this
is--this is how I feel with this, with Open Science. I have absolutely no idea where we
are going to be definitely going and how we are going to get there, but it feels right.
It feels--so, science that is trying to open where anybody can help us out and nothing
is kept secret is the real spirit of science. It's fast and it's transparent and it's generous
of spirit. This project that we're doing where we are trying to get the price down of this
drug with the World Health Organization, we've got another two and three quarter of years
to solve that problem. It needs to work; we have to be able to show that by massively
distributing a collaboration like this, where everything is in the open. We need to show
that we can do that and we need input from chemists all around the world, particularly
process chemists who work in industry, to help us with this problem. The price constraint
is extremely severe and it's a real challenge for organic chemistry. Of course, what we
would like to do eventually with Open Science is to move beyond philanthropy. We are doing
a project which is organic chemistry but we saw--it's, it's hard for physical science,
but it does have philanthropic element where people might contribute because they feel
good. What would be nice is try and move into an area which is academically hard where there's
a lot of activity at the moment and people are competing with each other to show that
an Open Science project could actually generate papers and results faster than traditional
close collaborations. That would be quite exciting, but it's not something that we're
doing right now. Generally, the dollar sign there indicates that countries weren't maybe
thought by many people about Open Source. Open Science also may well need funded kernels
of activity where projects are--small projects are funded and lots of the scientific community
can respond to us. Now that to me, would be extremely attractive if I was a funding agency
who want to de-fund scientific projects--if I was a government agency, I want to de-fund
scientific projects. What I would do is trying to fund a kernel of activity and then have,
have a wider group of people help me out, so we leverage more activity for my funding
dollars. There is an advantage there, of course, that I've covered before is that once the
funding runs out, the project doesn't have to stop; it's this leaderless organization,
the project can continue. And I guess the last thing is a more general point that open
data are very important. The idea that Open Science of course share data with many people
as possible; and this is always going to be a good thing; that if we have data in labs
which the public have funded through taxes, those data should be available for any body
to see. And open science obviously necessitates open data as part of its reason for being.
And so, my main appeal, of course, though, so is--just to close--is that at the moment,
we do not have really good intuitive tools for scientists to collaborate over large distances
effectively. We have tools that are--that require tutorials to use. My students are
very busy; they're very busy making molecules in the lab. And they--I know what's going
to happen if I ask them to try and learn how to write a Wiki page or sit through a tutorial
about how to use something. They are going to say they're just too busy. Those are my
students; and some of them are most receptive to this that I know of. Trying to ask an experimental
science student to learn something before they can post their data online, to me, is
like asking, "Gordon Ramsey to learn Arabic" right, this is silly. He wants to do his cooking
and generate a product. He doesn't have to learn a language to be able to do that. And
with chemists and experimental science students, we need applications--applications which do
not have tutorials on them so that scientists can rapidly gain and comment on each other's
work and share data effectively. So, given that I've never needed to look at tutorial
for Google app; I can't hear, right. So, I use Gmail and Google Docs and Picasso without
reading anything. I just started using it because it was intuitive. If we could develop
that for a Lab Book, for an open shared electronic Lab Book, that would be fantastic. We--my
lab is collaborating with a guy called Jeremy Frey of the University of South Hampton to
try and link online electronic Lab Books to machines in a chemistry school. So, data are
meshed with an experimental technique. If we could expand this to have a front-end that
was incredibly intuitive to use, I think we'd go places. And I think a lot of scientists
would love to do--to share what they are doing if they had an effective intuitive tool to
do that. And that's really my appeal, is for something about to happen. A dialogue between
IT guys and experimental guys, PhD student in the lab and know what they need to try
and develop a real killer app for an online shared electronic notebook. Okay, so with
that, I won't take anymore of your time. I just want to thank Carol again, and Chris,
and thank you, guys for coming along and listening to this idea. Thanks.
>> [INDISTINCT] >> Science is still going to get tenure and
paying in all kind of stuff and how does that work with conventions or spread outs or [INDISTINCT].
>> TODD: Yeah. Okay, so the idea that if you open everything up to the world, you're going
to commit professional suicide, yes. I mean, I--that's--at the moment, I think that that's
certainly the perception. There are some brave souls who are doing it anyway, a couple of
people I just mentioned. It's interesting. The--there is this sense, there is this--I'm
sorry, there's this prejudice that if you publish data online, you can't then put it
into an academic journal. And actually, for some of the high impact chemistry journals,
that's true. So I can't publish in certain big journals if I've already released data.
In many cases, that is not true. So I--if I have published everything on the web and
I've decided to summarize it on a paper and then submit to Nature, they'll take that paper.
So there are also journalists which are happy to do that, but this is a test case. How many
journals will take those kinds of papers and how effective are they? I think that if you
can do that, if you can still publish your work in journals, which are traditional journals,
then I think you'll be okay. The question about whether you'll get scooped is another
thing. So if you have a great idea and you then put it on the web, what are you going
to do? Of course, if something is commercially sensitive and you want to patent it, then
of course you can't do that; that's off limits. That's something is not--and it's something
which you don't foresee to have commercial interests. My theory is that by opening it
up and by recruiting more collaborators, your science will actually go faster; that's my
theory. But I--no one has done that yet; I don't' know if it's going to work. There was
a nice example last year, I think, a guy called Shean Cuttler who is a biologist; he's with
Riverside, I think, who had a nice result that he found. He's a pump biologist--had
a nice result that he found and rather than publishing his nice result, he went to actively
recruit his competitors. And then together, they published a much larger piece of work
in the journal science, which is an incredibly high impact paper and amazing and well-sited.
The idea is that you can go and actively get people to help you out. It's an anthem with
a lot of scientists who may be don't usually want to trust their competitors with what
they've done. But I think this is--this is the next exciting frontier to me is whether
it accelerates your science. If it does, then I think [INDISTINCT] committees would be rather
excited about it. All right, thanks--thanks guys.