G-DOC -- Enabling Systems Medicine through Innovations in Informatics: Dr. Subha Madhavan

Uploaded by NCIevents on 16.07.2012

>> Thank you very much Tony.
I appreciate the introduction.
Hello everybody and my name is Subha Madhavan.
I am responsible for Biomedical Informatics at Georgetown University.
I'd like to thank the NCI for inviting me to present at the CCC
and giving this opportunity to talk about this project Georgetown Database
of Cancer also known as G-DOC.
So, moving right along what I'm going to do is initially set the stage
with some definition and motivation for this project at Georgetown
and talk through a little bit of features and functionalities that are part
of this database and you'll see as I present
that it's more than just a database.
It has a number of tools and analytic methods associated with it and dive right
into providing a scientific use case on colorectal cancer,
a stage 2 CRC study that we're working with a number of clinicians at Georgetown
with and I'd like to show you how we are leveraged this G-DOC platform
to really identify molecular signatures of relapse in colorectal cancer,
that being the scientific example that I will use
to demonstrate the functionalities of G-DOC.
So here is a quick definition slide.
So, my title that enabling systems medicine utilizing G-DOC;
so I just wanted to up front define the stunt word
which has been quite commonly used these days.
So, I'd just like to-- I don't know if you can see these highlighted words here
so rather than reading the entire definition I'd
like to draw your attention just to these highlighted words here.
Systems medicine is really the application
of systems biology approach to biomedical problems.
The other term to keep in mind is effective and individualized diagnoses,
which I'm sure many of us on the call are working on.
And the other big concept to really keep in mind as we work
through different projects in systems medicine is really terabytes of data.
I don't think this is new to anybody here on the call.
I think we're dealing with numerous amounts of datasets on coming
out of multiple instruments that are being leveraged for biomedical research,
which usually relates to the development of new types of tools.
And we are really taking G-DOC in the direction of being able to deal
with these terabytes of data.
So, what are really the driving factors for G-DOC?
So, we early on realized that we need to have an information continuum.
We work very closely with commissions at the Lombardi Cancer Center
who provide care to a number of cancer patients
in the Washington, D.C. metropolitan area.
They also involve as physician scientists in various cancer research projects
and of course, they'd like to leverage what's being conducted in research
and through research projects and they'd like to take it back
into the clinic and vice versa.
And what we learned from the clinic we'd like to bring this back
into the research field and really help develop hypotheses.
We would like to incorporate all mixed base evidence
in clinical research and care settings.
We'd like to collect data once and use it multiple times
so we use electronic health records
and hospital information systems to collect information.
We'd like to collect it once very carefully and we want that data to be
of high quality so that we can leverage it for secondary use
of clinical care data, which is in research.
And we'd like to collect the research platforms.
Many of you may be coming from cancer centers and large academic medical centers
where you are very familiar with course facilities type set up
and sometimes report facilities are so siloed and it's hard
to pull data together and integrate them from various genomics and proteomics
and imaging core facilities.
So, one of the driving factors
for G-DOC was really how do we connect these research platforms
across these different course facilities
to accelerate the discovery and validation process?
And we also wanted to effectively utilize molecular and clinical information
to ultimately transform patient care.
So, with these as the driving factors,
the vision for G-DOC is shown on this slide and so I'd just like to point
out that the base for this entire program is really the researchers
at the Lombardi Cancer Center and more broadly at the Medical Center
at Georgetown; a larger number of growing external collaborators now
who very closely work with us on a number of research projects,
and these are all building blocks.
There's clinical data.
There is behavioral data and epidemiological data coming
out of our population sciences group.
We have a histopathology and tissue banking core facility
which collects tumor specimens and other nominal specimens as well from patients
that are treated at Lombardi cancer center
and that are include ILB approved protocols.
A number of datasets are collected from these patients.
I name a few here and we would really
like to integrate all these datasets within G-DOC.
And you know I can imagine for a project
like this you would need a close collaboration between biomathematicians,
biostatisticians and the IT team here
at Georgetown are very heavily involved as well.
And ultimately we'd like to really achieve the goal of cancer prevention,
cancer prognosis, effective prognosis and prediction,
biomarkers and drug development and therefore leading
to systems medicine based, based clinical practice.
So, diving right in we have been developing this project
for the past three years now
and one of the initial decisions we made was really not to develop anything
from scratch, which already happened in systems solutions.
So, what we built was really an intervention ready platform
and what I show you here on this slide is a number of third party tools
that we have integrated within this platform.
You will recognize many of these
and not all of them are open source and free products.
Some of them are commercial solutions.
So where there are appropriate commercial solutions available,
we have integrated that particular third party software.
So, I'll just name a few here.
JBrowse is a java version of the genome browser,
which many of you may be familiar with.
We use that for genome data visualization.
Java based heatmap viewers have been integrated.
We use REDcap for clinical research data collection
and this is for non MCI funded studies,
only MCI funded studies use C3D, which is oracle clinical.
We use the Ingenuity Variant Analysis platform for variants as well
as a pathway analysis of the genomic and proteomic data.
We use the open source free Cytoscape tools for network visualization analysis
and other commercial products called Pathway Studio
that provides the complimentary features to the Ingenuity tool,
allows for literature mining,
they have their own internal pathway database and knowledge database.
Leverage JMol is a product out of NCDI for visualizing 3D structures.
And we continually integrate as we identify new methods and new tools out there
that are useful our investigators to continue to integrate them within G-DOC.
So, here is a quick image so if you go to our neoplasia publication
that we published in October 2011, you will see a detailed description of this.
We have multiple data sources that we preprocess.
It is publically available data such as from GEO or EDI for example
or cosmic mutation database or farm GKB.
Not all of them are listed here.
We also have internal collaborators who provide data to us you know
for various research, research consortium projects.
We have a number of bioinformatics pipelines either internally developed
or adopted and adapted from existing pipelines where we process a number
of data types that are listed here.
There's a lot of data standardization and mapping to tenders that goes on.
We use lightweight standards.
I won't say that we spend a lot of time and energy on standards themselves,
but we do have a couple of research associates who really sit down at the table,
look at the data and do the mapping to standards
where a property is then possible,
prior to loading them into the G-DOC IT infrastructure.
It's the back end of an oracle database.
We have a separate analysis cluster which has four compute nodes
which run the various types of analysis that are shown here.
We-- I'll show some of these features and we'll go through them.
And we also link out to external allocation databases and these are listed here.
So, again you know as you can see in the modular architecture the goal is not
to replicate what's being done really well elsewhere.
We simply link out to them where possible.
So, just very quickly let's see some of the features
and functionalities before I show you what we are able to do with this platform.
So, if you go to gdoc.georgetown.edu, it's an https site, you will--
this is the flash page that will come up.
So, if you click on register now you will be asked
to select whether you are a public user or a Georgetown user.
So, in simply one click all they ask for is your email ID.
It'll immediately provided you with a registered user name and password.
You'll be able to log into the system to browse various studies,
public studies that are present there.
If you are closely associated with us and are working on various projects
with us, then probably it will be able to give you access to some
of the private studies that are present in G-DOC as well.
So, we have multiple levels of security
that we have implemented within, within G-DOC.
So, we have about 5, 500, and now it's a little bit outdated.
I think they have slightly more than that,
number of patients within the database.
And these indicate the different types of data for each one of those studies.
So, for example in this stomach cancer study you have clinical data,
select microray data.
In this colorectal cancer study that I will be showing you a little bit more
in detail we have copy number datasets.
We have clinical data.
We have metabalomic information.
We also have some cell line data.
So, it just gives you a quick glimpse of, for each type of cancer,
the number of patients, the number of studies and what is a biopspecimen count,
as well as other types of data that are available
for each of those cancer types.
We spent a lot of time thinking about how to present this data
because we serve a number of users and users range
from really sophisticated computational biologists,
biomathematicians all the way through to physician scientists who want
to do some tangible data mining and casual browsing.
So, we had to spend a lot of time
to have something useful for each type of user.
There are you know a number of tutorials that we've prepared to help people,
help orient people to these functionalities.
There's a quick start which has some really popular features.
Just in a couple of clicks you can find out what's in the database
for your disease of interest or your biomarkers and you can jump right
in without having to go through the other complex analytical tools.
We have other analytical tools
that we have integrated that'll get you to first base.
I mean, this is not a system that's going to help you avoid your biostatistician
but you know helps with hypothesis generation.
Of course you need to sit down with your biomathematician or biostatistician
to really understand your data and the analysis
and to help you write your graphs.
We have a study page that-- so, so--
just to take a step back the unit of data organization
within G-DOC is really a study.
So a study can be-- can contain 450 patients it can also contain 20 patients,
but that's the unit.
So, we have, we can have multiple data types associated with a particular study.
So, that's how the data is organized.
Each one of these is a study.
And there's a brief description,
so if you're registered G-DOC you will actually have access to this view.
You can actually see even though there are private studies you'll actually see
all the metadata about the study and when you click
on that link it will provide you for the point of contact for that study
and you can contact them to obtain access to that particular study.
We being the statisticians and the IT team we really don't want to take
on the ownership of providing access to people
so we simply direct them to the point of contact.
So, Rebecca Riggins is the point of contact for this breast cancer study
on tamoxifen and you can contact her to obtain access to that particular study.
We have a really fantastic drug discovery group here at Georgetown,
which has collected over 50,000 small molecules in a small molecule library
that they utilize for drug discovery.
We have not imported all of that data
but we have some very targeted breast cancer and colorectal cancer targets.
These are really geared towards some existing funding projects,
so small molecules that are associated with those targets has been imported
into the G-DOC system so you can plug in say a target of interest.
You can say ECFR show me all small molecules that are available
within the database and it shows you the small molecules that targets ECFR
from this areas physical chemical properties.
We are also integrated with Marvin Draw,
which is a really nice structured drawing tool so you can draw a structure
and ask the question are there small molecules that are similar to the structure
that I've drawn and are there any canceled targets
that are associated with this.
This was primarily driven from the Sony program and the chemi statisticians
that are part of the drug discovery group.
This is the quick start view that I talked
about so you can select the disease area of interest
and it will show you the number of studies that are present
for that particular diseased area of interest
and that endpoint that you're looking for.
So, if you're looking for breast cancer studies with patient data
that have relapse and prelapse information associated with them,
these are the number of studies that are already present in the database.
It shows you the number of patients that shows relapse,
so it's actually the cystographs indicate the clinical thenar type here.
So the green is no relapse and the red is-- indicates relapse.
And right form this menu you can right click
and it will give you a number of data analysis options.
You can do things like you know outcome analysis with Kaplan-Meier survival plot.
You can do you know basic P tests
and linear modeling, group comparison analysis.
You can run classification experiments.
You can draw a heatmap for example.
So, basic analysis can actually be done from our quick start menu.
Here are some quick screen shots.
In the essence of time, I'll just skip through.
Here is JBrowse and we have a modified JBrowse
to really fit the needs of our various projects.
These light blue tracks are all the various adaptations that are available.
So, those of you that are familiar with using the JBrowser,
these should look very familiar to you.
We've also added on the patient tracks,
you know our physician scientists were very interested in looking
at the copy number profile.
So red color indicates application, the blue color indicates deletion
in the genomes or these particular patients.
And they were interested in seeing these tracks for each
of these individual patients, but also associated with the clinical data.
So, if you mouse over the patient button,
it immediately shows you what is their staging grade
and what is the outcome data that's available for them.
This is not the entire clinical data,
but its sort of some of the minimal clinical dataset
that our physician scientists were interested in viewing
in conjunction with the OMEC information.
You can also drag and drop the other tracks,
so it actually shows you if you look at this region of amplification,
there are a number of microRNAs that are lining up with that, that abnormality.
So, you can actually start integrating the copy number data
with microRNA information to really better under that particular genomic region
and its impact on the cancer outcome that you're studying.
So, I think this is really just cracking the surface.
I would really encourage people to register and log in and browse for yourself
on the various functionalities that are present in the public datasets
that are available for you to mine through.
So, I want to switch gears a little bit and give you a very specific example
of how we are leveraging the data integration platform
and data analysis platform to really support colorectal cancer studies.
This is part of Lombardi's Ruesch Center for GI cancers and working very closely
with Dr. Lou Weiner and Dr. John Marshall,
who are both colorectal cancer oncologists.
We have designed this project.
The project has been ongoing for about two years.
It has been fairly challenging as you can imagine
to get really high quality biospecimen data and clinical annotations associated
with them to really study stage 2 colorectal cancer.
As many of you might know CRC is the third most commonly diagnosed cancer
in the United States.
About 140,000 new cases are expected to be diagnosed in 2012
and about 50,000 deaths are expected to occur in 2012
because of colorectal cancer.
And there have been great efforts to really identify molecular signatures
in colorectal cancer to really serve as prognostic markers of reference
because many of these CRCs reoccur
after the surgery have been conducted on these patients.
And so we have to identify those subgroup of patients
who benefit from chemotherapy.
So, we focused on a very specific group of CRC patients.
We focused on stage 2 colorectal cancer patients
because they are a very unique set.
About 75 percent of stage 2 CRC patients are cured after surgery.
But about 20 to 25 percent of them experience relapse
and ultimately die of metastatic disease.
So, there are a number of efforts that are ongoing,
but nationally and internationally to really use baseline data,
so if you were able to tell on the day of surgery whether
or not this patient is going to have a recurrence of colorectal cancer,
then you can develop obviously a much better treatment plan
for that particular patient.
So, with that as goal we started with a,
with a training set of 20 relapse patients
and relapse free patients and 20 relapse patients.
And we collected a number of data types on these patients.
So, we extracted DNA and RNA and vital fluid and analyzed them
through copy number analysis, gene expression data,
micro RNA analysis and metabolomic analysis.
All of these datasets along with the clinical data which were mapped to ICD 9s
and entonomic CT codes were loaded into the G-DOC system.
So, let me start with the bottom line before I show you the kinds
of analysis that we did.
So, the bottom line was when we compared the relapse patients
and nonrelapse cases we found that there were 37 cytobands.
This is using the Snip 6.0 copy number array 37 cytobands were
differentially either amplified or deleted in the relapsed cases
as compared to the nonrelapsed cases.
About 720 reporters this is from acumetric U133 plus 2 gene expression studies
were differentially expressed between the relapsed and nonrelapsed groups.
And about 34 micro RNA, this is from the tumor, this is from the tissue RNA
and about 8 microRNAs from the serum microRNA were differentially
between the relapse and the nonrelapsed groups.
And we also found about 77 peaks and 47 peaks respectively
in serum and urine metabolites.
So, this is on the day of surgery serum and urine were collected
from these patients and they were run through mastic experiments
to analyze the metabolomic fingerprints.
In these patients we found that 77 and 47 peaks in serum
and urine were differentially expressed.
And I'll show you some of the schematic mutation analysis.
This analysis isn't in at the moment,
but we have some early results we can share with you.
This is gene expression data and you see a heatmap here,
really nice clustering of the red shows relapse and the green shows nonrelapse.
And using the differentiated expected markers this is the principle component
analysis which shows a really nice separation
of the nonrelapsed cases compared to the relapsed cases.
These individual dots, each single dot is a patient
and the green dots indicate nonrelapse and the red dots indicate relapsed cases.
So when we further analyzed, we took the 720 reporters and we mapped them
to various biological pathways
and gene ontology studies and this is what we found.
We found that inflammatory response and infectious diseases came out to be one
of the top ranking biological processes that seems
to be highly implicated in cancer relapses.
Whole body of evidence that is now being generating
that is indicating the involvement of inflammatory response
and new response pathways in various types of cancer.
So, I just have some pictures here to show you.
This is the gap junction regulation.
This is the STAT signaling pathway all of which are associated
with inflammatory response and immune responses.
These are genes in our sets that were either down regulated or up regulated.
The up regulated ones are shown in red.
The down regulated ones-- these are complexes
so it actually is not showing you just the gene
but also the protein complexes and the small molecules.
So, this is inflammatory response pathway.
These were-- so we had map K, PD5, ECFRs were highly over expressed,
the intensity of the colors.
I don't know if you can see this very well.
The intensity indicates a level of expression and the blue indicates
under expression of genes, so that is interleukin 18 is under expressed
for example in the relapse cases compared to the non relapse.
Same thing with the immune response, again as I mentioned there's a growing body
of evidence really implicating inflammation is related molecules
and there association with colon adenocarcinoma.
So that was reassuring as we went through this analysis.
So, what I'm showing you know is sort of individual data type
by data type analysis and then I'll show you how we kind
of put all of that together.
This is microRNA data.
Again we-- I'm just showing you the microRNA serum information.
So we had 8 microRNAs that were essentially stressed.
MicroRNAs for those of you who are not familiar are small noncoding RNAs.
There are about 22 nucleotides in length and they bind to MRNA and in fact,
they are post transcriptional regulators and they impact translation.
So, there are about, close to about 1000 micro,
human microRNAs that have been documented.
And they are known to regulate about 60 to 80 percent of the genes.
There are lots of disease associated research that goes on microRNA,
especially in the area of cancer and so we have masked these microRNAs
to the human disease microRNA database and were able
to identify what those microRNAs were and use them again
for the component analysis which shows a really nice separation.
And so we talked about two data types: gene expression data,
microRNA data and now this is copying number information.
So recall that there were about 37 cytobands that were different
between the relapse and nonrelapse groups.
We use a method called CIN index.
It's chromosomal instability index part
of that method development was funded NCI to the ISRCE program.
What you're looking at here are results of the CIN index.
This diagram shows you sort of overall copy number,
overall deletion and amplification.
There is a green line here which separates the relapse cases
from the nonreplapse cases.
And this is-- let's just focus on the losses here
because there is definitely a lot of losses, chromosomal losses that are going
on in the relapse cases compared to the relapse free.
So, if you zoom in on the loss diagram, this is what you get.
So, with chromosomes we just zoomed in on chromosome four and in chromosome four
in the case of relapse, which is all these bottom samples,
the signature seems to be quite a bit
of loss compared to the relapse free cases.
We just use that same information to do just a heatmap of the losses
and amplifications that we saw.
Using the CIN index analysis there's a lot of choromosome 1s,
there's a lot of chromosome 4s, both of which is know in literature.
So, I'm just giving you some examples here of chromosome 4q in losses
and deletion in chromosome 4q predicts outcome in stage 2 colorectal cancer.
That was in reassuring and so moving on to metabolomics information,
so again we looked at-- so about 2500 human metabolites have been characterized.
We use mass spect to reidentify the chemical fingerprints
of whatever small molecules that the cellular processes leave behind.
So we did this in both urine as well as serum.
This is the pipeline so again metabolomics informatics is
such an evolving field, so this is something that we had to develop inhouse
by pulling together a number of R modules and bioconductor modules.
So, you know after the mass spect and data processing what we get are M/Z values
so we can get a data matrix that looks like this.
We apply linear modeling and moderate P statistics to it,
and we utilize about five different databases, which, of course,
we are not even close to mapping all the human metabolites.
So we wanted to not just rely on one database,
really rely on human metabolomic databases
and we developed an evidence code based on what those M/Z values map to.
So, what are the two different metabolites that these map to so
that we can use them for validating whether
or not this metabolomic signature is, in fact,
true in the relapsed cases versus the nonrelapsed.
We then took the metabolites
and conducted pathway analysis and network analysis.
So, again this is the list of curative metabolites that came out of
that pipeline that I just showed you.
We also looked for how many hits we get across these multiple data types
and we map to the M/Z values that we get from the mass spect analysis.
So after the analysis one of the interesting features that came
out of this nicotinate and nicotinamide metabolism pathway and we were wondering
if this had anything to do with smoking because we had a lot
of smoking information and beer consumption information on these patients
that we collected these samples from and this is--
I'm showing you here nicotinate and nicotinamide analysis metabolism pathway
and these metabolites were highly differentially depressed
in relapsed cases compared to the nonreplased cases.
When we went back to literature we found that there were snips
that had been identified in genes that are part of nicotinate
and nicotinamide metabolism pathway and they have been associated
with colorectal cancer etiology of course.
What we're looking at are much more downstream, in fact,
in metabolimic compared to the genomic variant data that's published here.
But it was reassuring again to see
that our results were validated in literature.
So, now our goal was yea, we have now analyzed each one of these data types.
Now can we identify which ones have the best predictive power?
So, which one-- which data type really gives us the classifiers?
So, we did use the support vector machine here.
So, I'm showing you the area under the, under the curve.
The AOC values are presented here.
I have a little table that compared the AOC values
across these different data types and as we expected metabolomics really showed
because the PCA was really strong.
The urine metabalomic information really gives us the highest predictive power
for relapse.
Now keep in mind that this is such a small dataset that it is only 40 samples
and so we had to go back and request additional samples.
We have done that now.
We have an additional set of 40 samples that they're going to use as a test set
to really validate this information.
So, how are we now integrating all of these data types.
I'm working closely with the group at Institute of Systems Biology.
This is Ilya Shmulevich's group.
We work with them on the project called Regulome Explorer,
which in the back end uses the random forced analysis
and they have modified the random forced analysis method to really allow
for multiple data type integration.
You can do sort of an all versus all.
Because if you're interested
in studying colorectal relapse you can incorporate the features
from gene expression from metabalomics, from copy number data into one viewer
to really see where the hubs are,
which of these data sets are really giving us the strongest signal.
So, this is the Regulome Exploder workflow.
I won't in the interest of time, not going through that.
These slides will be available for you to review at your leisure as well.
So these methods are described very briefly but if you search
for RFAs I think there's already a publication that's out there.
So, this is the end point of how the data looks like after it comes
out of random forced analysis so we have recorded the colorectal cancer data.
What you're looking at the top features,
these are top 40 features when you put together micro
or metabalomic gene expression.
Number copy information, and again we are looking for hubs here,
so what these different colors indicate are different regulators.
So, there's something going in chromosome three
which is regulating another element here in chromosome A.
So it immediately shows you on the circle's plot type viewer
where the activity is and who is regulating who.
And these different colors indicate different data types.
The brown I believe is metabolomic.
The blue indicates gene expression.
The clinical features are shown here on top of the circles plot
and these are all different chromosomes.
You're looking at chromosome 1 through 22 and the sex chromosomes.
What I'm not showing you here is you actually click on each one of these
and really visualize what the association is.
They also provide a network view.
So if you just took the top 40 features, we took 100 features
and it sort of helps you put together the copy number variations,
the metabolomic data and microRNA.
And based on literature it tries to connect them,
this network view of all the potentially associated features
of colorectal cancer relapse here, which is all connected.
So, one way to view this data if you're looking for sort
of what is the first half, what is the second half, and how they are impacting
and regulating the genes that you can see in this first half.
So, it let's your prioritize the various features.
So, we tried to look at this data in Regulome Explorer is really categorized
as those that are most immediately affecting relapse versus those
that are slightly and partly removed and what impact they might have.
So, we then employed multiple machine learning approaches,
so random forced analysis.
We also used the aging network model and compared the features that came
out of both of these methods just to show ourselves
that we're on the right track.
Out of about 40 features 26 of them overlap
and many of them are metabolomic data.
As you can see here many of the metabolomics one seem
to have a highly significant P value and this random force
and this the bayesian network analysis.
So, our last stop after we identified these top features was really to look
at the mutation information.
So is there anything in the sematic mutation stage that can tell us more
about something in the CRCV labs,
so this was really meant to be a multiomic view of colorectal cancer.
Relapse, so we are now just beginning to look at mutation data.
What you're looking at here, the ingenuity variant tool,
the 40 samples into, this all sequencing data.
We used EdBio vendor to really provide these sequencing for us
and we can also sort of summarize the data.
So, you can ask the question you know where are those variants present?
We want to look at just those variants that are present in the relapse cases
and not in the nonrelapse cases and summarize that data based on genes.
And of course, as expected, we find APC,
which is a highly implicated gene in colorectal cancer.
There are other novel ones that we are hitting as well.
So, these are all pending analysis.
We then take the genes that--
the gene level summaries of exome data
and we ask the question what are the protein,
top ranked protein complexes that may be impacting relapse
and we found the addition molecule.
We find group factor receptor networks and so for this we used the top 18 genes
which had about 38 variants and these were only present in 80 percent
of the cases and they were not present in control.
So this is a very nice view of the variant data,
so you can actually look at different topics of interest.
Right now the metastatic signaling pathway asks the question you know
where are my variants present?
So, it color codes all the genes that have the variants.
It also tells you what kind of variant it is.
Is it a heterozygote?
Is it a homozygote and so on so the different color coding really tells you what
kind of variant it is.
So, this is all still pending further analysis but initial analysis
of the schematic mutation data again indicates inflammatory response as one
of the top biological functions implicated in CRCV labs.
So, we are in the process of writing up this manuscript
for cloud computational biology.
Hopefully it will undergo a quick review.
So, I think I'm at the last stage of presentation here.
So, just switching gears to tell you a little bit about the direction
in which we are headed, working closely with the Amazon Cloud services group.
We have really adopted the Amazon machine image model
to really work with our collaborators.
So, as collaborators are interested in a platform like this
for their own disease portal we're working on pediatric metabolic disorders
on dementia and TBI and neurological disorders and preterm birth, for example.
So we have created G Code, which is sort of just the framework that they used
to create these disease specific portals
and we have now deployed G Code on the Amazon Cloud.
We are also able to take advantage of their S3 storage so there is terabytes
of information in next generation sequencing datasets.
The G Code is sitting right next to it so we can pull information
from the next generation sequencing data into G Code for various analysis.
So, I don't have to talk to the positive aspects of Cloud computing here,
but these were some of the reasons why we are heavily investing in the cloud.