Export Datasets from InSilicoDB to GenePattern

Uploaded by broadcancerdev on 29.03.2012

In this tutorial we will demonstrate how to export data from InSilicoDB to GenePattern
and create a heatmap with that data. InSilico DB is an open platform for managing
and analyzing Genomic datasets. It contains 1000s of public samples and provides connections
to analysis tools, such as GenePattern. More information about InSilicoDB can be found
at their website: insilico.ulb.ac.be. We’ll start by browsing the public datasets.
For this tutorial we’ll use the GEO dataset GSE4635. Click “Edit/Show clinical annotations”.
Next, we need to select a curation, or create a new one. For this tutorial we’ll select
the existing GEO annotation. Once this has been selected, we can close the window.
We can now export this dataset to GenePattern. To do so, click on the black arrow next to
the GenePattern icon and the word “Export” – this will display the export options.
We will leave the options at their default values and click Export.
If you are not already logged into InSilicoDB, you will need to do so now. Once logged in
you will be prompted for your GenePattern public server login. The data will then be
sent to an import module in GenePattern and you can “Open GenePattern” to see your
job as it completes in the Job Results page. (note that you will need to refresh the browser
to see progress). Now that the job has completed all of the
clinical annotations from InSilicoDB are now available to use in GenePattern. Let’s first
save the phenotype of interest. We can then open ComparativeMarkerSelection
on our expression data by clicking the arrow button next to the .gct file and selecting
ComparativeMarkerSelection from the list that appears.
ComparativeMarkerSelection is now displayed with the gct file already filled in. We now
need to specify our cls file. To be sure we get the cls file we want, click specify URL
and drag the smokers.cls file from the recent jobs panel on the right.
We’ll also need to choose yes, from the drop down for the log transformed data parameter,
because all fRMA re-normalized gene expression datasets in InSIlico DB are logged-transformed
and that was the normalization we chose when we exported from InSilicoDB.
We can now run ComparativeMarkerSelection. Once that is finished – we can send the
resulting odf file to ExtractComparativeMarkerResults Before that can be run we must specify the
gct file from InSilicoDB as our dataset file, we’ll also set the field to filter features
on and the minimum value for that filter. Once that is done, we run the module.
When the job completes, we send the gct file to HeatMapViewer.
The HeatmapViewer will now launch. We can then change the size of the heatmap by going
to View> view options and moving the sliders to adjust the size.
We’ll now label our samples, using the phenotype file we saved from InSilicoDB. To do so select
“Label Samples…” from the File menu and browse to smokers.cls. The Yellow samples
are non-smokers and the Red samples are current-smokers. You can click the colored bars to read labels
and change their colors. Let’s make one last adjustment and sort
our features. From the View menu, select Sort Features, choosing the smokers.cls file, Signal-To-Noise
and Sort Ascending. We now have a heatmap of over-expressed genes,
which came from a public GEO dataset exported from InSilicoDB and analyzed in GenePattern.
For more information about InSilicoDB, please visit insilico.ulb.ac.be. For more information
about GenePattern, please visit genepattern.org. You can also find us on Twitter.