GEO2R: Analyze GEO Data

Uploaded by NCBINLM on 23.04.2012

This video demonstrates how to use GEO2R to analyze microarray data and to retrieve
a list of differentially expressed genes. For this example, I’m going to analyze GEO
Series GSE18388. Reading the summary, we learn that this study
is designed to investigate how gene expression changes in the thymus of mice that have been
subjected to 13 days of spaceflight. To start analyzing this study, scroll down
the page and find the ‘Analyze with GEO2R’ link.
Click on this link and you will be taken to the GEO2R analysis tool.
Before we get started with the analysis, note the ‘Full instructions’ link at the top
of the page. This link provides details about the edit
options and features that are not covered in this video and references for the R Bioconductor
statistical tests that form the basis of this tool. It also provides an important section
detailing GEO2R ‘Limitations and caveats’ that users need to be aware of.
Going back to GEO2R, the table lists the samples in this study, and their descriptions as extracted
from the original records. We can see that this study examines 8 samples and that the
variable is that 4 of the samples are derived from the mice that were flown in space and
the other 4 are from the ground control mice. The first thing we need to do is to define
the groups of samples that we want to compare. Click ‘Define groups’ and type a group
name into the box. I can name the groups anything I want, but in this case I’ll type in ‘space-flown’,
click enter, and ‘control’, click enter. Next, we need to assign samples to each group.
To do that, I click a row and simply drag the cursor over the relevant samples. Then
I click the group name that they belong to. Repeat these steps for each group. In this
study, I’ve created only 2 groups, but GEO2R supports creation of up to 10 groups for more
complicated studies. After defining your groups, scroll down to
the tabbed section. You can perform the comparison straight away with default settings by clicking
the ‘Top 250’ button. However, it’s usually good practice to first
check the distribution of the expression values that we are about to compare.
To do that, use the ‘Value distribution’ tab and click ‘View’. This tool calculates
the distribution of the expression values for each selected sample and displays the
data as a box plot. In this case, the distributions look good - the box plots are median-centered
indicating that they are comparable, and so we are happy to proceed with the analysis.
So we go back to the GEO2R tab and then perform the analysis by clicking the Top250 button.
After a few moments, a table of the top 250 differentially expressed genes is presented
– the table is sorted by significance, based on P-values.
I can click on any row to view the gene expression profile of that gene. For example, if I click
on the top hit, which is gene Rbm3, I reveal a chart where the red bars represent the expression
level of Rbm3 in each sample. The group names we created earlier are listed along the bottom
of the chart. So in this case we see that Rbm3 is more highly expressed in the thymus
of the space-flown mice compared to the controls. I can edit the content of this table using
the ‘Select columns’ feature. This opens a dialog box that lists all the available
statistics and annotation columns. For example, I can choose to hide the t-statistic and the
B-value, but expose the ‘Gene ontology Function’ annotation. Click ‘Set’ and the table
content is modified accordingly. To save the full set of results and/or retrieve
data beyond the top 250 genes, I can click on the ‘Save all results’ link which exports
the full set of results as a tab-delimited table. If I want to change any of the test
options, I can go to the ‘Options’ tab, make the edit, go back to GEO2R and click
‘Re-calculate’ to perform the test with the new settings.
Note that the R script that was used to perform this analysis is provided in the ‘R script’
tab - you can save this as a reference for how an analysis was performed.
Finally, if you’re only interested in seeing the behavior of a specific gene within the
study, rather than retrieving the top 250 differentially expressed genes, use the ‘Profile
graph’ tab. You first need to search for your gene of interest within the Platform
record, and copy its identifier from the ID column. A link to the relevant platform record
is provided here. For example, I’ve determined that my favorite gene, H2-M2, has ID 10450880
- when I type that ID into the box, and click set I retrieve the profile graph for that
gene. That concludes this demonstration of GEO2R,
if you have any questions, please email the GEO team using the link at the top of the
page. Thank you.