Insights from Genome-Wide Association Studies and the Steps Beyond (highlights)

Uploaded by NCIresearchfunding on 12.08.2011

Bonjour. Good day from Paris.
I would like to thank the organizers for the opportunity to be able to speak to you today.
In particular, I would like to thank the organizers from China who were kind enough
to invite me and certainly, those from Nature Genetics and I apologize for my not being able
to be there due to personal reasons, but nonetheless, I am happy to be a participant
in this very important program that's addressing the heritable components
of cancer and other complex diseases.
The topic today will be the insights from Genome-Wide Association Studies
and the steps beyond, which you will have heard some about this morning,
and continue on for the next two days.
So, let's really jump into what ties the focus of today, and that is the germline genomics.
So here is a depiction of what we understand in the European population
for breast cancer, what's been discovered.
We know that we begin in the upper left hand side there, in 1990 with the discovery of BRCA-1
in families that were enriched for breast cancer and particularly women
from particular distinct populations -- with particular populations' genetics histories.
And then a set of other variance have been identified, rather mutations,
that have very strong effects and the size of those effects have gotten smaller and smaller
as we've become more and more comprehensive in looking at candidate regions,
or conducting linkage [WS-L1], or doing some targeted sequencing.
At the same time, in as just recently as 2007,
we had our first cancer Genome-Wide Association Studies identify the first regions
that are important for breast cancer.
Now, we have some twenty regions that have been identified.
We also know that there is this very large space, as we can see,
between the very low penetrance affect alleles that we see in breast cancers
which have small estimated effects out of the discovery series, and much on that later.
In this sort of blue box is where we really are expecting the exome and whole genome sequencing
to discover another part of the heritable component of cancer.
So as you can see, that kind
of genomic architecture is really beginning to take shape in front of us.
There are variants of different frequencies with different effect sizes that are to be discovered
to fill in the heritable component of breast cancer.
So I'd like to talk about the status of Genome-Wide Association Studies and sort
of what we really expected to be promised from this, okay?
We clearly see, number one: that the discovery regions in the genomes associated with diseases
and traits give us the new candidate genes.
This has been very successful.
This is really striking to see the relationship between having sets of epidemiologists
and geneticists working together to actually discover biology and discover regions
that then now need to be very aggressively prosecuted and interrogated
to understand the mechanistic insight underlying those common variants so that we really can get
at the etiology of how and why these complex diseases have so many regions that contribute
to a disease, like a particular cancer, prostate cancer, breast cancer, or pancreatic cancer.
We certainly are aware of the gene-environment lifestyle interactions
but these have been very difficult to identify, partly because of the epidemiological rigor
that had to be relaxed in order to achieve the statistical power to do these first scans.
And certainly the application towards outcomes and pharmaco- genomics has lagged behind,
but there are some very interesting things that are beginning to show up on the horizon.
The challenge of genetic markers for risk prediction for individual
or public health decisions is far more complex, and I will discuss that at the end,
and I will give probably a little bit more of a pessimistic view than others
who are going to speak in the short-term.
I think the common variants represent a fraction of the genetic contribution to risk and to think
that they can fully explain or that they can adequately be introduced into the clinic
at this time in 2011 is, in my mind, a daunting - and I would actually change
that to a dangerous- conclusion, and we'll talk more about that in just a little bit.
So let me talk for a few minutes about the discovery of new regions,
because I think that this is very important.
Here is a map as of May, the beginning of May of 2011,
where we now have 155 different disease loci marked by SNPs.
Only one of these is a copy number variation on chromosome 1 that was found in neuroblastoma,
and it sort of came in the back door, because there was something known
about that in terms of somatic alterations.
There are 24 different cancers that we looking at, that now have at least one specific variant
and there are six or seven regions that are notable because more than one cancer has markers
in the exact same region and the markers are in strong linkage disequilibrium.
So in other words, there's something about those regions that are sort of nexus regions
for cancer predisposition and some of them have as many as six
or seven different cancers localizing to the same region and they are cancers
that we would not necessarily have put together per se.
Now, the majority of GWAS we know map to non-coding regions,
and these non-protein coding regions require extensive bioinformatic analysis where we look
at unannotated transcripts and regulatory elements, look at functional elements
for novel transcripts and ask questions about the regulatory elements with respect
to alteration of gene levels and epigenetics and affects on genes elsewhere at a distance.
And then the different experimental strategies really get us into a realm that's very different
from the Genome-Wide Association Study.
Each one of the regions that we identify really has to be interrogated according
to what we learn and what we see and recognize specifically in those particular regions per se.
And regions are going to be quite different, it's not going to be possible
to do large genome-wide scans per se or use simplified technologies
to be able to answer these things.
Each one requires, I think, a separate kind of analysis.
As we go forward with the status of Genome-Wide Association Studies, you know,
I think we have to think of the challenge of genetic markers
for risk prediction a nd the individual or public health decisions,
both representing the fraction of the genetic contribution of the risk and again,
the emphasis is on the fraction so far.
And then how do we integrate this with lifestyle and environmental factors
that we've been very slow to be able to really put together.
So very early on, different groups had immediately put it in studies looking
in prostate cancer with five SNPs with family history, with unfortunately,
relatively low sensitivity and specificity, but it was very important to move these kinds
of studies and to begin to examine these questions.
In our study last year, where we took the performance of breast cancer SNPs,
we were able to see that the effect was relatively small in moving the curves for the
so called, receiver operator curves, and what we were able to do with ten SNPs
that had been identified in Genome-Wide Association Studies within 2-3 years,
gave equivalent results for what we had seen in 30 years of epidemiologic analyses in looking
at non-genetic risk factors in the so called Gail model.
But again, neither of these are really sufficient to be able to sit in front
of a single patient and say let's do these SNPs and be able
to make very important decisions at this time.
When we look at prostate cancer, for instance, in the breast and prostate cohort consortium,
we now can see that we have 30 some SNPs, and when we look at the shifts in the area
under the curve, we can see that there's an age-specific difference
that makes it all the more attractive in the younger individuals and as we look
at the clinical histories, that certainly is of greater interest, but again the shift
in the ROC curves are getting us to, at best, .68, at this time.
Maybe to .7 with the new SNPs that are being added.
And if we look at what we see when we look at the PSA screening alone in individuals under 65,
it's still not at that point and the question, is how do we put these together?
And this is going to require new studies or re-examination of some of the older studies
or the existing data that is yet to be really effectively
or comprehensively genotyped in this particular setting.
So at this time, I think we don't really have in hand a direct translation of GWAS findings.
We really haven't found the markers for prognosis,
so we can't really put them in the clinic yet.
We have adequate data to test SNPs in the clinical testing
in a very small subset of individuals.
But we really haven't figured out how to convey this risk and unfortunately,
we're in a world where this information is coming so quickly
with direct-to-consumer studies, companies, and also now with the availability
of genome sequencing to a small percentage of the population.
We have to ask the question, "How stable is the genome?
Can we trust what we call the genome?"
And we think we have to be very careful and look very closely at what we have in our comparisons
because as we move from next-generation sequencing of the germline,
from candidate exomes and regional GWAS to whole genome and exome genomes, we're going to have
to have functional information to be able to make these direct associations.
We know that we're going to have to use a multiplex family designs particularly,
for exome and sequencing at this time, because the exome data is just so over whelming.
If we look at the exome data from 30 different individuals,
we see thousands of exomic variants many of which we have never seen
and then the question is, "How do we test them all if we don't have the pedigrees in line?"
I think the association testing is going to be very difficult
to find these highly penetrant mutations.
We, so to speak, have to leave this behind and look primarily, you know,
where we see strong effects of the Mendelian-type disorders.
So finishing genomes, the case for portion of genetic contribution
to these traits resides in the unmapped regions.
And for cancer, this is going to be particularly challenging.
There are regions about 10-12% that are particularly difficult to get at.
And we know from the exome data that some 15% of cancer genes are not well covered
by the exome regions, due to GC content, paralogy, and the like.
And we have to really go very, very carefully at this.
So lastly, the challenge of next- generation sequencing, not only is going to have the cost,
but the population private variants.
It's going to be very difficult for us to interpret in the clinical setting.
The accuracy of the base calling and the plethora of novel variants is going
to make this very difficult and we're going to have insufficient capacity
to prioritize thousands of promising variants.
So I think we have to be careful in moving this too quickly into the clinical venue
because we do have limits to the agnostic statistical analysis.
So I'd like to just end by saying
that Genome-Wide Association Studies are really just the start.
It's not the end, it's not even the beginning of the end.
But it's perhaps the end of the beginning.
And I think we have new paradigms that we have established.
We go forward and we have to go forward with our eyes wide open,
because it is a very exciting time and it is a time to work together and not work separately
to really be able to bring these to some kind of clinical fruition.
So let me end here and acknowledge the wonderful people in the NCI and collaborators
at Harvard School Public Health and many of the consortiums who've contributed data,
some of which I've talked about today.
But again, it really is out of collaboration that we are able to get where we are right now.
Thank you very much.
[WS-L1]I wouldn't delete this - I heard the word.