Forensic Anthropology 2011 : 10 : Population Genetics pt.1

Uploaded by TheNFSTC on 03.10.2012

[ Music ]
>>My training was in human population genetics.
And I started working with DNA polymorphisms
in the early 1980's to study humans for a variety of reasons.
And became involved in the late 80's and early 90's
when DNA started being introduced
into court as an expert witness.
And I testified in many different places around the US
and in Canada and found it quite interesting.
It soon became unnecessary for expert witnesses.
So I basically dropped out of forensics.
It was never my main area of research.
And then at the time of the World Trade Center attacks,
an expert committee was set up, and I was asked to be on it.
It became obvious that the standard forensic markers
for individual identification did not work well
in a sizable number of cases
because of the extreme degradation of the DNA
and because they could say nothing about ancestry
of the sample, which would in itself be a help
in identifying the person.
At that point I realized that we had all the expertise
and the samples necessary to do a lot of research.
So I got back involved.
And let me then start the lecture and talk about some
of what we're doing and why DNA can be very important both
in ancestry and phenotype.
And I could say that DNA is going
to make everything you've learned so far unnecessary,
but I won't go that far.
>>DNA is not going to solve all problems
but it already can be extremely helpful when available
and it's going to be much more helpful in a few years
because of projects that are ongoing now.
Caveat. What we know now, as I hope you will you learn,
is often over interpreted -- that there is the implicit thing
that if a conclusion is based on DNA it's immutable,
it's true, it's precise.
Ain't so. So let's talk about the DNA in the human genome.
We've got mitochondrial DNA, which is a very small part
of the genome -- less than 17,000 base pairs as a circle --
compared to the nuclear DNA which is
over 3 point 3 billion base pairs of DNA.
The autosomes are 22 chromosomal pairs of varying sizes.
The sex chromosomes, one pair unmatched, females have two X's.
Males have an X and a Y. The Y chromosome can be subdivided
into a small part that recombines with the X chromosome
at the tip of the chromosome.
That is important in segregation during meiosis forming gametes,
and then the non-recombining part which is inherited
without any recombination.
So two parts -- they're both interesting
and have different implications for how they're studied.
All of these segments of DNA have polymorphisms.
So what's a polymorphism?
It literally translates to a part of the DNA
that occurs in many forms.
Depending upon how big the segment you look at,
one base pair will generally occur only in two forms,
sometimes three and four.
But in general the least common form must --
or the least common form, less than one percent
or the most common form at least less than 99 percent.
So we try to make a distinction between a polymorphism --
which in general must mean normal even
in it's got functional differences because millions
of people around the world will have that form of DNA --
and the rare variants that cause disease.
So the other part of this,
the idea that site is the polymorphism that occurs
in different forms, each of which is an allele.
There's a great tendency to call an allele a polymorphism.
It's the site that's the polymorphism, and also SNP,
single nucleotide polymorphism, that you'll hear about more.
So the types of polymorphisms are a combination
of how one detects it and the nature of the variation.
So the restriction fragment length polymorphisms was one
of the first technologies.
Those can be almost any of the other types in terms of the DNA.
It's basically a way of detecting variation.
Pretty much obsolete.
The short tandem repeat polymorphisms,
I always put the P on.
In forensics you generally hear of it as STRs.
But it's the polymorphism part that's important.
They're short tandem repeats.
The VNTRs are generally longer segments that occur
in tandem repeats, but similar in concept.
Insertions, deletions -- a bit of DNA from megabases,
huge deletions in some areas that seem to be compatible
with normality down to one base pair more or less.
The single nucleotide polymorphisms instead
of an adenine or an adenosine, you've got guanosine,
et cetera just in the string of DNA.
And then the copy number variation
where it's not a tandem repeat, but a segment of DNA may occur
in two copies or three copies -- sometimes tandem,
sometimes a whole segment is missing --
analogous to a deletion.
So let's -- mitochondrial DNA.
It's this small loop of DNA.
It occurs in the mitochondrion.
It's the remnant of an early parasite
that invaded an early cell.
And it's now dependent upon the nuclear genome
and we're dependent upon the genes here
as the major energy producing genes and apparatus of the cell.
It has its own slightly different DNA code.
And it's got its own transfer RNA for making proteins
and its own machinery for making its own proteins.
But most of the proteins are now made in the nucleus of the cell
and imported into the mitochondrion.
The relevant point here is of course is
that there are many polymorphisms.
It's almost all coding.
So there are great restrictions on what variation can occur
because it has to be compatible with function.
So some variants are recurring.
They have arisen independently many times
because both alleles are compatible with functioning.
The other thing is that the control region is highly
variable because it only has limited function.
As long as most of it's the same, the replication
of the mitochondrion occurs normally.
It doesn't code for a protein.
So you'll see a lot of studies of hypervariable regions
which are highly polymorphic and then of the single nucleotide
or other variants around the rest of the mitochondrion.
An advantage is that for every cell, there are one, two,
maybe more thousands of copies of mitochondrial DNA.
So it is much more prevalent in a sample than is nuclear DNA.
And hence it has been studied
because it doesn't need quite the sophistication
and characterization as nuclear DNA.
So a lot of human ancestry information
and forensic identification has been based on mitochondrial DNA.
Some of it is very powerful, but it's not
as powerful as nuclear DNA.
So basically this summarizes what I've just been saying
about the variation in both segments.
But at any one site there's not such a huge number of variants.
Now the relevant thing for ancestry
and even individual identification is
that the mitochondrial DNA is inherited only
through the mother.
Because the little sperm is only a bundle of nuclear DNA
that gets into the egg.
But the egg comes already with all of the mother's mitochondria
and hence her mitochondrial DNA.
So even males have mitochondrial DNA.
They just don't transmit it to their children.
It's entirely based on what they inherited from their mother.
So going back five generations,
how many of your ancestors have your mitochondrial DNA?
[ Pause ]
>>One out of 32, assuming there's no inbreeding along
the line.
>>My father was inbred because his mother and father were
at least five generations removed from the two brothers
who originally settled in the colonies.
And they met just because the young soldier coming
through town was asked if he'd met the Widow Kidd
and her three beautiful young daughters.
So young soldier, of course, wanted to meet them
and married one of them.
So within five generations it is not that uncommon.
So you're not learning much about your ancestry
from this type of DNA.
What about Y chromosome,
the non-recombining part of your Y chromosome.
It shows exactly the same pattern except it's the
paternal lineage.
So again, you're not learning much about your ancestry --
your overall ancestry -- from just this.
And I'll add that some of the companies
that do ancestry testing will use only Y-chromosome
or only mitochondrial DNA.
And they're not telling you a lot about your ancestry.
So I mentioned the mitochondrial variation and the Y chromosome.
There are relatively few genes.
So there's a fair amount of single nucleotide polymorphism.
And there are also many STRPs in the Y chromosome
that have a higher mutation rate.
>>Now let's look at autosomal.
Here is a sample pedigree
where I've colored the alleles coming down.
So we have ampersand and at sign alleles in a sister
and we have number sign and star allele in a brother.
They got the opposite alleles.
Now on this monitor, the green and the --
the green and the white don't show up very well.
But what can be said about the ancestry
who five generations ago contributed each
of these alleles?
Well, here we have the tracing back.
And notice that this blue ampersand allele,
as we go back we have a homozygote
and another homozygote.
And so we cannot really identify
where this particular allele came from.
But it has to be one of those five ancestors.
Similarly the at sign -- here going back through the white
because here is white --
it could only have come
from here even though both have the light green.
That means the light green must have come from here.
So from here, we can go back.
Again a homozygote, it could have been from either
of those or this ancestor.
So we know roughly where that came from.
We can go back with the hash mark, the number sign,
and the star is the only one where we know
that the father's father's mother's father father was the
origin of that particular allele.
But clearly overall we're beginning
to get a profile of the ancestry.
And if we look at many individual loci,
each locus is going to tell us something
about some of those ancestors.
And with many loci, because they're independent,
we get a picture of all of them.
So the next point is measuring variation.
We know that these polymorphisms exist in frequencies --
the individual alleles -- in frequencies in a population.
And a standard measure we use is called FST for --
F is the letter we use for the inbreeding coefficient,
subtotal to total, S and T. So in theory it's related
to random genetic drift.
Has anybody heard of random genetic drift?
A couple. You know that if you have two children,
there's a 50 percent chance that you give to both
of them the same allele that you have.
And that chance over many people means that the gene frequency
in a population of children will not be exactly the same
as the gene frequency in the parents.
And the smaller the number of parents and children,
the greater the possible fluctuation would be.
So among different populations, over time,
random genetic drift causes some changes.
And FST is a way, theoretically, of explaining that.
So here's an example.
I'm going to show many slides like this.
And I always arrange African populations on the left,
Middle East, European populations.
Here's Western Siberia, East Asia,
a couple of Pacific Island, Eastern Siberia,
North America, South America.
So here you see in black one of the two alleles --
the two have a frequency summing to one.
So the other allele is one minus this or coming
from the top instead of the bottom.
And you see two different polymorphic loci show different
patterns of variation around the world.
The expectation for any polymorphism you know nothing
about in advance is that there's a lot
of gene frequency variation around the world.
We are all alike in ethical ways.
But we are all different genetically.
Even if we're from the same ethnic group, we're different.
And so those differences become important.
>>Here's another example,
but these have low variation around the world.
Same populations, same order.
But they're not identical.
One of the ways, just as an example, to look at genotypes,
here are a bunch of individuals.
Each dot is an individual.
And this is using a TaqMan assay which measures fluorescence
as a function of the genotype.
And so across the bottom you've got the intensity
of fluorescence for one floor
and the Y axis you've got the intensity
of fluorescence for the other floor.
We typed 384 individuals at a time.
That's what each dot is.
And so you can see the blue here represent individuals
who have only the allele fluorescing in blue.
They are homozygous, only fluorescing
in red hence homozygous for the other allele,
and a bunch of heterozygotes who fluoresce both colors.
And the controls down here as black squares and those samples
that did not give an interpretable result.
And here are some that were not interpreting.
Here's one for whatever reason low fluorescence, low intensity,
whatever -- not being interpreted.
Here's one of the controls that not where we'd like it to be.
This is clearly real data.
But it's certainly not up here.
So it's not really affecting the interpretation.
So that's sort of a little bit of background.
But now exactly how are we using some of this DNA variation
and the polymorphism in forensics?
So it can be used to identify a criminal.
That's the way it's classically being used now.
There's DNA from a crime scene.
You've got the suspect's DNA.
They match.
And that's evidence for identity.
Most of what we're interested
in here is identifying human remains or maybe
from the crime scene trying to make some inference
of the ethnicity or ancestry or the phenotype of the individual
who left that DNA at the crime scene
on a supposition that's the criminal.
But DNA is used all the time in parentage testing.
And in the court system, the best use of DNA is
to exonerate innocent people.
I was once asked in cross examination
if I were falsely accused of a crime and there was DNA,
would I allow the DNA lab
and the Royal Canadian Mounted Police Forensics Labs
to test my DNA?
And my response was, 'Of course.
It's the surest way I know to prove I'm innocent.'
At which point the judge said
to the cross examining defense attorney,
'Don't you think you should stop working for the prosecution
and excuse this witness?'
>>So identifying human remains.
You may have, based on what we've experienced
from the World Trade Center attacks, may have known DNA.
Almost all of the firemen -- the first responders --
had given samples for bone marrow donation.
And so there was a known DNA sample available to test.
Clearly a lot of relatives brought in toothbrushes,
brought in dirty underwear, brought in all sorts
of personal objects from which DNA could be obtained.
And to date -- I forgot to bring the number with me,
but it's over 1600 of the individuals have been identified
with at least one little piece of bone.
>>Determine the phenotypic characteristics.
What hair color -- natural hair color -- did the person have?
What skin color?
Can we say anything
about whether it was thick or thin hair?
Could we say anything about height?
Or determine the ancestry in terms of more indigenous --
geographically indigenous -- origins.
So the forensic question in matching
to a known person is first what are the DNA patterns?
So this is a molecular and a laboratory issue.
Has the DNA been analyzed correctly?
Have the patterns been interpreted correctly?
Then, do the two patterns match?
Is the method used specific enough
that if the results are the same, you could say
that match for that locus?
Then the statistics, what are the chances
that two unrelated people have the same pattern?
Obviously that becomes very critical assuming the molecular
is done well.
And that's where databases are needed
because it all depends upon the allele frequency.
If the frequency of an allele is 99 percent
and you've got two homozygotes for that allele, well 81 percent
of the population has both alleles the same.
That really doesn't exclude a lot of people
as not being the same.
Not very informative.
And we'll get into that later.
So the CODIS markers --
the standard short tandem repeat polymorphisms --
used in cases nationwide now are a panel
of individual identification SNPs
that are clearly appropriate for this kind of question.
The lab methodology is pretty good.
And there are fair databases.
But individual identification is not the only type.
And remember I mentioned earlier the CODIS markers are not good
for ancestry.
They were picked because they are highly variable,
almost every place.
And so there's not a lot of difference
in allele frequencies among different populations.
So I came up with a classification
of four types a few years ago.
There are individual identification SNPs.
They have very low probabilities
of two individuals having the same multisite genotype.
So each SNP is optimized and the panel is very good.
Ancestry-informative SNPs would be sort of the opposite --
the high probability that an individual's ancestry comes
from one part of the world or maybe admixed
from two parts of the world.
Lineage SNPs are where we're trying to get
down to individual clans within a group -- extended families,
organized crime where it is a family.
And the phenotype-informative SNP --
SNPs that will, based on allelic differences that control parts
of the phenotype, will tell you something
about how a person looks.
>>So there are different requirements
for these different purposes of using SNPs.
And I'm concentrating on SNPs
because that's really the best type of DNA for any
of these applications in terms of laboratory methods,
numbers of markers available, and the detailed annotation.
So the importance here is that for the individual,
the ancestry, and the clannish or lineage markers,
they represent a small fraction of all available polymorphisms.
So one wants to search for and optimize a set
that is particularly appropriate for one of those purposes.
The phenotype informative SNPs are also uncommon, but they deal
with specific phenotypes.
And as yet, though there are good candidates,
they're poorly documented for exactly how they function
in development of the phenotype.
So there are now five or six loci
that we know are clearly involved in the amount
of melanin in the skin, but we don't know how they interact.
So while we can type them and make predictions,
the predictions are based on associations
without a clear understanding of the interactions
when you look at all of them.
>>So general criteria.
I'm reiterating myself to an extent.
Readily typable, has a unique marker, highly informative
for the stated purpose, and well documented
for such relevant characteristics
as allele frequencies,
association with phenotype, biology.
So which ones are going to be best?
So we want the maximum amount of information per SNP,
but what do we mean by information?
And we want SNPs that are not subject to typing difficulties
and what kind of typing difficulties exist.
So additional slides will amplify the first.
Let me verbally amplify the second.
Almost all of the typing methods involve using bits of DNA
that are complementary to either conduct amplification
of a fragment of DNA
or specifically probe the small region
around where the known variant is.
But if there are other variants nearby that interfere
with either of those then one may not get an accurate reading.
The test fails.
And so if you've got a heterozygote,
you only detect one of the alleles.
It's not that the polymorphism is not valid.
It's that that method does not detect it accurately.
>>There are other problems that anybody working
in a laboratory knows about the phase of the moon, the --
what you ate for dinner the night before.
All of these are probably real variables
in humorous sense at least.
But no method is perfect.
No dataset is error free.
We have to try to minimize them.
And that's where the prior work will be best.
So in terms of amount of information, we're talking
about alleles, we're talking about allele frequencies.
But what we see in the population is individuals
who have two copies -- one from the mother, one from the father.
So fortunately back in 1904,
a geneticist asked a mathematician
about this question.
And Hardy and Weinberg came
up with this very simple relationship based
on elementary probability.
And as a function of the gene frequencies,
you can see here the genotype frequencies.
And basically P squared, 2PQ,
Q squared is the Hardy Weinberg ratios.
It's the square of the quantity P plus Q, the quantity squared
where P and Q are frequencies of the two different alleles.
So it's very elementary probability.
If we want the most diversity within a population,
we clearly want the allele frequencies to be
at point five -- point five.
So for individual identification,
the lowest probability
of somebody unrelated being the same is
if the allele frequencies are equal to P,
equal Q, equal point five.
But remember I said they always differ among populations.
So here's where the low FST and here we're talking
about heterozygocity, the frequency in this green line
of an individual having two different alleles being
a heterozygote.
In the zygote they had two different alleles.
So that's one aspect.
For ancestry identification, we want the opposite.
We want one population to be like this
and the other population to be like that.
So when we test it, we've got a distinction
between the populations.
>>So, let's review now in terms
of ancestry information a little bit about what we really do know
about modern human evolution.
This is, at its basics,
no longer controversial, absolutely accepted.
There's of course infinite argument
about the nitty picky fine details.
That will always go on.
This is science and these are humans who are looking at it.
But it's clear.
Modern humans evolved in Africa roughly 200,000 years ago.
And it's also very clear
that considerable genetic variation accumulated
in Africa and it's still there.
Where are the shortest people in the world?
>>African pygmies.
Where are, on average, the tallest people in the world?
The Nilotics in Africa.
>>And tremendous variation.
In the US, among non scientists, there tends to be --
and even among some scientists who don't know much
about human variation --
there tends to be the assumption Africa is
genetically homogenous.
Well, no. [Pause]
>>About 100,000 years ago -- and here's where the argument,
some say as recently as 80,000,
some have even said 60,000 years ago --
some individuals left Africa into southwest Asia.
And the single population had only a small fraction
of the genetic variation present in Africa.
And that population then expanded
to occupy the rest of the world.
>>And here is how I put it in a pointillist way.
And this has been reproduced in National Geographic.
And if you see the race exhibit that's going in museums
around the country it's currently
at the Smithsonian in Washington.
>>This is part of the triple A online also -- The Race Project.
>>Is it online?
Well it's animated in the museum.
I don't know about online.
But it's clear where just the different colors represent
generalized genetic variation
that Africa had accumulated a lot by 100,000 years ago.
But notice it's not uniformly distributed.
There's a little more red here.
There's a little more blue and yellow here.
Typical of any widespread mammalian or any other species,
the fringes of the distribution don't have all of the variation.
There are little bit.
That's gene flow and random genetic drift.
Well the last time I left Africa,
it was in a 747 from Johannesburg.
A hundred thousand years ago, the only way out of Africa was
out of Northeast Africa into Southwest Asia.
And we know that by 40,000 years ago that population had spread.
And if you look carefully, there is less variation
out here than there is here.
And yet it's dramatically less all over than it is in Africa.
So basically if you wiped everybody
of non-African origin out,
the human species would still almost all
of the genetic variation that it has today.
Non-Africans represent a subset of genetic variation
and it's characterized by a loss of variation
as humans have spread out of Africa with a few exceptions,
but they're the exceptions.
So random genetic drift can explain most of that,
but selection also occurs.
We all believe in evolution.
That's selection.
These things evolve to get food into mouth in part.
I have no trouble using them to eat.
>>How do we detect selection?
We can argue that higher FSTs indicate selection in one part
of the world than not.
But that's hard.
You have to be very specific.
And it can occur by chance.
So one of the methods of detecting selection is the idea
that a particular variant in one part
of the world has become common quickly
where random genetic drift
to become common would take many generations.
The result is lest recombination in the DNA flanking it.
So you tend to get around a variant that's been selected
for an extended part of the DNA that is all identical.
>>For example, what about lactose tolerance as an adult?
Do you all know about that?
Well I've got the genes for it.
Sorry I just hit the mike.
That is essentially fixed
for one particular variant in Northern Europe.
And it shows a cline, a gradient, from low frequency
in Southern Europe to higher frequency in Northern Europe,
and very strong evidence of selection.
The plausible hypothesis is that as the Neolithic moved north,
your cow was in your hut with you during the winter
when there was little to eat outside.
And if you could use the cow's milk fresh,
you would survive the winter better.
And if you as the hunter gatherer during the winter died,
your children were going to die.
So there's very strong survival value
in being able to drink fresh milk.
In Southern Europe what happens to fresh milk?
Converted to yogurt.
So yogurt is a varied part of that diet,
but what makes yogurt?
Lactobacillus that digests the lactose.
So here we have culture
and selection operating on a genetic trait.
East Africa has adult lactose tolerance as well,
but from a different independent mutation.
There are many ways one can think of looking at selection,
but they're mostly at the moment statistical
until there's a solid biological explanation.
And the one I just gave you is a good story,
probably makes sense.
But it's not proof.
So there are others.
We know there are variants in hemoglobin.
Everybody knows about sickle cell hemoglobin.
There, there is proof.
The different susceptibility of the different genotype
to infection from the trypanosomes is clear.
The survival of infants is clear in a malarial environment.
[ Music ]