15 Parentage Analyses
The analysis of parental and offspring data
- Parentage-type analyses.
- Analyses based upon multiple paternity
Parentage analyses are used in a broad range of studies:
- Identical vs fraternal twins
- Agricultural crop line differentiation
- Differentiate between livestock/dog/cat breeds
- Pathogenic strain identification (e.g., Hep-C strains a-e)
- Assigning parentage to individuals
- Assigning individuals to populations
- Identifying the source of unknown tissues (e.g., gettin’ perps)
A statistical approach for identifying the parent(s) of a particular individual. This requires:
- A set of genetic markers that are bi-parentally inherited
- Variation in these markers
- Some assumptions about the prior probability of the union of parents.
15.1 Paternity vs. Maternity
For single parent parentage analysis, it is either paternity or maternity that is being established. Here we assume that the other parent is definitely the biological parent of the individual (e.g., \(P(prior)=1\)). This can be because:
- The offspring was collected from the identified parent
- There is other evidence that points to the identified parent being the biological one.
The goal then is to determine who the unidentified parent is with some level of statistical probability.
Terms Used in Parentage
The following terms are commonly used in parentage analyses:
Extra-Pair Paternity - Fertilization resulting from copulation outside a recognized pair bond.
Multiple Paternity - Offspring produced from mating events with different sets of individuals.
Paternity/Maternity Exclusion - Excluding an individual based upon an in-congruence in observed genetic data.
15.2 Probability of Exclusion
Mother | Offspring | Excluded Dads | Probability |
---|---|---|---|
\(A_1A_1\;(p^2_1)\) | \(A_1A_1\;(p_1)\) | \(A_2A_2\;(p_2^2)\) | \(p_1^3p_2^2\) |
\(A_1A_1\;(p^2_1)\) | \(A_1A_2\;(p_2)\) | \(A_1A_1\;(p_1^2)\) | \(p_1^4p_2\) |
\(A_1A_1\;(p^2_1)\) | \(A_2A_2\;(0)\) | - | - |
\(A_1A_2\;(2p_1p_2)\) | \(A_1A_1\;(\frac{p_1}{2})\) | \(A_2A_2\;(p_2^2)\) | \(p_1^2p_2^3\) |
\(A_1A_2\;(2p_1p_2)\) | \(A_1A_2\;(\frac{1}{2})\) | - | - |
\(A_1A_2\;(2p_1p_2)\) | \(A_2A_2\;(\frac{p_2}{2})\) | \(A_1A_1\;(p_1^2)\) | \(p_1^3p_2^2\) |
\(A_2A_2\;(p^2_2)\) | \(A_1A_1\;(0)\) | - | - |
\(A_2A_2\;(p^2_2)\) | \(A_1A_2\;(p_1)\) | \(A_2A_2\;(p_2^2)\) | \(p_1p_2^4\) |
\(A_2A_2\;(p^2_2)\) | \(A_2A_2\;(p_2)\) | \(A_1A_1\;(p_1^2)\) | \(p_1^2p_2^3\) |
Probability of Exclusion
[ P_{exc} = p_13p_22 + p_1^4p_2 +p_12p_23 +p_13p_22 +p_1p_2^4 +p_12p_23 \ ]
which when simplified down a bit becomes
[ P_{exc} = p_1p_2(1-p_1p_2) ]
Single Locus Paternity Exclusion
<- seq(0,1,by=0.02)
p <- 1-p
q <- p*q*(1-p*q)
Pexcl plot(Pexcl ~ p, xlab="Allele frequency, p", ylab="Paternity Exclusion Probability")
Multilocus Exclusion
Exclusion probabilities are multiplicative properties.
[ P_{excl} = 1 - {i=1}^(1-P{excl,;i}) ]
Example
<- rep( 0.5*0.5*(1-0.5*0.5), 5 )
p p
[1] 0.1875 0.1875 0.1875 0.1875 0.1875
<- 1
ptot for( i in 1:length(p))
<- ptot * (1-p[i])
ptot 1-ptot
[1] 0.6459074
Mother | Offspring | Excluded Father \(A_xA_y\) | Probability of Exclusion |
---|---|---|---|
\(A_iA_i\;(p_i^2)\) | \(A_iA_i\;(p_i)\) | \(x,y\ne i\;\; (1-p_i)^2\) | \(p_i^3(1-p_i)^2\) |
\(A_iA_i\;(p_i^2)\) | \(A_iA_j\;(p_j)\) | \(x,y\ne j\;\; (1-p_j)^2\) | \(p_i^2p_j(1-p_i)^2\) |
\(A_iA_j\;(2p_ip_j)\) | \(A_iA_i\;(\frac{p_i}{2})\) | \(x,y \ne i\;\; (1-p_i)^2\) | \(p_i^2p_j(1-p_i)^2\) |
\(A_iA_j\;(2p_ip_j)\) | \(A_iA_j\;(\frac{p_i+p_j}{2})\) | \(x,y \ne i,j\;\; (1-p_i-p_j)^2\) | \(p_ip_j(p_i+p_j)(1-p_i-p_j)^2\) |
\(A_iA_j\;(2p_ip_j)\) | \(A_iA_k\;(\frac{p_k}{2})\) | \(x,y \ne k\;\; (1-p_k)^2\) | \(p_ip_jp_k(1-p_k)^2\) |
\(A_iA_j\;(2p_ip_j)\) | \(A_jA_k\;(\frac{p_k}{2})\) | \(x,y \ne k\;\; (1-p_k)^2\) | \(p_ip_jp_k(1-p_k)^2\) |
\(A_iA_j\;(2p_ip_j)\) | \(A_jA_j\;(\frac{p_j}{2})\) | \(x,y \ne j\;\; (1-p_j)^2\) | \(p_ip_j^2(1-p_j)^2\) |
15.3 Paternity Exclusion
Likelihood Ratios
A likelihood ratio is given by:
[ LR = ]
where the \(H_X\) values are the hypotheses probabilities.
Nomenclature For Parentage
Individual | Identifier | Genotype |
---|---|---|
Female Parent | \(FP_i\) | \(\alpha_i\) |
Putative Male Parent | \(MP_j\) | \(\beta_j\) |
Offspring | \(O_k\) | \(\gamma_k\) |
\(\;\)
Paternal Probability The posterior odds of paternity versus non-paternity given the totality of genetic information.
Likelihood Ratios | Genetic Equivalences
The likelihood of one hypothesis, \(H_1\) relative to another \(H_2\) is:
[ L(H_1,H_2|D) = ]
where
[ P(D|H) = T(| , )P()P() ]
Assuming \(H_1:\) states that \(\beta\) is the real father of \(\gamma\) on \(\alpha\) and \(H_2:\) states that he is just a random individual in the population is:
[ L(H_1,H_2|,,) =
]
which can be simplified to:
[ _j = \ = \ = ]
where \(T(X|Y)\) is the Mendelian transition probability of offspring \(X\) given parent \(Y\).
Assumptions in Model of Paternity Likelihood
The basic paternity exclusion model assumes:
- Completely random mating (can be modified by changin priors)
- Independent assortment of alleles
Likelihood Example
Consider the maternal individual whose genotypes are: [ FP = {AA,;Bb,;CC,;Dd} ]
Whose \(i^{th}\) offspring has the genotypes:
[ O_i = {AA,;BB,;Cc,;dd} ]
Likelihood Example | \(T(O|FP)\)
The transition probability, \(T(O|FP)\), is then:
Individual | Locus1 | Locus2 | Locus3 | Locus4 |
---|---|---|---|---|
\(FP\) | \(AA\) | \(Bb\) | \(CC\) | \(Dd\) |
\(O_i\) | \(AA\) | \(BB\) | \(Cc\) | \(dd\) |
\(T(O|FP) = 1*0.5*1*0.5 = 0.25\)
Likelihood Example | Putative Male Parents
Individual | Locus1 | Locus2 | Locus3 | Locus4 |
---|---|---|---|---|
\(MP_1\) | \(Aa\) | \(BB\) | \(cc\) | \(Dd\) |
\(MP_2\) | \(AA\) | \(BB\) | \(Cc\) | \(dd\) |
\(\;\)
Which one of the potential fathers is the most likely parent?
Likelihood Example | First Putative Father
Individual | Locus1 | Locus2 | Locus3 | Locus4 |
---|---|---|---|---|
\(FP\) | \(AA\) | \(Bb\) | \(CC\) | \(Dd\) |
\(MP_1\) | \(Aa\) | \(BB\) | \(cc\) | \(Dd\) |
\(O_i\) | \(AA\) | \(BB\) | \(Cc\) | \(dd\) |
\(T(O|FP,MP)\) | 0.5 | 0.5 | 1.0 | 0.25 |
[ T(O_1|FP,MP_1) = 0.5 * 0.5 * 1.0 * 0.25 = 0.0625 ]
And
[ _1 = = = 0.25 ]
Likelihood Example | Second Putative Father
Individual | Locus1 | Locus2 | Locus3 | Locus4 |
---|---|---|---|---|
\(FP\) | \(AA\) | \(Bb\) | \(CC\) | \(Dd\) |
\(MP_2\) | \(AA\) | \(BB\) | \(Cc\) | \(dd\) |
\(O_i\) | \(AA\) | \(BB\) | \(Cc\) | \(dd\) |
\(T(O|FP,MP)\) | 1.0 | 0.5 | 0.5 | 0.5 |
[ T(O_1|FP,MP_2) = 1.0 * 0.5 * 0.5 * 0.5 = 0.125 ]
And
[ _2 = = = 0.5 ]
Likelihood Example | Interpretation of Results {.build}
Most likely parent is \(MP_2\) because \(\lambda_2 = 0.5 > \lambda_1 = 0.25\).
\(\;\)
Does this mean that \(MP_2\) is the real parent?
Likelihood Example | In Class Exercise - Whose the daddies?
Individual | Locus 1 | Locus 2 | Locus 3 |
---|---|---|---|
Mother | \(A_1A_1\) | \(B_1B_3\) | \(C_1C_1\) |
Offspring 1 | \(A_1A_2\) | \(B_1B_3\) | \(C_1C_2\) |
Offspring 2 | \(A_1A_1\) | \(B_3B_3\) | \(C_1C_1\) |
Offspring 3 | \(A_1A_1\) | \(B_1B_1\) | \(C_1C_1\) |
Dad 1 | \(A_1A_2\) | \(B_2B_3\) | \(C_1C_1\) |
Dad 2 | \(A_2A_2\) | \(B_1B_1\) | \(C_1C_2\) |
Dad 3 | \(A_1A_1\) | \(B_2B_3\) | \(C_1C_2\) |
Dad 4 | \(A_1A_1\) | \(B_1B_1\) | \(C_2C_2\) |
library(gstudio)
<- c("Locus-A","Locus-B","Locus-C","Locus-D")
loci <- data.frame(Locus = rep(loci, each = 4),
freqs Allele = rep(LETTERS[1:4], times = 4),
Frequency = 0.25)
freqs
Locus Allele Frequency
1 Locus-A A 0.25
2 Locus-A B 0.25
3 Locus-A C 0.25
4 Locus-A D 0.25
5 Locus-B A 0.25
6 Locus-B B 0.25
7 Locus-B C 0.25
8 Locus-B D 0.25
9 Locus-C A 0.25
10 Locus-C B 0.25
11 Locus-C C 0.25
12 Locus-C D 0.25
13 Locus-D A 0.25
14 Locus-D B 0.25
15 Locus-D C 0.25
16 Locus-D D 0.25
<- make_population( freqs, N=100 )
adults $OffID <- 0
adults<- adults[ , c(1,6,2:5)]
adults 1:5,] adults[
ID OffID Locus-A Locus-B Locus-C Locus-D
1 1 0 B:D C:D C:D B:D
2 2 0 C:D A:B B:C B:D
3 3 0 A:B B:D A:D A:D
4 4 0 B:C A:C A:A A:D
5 5 0 C:D B:C B:C D:D
<- data.frame()
offs <- adults[1,]
mom for( i in 1:20){
<- runif( 1, min=2, max=100)
dad_id <- adults[dad_id,]
dad <- mate( mom, dad, N=1 )
off <- rbind( offs, off )
offs
}$OffID <- 1:20
offs1:5,] offs[
ID OffID Locus-A Locus-B Locus-C Locus-D
1 1 1 D:D B:D C:C D:D
2 1 2 B:B C:D C:C B:D
3 1 3 A:D B:C C:D A:D
4 1 4 B:C A:D B:C B:C
5 1 5 A:B C:D B:C B:B
<- rbind( adults, offs )
data <- data[ order(data$ID,data$OffID),]
data rownames(data) <- 1:nrow(data)
1:10,] data[
ID OffID Locus-A Locus-B Locus-C Locus-D
1 1 0 B:D C:D C:D B:D
2 1 1 D:D B:D C:C D:D
3 1 2 B:B C:D C:C B:D
4 1 3 A:D B:C C:D A:D
5 1 4 B:C A:D B:C B:C
6 1 5 A:B C:D B:C B:B
7 1 6 C:D C:D B:C D:D
8 1 7 B:D C:D B:D A:B
9 1 8 A:B C:D A:C B:C
10 1 9 B:B B:C C:C B:B
<- frequencies( data[ data$OffID==0,] )
f <- exclusion_probability( f )
excl excl
Locus Pexcl PexclMax Fraction
1 Locus-A 0.5036015 0.5039062 0.9993953
2 Locus-B 0.5037844 0.5039062 0.9997581
3 Locus-C 0.5038453 0.5039062 0.9998791
4 Locus-D 0.5036015 0.5039062 0.9993953
<- excl$Pexcl
p <- 1 - prod( 1-p )
excl_multilocus excl_multilocus
[1] 0.9393336
<- data[ data$ID==1, ]
family minus_mom( family )
ID OffID Locus-A Locus-B Locus-C Locus-D
2 1 1 D B C D
3 1 2 B C:D C B:D
4 1 3 A B C:D A
5 1 4 C A B C
6 1 5 A C:D B B
7 1 6 C C:D B D
8 1 7 B:D C:D B A
9 1 8 A C:D A C
10 1 9 B B C B
11 1 10 A D C:D B:D
12 1 11 A A A B:D
13 1 12 A B A B:D
14 1 13 C D D D
15 1 14 A C D B:D
16 1 15 B:D A C:D D
17 1 16 B:D C:D C:D B
18 1 17 C C B A
19 1 18 C A A C
20 1 19 C D C:D B
21 1 20 A C:D B A
<- adults[2:100,]
dads <- adults[1,]
mom <- offs[1,]
off for( i in 1:nrow(dads)){
<- dads[i,]
dad <- transition_probability(off,mom,dad)
T if( T > 0 )
cat("Father",i,"may be the real father (T =",T,")\n")
}
Father 1 may be the real father (T = 0.00390625 )
Father 4 may be the real father (T = 0.0078125 )
Father 13 may be the real father (T = 0.00390625 )
Father 20 may be the real father (T = 0.00390625 )
Father 52 may be the real father (T = 0.00390625 )
Father 55 may be the real father (T = 0.0078125 )
Father 75 may be the real father (T = 0.00390625 )
Father 92 may be the real father (T = 0.0078125 )
15.4 Fractional Paternity
In cases where we have more than one putative father, we may want to get an idea of the relative strength of our inferences by comparing the likelihood ratios for all dads.
- We may use arbitrary cut-offs, or
- We may use all non-excluded dads, but weighted by their fractional contributions
Conditional Probability
Problem: We have several putative fathers (\(MP_i, MP_j, MP_k, ... , MP_m\)) have been found to have non-zero likelihoods of paternity.
\(\;\)
Question: What is the relative likelihood of paternity given these putative fathers?
Conditional Probability
Conditional probability determines the likelihood of an event (paternal likelihood) given that some other event has already happened (not excluded as a potential father).
[ P(MP=j^*|FP=i,O=k) = ]
If we can assume that \(P(MP=j|FP=i) = c\) (e.g., the frequencies of the female and male parents are constant with respect to the individual offspring being considered) then,
[ P(MP=j^*|FP=i,O=k) = ]
Fractional Paternity
Some things to consider when using fractional analyses for paternity.
- Not usually used in human studies.
- Can be considered a prior probability of paternity.
- Can include ecological, spatial, evolutionary components such as differential attractiveness, pollen fertility, output, etc.
- Possible tautology
Every potential father is assigned paternity, the fraction of \(X_{ik}\) on a particular \(MP_j\) is proportional to the likelihood ratio.
<- paternity( offs, mom, dads )
frac summary(frac)
MomID OffID DadID Fij
Min. :1 Min. : 1.00 Min. : 2.0 Min. :0.02439
1st Qu.:1 1st Qu.: 5.00 1st Qu.:22.0 1st Qu.:0.04878
Median :1 Median :11.00 Median :58.5 Median :0.09091
Mean :1 Mean :10.35 Mean :52.1 Mean :0.13889
3rd Qu.:1 3rd Qu.:15.00 3rd Qu.:77.0 3rd Qu.:0.18182
Max. :1 Max. :20.00 Max. :98.0 Max. :0.66667
1:10,] frac[
MomID OffID DadID Fij
1 1 1 5 0.18181818
2 1 1 56 0.18181818
3 1 1 93 0.18181818
4 1 1 2 0.09090909
5 1 1 14 0.09090909
6 1 1 21 0.09090909
7 1 1 53 0.09090909
8 1 1 76 0.09090909
9 1 2 12 0.23529412
10 1 2 24 0.23529412
<- table(frac$OffID)
t t
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
8 6 13 8 2 9 7 3 3 12 3 10 3 10 13 16 5 3 6 4
<- nrow(dads)*(1-excl_multilocus)
N_pexcl <- t.test(as.numeric(t),mu = N_pexcl)
fit fit
One Sample t-test
data: as.numeric(t)
t = 1.2987, df = 19, p-value = 0.2096
alternative hypothesis: true mean is not equal to 6.005976
95 percent confidence interval:
5.275711 9.124289
sample estimates:
mean of x
7.2
Maternity Analysis
Putative father identified by mother unknown.
[ _i = \ = ]
where \(P(\gamma)\) is the frequency of the offspring genotype in the population. All other things are the same.
Cryptic Gene Flow
Consider the case where:
- You have identified a set of offspring collected from mothers.
- Identified a set of fathers that are probabilistically sires of the offspring.
15.5 Dispersal Kernels
Estimating the Disperal Distribution
Once a collection of paternity estimates have been determined, you can use them to estimate a dispersal kernel, describing the probability of paternity as a function of distance from the maternal individual.
Dispersal Kernels | Distributions
The form of the distribution is critical for estimation. It determines:
- The shape of the distribution
- The variance of the distribution
- Quantitative estimates and hypotheses you get from the data
Example Kernel Distribution Families
Normal Family
[ p(a|x,y) = exp]
where \(r = \sqrt{ x^2 + y^2}\) and \(a = \sigma \sqrt{2}\).
\(\;\)
This produces a thin tailed distribution.
Example Kernel Distribution Families
Exponential Family
[ p(a,b|x,y) = exp]
where \(\Gamma(a,b)\) is the gamma function and \(b\) is a ‘shape’ parameter.
- When \(b=1\) This is the exponential distribution.
- When \(b=2\) this is the normal function.
- When \(b<1\) this is a fat-tailed distribution.
Example Kernel Distribution Families
Other distributions you may run across include:
- The Geometric distribution,
- The Weibull family of distributions,
- The 2Dt family of distributions.
<- abs(rnorm(10000))
r1 <- rexp(10000)
r2 <- data.frame( Distribution=c(rep(c("Normal","Exponential"),each=10000)), Value=c(r1,r2))
df library(ggplot2)
ggplot(df,aes(x=Value,fill=Distribution)) + geom_density(alpha=0.75) + theme_bw() + ylab("Frequency")
Concerns with kernel estimation
The following are some assumptions that are inherent in the use of dispersal kernels for estimating connectivity.
- All functions are continuous,
- All functions assume isotropy in dispersal,
- All functions explicitly assume homogeneity of the dispersal matrix.
Skills
In this lecture we covered some rather simple parent/offspring relationships and how we can analyze them. Specifically, you should be comfortable with:
- Understanding the qualities of loci that make for more powerful parentage analyses.
- Be able to estimate single and multilocus exclusion probabilities and understand what they mean.
- Estimate likelihood ratios for paternity given Mother, Offspring, and Putative Male Parent.
- Use fractional paternity and understand conditional probability and how it applies to parentage.
- Understand dispersal kernel estimation.