15 Parentage Analyses

The analysis of parental and offspring data

Parentage-type analyses.
Analyses based upon multiple paternity

Parentage analyses are used in a broad range of studies:

Identical vs fraternal twins
Agricultural crop line differentiation
Differentiate between livestock/dog/cat breeds
Pathogenic strain identification (e.g., Hep-C strains a-e)
Assigning parentage to individuals
Assigning individuals to populations
Identifying the source of unknown tissues (e.g., gettin’ perps)

A statistical approach for identifying the parent(s) of a particular individual. This requires:

A set of genetic markers that are bi-parentally inherited
Variation in these markers
Some assumptions about the prior probability of the union of parents.

15.1 Paternity vs. Maternity

For single parent parentage analysis, it is either paternity or maternity that is being established. Here we assume that the other parent is definitely the biological parent of the individual (e.g., \(P(prior)=1\)). This can be because:

The offspring was collected from the identified parent
There is other evidence that points to the identified parent being the biological one.

The goal then is to determine who the unidentified parent is with some level of statistical probability.

Terms Used in Parentage

The following terms are commonly used in parentage analyses:

Extra-Pair Paternity - Fertilization resulting from copulation outside a recognized pair bond.

Multiple Paternity - Offspring produced from mating events with different sets of individuals.

Paternity/Maternity Exclusion - Excluding an individual based upon an in-congruence in observed genetic data.

15.2 Probability of Exclusion

Mother	Offspring	Excluded Dads	Probability
\(A_1A_1\;(p^2_1)\)	\(A_1A_1\;(p_1)\)	\(A_2A_2\;(p_2^2)\)	\(p_1^3p_2^2\)
\(A_1A_1\;(p^2_1)\)	\(A_1A_2\;(p_2)\)	\(A_1A_1\;(p_1^2)\)	\(p_1^4p_2\)
\(A_1A_1\;(p^2_1)\)	\(A_2A_2\;(0)\)	-	-
\(A_1A_2\;(2p_1p_2)\)	\(A_1A_1\;(\frac{p_1}{2})\)	\(A_2A_2\;(p_2^2)\)	\(p_1^2p_2^3\)
\(A_1A_2\;(2p_1p_2)\)	\(A_1A_2\;(\frac{1}{2})\)	-	-
\(A_1A_2\;(2p_1p_2)\)	\(A_2A_2\;(\frac{p_2}{2})\)	\(A_1A_1\;(p_1^2)\)	\(p_1^3p_2^2\)
\(A_2A_2\;(p^2_2)\)	\(A_1A_1\;(0)\)	-	-
\(A_2A_2\;(p^2_2)\)	\(A_1A_2\;(p_1)\)	\(A_2A_2\;(p_2^2)\)	\(p_1p_2^4\)
\(A_2A_2\;(p^2_2)\)	\(A_2A_2\;(p_2)\)	\(A_1A_1\;(p_1^2)\)	\(p_1^2p_2^3\)

Probability of Exclusion

\[ P_{exc} = p_1^3p_2^2 + p_1^4p_2 +p_1^2p_2^3 +p_1^3p_2^2 +p_1p_2^4 +p_1^2p_2^3 \\ \]

which when simplified down a bit becomes

\[ P_{exc} = p_1p_2(1-p_1p_2) \]

Single Locus Paternity Exclusion

p <- seq(0,1,by=0.02)
q <- 1-p
Pexcl <- p*q*(1-p*q)
plot(Pexcl ~ p, xlab="Allele frequency, p", ylab="Paternity Exclusion Probability")

Multilocus Exclusion

Exclusion probabilities are multiplicative properties.

\[ P_{excl} = 1 - \prod_{i=1}^\ell(1-P_{excl,\;i}) \]

Example

p <- rep( 0.5*0.5*(1-0.5*0.5), 5 )
p

## [1] 0.1875 0.1875 0.1875 0.1875 0.1875

ptot <- 1
for( i in 1:length(p))
  ptot <- ptot * (1-p[i])
1-ptot

## [1] 0.6459074

Mother	Offspring	Excluded Father \(A_xA_y\)	Probability of Exclusion
\(A_iA_i\;(p_i^2)\)	\(A_iA_i\;(p_i)\)	\(x,y\ne i\;\; (1-p_i)^2\)	\(p_i^3(1-p_i)^2\)
\(A_iA_i\;(p_i^2)\)	\(A_iA_j\;(p_j)\)	\(x,y\ne j\;\; (1-p_j)^2\)	\(p_i^2p_j(1-p_i)^2\)
\(A_iA_j\;(2p_ip_j)\)	\(A_iA_i\;(\frac{p_i}{2})\)	\(x,y \ne i\;\; (1-p_i)^2\)	\(p_i^2p_j(1-p_i)^2\)
\(A_iA_j\;(2p_ip_j)\)	\(A_iA_j\;(\frac{p_i+p_j}{2})\)	\(x,y \ne i,j\;\; (1-p_i-p_j)^2\)	\(p_ip_j(p_i+p_j)(1-p_i-p_j)^2\)
\(A_iA_j\;(2p_ip_j)\)	\(A_iA_k\;(\frac{p_k}{2})\)	\(x,y \ne k\;\; (1-p_k)^2\)	\(p_ip_jp_k(1-p_k)^2\)
\(A_iA_j\;(2p_ip_j)\)	\(A_jA_k\;(\frac{p_k}{2})\)	\(x,y \ne k\;\; (1-p_k)^2\)	\(p_ip_jp_k(1-p_k)^2\)
\(A_iA_j\;(2p_ip_j)\)	\(A_jA_j\;(\frac{p_j}{2})\)	\(x,y \ne j\;\; (1-p_j)^2\)	\(p_ip_j^2(1-p_j)^2\)

15.3 Paternity Exclusion

Likelihood Ratios

A likelihood ratio is given by:

\[ LR = \frac{H_P}{H_D} \]

where the \(H_X\) values are the hypotheses probabilities.

Nomenclature For Parentage

Individual	Identifier	Genotype
Female Parent	\(FP_i\)	\(\alpha_i\)
Putative Male Parent	\(MP_j\)	\(\beta_j\)
Offspring	\(O_k\)	\(\gamma_k\)

\(\;\)

Paternal Probability The posterior odds of paternity versus non-paternity given the totality of genetic information.

Likelihood Ratios | Genetic Equivalences

The likelihood of one hypothesis, \(H_1\) relative to another \(H_2\) is:

\[ L(H_1,H_2|D) = \frac{P(D|H_1)}{P(D|H_2)} \]

where

\[ P(D|H) = T(\gamma | \alpha, \beta)P(\alpha)P(\beta) \]

Assuming \(H_1:\) states that \(\beta\) is the real father of \(\gamma\) on \(\alpha\) and \(H_2:\) states that he is just a random individual in the population is:

\[ L(H_1,H_2|\alpha,\beta,\gamma) = \frac{P(D|\alpha,\beta,\gamma)}{P(D|\alpha,\gamma)} \]

which can be simplified to:

\[ \lambda_j = \frac{P(\alpha_i,\beta_j,\gamma_k|\mathrm{paternity})}{P(\alpha_i,\beta_j,\gamma_k|\mathrm{non-paternity})} \\ = \frac{P(\beta_j)P(\alpha_i)T(\gamma_k|\alpha_i,\beta_j)}{P(\beta_j)P(\alpha_i)T(\gamma_k|\alpha_i)} \\ = \frac{T(\gamma_k|\alpha_i,\beta_j)}{T(\gamma_k|\alpha_i)} \]

where \(T(X|Y)\) is the Mendelian transition probability of offspring \(X\) given parent \(Y\).

Assumptions in Model of Paternity Likelihood

The basic paternity exclusion model assumes:

Completely random mating (can be modified by changin priors)
Independent assortment of alleles

Likelihood Example

Consider the maternal individual whose genotypes are: \[ FP = \{AA,\;Bb,\;CC,\;Dd\} \]

Whose \(i^{th}\) offspring has the genotypes:

\[ O_i = \{AA,\;BB,\;Cc,\;dd\} \]

Likelihood Example | \(T(O|FP)\)

The transition probability, \(T(O|FP)\), is then:

Individual	Locus1	Locus2	Locus3	Locus4
\(FP\)	\(AA\)	\(Bb\)	\(CC\)	\(Dd\)
\(O_i\)	\(AA\)	\(BB\)	\(Cc\)	\(dd\)

\(T(O|FP) = 1*0.5*1*0.5 = 0.25\)

Likelihood Example | Putative Male Parents

Individual	Locus1	Locus2	Locus3	Locus4
\(MP_1\)	\(Aa\)	\(BB\)	\(cc\)	\(Dd\)
\(MP_2\)	\(AA\)	\(BB\)	\(Cc\)	\(dd\)

\(\;\)

Which one of the potential fathers is the most likely parent?

Likelihood Example | First Putative Father

Individual	Locus1	Locus2	Locus3	Locus4
\(FP\)	\(AA\)	\(Bb\)	\(CC\)	\(Dd\)
\(MP_1\)	\(Aa\)	\(BB\)	\(cc\)	\(Dd\)
\(O_i\)	\(AA\)	\(BB\)	\(Cc\)	\(dd\)
\(T(O\|FP,MP)\)	0.5	0.5	1.0	0.25

\[ T(O_1|FP,MP_1) = 0.5 * 0.5 * 1.0 * 0.25 = 0.0625 \]

And

\[ \lambda_1 = \frac{T(O_i|FP,MP_1)}{T(O_i|FP)} = \frac{0.0625}{0.25} = 0.25 \]

Likelihood Example | Second Putative Father

Individual	Locus1	Locus2	Locus3	Locus4
\(FP\)	\(AA\)	\(Bb\)	\(CC\)	\(Dd\)
\(MP_2\)	\(AA\)	\(BB\)	\(Cc\)	\(dd\)
\(O_i\)	\(AA\)	\(BB\)	\(Cc\)	\(dd\)
\(T(O\|FP,MP)\)	1.0	0.5	0.5	0.5

\[ T(O_1|FP,MP_2) = 0.5 * 0.5 * 1.0 * 0.25 = 0.125 \]

And

\[ \lambda_2 = \frac{T(O_i|FP,MP_1)}{T(O_i|FP)} = \frac{0.125}{0.25} = 0.5 \]

Likelihood Example | Interpretation of Results {.build}

Most likely parent is \(MP_2\) because \(\lambda_2 = 0.5 > \lambda_1 = 0.25\).

\(\;\)

Does this mean that \(MP_2\) is the real parent?

Likelihood Example | In Class Exercise - Whose the daddies?

Individual	Locus 1	Locus 2	Locus 3
Mother	\(A_1A_1\)	\(B_1B_3\)	\(C_1C_1\)
Offspring 1	\(A_1A_2\)	\(B_1B_3\)	\(C_1C_2\)
Offspring 2	\(A_1A_1\)	\(B_3B_3\)	\(C_1C_1\)
Offspring 3	\(A_1A_1\)	\(B_1B_1\)	\(C_1C_1\)
Dad 1	\(A_1A_2\)	\(B_2B_3\)	\(C_1C_1\)
Dad 2	\(A_2A_2\)	\(B_1B_1\)	\(C_1C_2\)
Dad 3	\(A_1A_1\)	\(B_2B_3\)	\(C_1C_2\)
Dad 4	\(A_1A_1\)	\(B_1B_1\)	\(C_2C_2\)

library(gstudio)
loci <- c("Locus-A","Locus-B","Locus-C","Locus-D")
freqs <- data.frame(Locus = rep(loci, each = 4), 
                    Allele = rep(LETTERS[1:4], times = 4), 
                    Frequency = 0.25)
freqs

##      Locus Allele Frequency
## 1  Locus-A      A      0.25
## 2  Locus-A      B      0.25
## 3  Locus-A      C      0.25
## 4  Locus-A      D      0.25
## 5  Locus-B      A      0.25
## 6  Locus-B      B      0.25
## 7  Locus-B      C      0.25
## 8  Locus-B      D      0.25
## 9  Locus-C      A      0.25
## 10 Locus-C      B      0.25
## 11 Locus-C      C      0.25
## 12 Locus-C      D      0.25
## 13 Locus-D      A      0.25
## 14 Locus-D      B      0.25
## 15 Locus-D      C      0.25
## 16 Locus-D      D      0.25

adults <- make_population( freqs, N=100 )
adults$OffID <- 0
adults <- adults[ , c(1,6,2:5)]
adults[1:5,]

##   ID OffID Locus-A Locus-B Locus-C Locus-D
## 1  1     0     B:D     A:D     A:D     A:B
## 2  2     0     A:B     D:D     B:B     A:D
## 3  3     0     A:B     B:C     C:C     A:C
## 4  4     0     C:D     A:C     A:B     A:C
## 5  5     0     B:D     A:D     D:D     B:B

offs <- data.frame()
mom <- adults[1,]
for( i in 1:20){
  dad_id <- runif( 1, min=2, max=100)
  dad <- adults[dad_id,]
  off <- mate( mom, dad, N=1 )
  offs <- rbind( offs, off )
}
offs$OffID <- 1:20
offs[1:5,]

##   ID OffID Locus-A Locus-B Locus-C Locus-D
## 1  1     1     B:D     A:D     C:D     B:C
## 2  1     2     C:D     A:D     A:D     B:D
## 3  1     3     B:D     A:D     A:B     B:D
## 4  1     4     A:B     A:C     A:D     B:B
## 5  1     5     D:D     A:D     A:B     A:A

data <- rbind( adults, offs )
data <- data[ order(data$ID,data$OffID),]
rownames(data) <- 1:nrow(data)
data[1:10,]

##    ID OffID Locus-A Locus-B Locus-C Locus-D
## 1   1     0     B:D     A:D     A:D     A:B
## 2   1     1     B:D     A:D     C:D     B:C
## 3   1     2     C:D     A:D     A:D     B:D
## 4   1     3     B:D     A:D     A:B     B:D
## 5   1     4     A:B     A:C     A:D     B:B
## 6   1     5     D:D     A:D     A:B     A:A
## 7   1     6     B:D     B:D     A:B     A:D
## 8   1     7     A:D     A:B     C:D     B:C
## 9   1     8     A:D     A:C     A:C     A:D
## 10  1     9     A:D     A:D     A:B     A:C

f <- frequencies( data[ data$OffID==0,] )
excl <- exclusion_probability( f )
excl

##     Locus     Pexcl  PexclMax  Fraction
## 1 Locus-A 0.5038453 0.5039062 0.9998791
## 2 Locus-B 0.5037248 0.5039062 0.9996400
## 3 Locus-C 0.5036015 0.5039062 0.9993953
## 4 Locus-D 0.5038453 0.5039062 0.9998791

p <- excl$Pexcl
excl_multilocus <- 1 - prod( 1-p )
excl_multilocus

## [1] 0.9393561

family <- data[ data$ID==1, ]
minus_mom( family )

##    ID OffID Locus-A Locus-B Locus-C Locus-D
## 2   1     1     B:D     A:D       C       C
## 3   1     2       C     A:D     A:D       D
## 4   1     3     B:D     A:D       B       D
## 5   1     4       A       C     A:D       B
## 6   1     5       D     A:D       B       A
## 7   1     6     B:D       B       B       D
## 8   1     7       A       B       C       C
## 9   1     8       A       C       C       D
## 10  1     9       A     A:D       B       C
## 11  1    10     B:D       D     A:D     A:B
## 12  1    11       A       A       B       D
## 13  1    12       C       A     A:D       D
## 14  1    13       D     A:D     A:D       C
## 15  1    14       D       C       B       B
## 16  1    15       D       B       C       A
## 17  1    16       D     A:D       B       C
## 18  1    17     B:D       D       A       D
## 19  1    18       B       C       D     A:B
## 20  1    19       D       A       B       C
## 21  1    20       A       A       B       D

dads <- adults[2:100,]
mom <- adults[1,]
off <- offs[1,]
for( i in 1:nrow(dads)){
  dad <- dads[i,]
  T <- transition_probability(off,mom,dad)
  if( T > 0 )
    cat("Father",i,"may be the real father (T =",T,")\n")
}

## Father 16 may be the real father (T = 0.00390625 )
## Father 19 may be the real father (T = 0.015625 )
## Father 22 may be the real father (T = 0.0078125 )
## Father 25 may be the real father (T = 0.0078125 )
## Father 28 may be the real father (T = 0.00390625 )
## Father 33 may be the real father (T = 0.00390625 )
## Father 49 may be the real father (T = 0.0078125 )
## Father 50 may be the real father (T = 0.015625 )
## Father 73 may be the real father (T = 0.03125 )
## Father 76 may be the real father (T = 0.00390625 )
## Father 82 may be the real father (T = 0.00390625 )

15.4 Fractional Paternity

In cases where we have more than one putative father, we may want to get an idea of the relative strength of our inferences by comparing the likelihood ratios for all dads.

We may use arbitrary cut-offs, or
We may use all non-excluded dads, but weighted by their fractional contributions

Conditional Probability

Problem: We have several putative fathers (\(MP_i, MP_j, MP_k, ... , MP_m\)) have been found to have non-zero likelihoods of paternity.

\(\;\)

Question: What is the relative likelihood of paternity given these putative fathers?

Conditional Probability

Conditional probability determines the likelihood of an event (paternal likelihood) given that some other event has already happened (not excluded as a potential father).

\[ P(MP=j^*|FP=i,O=k) = \frac{P(O=k|FP=i,MP=j^*)P(MP=j^*|FP=i)}{\sum_{\forall j}P(O=k|FP=i,MP=j^*)P(MP=j^*|FP=i)} \]

If we can assume that \(P(MP=j|FP=i) = c\) (e.g., the frequencies of the female and male parents are constant with respect to the individual offspring being considered) then,

\[ P(MP=j^*|FP=i,O=k) = \frac{T(\gamma_k|\alpha_i,\beta_j^*)}{\sum_{\forall k}T(\gamma_k|\alpha_i,\beta_k^*)} \]

Fractional Paternity

Some things to consider when using fractional analyses for paternity.

Not usually used in human studies.
Can be considered a prior probability of paternity.
Can include ecological, spatial, evolutionary components such as differential attractiveness, pollen fertility, output, etc.
Possible tautology

Every potential father is assigned paternity, the fraction of \(X_{ik}\) on a particular \(MP_j\) is proportional to the likelihood ratio.

frac <- paternity( offs, mom, dads )
summary(frac)

##      MomID       OffID            DadID             Fij         
##  Min.   :1   Min.   : 1.000   Min.   :  2.00   Min.   :0.02222  
##  1st Qu.:1   1st Qu.: 4.000   1st Qu.: 21.00   1st Qu.:0.06667  
##  Median :1   Median :10.000   Median : 45.00   Median :0.09091  
##  Mean   :1   Mean   : 9.369   Mean   : 49.58   Mean   :0.12500  
##  3rd Qu.:1   3rd Qu.:14.000   3rd Qu.: 79.00   3rd Qu.:0.16667  
##  Max.   :1   Max.   :20.000   Max.   :100.00   Max.   :0.80000

frac[1:10,]

##    MomID OffID DadID        Fij
## 1      1     1    74 0.29629630
## 2      1     1    20 0.14814815
## 3      1     1    51 0.14814815
## 4      1     1    23 0.07407407
## 5      1     1    26 0.07407407
## 6      1     1    50 0.07407407
## 7      1     1    17 0.03703704
## 8      1     1    29 0.03703704
## 9      1     1    34 0.03703704
## 10     1     1    77 0.03703704

t <- table(frac$OffID)
t

## 
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
## 11 14 13 10  8  6  5  7  5 15  8  9  8  2  2  8  9  8  4  8

N_pexcl <- nrow(dads)*(1-excl_multilocus)
fit <- t.test(as.numeric(t),mu = N_pexcl)
fit

## 
##  One Sample t-test
## 
## data:  as.numeric(t)
## t = 2.5331, df = 19, p-value = 0.02027
## alternative hypothesis: true mean is not equal to 6.003747
## 95 percent confidence interval:
##  6.350553 9.649447
## sample estimates:
## mean of x 
##         8

Maternity Analysis

Putative father identified by mother unknown.

\[ \lambda_i = \frac{P(\gamma|\beta)P(\beta)}{P(\gamma)P{\beta})} \\ = \frac{P(\gamma|\beta)}{P(\gamma)} \]

where \(P(\gamma)\) is the frequency of the offspring genotype in the population. All other things are the same.

Cryptic Gene Flow

Consider the case where:

You have identified a set of offspring collected from mothers.
Identified a set of fathers that are probabilistically sires of the offspring.

15.5 Dispersal Kernels

Estimating the Disperal Distribution

Once a collection of paternity estimates have been determined, you can use them to estimate a dispersal kernel, describing the probability of paternity as a function of distance from the maternal individual.

Dispersal Kernels | Distributions

The form of the distribution is critical for estimation. It determines:

The shape of the distribution

The variance of the distribution

Quantitative estimates and hypotheses you get from the data

Example Kernel Distribution Families

Normal Family

\[ p(a|x,y) = \frac{1}{\pi a^2}exp\left[-\left( \frac{r}{a} \right)^2\right] \]

where \(r = \sqrt{ x^2 + y^2}\) and \(a = \sigma \sqrt{2}\).

\(\;\)

This produces a thin tailed distribution.

Example Kernel Distribution Families

Exponential Family

\[ p(a,b|x,y) = \frac{b}{2\pi a^2 \Gamma(2/b)}exp\left[ -\left( \frac{r}{a} \right)^b \right] \]

where \(\Gamma(a,b)\) is the gamma function and \(b\) is a ‘shape’ parameter.

When \(b=1\) This is the exponential distribution.
When \(b=2\) this is the normal function.
When \(b<1\) this is a fat-tailed distribution.

Example Kernel Distribution Families

Other distributions you may run across include:

The Geometric distribution,
The Weibull family of distributions,
The 2Dt family of distributions.

r1 <- abs(rnorm(10000))
r2 <- rexp(10000)
df <- data.frame( Distribution=c(rep(c("Normal","Exponential"),each=10000)), Value=c(r1,r2))
library(ggplot2)
ggplot(df,aes(x=Value,fill=Distribution)) + geom_density(alpha=0.75) + theme_bw() + ylab("Frequency")

Concerns with kernel estimation

The following are some assumptions that are inherent in the use of dispersal kernels for estimating connectivity.

All functions are continuous,
All functions assume isotropy in dispersal,
All functions explicitly assume homogeneity of the dispersal matrix.

Skills

In this lecture we covered some rather simple parent/offspring relationships and how we can analyze them. Specifically, you should be comfortable with:

Understanding the qualities of loci that make for more powerful parentage analyses.
Be able to estimate single and multilocus exclusion probabilities and understand what they mean.
Estimate likelihood ratios for paternity given Mother, Offspring, and Putative Male Parent.
Use fractional paternity and understand conditional probability and how it applies to parentage.
Understand dispersal kernel estimation.