5  Genetic Diversity

This chapter covers estimation of genetic diversity statistics using the genetic_diversity() wrapper and its underlying per-locus functions.

library(gstudio)
data(arapat)

5.1 The genetic_diversity() Wrapper

The genetic_diversity() function is the main entry point for diversity estimation. It dispatches to specific estimators based on the mode parameter:

Mode Statistic Description
"A" Allelic richness Number of alleles
"Ae" Effective alleles \(1 / \sum p_i^2\)
"A95" A (5% threshold) Alleles with frequency >= 0.05
"He" Expected heterozygosity \(1 - \sum p_i^2\)
"Ho" Observed heterozygosity Proportion of heterozygotes
"Hes" Corrected He Nei’s small-sample corrected He
"Hos" Corrected Ho Nei’s small-sample corrected Ho
"Fis" Inbreeding coefficient \(1 - H_o/H_e\)
"Pe" Polymorphic index Whether the locus is polymorphic

5.2 Global (Unstratified) Estimates

Without a stratum, diversity is estimated across all individuals:

genetic_diversity(arapat, mode = "A")
  Locus  A
1  LTRS  2
2   WNT  5
3    EN  5
4    EF  2
5   ZMP  2
6   AML 13
7  ATPS 10
8  MP20 19
genetic_diversity(arapat, mode = "Ae")
  Locus       Ae
1  LTRS 1.995623
2   WNT 2.880450
3    EN 1.814656
4    EF 1.773533
5   ZMP 1.513877
6   AML 5.860583
7  ATPS 3.563347
8  MP20 5.511837
genetic_diversity(arapat, mode = "He")
  Locus        He
1  LTRS 0.4989034
2   WNT 0.6528320
3    EN 0.4489313
4    EF 0.4361538
5   ZMP 0.3394444
6   AML 0.8293685
7  ATPS 0.7193649
8  MP20 0.8185723
genetic_diversity(arapat, mode = "Ho")
  Locus         Ho
1  LTRS 0.23691460
2   WNT 0.27840909
3    EN 0.20555556
4    EF 0.14404432
5   ZMP 0.15454545
6   AML 0.37058824
7  ATPS 0.08539945
8  MP20 0.44692737

5.3 Per-Stratum Estimates

Provide a stratum argument to compute diversity within each population:

genetic_diversity(arapat, stratum = "Species", mode = "He")
     Stratum Locus         He
2       Cape  LTRS 0.14720000
3       Cape   WNT 0.23374726
4       Cape    EN 0.42000000
5       Cape    EF 0.06444444
6       Cape   ZMP 0.00000000
7       Cape   AML 0.53312835
8       Cape  ATPS 0.11546667
9       Cape  MP20 0.36791111
10  Mainland  LTRS 0.38850309
11  Mainland   WNT 0.03277778
12  Mainland    EN 0.50285714
13  Mainland    EF 0.27119377
14  Mainland   ZMP 0.09500000
15  Mainland   AML 0.53854875
16  Mainland  ATPS 0.25038580
17  Mainland  MP20 0.80670340
18 Peninsula  LTRS 0.42591805
19 Peninsula   WNT 0.50613781
20 Peninsula    EN 0.17229600
21 Peninsula    EF 0.48979592
22 Peninsula   ZMP 0.41492187
23 Peninsula   AML 0.72842417
24 Peninsula  ATPS 0.52255763
25 Peninsula  MP20 0.69148800
genetic_diversity(arapat, stratum = "Species", mode = "Ae")
     Stratum Locus       Ae
2       Cape  LTRS 1.172608
3       Cape   WNT 1.305052
4       Cape    EN 1.724138
5       Cape    EF 1.068884
6       Cape   ZMP 1.000000
7       Cape   AML 2.141916
8       Cape  ATPS 1.130540
9       Cape  MP20 1.582056
10  Mainland  LTRS 1.635331
11  Mainland   WNT 1.033889
12  Mainland    EN 2.011494
13  Mainland    EF 1.372107
14  Mainland   ZMP 1.104972
15  Mainland   AML 2.167076
16  Mainland  ATPS 1.334020
17  Mainland  MP20 5.173397
18 Peninsula  LTRS 1.741912
19 Peninsula   WNT 2.024856
20 Peninsula    EN 1.208161
21 Peninsula    EF 1.960000
22 Peninsula   ZMP 1.709173
23 Peninsula   AML 3.682213
24 Peninsula  ATPS 2.094494
25 Peninsula  MP20 3.241365

5.4 Individual Estimator Functions

Each diversity metric has a standalone function that operates on a single locus vector.

5.4.1 Allelic Richness

A(arapat$LTRS)
A 
2 

5.4.2 Effective Number of Alleles

Ae(arapat$LTRS)
      Ae 
1.995623 

5.4.3 Expected Heterozygosity

He(arapat$LTRS)
       He 
0.4989034 

5.4.4 Observed Heterozygosity

Ho(arapat$LTRS)
       Ho 
0.2369146 

5.4.5 Inbreeding Coefficient

Fis(arapat$LTRS)
      Fis 
0.5251293 

5.4.6 Exclusion Probability

Pe(arapat$LTRS)
[1] 0.4989034

5.5 Corrected Heterozygosity (Nei’s)

The Hes and Hos modes require a stratum because they apply a small-sample correction per subpopulation:

genetic_diversity(arapat, stratum = "Species", mode = "Hes")
       Locus       Hes
1       LTRS 0.3233431
2        WNT 0.2606681
3         EN 0.3687470
4         EF 0.2781409
5        ZMP 0.1731159
6        AML 0.6108282
7       ATPS 0.2999070
8       MP20 0.6284672
9 Multilocus 0.3679022

5.6 Multilocus Diversity

The multilocus_diversity() function computes a summary across all loci:

multilocus_diversity(arapat)
[1] 0.8374656

5.7 Selecting Specific Loci

You can restrict the analysis to a subset of loci:

genetic_diversity(arapat, loci = c("LTRS", "WNT"), mode = "He")
  Locus        He
1  LTRS 0.4989034
2   WNT 0.6528320

5.8 Comparing Populations

A common workflow is to estimate diversity per population, then compare:

he_by_pop <- genetic_diversity(arapat, stratum = "Species", mode = "He")
he_by_pop
     Stratum Locus         He
2       Cape  LTRS 0.14720000
3       Cape   WNT 0.23374726
4       Cape    EN 0.42000000
5       Cape    EF 0.06444444
6       Cape   ZMP 0.00000000
7       Cape   AML 0.53312835
8       Cape  ATPS 0.11546667
9       Cape  MP20 0.36791111
10  Mainland  LTRS 0.38850309
11  Mainland   WNT 0.03277778
12  Mainland    EN 0.50285714
13  Mainland    EF 0.27119377
14  Mainland   ZMP 0.09500000
15  Mainland   AML 0.53854875
16  Mainland  ATPS 0.25038580
17  Mainland  MP20 0.80670340
18 Peninsula  LTRS 0.42591805
19 Peninsula   WNT 0.50613781
20 Peninsula    EN 0.17229600
21 Peninsula    EF 0.48979592
22 Peninsula   ZMP 0.41492187
23 Peninsula   AML 0.72842417
24 Peninsula  ATPS 0.52255763
25 Peninsula  MP20 0.69148800
library(ggplot2)
ggplot(he_by_pop, aes(x = Stratum, y = He, fill = Stratum)) +
  geom_boxplot() +
  theme_minimal() +
  labs(title = "Expected Heterozygosity by Species",
       y = "He") +
  theme(legend.position = "none")