library(gstudio)
data(arapat)4 Allele Frequencies & HWE
This chapter covers estimating allele and genotype frequencies, testing for Hardy-Weinberg Equilibrium, and rarefaction analysis.
4.1 Allele Frequencies
The frequencies() function is an S3 generic that works on both individual locus vectors and entire data.frames.
4.1.1 Single Locus
frequencies(arapat$LTRS) Allele Frequency
1 01 0.523416
2 02 0.476584
4.1.2 Entire Data Frame
When applied to a data.frame, frequencies() returns frequencies for all locus columns:
freqs <- frequencies(arapat)
head(freqs, 12) Locus Allele Frequency
1 LTRS 01 0.523415978
2 LTRS 02 0.476584022
3 WNT 01 0.357954545
4 WNT 02 0.181818182
5 WNT 03 0.430397727
6 WNT 04 0.026988636
7 WNT 05 0.002840909
8 EN 01 0.715277778
9 EN 02 0.180555556
10 EN 03 0.080555556
11 EN 04 0.018055556
12 EN 05 0.005555556
4.1.3 By Stratum
Add a stratum argument to partition frequencies by population:
freqs_pop <- frequencies(arapat, stratum = "Species")
head(freqs_pop, 12) Stratum Locus Allele Frequency
1 Cape LTRS 01 0.08000000
2 Cape LTRS 02 0.92000000
3 Cape WNT 01 0.13513514
4 Cape WNT 02 0.86486486
5 Cape EN 01 0.30000000
6 Cape EN 02 0.70000000
7 Cape EF 01 0.96666667
8 Cape EF 02 0.03333333
9 Cape ZMP 02 1.00000000
10 Cape AML 03 0.09154930
11 Cape AML 04 0.60563380
12 Cape AML 05 0.30281690
4.1.4 Selecting Specific Loci
freqs_sub <- frequencies(arapat, loci = c("LTRS", "WNT"))
freqs_sub Locus Allele Frequency
1 LTRS 01 0.523415978
2 LTRS 02 0.476584022
3 WNT 01 0.357954545
4 WNT 02 0.181818182
5 WNT 03 0.430397727
6 WNT 04 0.026988636
7 WNT 05 0.002840909
4.2 Frequency Matrix
The frequency_matrix() function produces a wide-format matrix of allele frequencies:
fm <- frequency_matrix(arapat, stratum = "Species")
fm[, 1:8] Stratum AML-01 AML-02 AML-03 AML-04 AML-05 AML-06
1 Cape 0.000000000 0.000000000 0.0915493 0.6056338 0.30281690 0.00000000
2 Mainland 0.000000000 0.000000000 0.0000000 0.0000000 0.00000000 0.00000000
3 Peninsula 0.002016129 0.002016129 0.0000000 0.0000000 0.01008065 0.09879032
AML-07
1 0.0000000
2 0.0000000
3 0.3225806
4.3 Genotype Counts and Frequencies
4.3.1 Genotype Counts per Stratum
The genotype_counts() function summarizes non-missing genotype counts per locus, optionally by stratum:
genotype_counts(arapat) Stratum N LTRS WNT EN EF ZMP AML ATPS MP20
ALL ALL 363 363 352 360 361 330 340 363 358
genotype_counts(arapat, stratum = "Species") Stratum N LTRS WNT EN EF ZMP AML ATPS MP20
Cape Cape 75 75 74 75 75 70 71 75 75
Mainland Mainland 36 36 30 35 34 20 21 36 33
Peninsula Peninsula 252 252 248 250 252 240 248 252 250
4.3.2 Genotype Frequencies
The genotype_frequencies() function returns observed and expected genotype counts for a locus vector:
genotype_frequencies(arapat$LTRS) Genotype Observed Expected
1 01:01 147 99.44904
2 01:02 86 181.10193
3 02:02 130 82.44904
4.4 Hardy-Weinberg Equilibrium
The hwe() function tests for HWE using a chi-square approximation:
mainland <- arapat[arapat$Species == "Mainland", ]
hwe(mainland)Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for LTRS this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at LTRS consider
collapsing alleles.
Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for WNT this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at WNT consider
collapsing alleles.
Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for EN this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at EN consider
collapsing alleles.
Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for EF this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at EF consider
collapsing alleles.
Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for ZMP this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at ZMP consider
collapsing alleles.
Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for AML this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at AML consider
collapsing alleles.
Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for ATPS this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at ATPS consider
collapsing alleles.
Warning in genotype_frequencies(x[[locus]]): Some genotype expectations are <
5, a continuity correction should be applied. See ?hwe
Warning in hwe(mainland): Under 50 samples for MP20 this may not be a good
approximation.
Warning in hwe(mainland): Fewer than 5 genotypes expected at MP20 consider
collapsing alleles.
Locus Chi df Prob
1 LTRS 0.9082806 1 3.405710e-01
2 WNT 29.2542373 1 6.347726e-08
3 EN 19.7817078 3 1.883729e-04
4 EF 0.3545808 1 5.515314e-01
5 ZMP 5.0927978 1 2.402540e-02
6 AML 36.8802692 6 1.858093e-06
7 ATPS 43.9698647 6 7.494505e-08
8 MP20 161.5565210 45 4.773959e-15
The output includes the chi-square statistic, degrees of freedom, and p-value for each locus. Significant p-values indicate departures from HWE expectations.
4.4.1 Interpreting Results
Departures from HWE can arise from:
- Non-random mating (inbreeding, assortative mating)
- Population substructure (Wahlund effect)
- Selection at or near the marker locus
- Small population size (genetic drift)
4.5 Rarefaction
The rarefaction() function subsamples the data repeatedly at a smaller sample size to estimate the distribution of a diversity statistic:
rare_vals <- rarefaction(arapat$LTRS, mode = "Ae",
size = 20, nperm = 199)
hist(rare_vals, main = "Rarefied Ae at LTRS",
xlab = "Effective Alleles", col = "steelblue")
This is useful for comparing diversity across populations with unequal sample sizes.
4.6 Optimal Sampling
The optimal_sampling() function estimates how many individuals are needed to capture a given proportion of allelic diversity:
optimal_sampling(arapat$LTRS, nrep = 99)