library(gstudio)
data(arapat)6 Population Structure
This chapter covers the estimation of genetic structure statistics — measures of how genetic variation is partitioned among populations.
6.1 The genetic_structure() Wrapper
The genetic_structure() function dispatches to specific structure estimators based on the mode parameter:
| Mode | Statistic | Description |
|---|---|---|
"Gst" |
Nei’s Gst | Proportion of total diversity among populations |
"Gst_prime" |
Hedrick’s G’st | Gst corrected for marker diversity |
"Dest" |
Jost’s D | True differentiation measure |
"Fst" |
Wright’s Fst | Fixation index |
6.2 Basic Usage
Estimate Gst across species:
genetic_structure(arapat, stratum = "Species", mode = "Gst") Locus Gst Hs Ht P
1 LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2 WNT 0.6028903 0.2595040 0.6534819 0.6028903
3 EN 0.1823664 0.3678145 0.4498525 0.1823664
4 EF 0.3653913 0.2772276 0.4368481 0.3653913
5 ZMP 0.4961044 0.1712607 0.3398734 0.4961044
6 AML 0.2723689 0.6045761 0.8308826 0.2723689
7 ATPS 0.5856500 0.2983785 0.7201122 0.5856500
8 MP20 0.2358115 0.6267430 0.8201419 0.2358115
9 Multilocus 0.3835972 2.9284713 4.7509048 NA
The output provides per-locus estimates plus a multilocus estimate (last row). The multilocus estimate follows the approach of Culley et al. (2001).
6.3 Hedrick’s Correction
For highly polymorphic markers, Gst is constrained to small values. Hedrick’s correction accounts for this:
genetic_structure(arapat, stratum = "Species", mode = "Gst_prime") Locus Gst Hs Ht P
1 LTRS 0.6067798 0.3229669 0.4997122 0
2 WNT 0.9198112 0.2595040 0.6534819 0
3 EN 0.3415214 0.3678145 0.4498525 0
4 EF 0.5756163 0.2772276 0.4368481 0
5 ZMP 0.6498859 0.1712607 0.3398734 0
6 AML 0.8970188 0.6045761 0.8308826 0
7 ATPS 0.9592390 0.2983785 0.7201122 0
8 MP20 0.8297450 0.6267430 0.8201419 0
9 Multilocus 0.6503121 0.3110927 0.5406400 NA
6.4 Jost’s D
An alternative differentiation measure that is independent of within-population diversity:
genetic_structure(arapat, stratum = "Species", mode = "Dest") Locus Dest Hs Ht P
1 LTRS 0.17403908 0.3229669 0.4997122 0.17403908
2 WNT 0.35469734 0.2595040 0.6534819 0.35469734
3 EN 0.08651255 0.3678145 0.4498525 0.08651255
4 EF 0.14722985 0.2772276 0.4368481 0.14722985
5 ZMP 0.13563788 0.1712607 0.3398734 0.13563788
6 AML 0.38154249 0.6045761 0.8308826 0.38154249
7 ATPS 0.40072291 0.2983785 0.7201122 0.40072291
8 MP20 0.34542581 0.6267430 0.8201419 0.34542581
9 Multilocus 0.18912423 0.3660589 0.5938631 NA
6.5 Permutation Testing
Use the nperm parameter to test the null hypothesis that the structure parameter equals zero:
genetic_structure(arapat, stratum = "Species",
mode = "Gst", nperm = 99) Locus Gst Hs Ht P
1 LTRS 0.3536942 0.3229669 0.4997122 0
2 WNT 0.6028903 0.2595040 0.6534819 0
3 EN 0.1823664 0.3678145 0.4498525 0
4 EF 0.3653913 0.2772276 0.4368481 0
5 ZMP 0.4961044 0.1712607 0.3398734 0
6 AML 0.2723689 0.6045761 0.8308826 0
7 ATPS 0.5856500 0.2983785 0.7201122 0
8 MP20 0.2358115 0.6267430 0.8201419 0
9 Multilocus 0.3835972 2.9284713 4.7509048 NA
The resulting P column indicates the probability of observing a value as extreme as the estimate under random permutation of individuals among strata.
6.6 Pairwise Structure
Set pairwise = TRUE to compute structure between all pairs of strata:
pw <- genetic_structure(arapat, stratum = "Species",
mode = "Gst", pairwise = TRUE)
pw Cape Mainland Peninsula
Cape NA 0.2955307 0.3609387
Mainland 0.2955307 NA 0.2024045
Peninsula 0.3609387 0.2024045 NA
This returns a symmetric matrix of pairwise multilocus estimates.
6.7 Restricting to Specific Loci
Use the locus parameter to restrict the analysis:
genetic_structure(arapat, stratum = "Species",
mode = "Gst", locus = "LTRS") Locus Gst Hs Ht P
1 LTRS 0.3536942 0.3229669 0.4997122 0.3536942
genetic_structure(arapat, stratum = "Species",
mode = "Gst", locus = c("LTRS", "WNT")) Locus Gst Hs Ht P
1 LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2 WNT 0.6028903 0.2595040 0.6534819 0.6028903
3 Multilocus 0.4949065 0.5824709 1.1531942 NA
6.8 Multilocus Assignment
The multilocus_assignment() function estimates the probability that an individual originates from each population based on multilocus genotype frequencies:
# Compute stratum-level frequencies
freqs <- frequencies(arapat, stratum = "Species")
# Assign the first individual
multilocus_assignment(arapat[1, ], freqs)Warning in multilocus_assignment(arapat[1, ], freqs): This individual has
missing genotypes, cannot compre assignment probability to individuals who do
not have missing data.
Stratum Probability Posterior
3 Peninsula 3.205128e-07 1
The Posterior column gives the relative probability of assignment to each stratum given the individual’s multilocus genotype.
6.9 Individual Estimator Functions
Each structure metric has a standalone function:
Gst(arapat, stratum = "Species") Locus Gst Hs Ht P
1 LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2 WNT 0.6028903 0.2595040 0.6534819 0.6028903
3 EN 0.1823664 0.3678145 0.4498525 0.1823664
4 EF 0.3653913 0.2772276 0.4368481 0.3653913
5 ZMP 0.4961044 0.1712607 0.3398734 0.4961044
6 AML 0.2723689 0.6045761 0.8308826 0.2723689
7 ATPS 0.5856500 0.2983785 0.7201122 0.5856500
8 MP20 0.2358115 0.6267430 0.8201419 0.2358115
9 Multilocus 0.3835972 2.9284713 4.7509048 NA
Gst_prime(arapat, stratum = "Species") Locus Gst Hs Ht P
1 LTRS 0.6067798 0.3229669 0.4997122 0
2 WNT 0.9198112 0.2595040 0.6534819 0
3 EN 0.3415214 0.3678145 0.4498525 0
4 EF 0.5756163 0.2772276 0.4368481 0
5 ZMP 0.6498859 0.1712607 0.3398734 0
6 AML 0.8970188 0.6045761 0.8308826 0
7 ATPS 0.9592390 0.2983785 0.7201122 0
8 MP20 0.8297450 0.6267430 0.8201419 0
9 Multilocus 0.6503121 0.3110927 0.5406400 NA
Dest(arapat, stratum = "Species") Locus Dest Hs Ht P
1 LTRS 0.17500476 0.3229669 0.4997122 0.17500476
2 WNT 0.35493302 0.2595040 0.6534819 0.35493302
3 EN 0.08807037 0.3678145 0.4498525 0.08807037
4 EF 0.14808390 0.2772276 0.4368481 0.14808390
5 ZMP 0.13611660 0.1712607 0.3398734 0.13611660
6 AML 0.38225685 0.6045761 0.8308826 0.38225685
7 ATPS 0.40086216 0.2983785 0.7201122 0.40086216
8 MP20 0.34665943 0.6267430 0.8201419 0.34665943
9 Multilocus 0.19056225 0.3660589 0.5938631 NA