6  Population Structure

This chapter covers the estimation of genetic structure statistics — measures of how genetic variation is partitioned among populations.

library(gstudio)
data(arapat)

6.1 The genetic_structure() Wrapper

The genetic_structure() function dispatches to specific structure estimators based on the mode parameter:

Mode Statistic Description
"Gst" Nei’s Gst Proportion of total diversity among populations
"Gst_prime" Hedrick’s G’st Gst corrected for marker diversity
"Dest" Jost’s D True differentiation measure
"Fst" Wright’s Fst Fixation index

6.2 Basic Usage

Estimate Gst across species:

genetic_structure(arapat, stratum = "Species", mode = "Gst")
       Locus       Gst        Hs        Ht         P
1       LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2        WNT 0.6028903 0.2595040 0.6534819 0.6028903
3         EN 0.1823664 0.3678145 0.4498525 0.1823664
4         EF 0.3653913 0.2772276 0.4368481 0.3653913
5        ZMP 0.4961044 0.1712607 0.3398734 0.4961044
6        AML 0.2723689 0.6045761 0.8308826 0.2723689
7       ATPS 0.5856500 0.2983785 0.7201122 0.5856500
8       MP20 0.2358115 0.6267430 0.8201419 0.2358115
9 Multilocus 0.3835972 2.9284713 4.7509048        NA

The output provides per-locus estimates plus a multilocus estimate (last row). The multilocus estimate follows the approach of Culley et al. (2001).

6.3 Hedrick’s Correction

For highly polymorphic markers, Gst is constrained to small values. Hedrick’s correction accounts for this:

genetic_structure(arapat, stratum = "Species", mode = "Gst_prime")
       Locus       Gst        Hs        Ht  P
1       LTRS 0.6067798 0.3229669 0.4997122  0
2        WNT 0.9198112 0.2595040 0.6534819  0
3         EN 0.3415214 0.3678145 0.4498525  0
4         EF 0.5756163 0.2772276 0.4368481  0
5        ZMP 0.6498859 0.1712607 0.3398734  0
6        AML 0.8970188 0.6045761 0.8308826  0
7       ATPS 0.9592390 0.2983785 0.7201122  0
8       MP20 0.8297450 0.6267430 0.8201419  0
9 Multilocus 0.6503121 0.3110927 0.5406400 NA

6.4 Jost’s D

An alternative differentiation measure that is independent of within-population diversity:

genetic_structure(arapat, stratum = "Species", mode = "Dest")
       Locus       Dest        Hs        Ht          P
1       LTRS 0.17403908 0.3229669 0.4997122 0.17403908
2        WNT 0.35469734 0.2595040 0.6534819 0.35469734
3         EN 0.08651255 0.3678145 0.4498525 0.08651255
4         EF 0.14722985 0.2772276 0.4368481 0.14722985
5        ZMP 0.13563788 0.1712607 0.3398734 0.13563788
6        AML 0.38154249 0.6045761 0.8308826 0.38154249
7       ATPS 0.40072291 0.2983785 0.7201122 0.40072291
8       MP20 0.34542581 0.6267430 0.8201419 0.34542581
9 Multilocus 0.18912423 0.3660589 0.5938631         NA

6.5 Permutation Testing

Use the nperm parameter to test the null hypothesis that the structure parameter equals zero:

genetic_structure(arapat, stratum = "Species",
                  mode = "Gst", nperm = 99)
       Locus       Gst        Hs        Ht  P
1       LTRS 0.3536942 0.3229669 0.4997122  0
2        WNT 0.6028903 0.2595040 0.6534819  0
3         EN 0.1823664 0.3678145 0.4498525  0
4         EF 0.3653913 0.2772276 0.4368481  0
5        ZMP 0.4961044 0.1712607 0.3398734  0
6        AML 0.2723689 0.6045761 0.8308826  0
7       ATPS 0.5856500 0.2983785 0.7201122  0
8       MP20 0.2358115 0.6267430 0.8201419  0
9 Multilocus 0.3835972 2.9284713 4.7509048 NA

The resulting P column indicates the probability of observing a value as extreme as the estimate under random permutation of individuals among strata.

6.6 Pairwise Structure

Set pairwise = TRUE to compute structure between all pairs of strata:

pw <- genetic_structure(arapat, stratum = "Species",
                        mode = "Gst", pairwise = TRUE)
pw
               Cape  Mainland Peninsula
Cape             NA 0.2955307 0.3609387
Mainland  0.2955307        NA 0.2024045
Peninsula 0.3609387 0.2024045        NA

This returns a symmetric matrix of pairwise multilocus estimates.

6.7 Restricting to Specific Loci

Use the locus parameter to restrict the analysis:

genetic_structure(arapat, stratum = "Species",
                  mode = "Gst", locus = "LTRS")
  Locus       Gst        Hs        Ht         P
1  LTRS 0.3536942 0.3229669 0.4997122 0.3536942
genetic_structure(arapat, stratum = "Species",
                  mode = "Gst", locus = c("LTRS", "WNT"))
       Locus       Gst        Hs        Ht         P
1       LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2        WNT 0.6028903 0.2595040 0.6534819 0.6028903
3 Multilocus 0.4949065 0.5824709 1.1531942        NA

6.8 Multilocus Assignment

The multilocus_assignment() function estimates the probability that an individual originates from each population based on multilocus genotype frequencies:

# Compute stratum-level frequencies
freqs <- frequencies(arapat, stratum = "Species")

# Assign the first individual
multilocus_assignment(arapat[1, ], freqs)
Warning in multilocus_assignment(arapat[1, ], freqs): This individual has
missing genotypes, cannot compre assignment probability to individuals who do
not have missing data.
    Stratum  Probability Posterior
3 Peninsula 3.205128e-07         1

The Posterior column gives the relative probability of assignment to each stratum given the individual’s multilocus genotype.

6.9 Individual Estimator Functions

Each structure metric has a standalone function:

Gst(arapat, stratum = "Species")
       Locus       Gst        Hs        Ht         P
1       LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2        WNT 0.6028903 0.2595040 0.6534819 0.6028903
3         EN 0.1823664 0.3678145 0.4498525 0.1823664
4         EF 0.3653913 0.2772276 0.4368481 0.3653913
5        ZMP 0.4961044 0.1712607 0.3398734 0.4961044
6        AML 0.2723689 0.6045761 0.8308826 0.2723689
7       ATPS 0.5856500 0.2983785 0.7201122 0.5856500
8       MP20 0.2358115 0.6267430 0.8201419 0.2358115
9 Multilocus 0.3835972 2.9284713 4.7509048        NA
Gst_prime(arapat, stratum = "Species")
       Locus       Gst        Hs        Ht  P
1       LTRS 0.6067798 0.3229669 0.4997122  0
2        WNT 0.9198112 0.2595040 0.6534819  0
3         EN 0.3415214 0.3678145 0.4498525  0
4         EF 0.5756163 0.2772276 0.4368481  0
5        ZMP 0.6498859 0.1712607 0.3398734  0
6        AML 0.8970188 0.6045761 0.8308826  0
7       ATPS 0.9592390 0.2983785 0.7201122  0
8       MP20 0.8297450 0.6267430 0.8201419  0
9 Multilocus 0.6503121 0.3110927 0.5406400 NA
Dest(arapat, stratum = "Species")
       Locus       Dest        Hs        Ht          P
1       LTRS 0.17500476 0.3229669 0.4997122 0.17500476
2        WNT 0.35493302 0.2595040 0.6534819 0.35493302
3         EN 0.08807037 0.3678145 0.4498525 0.08807037
4         EF 0.14808390 0.2772276 0.4368481 0.14808390
5        ZMP 0.13611660 0.1712607 0.3398734 0.13611660
6        AML 0.38225685 0.6045761 0.8308826 0.38225685
7       ATPS 0.40086216 0.2983785 0.7201122 0.40086216
8       MP20 0.34665943 0.6267430 0.8201419 0.34665943
9 Multilocus 0.19056225 0.3660589 0.5938631         NA