7  Genetic Distance & Relatedness

This chapter covers inter-individual and inter-population distance estimation, pairwise relatedness, and spatial autocorrelation analysis.

library(gstudio)
data(arapat)

7.1 Genetic Distance

The genetic_distance() function computes distance matrices from genetic data. Some modes operate among individuals, others among populations (strata).

7.1.1 Available Distance Metrics

Mode Level Description
"AMOVA" Individual Inter-individual AMOVA distance
"Bray" Individual Proportion of shared alleles
"Euclidean" Population Euclidean frequency distance
"cGD" Population Conditional genetic distance
"Nei" Population Nei’s genetic distance (1978)
"Dps" Population Shared allele distance (1 - Ps)
"Jaccard" Population Jaccard set dissimilarity

7.1.2 Individual-Level Distances

AMOVA distance computes inter-individual genetic distances:

# Subset for a manageable example
sub <- arapat[1:20, ]
D_amova <- genetic_distance(sub, mode = "AMOVA")
dim(D_amova)
[1] 20 20
D_amova[1:5, 1:5]
     [,1] [,2] [,3] [,4] [,5]
[1,]    0   NA   NA   NA   NA
[2,]   NA    0    1    2    5
[3,]   NA    1    0    3    2
[4,]   NA    2    3    0    5
[5,]   NA    5    2    5    0

7.1.3 Population-Level Distances

Population-level distances require a stratum column:

D_nei <- genetic_distance(arapat, stratum = "Species", mode = "Nei")
D_nei
               Cape  Mainland Peninsula
Cape      0.0000000 0.7311289 0.9244791
Mainland  0.7311289 0.0000000 0.6851578
Peninsula 0.9244791 0.6851578 0.0000000
D_cgd <- genetic_distance(arapat, stratum = "Species", mode = "cGD")
Warning in popgraph(x = mv, groups = factor(as.character(x[[stratum]]))): 1
variables are collinear and being dropped from the discriminant rotation.
D_cgd
              Cape  Mainland Peninsula
Cape      0.000000  9.913083  7.136108
Mainland  9.913083  0.000000 16.490315
Peninsula 7.136108 16.490315  0.000000
D_euc <- genetic_distance(arapat, stratum = "Species",
                           mode = "Euclidean")
Multilous estimates of Euclidean distance are assumed to be additive.
D_euc
              Cape Mainland Peninsula
Cape      0.000000 2.418731  2.494205
Mainland  2.418731 0.000000  2.140258
Peninsula 2.494205 2.140258  0.000000

7.1.4 Standalone Distance Functions

Each distance metric also has a standalone function:

D <- dist_nei(arapat, stratum = "Species")
D
               Cape  Mainland Peninsula
Cape      0.0000000 0.7311289 0.9244791
Mainland  0.7311289 0.0000000 0.6851578
Peninsula 0.9244791 0.6851578 0.0000000

7.2 Genetic Relatedness

The genetic_relatedness() function computes pairwise relatedness among individuals.

7.2.1 Available Relatedness Estimators

Mode Description
"Nason" Nason’s kinship coefficient (Fij)
"LynchRitland" Lynch & Ritland estimator
"Ritland" Ritland estimator

7.2.2 Basic Usage

The Nason estimator divides by the polymorphic index Pe(), so monomorphic loci (where Pe = 0) will produce NaN. In the arapat dataset, LTRS and ATPS can be monomorphic in small subsets. To avoid this, restrict the data to polymorphic loci:

# Use loci that are polymorphic in this subset
poly_loci <- c("WNT", "EN", "EF", "ZMP", "AML", "MP20")
sub <- arapat[1:10, c("Species", poly_loci)]
R <- genetic_relatedness(sub, mode = "Nason")
Some of your loci are missing, Fij will treat these as loci with all alleles with likelihood equal to the population allele frequency.
Some of your loci are missing, Fij will treat these as loci with all alleles with likelihood equal to the population allele frequency.
Some of your loci are missing, Fij will treat these as loci with all alleles with likelihood equal to the population allele frequency.
dim(R)
[1] 10 10
round(R[1:5, 1:5], 4)
       [,1]   [,2]   [,3]   [,4]   [,5]
[1,] 1.0000 0.2497 0.2431 0.2534 0.2403
[2,] 0.2497 1.0000 0.2784 0.2725 0.2680
[3,] 0.2431 0.2784 1.0000 0.2659 0.2877
[4,] 0.2534 0.2725 0.2659 1.0000 0.2606
[5,] 0.2403 0.2680 0.2877 0.2606 1.0000

7.2.3 Standalone Relatedness Functions

Per-locus relatedness can also be estimated directly. Choose a polymorphic locus:

r_nason <- rel_nason(arapat$EN[1:10])
round(r_nason[1:5, 1:5], 4)
       [,1]   [,2]   [,3]   [,4]   [,5]
[1,]     NA 0.0834 0.0834 0.1061 0.1061
[2,] 0.0834     NA 0.1260 0.1029 0.1029
[3,] 0.0834 0.1260     NA 0.1029 0.1029
[4,] 0.1061 0.1029 0.1029     NA 0.1107
[5,] 0.1061 0.1029 0.1029 0.1107     NA

7.3 Spatial Autocorrelation

The genetic_autocorrelation() function performs the spatial autocorrelation analysis of Smouse & Peakall (1999). It requires two square distance matrices — one physical, one genetic — and a vector of distance bin boundaries:

# Physical distance matrix
P <- strata_distance(arapat,
                     longitude = "Longitude",
                     latitude = "Latitude")

# Genetic distance matrix
G <- genetic_distance(arapat, mode = "AMOVA")

# Define distance bins
bins <- seq(0, max(P), length.out = 6)

# Run autocorrelation
result <- genetic_autocorrelation(P, G, bins = bins, perms = 99)
result

The output includes the spatial autocorrelation coefficient (R) for each distance class, the number of pairwise comparisons (N), and a p-value from the permutation test (P).

7.4 Strata Distances

The strata_coordinates() function extracts per-stratum centroid coordinates, and strata_distance() computes physical (geographic) distances between them:

coords <- strata_coordinates(arapat, stratum = "Species",
                              longitude = "Longitude",
                              latitude = "Latitude")
coords
    Stratum Longitude Latitude
3 Peninsula -111.9226 26.18956
1      Cape -110.5805 24.25405
2  Mainland -109.6759 26.97435
strata_distance(coords)
          Peninsula     Cape Mainland
Peninsula    0.0000 254.0525 239.8471
Cape       254.0525   0.0000 315.7868
Mainland   239.8471 315.7868   0.0000