This section will focus on estimating the measures of genetic distance and genetic structure. The lecture content for this topic is here.
library( gstudio )
data( arapat )
The data for this activity is included in the gstudio
library and represents a set of nuclear co-dominant loci (named LTRS, WNT, EN, EF, ZMP, AML, ATPS, MP20) assayed for 363 individuals and partitioned into 3 partitions.
summary( arapat )
Species Cluster Population ID Latitude
Cape : 75 CBP-C :150 32 : 19 101_10A: 1 Min. :23.08
Mainland : 36 NBP-C : 84 75 : 11 101_1A : 1 1st Qu.:24.59
Peninsula:252 SBP-C : 18 Const : 11 101_2A : 1 Median :26.25
SCBP-A: 75 12 : 10 101_3A : 1 Mean :26.25
SON-B : 36 153 : 10 101_4A : 1 3rd Qu.:27.53
157 : 10 101_5A : 1 Max. :29.33
(Other):292 (Other):357
Longitude LTRS WNT EN EF
Min. :-114.3 01:01 :147 03:03 :108 01:01 :225 01:01 :219
1st Qu.:-113.0 01:02 : 86 01:01 : 82 01:02 : 52 01:02 : 52
Median :-111.5 02:02 :130 01:03 : 77 02:02 : 38 02:02 : 90
Mean :-111.7 02:02 : 62 03:03 : 22 NA's : 2
3rd Qu.:-110.5 03:04 : 8 01:03 : 7
Max. :-109.1 (Other): 15 (Other): 16
NA's : 11 NA's : 3
ZMP AML ATPS MP20
01:01 : 46 08:08 : 51 05:05 :155 05:07 : 64
01:02 : 51 07:07 : 42 03:03 : 69 07:07 : 53
02:02 :233 07:08 : 42 09:09 : 66 18:18 : 52
NA's : 33 04:04 : 41 02:02 : 30 05:05 : 48
07:09 : 22 07:09 : 14 05:06 : 22
(Other):142 08:08 : 9 (Other):119
NA's : 23 (Other): 20 NA's : 5
Copy over the function island_frequencies()
created in the slides ( here ). If the mainland population frequency was \(p=1.0\) and the island frequency started at \(p=0.32\), how many generations does it take for a migration rate of \(m=0.10\) with the underlying model of unidirectional migration as depicted in the \(N-Island\) diagram?
Consider the simple demographic population model with three populations (Pop-X, Pop-Y, and Pop-Z) all connected by symmetric migration (as shown here). If the initial allele frequencies at a simple 2-allele locus are \(p_X = 0.25\), \(p_Y = 0.50\), and \(p_Z = 0.75\), what are the allele frequency for this network of populations at equilibrium if migration is:
Isolation by physical distance (classically called IBD) is a relationship between physical separation of populations and some measure of genetic differences. For the the arapat data set, estimate inter-population genetic distance using Nei’s metric and then plot that against inter-population physical distance. Overlay a trendline. Interpret the results, do genetic differences have a relationship with physical separation?
Does your interpretation of the IBD relationship change if you only use individuals whose Species == Peninsula
?
Is there more structure (\(\Phi_{ST}\)) among Species, Clusters, or Populations in the arapat data set? And how would you suggest analyzing these data after seeing these differences?