install.packages("devtools")
devtools::install_github("dyerlab/gstudio")1 Getting Started
This chapter covers the installation of gstudio, loading bundled datasets, and a quick tour of the package’s core utilities.
1.1 Installation
The gstudio package is available from GitHub and can be installed using the devtools package:
Once installed, load the package:
library(gstudio)1.2 Bundled Datasets
The package ships with several datasets for examples and testing:
| Dataset | Description |
|---|---|
arapat |
Araptus attenuatus multilocus microsatellite data across Baja California |
aflp_arapat |
Araptus attenuatus AFLP marker data |
cornus |
Cornus florida adult genotypes |
cornus_florida |
Cornus florida mother-offspring pair data |
lopho |
Lophocereus schottii population graph (Dyer & Nason 2004) |
upiga |
Upiga virescens population graph |
baja |
Baja California sampling location metadata |
alt |
Sonoran desert altitude raster data |
Load any dataset with the data() function:
data(arapat)1.3 Exploring a Dataset
The arapat dataset is a data.frame with both metadata columns and locus columns:
names(arapat) [1] "Species" "Cluster" "Population" "ID" "Latitude"
[6] "Longitude" "LTRS" "WNT" "EN" "EF"
[11] "ZMP" "AML" "ATPS" "MP20"
dim(arapat)[1] 363 14
1.3.1 Identifying Locus Columns
Use column_class() to find all columns of a particular class:
column_class(arapat, "locus")[1] "LTRS" "WNT" "EN" "EF" "ZMP" "AML" "ATPS" "MP20"
This returns the names of all columns that contain locus objects — a function used extensively throughout the package.
1.3.2 Splitting by Stratum
The partition() function splits a data.frame by a grouping column, returning a named list:
pops <- partition(arapat, stratum = "Species")
names(pops)[1] "Cape" "Mainland" "Peninsula"
sapply(pops, nrow) Cape Mainland Peninsula
75 36 252
1.3.3 Ploidy
Check the ploidy level of a locus column:
ploidy(arapat$LTRS) [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[38] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[112] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[149] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[186] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[223] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[260] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[297] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[334] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
1.4 A Quick Analysis
With just a few lines, you can estimate genetic diversity across populations:
genetic_diversity(arapat, stratum = "Species", mode = "He") Stratum Locus He
2 Cape LTRS 0.14720000
3 Cape WNT 0.23374726
4 Cape EN 0.42000000
5 Cape EF 0.06444444
6 Cape ZMP 0.00000000
7 Cape AML 0.53312835
8 Cape ATPS 0.11546667
9 Cape MP20 0.36791111
10 Mainland LTRS 0.38850309
11 Mainland WNT 0.03277778
12 Mainland EN 0.50285714
13 Mainland EF 0.27119377
14 Mainland ZMP 0.09500000
15 Mainland AML 0.53854875
16 Mainland ATPS 0.25038580
17 Mainland MP20 0.80670340
18 Peninsula LTRS 0.42591805
19 Peninsula WNT 0.50613781
20 Peninsula EN 0.17229600
21 Peninsula EF 0.48979592
22 Peninsula ZMP 0.41492187
23 Peninsula AML 0.72842417
24 Peninsula ATPS 0.52255763
25 Peninsula MP20 0.69148800
Or compute population structure:
genetic_structure(arapat, stratum = "Species", mode = "Gst") Locus Gst Hs Ht P
1 LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2 WNT 0.6028903 0.2595040 0.6534819 0.6028903
3 EN 0.1823664 0.3678145 0.4498525 0.1823664
4 EF 0.3653913 0.2772276 0.4368481 0.3653913
5 ZMP 0.4961044 0.1712607 0.3398734 0.4961044
6 AML 0.2723689 0.6045761 0.8308826 0.2723689
7 ATPS 0.5856500 0.2983785 0.7201122 0.5856500
8 MP20 0.2358115 0.6267430 0.8201419 0.2358115
9 Multilocus 0.3835972 2.9284713 4.7509048 NA
The following chapters cover each of these operations in detail.