1  Getting Started

This chapter covers the installation of gstudio, loading bundled datasets, and a quick tour of the package’s core utilities.

1.1 Installation

The gstudio package is available from GitHub and can be installed using the devtools package:

install.packages("devtools")
devtools::install_github("dyerlab/gstudio")

Once installed, load the package:

library(gstudio)

1.2 Bundled Datasets

The package ships with several datasets for examples and testing:

Dataset Description
arapat Araptus attenuatus multilocus microsatellite data across Baja California
aflp_arapat Araptus attenuatus AFLP marker data
cornus Cornus florida adult genotypes
cornus_florida Cornus florida mother-offspring pair data
lopho Lophocereus schottii population graph (Dyer & Nason 2004)
upiga Upiga virescens population graph
baja Baja California sampling location metadata
alt Sonoran desert altitude raster data

Load any dataset with the data() function:

data(arapat)

1.3 Exploring a Dataset

The arapat dataset is a data.frame with both metadata columns and locus columns:

names(arapat)
 [1] "Species"    "Cluster"    "Population" "ID"         "Latitude"  
 [6] "Longitude"  "LTRS"       "WNT"        "EN"         "EF"        
[11] "ZMP"        "AML"        "ATPS"       "MP20"      
dim(arapat)
[1] 363  14

1.3.1 Identifying Locus Columns

Use column_class() to find all columns of a particular class:

column_class(arapat, "locus")
[1] "LTRS" "WNT"  "EN"   "EF"   "ZMP"  "AML"  "ATPS" "MP20"

This returns the names of all columns that contain locus objects — a function used extensively throughout the package.

1.3.2 Splitting by Stratum

The partition() function splits a data.frame by a grouping column, returning a named list:

pops <- partition(arapat, stratum = "Species")
names(pops)
[1] "Cape"      "Mainland"  "Peninsula"
sapply(pops, nrow)
     Cape  Mainland Peninsula 
       75        36       252 

1.3.3 Ploidy

Check the ploidy level of a locus column:

ploidy(arapat$LTRS)
  [1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [38] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[112] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[149] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[186] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[223] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[260] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[297] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[334] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

1.4 A Quick Analysis

With just a few lines, you can estimate genetic diversity across populations:

genetic_diversity(arapat, stratum = "Species", mode = "He")
     Stratum Locus         He
2       Cape  LTRS 0.14720000
3       Cape   WNT 0.23374726
4       Cape    EN 0.42000000
5       Cape    EF 0.06444444
6       Cape   ZMP 0.00000000
7       Cape   AML 0.53312835
8       Cape  ATPS 0.11546667
9       Cape  MP20 0.36791111
10  Mainland  LTRS 0.38850309
11  Mainland   WNT 0.03277778
12  Mainland    EN 0.50285714
13  Mainland    EF 0.27119377
14  Mainland   ZMP 0.09500000
15  Mainland   AML 0.53854875
16  Mainland  ATPS 0.25038580
17  Mainland  MP20 0.80670340
18 Peninsula  LTRS 0.42591805
19 Peninsula   WNT 0.50613781
20 Peninsula    EN 0.17229600
21 Peninsula    EF 0.48979592
22 Peninsula   ZMP 0.41492187
23 Peninsula   AML 0.72842417
24 Peninsula  ATPS 0.52255763
25 Peninsula  MP20 0.69148800

Or compute population structure:

genetic_structure(arapat, stratum = "Species", mode = "Gst")
       Locus       Gst        Hs        Ht         P
1       LTRS 0.3536942 0.3229669 0.4997122 0.3536942
2        WNT 0.6028903 0.2595040 0.6534819 0.6028903
3         EN 0.1823664 0.3678145 0.4498525 0.1823664
4         EF 0.3653913 0.2772276 0.4368481 0.3653913
5        ZMP 0.4961044 0.1712607 0.3398734 0.4961044
6        AML 0.2723689 0.6045761 0.8308826 0.2723689
7       ATPS 0.5856500 0.2983785 0.7201122 0.5856500
8       MP20 0.2358115 0.6267430 0.8201419 0.2358115
9 Multilocus 0.3835972 2.9284713 4.7509048        NA

The following chapters cover each of these operations in detail.