2  The Locus Class

The locus is the fundamental data type in gstudio. It is an S3 class that stores a genotype as a colon-separated string of alleles. This chapter covers creating, manipulating, and converting locus objects.

library(gstudio)

2.1 Creating Locus Objects

The locus() constructor takes a vector of alleles and collapses them into a single genotype:

loc <- locus(c("A", "B"))
loc
[1] "A:B"

Homozygotes and higher-ploidy genotypes work the same way:

locus(c("A", "A"))
[1] "A:A"
locus(c("A", "B", "C", "D"))
[1] "A:B:C:D"

2.1.1 Missing Data

Missing data is represented as an empty string internally:

loc_missing <- locus()
loc_missing
[1] NA
is.na(loc_missing)
[1] TRUE

2.2 Marker Types

The type parameter controls how raw data is interpreted:

2.2.1 Codominant (default)

Alleles are sorted alphabetically and joined with ::

locus(c("B", "A"), type = "codom")
[1] "A:B"

2.2.2 AFLP

Binary presence/absence data (0 or 1):

locus("1", type = "aflp")
[1] "1"
locus("0", type = "aflp")
[1] "0"

2.2.3 SNP

Encoded as 0, 1, or 2 (count of minor alleles):

locus("0", type = "snp")
[1] "A:A"
locus("1", type = "snp")
[1] "A:B"
locus("2", type = "snp")
[1] "B:B"

2.2.4 Separated

Pre-formatted colon-separated strings:

locus("A:B", type = "separated")
[1] "A:B"

2.2.5 Zyme (allozyme)

Alleles encoded as concatenated integers (e.g., “12” = alleles 1 and 2):

locus("12", type = "zyme")
[1] "1:2"
locus("23", type = "zyme")
[1] "2:3"

2.2.6 Column

Two-column matrix format (used internally by read_population()):

alleles_mat <- cbind(c("A", "B", "C"), c("B", "A", "C"))
locus(alleles_mat, type = "column")
[1] "A:B" "A:B" "C:C"

2.3 Working with Vectors of Loci

Genotype vectors are created using c():

AA <- locus(c("A", "A"))
AB <- locus(c("A", "B"))
BB <- locus(c("B", "B"))
loci <- c(AA, AB, AB, AA, BB)
loci
[1] "A:A" "A:B" "A:B" "A:A" "B:B"

2.3.1 Indexing

loci[2]
[1] "A:B"
loci[c(1, 3, 5)]
[1] "A:A" "A:B" "B:B"

2.3.2 Replication

rep(AB, times = 3)
[1] "A:B" "A:B" "A:B"

2.4 Extracting Alleles

The alleles() function returns the component alleles:

alleles(AB)
[1] "A" "B"
alleles(loci)
     [,1] [,2]
[1,] "A"  "A" 
[2,] "A"  "B" 
[3,] "A"  "B" 
[4,] "A"  "A" 
[5,] "B"  "B" 

2.5 Heterozygosity Testing

is_heterozygote(AA)
[1] FALSE
is_heterozygote(AB)
[1] TRUE
is_heterozygote(loci)
[1] FALSE  TRUE  TRUE FALSE FALSE

2.6 Operator Overloads

2.6.1 Mating (+)

The + operator simulates mating by randomly sampling one allele from each parent:

dad <- locus(c("A", "A"))
mom <- locus(c("B", "B"))
set.seed(42)
offspring <- mom + dad
offspring
[1] "A:B"

2.6.2 Parental Subtraction (-)

The - operator removes the maternal contribution from an offspring genotype:

off <- locus(c("A", "B"))
mom <- locus(c("A", "A"))
paternal_gamete <- off - mom
paternal_gamete
[1] "B"

This is useful in parentage analysis for identifying the paternal allelic contribution.

2.7 Coercion

2.7.1 To Data Frame

as.data.frame(loci)
  loci
1  A:A
2  A:B
3  A:B
4  A:A
5  B:B

2.7.2 From Other Types

as.locus(c("A", "B"))
[1] "A:B"
as.locus(list("C", "D"))
[1] "C:D"

2.7.3 Is-a Test

is.locus(AB)
[1] TRUE
is.locus("not a locus")
[1] FALSE

2.8 Multivariate Conversion

The to_mv() function converts locus data to a multivariate numeric format suitable for ordination and graph analysis:

data(arapat)
mv <- to_mv(arapat)
dim(mv)
[1] 363  58
mv[1:5, 1:6]
     01 02  01 02  03 04
[1,]  1  0 0.0  0 0.0  0
[2,]  1  0 0.5  0 0.5  0
[3,]  1  0 0.5  0 0.5  0
[4,]  1  0 1.0  0 0.0  0
[5,]  1  0 0.5  0 0.5  0

Each allele at each locus becomes a column, with values representing allele counts (0, 1, or 2 for diploids). This is the input format for popgraph().