library(gstudio)2 The Locus Class
The locus is the fundamental data type in gstudio. It is an S3 class that stores a genotype as a colon-separated string of alleles. This chapter covers creating, manipulating, and converting locus objects.
2.1 Creating Locus Objects
The locus() constructor takes a vector of alleles and collapses them into a single genotype:
loc <- locus(c("A", "B"))
loc[1] "A:B"
Homozygotes and higher-ploidy genotypes work the same way:
locus(c("A", "A"))[1] "A:A"
locus(c("A", "B", "C", "D"))[1] "A:B:C:D"
2.1.1 Missing Data
Missing data is represented as an empty string internally:
loc_missing <- locus()
loc_missing[1] NA
is.na(loc_missing)[1] TRUE
2.2 Marker Types
The type parameter controls how raw data is interpreted:
2.2.1 Codominant (default)
Alleles are sorted alphabetically and joined with ::
locus(c("B", "A"), type = "codom")[1] "A:B"
2.2.2 AFLP
Binary presence/absence data (0 or 1):
locus("1", type = "aflp")[1] "1"
locus("0", type = "aflp")[1] "0"
2.2.3 SNP
Encoded as 0, 1, or 2 (count of minor alleles):
locus("0", type = "snp")[1] "A:A"
locus("1", type = "snp")[1] "A:B"
locus("2", type = "snp")[1] "B:B"
2.2.4 Separated
Pre-formatted colon-separated strings:
locus("A:B", type = "separated")[1] "A:B"
2.2.5 Zyme (allozyme)
Alleles encoded as concatenated integers (e.g., “12” = alleles 1 and 2):
locus("12", type = "zyme")[1] "1:2"
locus("23", type = "zyme")[1] "2:3"
2.2.6 Column
Two-column matrix format (used internally by read_population()):
alleles_mat <- cbind(c("A", "B", "C"), c("B", "A", "C"))
locus(alleles_mat, type = "column")[1] "A:B" "A:B" "C:C"
2.3 Working with Vectors of Loci
Genotype vectors are created using c():
AA <- locus(c("A", "A"))
AB <- locus(c("A", "B"))
BB <- locus(c("B", "B"))
loci <- c(AA, AB, AB, AA, BB)
loci[1] "A:A" "A:B" "A:B" "A:A" "B:B"
2.3.1 Indexing
loci[2][1] "A:B"
loci[c(1, 3, 5)][1] "A:A" "A:B" "B:B"
2.3.2 Replication
rep(AB, times = 3)[1] "A:B" "A:B" "A:B"
2.4 Extracting Alleles
The alleles() function returns the component alleles:
alleles(AB)[1] "A" "B"
alleles(loci) [,1] [,2]
[1,] "A" "A"
[2,] "A" "B"
[3,] "A" "B"
[4,] "A" "A"
[5,] "B" "B"
2.5 Heterozygosity Testing
is_heterozygote(AA)[1] FALSE
is_heterozygote(AB)[1] TRUE
is_heterozygote(loci)[1] FALSE TRUE TRUE FALSE FALSE
2.6 Operator Overloads
2.6.1 Mating (+)
The + operator simulates mating by randomly sampling one allele from each parent:
dad <- locus(c("A", "A"))
mom <- locus(c("B", "B"))
set.seed(42)
offspring <- mom + dad
offspring[1] "A:B"
2.6.2 Parental Subtraction (-)
The - operator removes the maternal contribution from an offspring genotype:
off <- locus(c("A", "B"))
mom <- locus(c("A", "A"))
paternal_gamete <- off - mom
paternal_gamete[1] "B"
This is useful in parentage analysis for identifying the paternal allelic contribution.
2.7 Coercion
2.7.1 To Data Frame
as.data.frame(loci) loci
1 A:A
2 A:B
3 A:B
4 A:A
5 B:B
2.7.2 From Other Types
as.locus(c("A", "B"))[1] "A:B"
as.locus(list("C", "D"))[1] "C:D"
2.7.3 Is-a Test
is.locus(AB)[1] TRUE
is.locus("not a locus")[1] FALSE
2.8 Multivariate Conversion
The to_mv() function converts locus data to a multivariate numeric format suitable for ordination and graph analysis:
data(arapat)
mv <- to_mv(arapat)
dim(mv)[1] 363 58
mv[1:5, 1:6] 01 02 01 02 03 04
[1,] 1 0 0.0 0 0.0 0
[2,] 1 0 0.5 0 0.5 0
[3,] 1 0 0.5 0 0.5 0
[4,] 1 0 1.0 0 0.0 0
[5,] 1 0 0.5 0 0.5 0
Each allele at each locus becomes a column, with values representing allele counts (0, 1, or 2 for diploids). This is the input format for popgraph().