13 Statistical Inferences

All models are wrong, but some are useful. (George E.P. Box)

So we have been able to take our data, visualize it, manipulate it, and now it is time to apply some model to the data so we can get some degree of confidence on the statements we are going to make.

13.1 An applied example.

Consider the case where we are trying to determine the relative frequency of blue catfish (Ictalurus furcatus) in the James River.

The blue catfish (Ictalurus furcatus), an invasive species in the James River watershed. Photo by Matt Rath (CC BY-NC 2.0).

The blue catfish (Ictalurus furcatus), an invasive species in the James River watershed. Photo by Matt Rath (CC BY-NC 2.0).

This species is not native to this region, it is endemic to the midwest drainages in the US such as the Mississippi, Missouri, and Ohio rivers. In the 1970’s, individuals were introducted into both the James and Rappahannock rivers and in 1985 stock was released in the Mattaponi river which eventually populated the Pamunkey. In the interim, this species has had a tremendous impact on native fishes. Before we can come up with management plans and/or recommendations, we must establish some baseline census data on the frequency of this fish in the basin.

As a simple example, consider the case where you went out and caught 50 fish, 37 of which were blue cats.

From this, we can estimate the relative frequency of the invasive species as:

\[ p_{catfish} = \frac{N_{catfish}}{N_{catfish} + N_{other}} = \frac{37}{50} = 0.74 \]

So, empirically, we found 74% of the captures were the invasive species and 26% were other species.

Our underlying biological question here is one focusing on how common this invasive species is. In the data above, we had a single sample and found the relative frequency to be \(p=0.74\), though this just one sample. Is it really 74% ? How can we get any statistical confidence on this finding? Let’s take a short detour into how we develop statistical inferences using the binomial probability function as an example and then show how there are two different schools of thought on how to analyze these data

13.2 The Binomial

If you were to flip a coin, it is either HEADS or TAILS, and if it is a fair coin, then the frequency at which you find a HEADS can be determined by flipping it a large number of times and counting how many come up HEADS. Actually, this probability is drawn from a well-known distribution, the binomial distribution, which is constructed as follows.

Let the probability of getting HEADS be denoted as \(P(HEADS) = p\) and the probability of not heads (or TAILS if you consider it to never land perfectly balanced on its side, and that would be silly), as \(1 - P(HEADS) = P(TAILS) = 1 - p = q\). Since either HEADS or TAILS must happen then \(p + q = 1\). Depending upon the number of times you flip the coin, you can come up with several different arrangements of HEADS (H for brevity) and TAILS (T similiarly).

Here are some scenarios:
- One flip: H or T, both occuring at an expected rate of \(p\) and \(q\).
- Two flips: HH (rate is expected at \(p^2\)) or HT (\(pq\)) or TH (\(qp\)) or TT (\(q^2\)).
- Three flips: HHH (which can be written as \(p^3q^0\) since \(q^0=1\)) or HHT (\(p^2q^1\)) or HTH (\(pqp = p^2q^1\)) or THH (\(q^1p^2\)) or HTT (\(p^1q^2\)) or THT (\(pq^2\)) or TTH (\(pq^2\)) or TTT (\(p^0q^3\)).
- etc.

As you can see, there are some patterns emerging for any set of \(N\) coin flips.
1. The probability of getting \(K\) H is \(p^K\) and getting \(N-K\) T is \(q^{N-K}\). So if we are intersted in the simple question of 2 H and 1 T results, we have \(p^2q^1\).
2. However, if you look at the iteration through the three flip case, there are many different ways you can get 2 H and 1 T. In fact, there are \(\frac{N!}{K!(N-K)!}\) different ways.1

By doing this, we can define the binomial probability distribution as:

\[ P(K|N,p) = \frac{N!}{K!(N-K)!}p^K(1-p)^{N-k} \]

In the coin example, we set (or assumed to set) \(p=q\) but that does not necessarily have to be true. We could, for example, set \(p = 0.74\) as in our fish example. It may be the case that the true frequency of blue catfish is \(p\) as specified. However, it is also possible that you could take \(N\) samples and get 0 catfish in the same way as it is possible to get \(N\) H flips, right? Lets look at the distribution for a sampling haul that caught 10 fishes (or fish if on the extreme ends). First, set up some parameters,

p <- 0.74
q <- 1-p
N <- 10
Catfish <- seq(0,N)

Then estimate the number of ways to get \(0\), \(1\), \(\ldots\) \(N\) catfish.

ways_to_get <- choose(N,Catfish)
ways_to_get
##  [1]   1  10  45 120 210 252 210 120  45  10   1

This shows that while there is only one way to have all catfish (the first entry), there are 252 different ways to have 5 catfish! We have to attenuate these potential outcomes by the probability of observing \(0\), \(1\), \(\ldots\) \(N\) catfish given the frequency at which they occur in the river.

prob_of_each <- p^Catfish * q^(N-Catfish)
prob_of_each
##  [1] 1.411671e-06 4.017833e-06 1.143537e-05 3.254682e-05 9.263326e-05
##  [6] 2.636485e-04 7.503843e-04 2.135709e-03 6.078556e-03 1.730051e-02
## [11] 4.923990e-02

And then overall, we put these together to get the total probabilities.

data <- data.frame( Catfish, Frequency = ways_to_get * prob_of_each )
ggplot( data, aes(Catfish,Frequency)) + geom_col() + scale_x_continuous(breaks=Catfish)

So while the true underlying fraction of catfish is 74%, it is possible to get \(0\), \(1\), \(\ldots\) \(N\) catfish in a sample. So how do we gain meaninful inferences? Here are two of the most common ways.

13.3 A Frequentist Approach

This is by-in-large the most common approach to extracting statistical inferences from some data. Lets assume that the true underlying frequency is actually \(p=0.74\) in the James River. A frequentist, accepting the knowledge in the graph above, would consider that the way to get a good understanding of what the real likelihood of catfish in the James would require us to go sample a large number of times. After all, any one sampling session could be a varied number of catfish even if we are drawing from a fixed population. However, if we sampled many times, on average, we could get a good approximation of what the true underlying frequency really is.

For simplicity, lets say we go out and capture 50 fish from 10 different locations. The samples we collected had the following number of catfish.

df <- data.frame( Site = 1:10, Catfish=c(37,32,35,39,31,38,38,35,41,38) )
df
##    Site Catfish
## 1     1      37
## 2     2      32
## 3     3      35
## 4     4      39
## 5     5      31
## 6     6      38
## 7     7      38
## 8     8      35
## 9     9      41
## 10   10      38

This results in an average frequency of \(\bar{p}=\) 0.728. Pretty close to the target of 74%, right? Theoretically, if we went out and sampled more times the average would tend towards the true frequency—this is called the Central Limit Theorem. Understanding how repetitive sampling tends to reveal underlying parameters (such as a mean value) is the basis of the frequentist approach.

With these data, we could easily test the NULL hypothesis, \(H_O: p = 0.74\), using a univariate Student’s \(t-\)test. This test was developed in 1908 by William Sealy Gosset, a chemist working for Guinness brewery in Dublin Ireland (‘Student’ was apparently his pen name). While we will come back to this specific test later and get into more details about its use, for our purposes, we can use it directly in R. Most simple statistical models in R are depicted as a function with the name of the test and the suffix .test() attached to it. From this function, a result is returned as an object that has our answer in it.

model <- t.test(df$Catfish, mu = 37)
model
## 
##  One Sample t-test
## 
## data:  df$Catfish
## t = -0.60541, df = 9, p-value = 0.5599
## alternative hypothesis: true mean is not equal to 37
## 95 percent confidence interval:
##  34.15804 38.64196
## sample estimates:
## mean of x 
##      36.4

Here we see the average number of catfish sampled across all runs was 36.4 catfish. The \(t-\)statistic, as we will see, has a very specific distribution and given that we have 9 degrees of freedom (more on what this means also later), the probability associated with these set of observations is \(P =\) 0.559864. If we think of the distribution of all the times we could go out and sample 50 fish from the James River and the true underlying frequency was actually 74%, then the collection of samples we had would be roughly in the \(56^{th}\) percentile—right in the middle. As such, I fail to reject the null hypothesis (\(H_O: p = 0.74\)) for these data2. It may actually be 74%.

However, consider the following code. Here we examine the sequence of potential values for the relative frequency, ranging from no catfish to only catfish. After all, I kinda just made up the 74% number because that was a random draw from the first sampling session. As each sampling session is independent and many found differing (though close) numbers of catfish, who is to say that 75% is actually the true value.

Lets look at how the data observed the first time can be examined across all potential values for the true mean. In the figure below, I perform the \(t-\)test using our original data and the hypothesis that the true mean is 0, 1, 2, , and 50 catfish. For each, I record the probability associated with testing the null hypothesis.

potential <- seq(0,50)
other_p.vals <- unlist(lapply( potential,FUN = function(x) {t.test(df$Catfish, mu=x)$p.value}))
data <- data.frame( potential,other_p.vals)
data$Reject <- TRUE
data$Reject[ data$other_p.vals >= 0.05] <- FALSE
data$Reject <- factor( data$Reject, ordered=TRUE, levels=c(TRUE,FALSE))
ggplot(data,aes(x=potential,y=other_p.vals)) + 
  geom_abline(slope=0,intercept = 0.05, color="red") + 
  geom_line() + geom_point(aes(shape=Reject)) +  
  xlab("True Mean") + ylab("Probability") +
  scale_shape_discrete("Reject Ho")

We see that 35, 36, 37, and 38, are all values could not be rejected using a \(t-\)test. This means that the actual frequency of \(\frac{35}{50} = 0.7\), \(\frac{36}{50} = 0.72\), \(\frac{37}{50} = 0.74\), and \(\frac{38}{50} = 0.76\) are all potential frequencies for which the observations we made could not be rejected! So how do we know?

One way to get a better idea is to go sample more. “Always increase your sample size,” says the advisor to the student. If we captured all the fish in the James River, we would know exactly their relative frequencies–just by counting. However, we cannot sample them all (though Dr. Garman would really like to have all the catfish out of the James River).

The main idea here is that if we were to sample enough, the Central Limit Theorem states that we would assymptotically arrive at the true value of the underlying.

13.4 A Bayesian Approach

Using probability to represent uncertainty in all parts of a statistical model. Adapting a Bayesian approach requires three things:

  1. Data consisting of several observations.
  2. A generative model underlying the creation of the data.
  3. Priors probabilities,

In essence, we want to start with some data and try to figure out what the parameters are given some specified model.

A/B Testing

\[ P(A|B) = \frac{P(B|A)P(B)}{P(A)P(B)} \]

Simple example. For a set of \(N\) observations, consider the case where we have \(k\) observations of species A and \(N-k\) observations of species B. As an environmental scientist, we are interested in the frequency of each of these species in our sampling location. For simplicity, lets call the frequency of species A \(p\) and the frequency of species B in our samples as \(q = 1-p\).

A frequentist would start with a hypothesis, such as \(H_O:\)The species are equally frequenct. In this case, it would amo

\[ P(X|N,k,p) = \frac{N!}{(N-k)!k!}p^{N-k}(1-p)^k \]

observed <- rbinom(25,1,0.42)
observed
##  [1] 0 1 0 1 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0 0 1 1 1 0

  1. The \(X!\) notation is defined as: \(X! = X * (X-1) * (X-2) * (X-3) * \dots * 3 * 2 * 1\). This number gets big QUICK so be careful.

  2. Notice here that I did not say ‘accept’ the null, we never accept null hypotheses, only fail to reject them.