class: left, middle, inverse background-image: url("https://live.staticflickr.com/65535/50559539697_1c35d0a56a_o_d.png") background-size: cover # .black[Ordination Techniques
] ### .yellow[.fancy[Viewing High-Dimensional Data In Low-Dimensional Spaces]] --- class: inverse, middle background-image: url("background.png") background-position: right background-size: auto # .orange[Spatial <br> Autocorrelation] ## .fancy[Self-similarity in space.] --- class: middle, center ![](https://live.staticflickr.com/65535/51206415778_cbcfde8417_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206209856_ff6cc1e21b_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51205518387_9a282276e8_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206228671_cde5f8ff6a_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206434673_a227c2dd48_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206209876_fac5d21f18_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51207277255_f6c7737ecd_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206994379_1117dc4e90_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51205499692_8930c96040_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51207277285_e4bb978743_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51207277345_391091e168_c_d.jpg) --- # An Example from *Cornus* .pull-left[ ```r library( gstudio ) data(cornus) summary( cornus ) ``` ``` ## Population SampleID X.Coordinate Y.Coordinate Cf.G8 ## Min. :2.000 Min. :203.0 Min. : 346 Min. : 254 155:165 : 18 ## 1st Qu.:3.000 1st Qu.:315.5 1st Qu.:1482 1st Qu.:2231 165:165 : 15 ## Median :4.000 Median :428.0 Median :1656 Median :2928 167:167 : 13 ## Mean :3.809 Mean :428.0 Mean :1747 Mean :2588 155:159 : 12 ## 3rd Qu.:5.000 3rd Qu.:540.5 3rd Qu.:1914 3rd Qu.:3082 157:157 : 12 ## Max. :6.000 Max. :653.0 Max. :3778 Max. :6148 (Other) :372 ## NA's : 9 ## Cf.H18 Cf.N5 Cf.N10 Cf.O5 ## 105:119 : 23 170:170 :251 189:193 : 25 182:196 : 43 ## 105:105 : 18 162:170 : 34 189:201 : 20 182:182 : 41 ## 107:119 : 16 172:172 : 34 189:197 : 18 178:196 : 28 ## 121:121 : 16 164:170 : 27 189:189 : 17 178:182 : 25 ## 105:113 : 15 166:170 : 19 193:193 : 17 180:180 : 24 ## (Other) :362 (Other) : 50 (Other) :341 (Other) :282 ## NA's : 1 NA's : 36 NA's : 13 NA's : 8 ``` ] .pull-right[ <img src="slides_files/figure-html/unnamed-chunk-2-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Spatial Autocorrelation ![](https://live.staticflickr.com/65535/51207391795_a656b2839e_c_d.jpg) --- # Estimating Spatial Bins .pull-left[ To construct a spatial autocorrelation, we need to find groups of individuals who are separated by a `lag` or `bin` (e.g., that can all be grouped into a *distance class*). - Ecological - Euclidean - Other ] .pull-right[ ```r coords <- strata_coordinates(cornus, stratum = "SampleID", longitude = "X.Coordinate", latitude="Y.Coordinate") P <- strata_distance(coords, mode="Euclidean") G <- genetic_distance(cornus, mode="AMOVA") ``` ] --- ```r df <- data.frame( Physical=P[lower.tri(P)], Genetic=G[lower.tri(G)]) ggplot( df, aes(Physical,Genetic)) + geom_point() + stat_smooth(method="gam") ``` <img src="slides_files/figure-html/unnamed-chunk-4-1.png" width="504" style="display: block; margin: auto;" /> --- # Overall Correlation - The Mantel ```r cor.test(df$Physical, df$Genetic) ``` ``` ## ## Pearson's product-moment correlation ## ## data: df$Physical and df$Genetic ## t = -0.015364, df = 101473, p-value = 0.9877 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## -0.006200982 0.006104525 ## sample estimates: ## cor ## -4.823045e-05 ``` --- ```r df %>% filter( Physical < 25 ) -> df1 ggplot( df[ df$Physical < 25,], aes(Physical,Genetic)) + geom_point() + stat_smooth(method="loess") + geom_jitter() ``` <img src="slides_files/figure-html/unnamed-chunk-6-1.png" width="504" style="display: block; margin: auto;" /> --- # Correlation At Bins - The Spatially Restriced Mantel ```r cor.test(df1$Physical, df1$Genetic) ``` ``` ## ## Pearson's product-moment correlation ## ## data: df1$Physical and df1$Genetic ## t = 5.4428, df = 874, p-value = 6.816e-08 ## alternative hypothesis: true correlation is not equal to 0 ## 95 percent confidence interval: ## 0.1162186 0.2443693 ## sample estimates: ## cor ## 0.1810624 ``` --- # Estimate the Correlgrams .pull-left[ Each entry in the correlogram is based upon the interclass correlation statistic: `\begin{equation} r^h = \frac{\sum_{i\ne j}^K x_{ij}^hc_{ij}^h}{\sum_{i=1}^Kx_{ii}^hc_{ii}^h} \end{equation}` ```r df <- genetic_autocorrelation(P,G,bins=seq(0,1000,by=100),perms=999) df$Significant <- df$P <= 0.05 ggplot( df, aes(x=To,y=R)) + geom_line() + geom_point( aes(color=Significant), size=4) + geom_abline(slope=0,intercept=0, linetype=2) + xlab("Physical Separation") + ylab("Genetic Correlation") ``` ] .pull-right[ <img src="slides_files/figure-html/autocorr-1.png" width="504" style="display: block; margin: auto;" /> ] --- class: middle, center ![](https://live.staticflickr.com/65535/51206415968_fd33982962_c_d.jpg) --- class: inverse, middle background-image: url("background.png") background-position: right background-size: auto # .orange[Eigen Structure] ## .fancy[Please define an eigenvalue... I dare you.] --- class: middle, center ![](https://live.staticflickr.com/65535/51206994389_bdd3824fda_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51207296305_4f7428786d_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206415998_3527374bc2_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206210046_4cc91ca0ea_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206975949_e08e018ffe_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206975979_22ceef0be6_c_d.jpg) --- class: middle, center ![](https://live.staticflickr.com/65535/51206994494_bcd4fb98c3_c_d.jpg) --- ```r data( arapat ) mv_genos <- to_mv( arapat ) fit.pca <- princomp(mv_genos,cor = TRUE) names( fit.pca ) ``` ``` ## [1] "sdev" "loadings" "center" "scale" "n.obs" "scores" "call" ``` --- ```r summary( fit.pca ) ``` ``` ## Importance of components: ## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 ## Standard deviation 2.8113021 2.19949725 1.98692071 1.76188725 1.35153653 ## Proportion of Variance 0.1362659 0.08341014 0.06806645 0.05352149 0.03149398 ## Cumulative Proportion 0.1362659 0.21967599 0.28774244 0.34126393 0.37275792 ## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 ## Standard deviation 1.3052912 1.24832072 1.23585320 1.20816941 1.16573700 ## Proportion of Variance 0.0293756 0.02686732 0.02633333 0.02516678 0.02343005 ## Cumulative Proportion 0.4021335 0.42900084 0.45533417 0.48050095 0.50393100 ## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 ## Standard deviation 1.1479296 1.12805147 1.11077635 1.09681605 1.07210090 ## Proportion of Variance 0.0227197 0.02193966 0.02127283 0.02074147 0.01981725 ## Cumulative Proportion 0.5266507 0.54859035 0.56986318 0.59060466 0.61042190 ## Comp.16 Comp.17 Comp.18 Comp.19 Comp.20 ## Standard deviation 1.06907461 1.06258841 1.05051763 1.03671883 1.0197660 ## Proportion of Variance 0.01970553 0.01946714 0.01902737 0.01853079 0.0179297 ## Cumulative Proportion 0.63012743 0.64959457 0.66862194 0.68715273 0.7050824 ## Comp.21 Comp.22 Comp.23 Comp.24 Comp.25 ## Standard deviation 1.00291893 0.9913325 0.9779643 0.96869623 0.95675236 ## Proportion of Variance 0.01734218 0.0169438 0.0164899 0.01617883 0.01578233 ## Cumulative Proportion 0.72242461 0.7393684 0.7558583 0.77203714 0.78781947 ## Comp.26 Comp.27 Comp.28 Comp.29 Comp.30 ## Standard deviation 0.94329924 0.93880305 0.91823416 0.89529206 0.87049453 ## Proportion of Variance 0.01534161 0.01519571 0.01453714 0.01381979 0.01306484 ## Cumulative Proportion 0.80316108 0.81835679 0.83289393 0.84671372 0.85977856 ``` --- ```r plot( fit.pca ) ``` <img src="slides_files/figure-html/unnamed-chunk-12-1.png" width="504" style="display: block; margin: auto;" /> --- .pull-left[ # Visualization ```r predict( fit.pca ) %>% data.frame() %>% mutate( Species = arapat$Species) -> pred.pca ggplot( pred.pca, aes(Comp.1,Comp.2,color=Species) ) + geom_point() + theme( legend.position = "none") ``` ] .pull-right[ <p> </p> <img src="slides_files/figure-html/unnamed-chunk-14-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Principal Components Analysis on Frequencies Just like working on raw data, but coalescing all the individuals into single populations defined by allele frquency matrices. ```r freqs <- frequency_matrix(arapat) head( freqs[,1:19] ) ``` ``` ## Stratum AML-01 AML-02 AML-03 AML-04 AML-05 AML-06 AML-07 AML-08 AML-09 ## 1 101 0.00 0 0 0.0000000 0.0000000 0.00 0.00 0.50 0.00 ## 2 102 0.00 0 0 0.0000000 0.0000000 0.00 0.00 0.00 0.00 ## 3 12 0.05 0 0 0.0000000 0.0000000 0.05 0.35 0.50 0.00 ## 4 153 0.00 0 0 0.0000000 0.0000000 0.00 0.60 0.35 0.05 ## 5 156 0.00 0 0 0.6666667 0.3333333 0.00 0.00 0.00 0.00 ## 6 157 0.00 0 0 0.7000000 0.1000000 0.20 0.00 0.00 0.00 ## AML-10 AML-11 AML-12 AML-13 ATPS-01 ATPS-02 ATPS-03 ATPS-04 ATPS-05 ## 1 0.00 0.5 0 0 0 0.6666667 0.0000000 0.1111111 0.00 ## 2 0.00 1.0 0 0 0 0.9375000 0.0000000 0.0000000 0.00 ## 3 0.05 0.0 0 0 0 0.0000000 0.0000000 0.0000000 1.00 ## 4 0.00 0.0 0 0 0 0.0000000 0.0000000 0.0000000 1.00 ## 5 0.00 0.0 0 0 0 0.0000000 0.9166667 0.0000000 0.00 ## 6 0.00 0.0 0 0 0 0.0000000 0.7000000 0.0000000 0.15 ``` --- # Principal Components Analysis on Frequencies Just like working on raw data, but coalescing all the individuals into single populations defined by allele frquency matrices. ```r F <- as.matrix( freqs[,2:59]) rownames( F ) <- freqs$Stratum fit.pca_freq <- prcomp( F, center = TRUE ) ``` --- ```r summary( fit.pca_freq ) ``` ``` ## Importance of components: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 ## Standard deviation 0.9961 0.7719 0.5274 0.39514 0.2898 0.25543 0.24065 ## Proportion of Variance 0.4135 0.2483 0.1159 0.06508 0.0350 0.02719 0.02414 ## Cumulative Proportion 0.4135 0.6618 0.7777 0.84282 0.8778 0.90501 0.92914 ## PC8 PC9 PC10 PC11 PC12 PC13 PC14 ## Standard deviation 0.19880 0.15834 0.14287 0.13783 0.12481 0.10191 0.09150 ## Proportion of Variance 0.01647 0.01045 0.00851 0.00792 0.00649 0.00433 0.00349 ## Cumulative Proportion 0.94562 0.95606 0.96457 0.97249 0.97898 0.98331 0.98680 ## PC15 PC16 PC17 PC18 PC19 PC20 PC21 ## Standard deviation 0.08413 0.07641 0.07166 0.05890 0.05077 0.03845 0.03744 ## Proportion of Variance 0.00295 0.00243 0.00214 0.00145 0.00107 0.00062 0.00058 ## Cumulative Proportion 0.98975 0.99218 0.99432 0.99577 0.99685 0.99746 0.99805 ## PC22 PC23 PC24 PC25 PC26 PC27 PC28 ## Standard deviation 0.03216 0.02974 0.02461 0.02256 0.01880 0.01789 0.01682 ## Proportion of Variance 0.00043 0.00037 0.00025 0.00021 0.00015 0.00013 0.00012 ## Cumulative Proportion 0.99848 0.99884 0.99910 0.99931 0.99946 0.99959 0.99971 ## PC29 PC30 PC31 PC32 PC33 PC34 ## Standard deviation 0.01469 0.01358 0.01061 0.007838 0.006974 0.005382 ## Proportion of Variance 0.00009 0.00008 0.00005 0.000030 0.000020 0.000010 ## Cumulative Proportion 0.99980 0.99987 0.99992 0.999950 0.999970 0.999980 ## PC35 PC36 PC37 PC38 PC39 ## Standard deviation 0.004322 0.003937 0.003217 0.002008 7.634e-17 ## Proportion of Variance 0.000010 0.000010 0.000000 0.000000 0.000e+00 ## Cumulative Proportion 0.999990 0.999990 1.000000 1.000000 1.000e+00 ``` --- .pull-left[ # PCA on Frequencies Just like working on raw data, but coalescing all the individuals into single populations defined by allele frquency matrices. ```r predict( fit.pca_freq ) %>% data.frame() %>% mutate( Population = freqs$Stratum ) %>% ggplot( aes(PC1,PC2) ) + geom_text( aes(label=Population)) ``` ] .pull-right[ <p> </p> <img src="slides_files/figure-html/unnamed-chunk-19-1.png" width="504" style="display: block; margin: auto;" /> ] --- .pull-left[ # Detailed Visulizations ```r library( factoextra ) fviz_pca_biplot( fit.pca_freq ) ``` ] .pull-right[ <p> </p> <img src="slides_files/figure-html/unnamed-chunk-21-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Principal Coordinate Analysis Like PCA but using distance matrices instead of raw data. ```r D.Euc <- genetic_distance(arapat, mode="Euclidean") dim(D.Euc) ``` ``` ## [1] 39 39 ``` ```r fit.gendist <- prcomp( D.Euc, center = TRUE) ``` --- ```r summary( fit.gendist ) ``` ``` ## Importance of components: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 ## Standard deviation 3.4963 2.3244 1.38995 0.76870 0.62286 0.5129 0.4473 ## Proportion of Variance 0.5622 0.2485 0.08884 0.02717 0.01784 0.0121 0.0092 ## Cumulative Proportion 0.5622 0.8106 0.89946 0.92664 0.94448 0.9566 0.9658 ## PC8 PC9 PC10 PC11 PC12 PC13 PC14 ## Standard deviation 0.39332 0.31379 0.26270 0.23524 0.20290 0.1976 0.18482 ## Proportion of Variance 0.00711 0.00453 0.00317 0.00254 0.00189 0.0018 0.00157 ## Cumulative Proportion 0.97289 0.97742 0.98059 0.98314 0.98503 0.9868 0.98839 ## PC15 PC16 PC17 PC18 PC19 PC20 PC21 ## Standard deviation 0.18292 0.16247 0.14794 0.14137 0.13605 0.12182 0.11651 ## Proportion of Variance 0.00154 0.00121 0.00101 0.00092 0.00085 0.00068 0.00062 ## Cumulative Proportion 0.98993 0.99115 0.99215 0.99307 0.99392 0.99461 0.99523 ## PC22 PC23 PC24 PC25 PC26 PC27 PC28 ## Standard deviation 0.11066 0.1039 0.10234 0.09489 0.08724 0.08436 0.07748 ## Proportion of Variance 0.00056 0.0005 0.00048 0.00041 0.00035 0.00033 0.00028 ## Cumulative Proportion 0.99579 0.9963 0.99677 0.99719 0.99754 0.99786 0.99814 ## PC29 PC30 PC31 PC32 PC33 PC34 PC35 ## Standard deviation 0.07707 0.07387 0.06873 0.06740 0.06523 0.06105 0.05838 ## Proportion of Variance 0.00027 0.00025 0.00022 0.00021 0.00020 0.00017 0.00016 ## Cumulative Proportion 0.99841 0.99866 0.99888 0.99909 0.99929 0.99946 0.99961 ## PC36 PC37 PC38 PC39 ## Standard deviation 0.05684 0.05523 0.04602 2.627e-16 ## Proportion of Variance 0.00015 0.00014 0.00010 0.000e+00 ## Cumulative Proportion 0.99976 0.99990 1.00000 1.000e+00 ``` --- <img src="slides_files/figure-html/unnamed-chunk-24-1.png" width="504" style="display: block; margin: auto;" /> --- class: inverse, middle background-image: url("background.png") background-position: right background-size: auto # .orange[Hierarchical Clustering] ### .fancy[Stand next to your buddy.] --- # Clustering .pull-left[ A technique to build a representation of similarity between objects. - Supervised - Unsupervised - Individual or Group Based ] .pull-right[ ![From www.nature.com/articles/s41467-020-20507-3](https://live.staticflickr.com/65535/51734955871_9c76562bd3_w_d.jpg) ] --- ![Help File for hclust](https://live.staticflickr.com/65535/51735192018_1444d8d533_o_d.png) --- # Visualizing From Distance Views Requires that the `matrix` objects actually be turned into `dist` objects (which are `matrix` objects with constraints). ```r dist( D.Euc[1:7,1:7] ) ``` ``` ## 101 102 12 153 156 157 ## 102 2.048994 ## 12 3.972442 4.342952 ## 153 4.099369 4.364062 1.860651 ## 156 4.727214 4.754565 4.901142 4.871141 ## 157 4.541334 4.629884 4.510097 4.532973 1.073274 ## 159 3.733735 4.047019 2.537027 3.302282 4.434527 4.121070 ``` --- # Visualizing From Distance Views .pull-left[ ```r d <- dist( D.Euc ) h <- hclust( d ) h ``` ``` ## ## Call: ## hclust(d = d) ## ## Cluster method : complete ## Distance : euclidean ## Number of objects: 39 ``` ] -- .pull-right[ ```r plot(h) ``` <img src="slides_files/figure-html/unnamed-chunk-27-1.png" width="504" style="display: block; margin: auto;" /> ] --- # Interactive Plots ```r library( networkD3 ) dendroNetwork( h, height=400, zoom=TRUE,textColour = c("red","green","orange","blue")[cutree(h,4)]) ```
--- class: middle background-position: right background-size: auto .center[ # Questions? ![Peter Sellers](https://live.staticflickr.com/65535/50382906427_2845eb1861_o_d.gif+) ] <p> </p> .bottom[ If you have any questions for about the content presented herein, please feel free to [submit them to me](mailto://rjdyer@vcu.edu) and I'll get back to you as soon as possible.]