-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgrazing-gr-SKH.Rmd
2871 lines (2436 loc) · 131 KB
/
grazing-gr-SKH.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Gorda Ridge data analysis"
author: "Sarah K. Hu"
date: "01/02/2021"
output:
html_document:
number_sections: true
theme: cosmo
highlight: kate
collapsed: false
toc: true
toc_depth: 4
toc_float: true
pdf_document:
toc: true
toc_depth: '4'
knit: (function(input_file, encoding) {
out_dir <- 'docs';
rmarkdown::render(input_file,
encoding=encoding,
output_file=file.path(dirname(input_file), out_dir, 'index.html'))})
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE, tidy = TRUE)
```
**Protistan grazing impacts microbial communities and carbon cycling at deep-sea hydrothermal vents**
# Overview
Hu, S.K., Herrera, E., Smith, A., Pachiadaki, M.G., Edgcomb, V.P., Sylva, S.P., Chan, E.W., Seewald, J.S., German, C.R., & Huber, J.A. (In press) Protistan grazing impacts microbial communities and carbon cycling at deep-sea hydrothermal vents. *PNAS*
## Contents
Code for all data analysis and figure generation, including grazing experiment analysis and sequence data processing.
- Import raw counts from FLP disappearance experiments
- Perform calculations to estimate grazing pressure
- Generate figures to visualize grazing pressure
- Import and quality control 18S and 16S tag-sequencing data
- Taxonomy curation
- Statistical analyses
- Figure generation
## Summary of work
*Abstract*: Microbial eukaryotes (or protists) in marine ecosystems are a link between microbial primary producers and all higher trophic levels. The rate at which heterotrophic protistan grazers consume microbial prey and recycle organic matter is an important component of marine microbial food webs and carbon cycling. At deep-sea hydrothermal vents, chemosynthetic bacteria and archaea form the basis of a food web in the absence of sunlight, but the role of protistan grazers in these highly productive ecosystems is largely unexplored. We investigated protistan activity, in the context of hydrothermal vent food web dynamics, in low-temperature venting fluids from Gorda Ridge in the North East (NE) Pacific Ocean. In this study we:
- Conducted grazing incubations with hydrothermal vent fluid to quantify protistan predation pressure\
- Collected samples from *in situ* and grazing experiments for genetic analysis to characterize the composition of vent-associated protists.
# Set up working R environment
The following analysis was performed in **R version 3.6.1**. All input files are available on the [Gorda Ridge GitHub repo](https://github.com/shu251/protist-gordaridge-2021). Data used in analysis is available in `/data-input/`. ASV tables generated using QIIME2 (with DADA2); raw sequences are publicly available under [BioProject PRJNA637089](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA637089).
```{r Load libraries, message=FALSE}
library(tidyverse);library(reshape2);library(cowplot)
library(phyloseq);library(decontam)
```
# Calculations for grazing experiments
Import metadata for each site sampled, cell count data, and details for each grazing experiment. Count data includes concentration of bacteria and archaea (cells per ml) and FLP per time point (cells per ml) for replicates in the grazing experiments.
```{r Import grazing data and metadata}
# Metadata for each grazing experiment
# Including dive ID, vent/site name, incubation parameters
exp_list <- read.table("data-input/Table1_grazingexp_list.txt",header=T, fill=T, sep="\t")
# Import all cell count information from FLP disappearance experiments
counts <- read.csv("data-input/GordaRidge-cell-count-results.csv")
counts_df <- counts %>%
separate(Site, c("SampleOrigin", "SampleNumber", "Stain"), "-", remove = FALSE) %>%
separate(ID, c("TimePoint", "Bottle", "Replicate"), "-", remove = FALSE) %>%
add_column(excess = "NA108") %>%
unite(Sample.ID, excess, SampleNumber, sep = "-") %>%
data.frame
# Import prok counts
prok<-read.table("data-input/prok_counts.txt",header=T, fill=T, sep="\t")
```
*Correction for sample name*
> Sample Plume003, is more appropriately considered near vent bottom water. Modify sample name and entry below.
```{r Update sample name schematic - grazing}
exp_IDs_mod <- exp_list %>%
type.convert(as.is = TRUE) %>%
mutate(Vent.name = case_when(
Sample.ID == "NA108-001" ~ "Near vent BW",
TRUE ~ `Vent.name`),
Sample.Location = case_when(
Sample.ID == "NA108-001" ~ "BW",
TRUE ~ `Sample.Location`)) %>%
mutate(Vent.name = case_when(
Sample.Location == "Plume" ~ "Plume",
TRUE ~ Vent.name))
counts_df_mod <- counts_df %>%
mutate(SampleOrigin = case_when(
Sample.ID == "NA108-001" ~ "BW",
TRUE ~ SampleOrigin)) %>%
select(-Site)
prok_mod <- prok %>%
type.convert(as.is = TRUE) %>%
mutate(Vent.name = case_when(
Specific_Site == "Plume001" ~ "Near vent BW",
TRUE ~ `Vent.name`),
Sample.location = case_when(
Specific_Site == "Plume001" ~ "BW",
TRUE ~ `Sample.location`)) %>%
mutate(Specific_Site = case_when(
Sample.location == "Plume" ~ "Plume",
TRUE ~ Specific_Site),
Vent.name = case_when(
Sample.location == "Plume" ~ "Plume",
TRUE ~ `Vent.name`))
```
```{r Join count data with experiment information}
# Join count data with experiment IDs:
counts_df_ids <- counts_df_mod %>%
left_join(exp_IDs_mod, by = c("SampleOrigin" = "Sample.Location", "Sample.ID" = "Sample.ID")) %>%
unite(Sample, TimePoint, Bottle, sep = "_", remove = FALSE) %>%
data.frame
```
## Calculate error rate
Counts from Vent sample 110 T0 control were repeated three times (3 separate slides). Results are used below as technical replicates to estimate the percentage error rate. Calculating error rate is necessary for determining when FLP cells per ml were significantly different from one another.
> By determining error rate from microscopy counting we can be more confident in evaluating true differences in values.
```{r Determine error rate, message=FALSE}
# This is the % max and min that we will consider to be a margin of error
tech_check <- counts_df_ids %>%
filter(Sample.ID %in% "NA108-110" &
TimePoint %in% "T0" &
Bottle %in% "Ctrl" &
!(Replicate %in% "R2")) %>%
group_by(SampleOrigin, Sample.ID) %>%
summarise(MEAN = mean(Cellsperml), STDEV = sd(Cellsperml), ERR_PER = (100*(STDEV/MEAN))) %>%
data.frame
PERCENT_ERR <- tech_check[["ERR_PER"]]; PERCENT_ERR # Change in FLP time point to time point must exceed 16%
```
## Estimate average cells/ml
Get average FLP concentration across experiment replicates and average cells/ml from prokaryotic counts.
```{r Average across replicates, message=FALSE}
calc_FLP_avg <- counts_df_ids %>%
group_by(SampleOrigin, Sample.ID, T, Bottle, Vent.name, Sample, Stain, T1, T2) %>%
summarise(Avg_cellmL = mean(Cellsperml), # Average cells per ml across replicates
sem=sd(Cellsperml)/sqrt(length(Cellsperml)), # Standard mean error
SD=sd(Cellsperml), #standard deviation
var=sqrt(SD), # variance
Num = n()) %>% #Total number of
data.frame
```
Separate T0 from other time points to calculate % differences in DTAF counts from T0 to T1 and T0 to T2
```{r Extract t0, message=FALSE}
t0 <- filter(calc_FLP_avg, (T == "T0" & Stain == "DTAF")) %>%
select(-T1, -T2, -Stain, -Num, -T, -Sample, -SD, -var, Avg_cellmL_T0 = Avg_cellmL, sem_T0 = sem) %>%
data.frame
# Isolate non-T0 time points
t_ex <- filter(calc_FLP_avg, (!(T == "T0") & Stain == "DTAF")) %>%
select(-Stain, -Num, -Sample, -SD, -var) %>%
pivot_wider(names_from = T, values_from = c(Avg_cellmL, sem)) %>%
data.frame
bac_exp <- calc_FLP_avg %>%
filter(Stain %in% "DAPI") %>%
select(-Bottle, -Stain, -T1, -T2, -SD, -var, -Num, bac_cellmL = Avg_cellmL, bac_sem = sem) %>%
unite(SAMPLE, SampleOrigin, Vent.name, sep = "-", remove = FALSE) %>%
data.frame
dapi<-as.character(unique(bac_exp$SAMPLE))
prok_avg <- prok_mod %>%
group_by(Sample.location, Vent.name) %>%
summarise(prok_avg = mean(Prok_count)) %>%
unite(SAMPLE, Sample.location, Vent.name, sep = "-", remove = FALSE) %>%
data.frame
colnames(t0)
```
Calculate percent difference between T0 and T1, and T1 and T2. These will be compared to the percent error rate.
```{r Percent difference from T0 and timepoints, message=FALSE}
flp_exp_summary <- t0 %>%
left_join(t_ex) %>%
unite(SAMPLE, SampleOrigin, Vent.name, sep = "-", remove = FALSE) %>%
left_join(prok_avg) %>%
mutate(T0_T1_PercDiff = 100*(abs(Avg_cellmL_T1-Avg_cellmL_T0)/Avg_cellmL_T0),
T0_T2_PercDiff = 100*(abs(Avg_cellmL_T2-Avg_cellmL_T0)/Avg_cellmL_T0)) %>%
data.frame
```
### Find significant differences
Above data frame created lists the T0 FLP concentration and the T1 and T2 separately. The difference between T0 and T1 or T0 and T2 must exceed the percent error rate to be considered a reliable difference.
```{r Report percent error}
# Compare to those that exceed error rate
PERCENT_ERR
```
Generate data frame with timepoints to consider for downstream calculations.
```{r Extract significant time points, message=FALSE}
cells_long <- flp_exp_summary %>%
select(SAMPLE, Bottle, Vent.name, Avg_cellmL_T0, Avg_cellmL_T1, Avg_cellmL_T2, T1, T2) %>%
pivot_longer(cols = starts_with("Avg_cellmL"), names_to = "CountID", values_to = "cellmL") %>%
separate(CountID, c("avg", "excess", "Tx"), sep = "_", remove = FALSE) %>%
select(-avg, -excess) %>%
data.frame
sem_long <- flp_exp_summary %>%
select(SAMPLE, Bottle, Vent.name, sem_T0, sem_T1, sem_T2) %>%
pivot_longer(cols = starts_with("sem"), names_to = "semID", values_to = "sem") %>%
separate(semID, c("excess", "Tx"), sep = "_", remove = FALSE) %>%
select(-excess) %>%
data.frame
# Combine and fix Timepoint
flp_long_toplot <- cells_long %>%
left_join(sem_long) %>%
select(-semID) %>%
add_column(Hrs = 0) %>%
mutate(Hrs = case_when(
Tx == "T1" ~ T1,
Tx == "T2" ~ T2,
TRUE ~ (as.integer(.$Hrs)))) %>%
select(-T1, -T2) %>%
data.frame
# head(flp_long_toplot)
```
## Plot FLP loss: all time points
Supplementary plot showing loss in FLP over all time points. Shaded area reports percent error rate from T0.
```{r Plot average cells/ml with error range}
# Factor for plotting
sample_order <- c("Near vent BW","Mt Edwards","Venti latte","Candelabra","SirVentsalot")
sample_label <- c("Near vent BW","Mt. Edwards","Venti latte","Candelabra","Sir Ventsalot")
sample_color <-c("#6f88af","#61ac86","#711518","#dfa837","#ce536b")
flp_long_toplot$SAMPLE_ORDER <- factor(flp_long_toplot$Vent.name, levels = (sample_order), labels = sample_label)
names(sample_color) <- sample_label
bottle_order <- c("Ctrl", "Exp")
flp_long_toplot$BOTTLE <- factor(flp_long_toplot$Bottle, levels = bottle_order, labels = c("Control", "Experimental"))
```
Generate plot with all time points.
```{r Generate plot, fig.width=7, fig.height=6}
# svg("figs/Supplementary-FLP-CTRL-PercError-plot.svg", w = 7, h = 6)
ggplot(flp_long_toplot, aes(x = Hrs, y = cellmL, fill = SAMPLE_ORDER)) +
geom_rect(data = (subset(flp_long_toplot, Tx %in% "T0")), aes(xmin=0, xmax=40,
ymin=(cellmL - ((PERCENT_ERR/100)*cellmL)),
ymax=(cellmL + ((PERCENT_ERR/100)*cellmL))), color=NA,alpha=0.3) +
geom_line(stat = "identity", linetype = 1, aes(group = SAMPLE)) +
geom_errorbar(aes(ymin = (cellmL - sem), ymax = (cellmL + sem)), width = 0.1) +
geom_point(stat = "identity", size = 3, color = "black", aes(fill = SAMPLE_ORDER, shape = SAMPLE_ORDER)) +
scale_y_log10() +
scale_fill_manual(values = sample_color) +
scale_shape_manual(values = c(23,21,21,21,21)) +
labs(y = bquote("FLP cells "~mL^-1), x = "Incubation hours") +
facet_grid(SAMPLE_ORDER~BOTTLE, scales = "free") +
theme_bw() +
theme(panel.grid.minor = element_blank(),
legend.title = element_blank(),
strip.text.x = element_text(face = "bold", color = "black", hjust = 0, size = 10),
strip.text.y = element_text(size = 10),
strip.background = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
axis.line = element_line(colour = "black"),
axis.text = element_text(color = "black", size = 9))
# dev.off()
```
## Plot FLP loss: significant only
Subset FLP results to select time points with significant loss in FLP/
```{r Subset significant time points}
flp_sig <- flp_exp_summary %>%
filter(Bottle %in% "Exp") %>%
select(-T1, -T2) %>%
mutate(T1_sig = case_when(
T0_T1_PercDiff > PERCENT_ERR ~ "exceeds"),
T2_sig = case_when(T0_T2_PercDiff > PERCENT_ERR ~ "exceeds")
) %>%
data.frame
# Select experiments that T1 exceeds percent difference
T1_tmp <- flp_sig %>%
filter(T1_sig == "exceeds") %>%
select(SAMPLE) %>%
data.frame
T1_tmp$Tx = "T1"
T1_tmp$Keep = "yes"
# Select experiments that T1 was NA, but T2 was significant
T2_tmp <- flp_sig %>%
filter(is.na(T1_sig) & T2_sig == "exceeds") %>%
select(SAMPLE) %>%
data.frame
T2_tmp$Tx = "T2"
T2_tmp$Keep = "yes"
keep_status <- rbind(T1_tmp, T2_tmp); #keep_status
# # KEPT:
# # near vent point T2, Candelabra T2
# # Mt Edwards time point T1, Sirventsalot T1, & venti latte T1
```
```{r Join significant experiments, echo=FALSE, message=FALSE}
# Join experiments that are significant and filter data to keep.
t <- c("T1", "T2")
flp_trend_sig <- flp_long_toplot %>%
filter(Bottle == "Exp") %>%
left_join(keep_status) %>%
filter(Tx == "T0" | (Tx %in% t & Keep == "yes")) %>%
separate(SAMPLE, c("SampleOrigin", "excess"), sep = "-", remove = FALSE) %>%
select(-Keep, -excess) %>%
data.frame
# flp_trend_sig
# Also generate a control table for this
flp_trend_sig_ctrls <- flp_long_toplot%>%
left_join(keep_status) %>%
filter(Tx == "T0" | (Tx %in% t & Keep == "yes")) %>%
separate(SAMPLE, c("SampleOrigin", "excess"), sep = "-", remove = FALSE) %>%
select(-Keep, -excess) %>%
filter(Bottle != "Exp") %>%
data.frame
# View(flp_trend_sig_ctrls)
```
> These values are used for all downstream grazing rate calculations, as the loss in FLP was found to exceed the microscopy count error percentage.
There was one exception to the above, where the Candelabra experiment T2 exceeded the error rate, and the control FLP also exceeded the error rate. Due to the change between T0 T1 and T2 in the control Candelabra experiment, this was likely an imprecise collection of T0 (poor mixing of control treatment bottle). Below, the average T0 FLP concentration in the control treatments (excluding Candelabra) was determined, so the T0 value for Candelabra could be corrected.
```{r Examine % loss in controls overall}
flp_ctrl_trend <- flp_trend_sig_ctrls %>%
mutate(TimePoint = case_when(
Tx == "T0" ~ "T0",
Tx == "T2" ~ "TF",
Tx == "T1" ~ "TF"
)) %>%
select(-Tx, -Bottle, -SampleOrigin, -CountID, -BOTTLE, -SAMPLE, -SAMPLE_ORDER) %>%
pivot_wider(names_from = TimePoint, names_sep = "_", values_from = c(cellmL, sem, Hrs)) %>%
mutate(Perc_loss = 100*((cellmL_T0-cellmL_TF)/cellmL_T0))
flp_ctrl_trend_vent <- flp_ctrl_trend %>%
filter(Vent.name != "Near vent BW") %>%
filter(Vent.name != "Candelabra")
# Extract average set of controls
flp_controls_t0_avg <- mean((flp_ctrl_trend_vent$cellmL_T0))
# flp_controls_t0_avg
```
Create Figure S3 with corrected T0 FLP concentration.
```{r generate Figure S3 corrected, fig.width=7, fig.height=6}
# head(flp_long_toplot %>% filter(Vent.name == "Candelabra"))
tmp <- flp_long_toplot %>%
filter(Vent.name == "Candelabra" & Bottle == "Ctrl") %>%
mutate(cellmL = case_when(
(Vent.name == "Candelabra" & Bottle == "Ctrl" & Tx == "T0") ~ flp_controls_t0_avg,
TRUE ~ cellmL
)) %>% add_column(corr = "corrected")
flp_long_to_plot_corr <- flp_long_toplot %>%
add_column(corr = "uncorrected") %>%
rbind(flp_long_toplot %>%
filter(Vent.name == "Candelabra" & Bottle == "Ctrl") %>%
mutate(cellmL = case_when(
(Vent.name == "Candelabra" & Bottle == "Ctrl" & Tx == "T0") ~ flp_controls_t0_avg,
TRUE ~ cellmL
)) %>% add_column(corr = "corrected"))
# svg("corrected_flp_FigS3.svg", w=7, h = 6)
ggplot(flp_long_to_plot_corr, aes(x = Hrs, y = cellmL, fill = SAMPLE_ORDER)) +
geom_rect(data = (subset(flp_long_to_plot_corr, Tx %in% "T0")), aes(xmin=0, xmax=40,
ymin=(cellmL - ((PERCENT_ERR/100)*cellmL)),
ymax=(cellmL + ((PERCENT_ERR/100)*cellmL))), color=NA,alpha=0.3) +
geom_line(stat = "identity", aes(group = SAMPLE, linetype = corr)) +
geom_errorbar(aes(ymin = (cellmL - sem), ymax = (cellmL + sem)), width = 0.1) +
geom_point(stat = "identity", size = 3, color = "black", aes(fill = SAMPLE_ORDER, shape = SAMPLE_ORDER)) +
scale_y_log10() +
scale_fill_manual(values = sample_color) +
scale_shape_manual(values = c(23,21,21,21,21)) +
scale_linetype_manual(values = c(1,1)) +
labs(y = bquote("FLP cells "~mL^-1), x = "Incubation hours") +
facet_grid(SAMPLE_ORDER~BOTTLE, scales = "free") +
theme_bw() +
theme(panel.grid.minor = element_blank(),
legend.title = element_blank(),
strip.text.x = element_text(face = "bold", color = "black", hjust = 0, size = 10),
strip.text.y = element_text(size = 10),
strip.background = element_blank(),
panel.background = element_blank(),
panel.border = element_blank(),
axis.line = element_line(colour = "black"),
axis.text = element_text(color = "black", size = 9))
# dev.off()
```
Generate plot that reports the FLP loss in the significant experimental treatments.
```{r Plot significant FLP loss, fig.width=5, fig.height=3}
# Factor for plotting
## use characterise lists from above
flp_trend_sig$SAMPLE_ORDER <- factor(flp_trend_sig$Vent.name, levels = (sample_order), labels = sample_label)
plot_graze_trends <- ggplot(flp_trend_sig,
aes(x = Hrs, y = cellmL, fill = SAMPLE_ORDER, shape = SampleOrigin)) +
geom_line(stat="identity", aes(group = SAMPLE_ORDER, linetype = SampleOrigin)) +
geom_errorbar(aes(ymin = (cellmL-sem), ymax = (cellmL+sem)), size = 0.5, width = 0.1)+
geom_point(stat="identity", size=3, color="black") +
scale_linetype_manual(values = c(1, 1))+
scale_fill_manual(values = sample_color)+
scale_shape_manual(values = c(23,21))+
scale_y_log10(limits = c(5e3,1e5))+
labs(y = bquote("FLP cells "~mL^-1), x = "Incubation hours")+
theme_minimal()+
theme(panel.grid.major = element_line(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
axis.text=element_text(color="black"),
legend.title = element_blank()) +
guides(fill = guide_legend(override.aes = list(shape = c(23,21,21,21,21))),
shape = guide_legend(override.aes = list(fill = "black"))) +
annotation_logticks(sides = "l")
# plot_graze_trends
```
## Calculate mortality & grazing rate
**See reference**: Salat, J. and Marrasé, C. (1994) Exponential and linear estimations of grazing on bacteria: effects of changes in the proportion of marked cells. Mar Ecol Prog Ser 104: 205--209.
```{r Change time point label structure, as each experiments lists T0 or TF, message=FALSE}
processed_data <- flp_trend_sig %>%
type.convert(as.is = TRUE) %>%
mutate(TimePoint = case_when(Tx == "T0" ~ "T0",
Tx != "T0" ~ "Tf")) %>%
select(-Tx, -CountID) %>%
pivot_wider(names_from = TimePoint, values_from = c(cellmL, sem, Hrs)) %>%
select(-Hrs_T0) %>%
left_join(prok_avg) %>%
data.frame
```
Perform all calculations
```{r Grazing rate calculation}
# cellmL = prokaryote average cells per ml
graze_rate <- processed_data %>%
# type.convert(as.is = TRUE) %>%
group_by(SAMPLE, SampleOrigin, Vent.name, Hrs_Tf, SAMPLE_ORDER) %>%
mutate(
# Calculate mortality factor (m)
MORTALITY = (log(cellmL_Tf/cellmL_T0))*(-1/(Hrs_Tf/24)),
MORTALITY_min = (log((cellmL_Tf-sem_Tf)/(cellmL_T0-sem_T0)))*(-1/(Hrs_Tf/24)),
MORTALITY_max = (log((cellmL_Tf+sem_Tf)/(cellmL_T0+sem_T0)))*(-1/(Hrs_Tf/24)),
# Calculate model I G - Rate over given amount of time
G = ((cellmL_T0 - cellmL_Tf) * (prok_avg / cellmL_T0)),
G_min = (((cellmL_T0-sem_T0) - (cellmL_Tf-sem_Tf)) * (prok_avg / (cellmL_T0-sem_T0))),
G_max = (((cellmL_T0+sem_T0) - (cellmL_Tf+sem_Tf)) * (prok_avg / (cellmL_T0+sem_T0))),
# Calculate Grazing per hour
GrazingRate_hr = (G/Hrs_Tf),
GrazingRate_hr_min = (G_min/Hrs_Tf),
GrazingRate_hr_max = (G_max/Hrs_Tf),
# Estimate prokaryote turnover % per day
Prok_turnover = (100*(G / prok_avg)), #Convert to per day (*24)
Prok_turnover_min = (100*(G_min / prok_avg)),
Prok_turnover_max = (100*(G_max / prok_avg)),
# Prok_turnover = (100*((rate * cellmL)/cellmL)), #ARCHIVE
# Prok_turnover_min = (100*((rate_min * cellmL)/cellmL)), #ARCHIVE
# Prok_turnover_max = (100*((rate_max * cellmL)/cellmL)) #ARCHIVE
# Model II
N_avg = ((prok_avg + prok_avg)/2),
F_avg = ((cellmL_T0 + cellmL_Tf)/2),
q = ((cellmL_T0 - cellmL_Tf)/F_avg),
# G_II a and b should be equivalent
G_II_a = q * (N_avg),
G_II_b = ((cellmL_T0 - cellmL_Tf) * ((prok_avg+prok_avg)/(cellmL_T0+cellmL_Tf))),
GrazingRate_hr_II = (G_II_a/Hrs_Tf)
) %>%
data.frame
# graze_rate
```
Plot grazing rate for each site.
```{r Plot grazing rate, fig.width=4, fig.height=3}
# Factor for plotting
sample_order <- c("Near vent BW","Mt Edwards","Venti latte","Candelabra","SirVentsalot")
sample_label <- c("Near vent BW","Mt. Edwards","Venti latte","Candelabra","Sir Ventsalot")
sample_color <-c("#6f88af","#61ac86","#711518","#dfa837","#ce536b")
graze_rate$SAMPLE_ORDER <- factor(graze_rate$Vent.name, levels = (sample_order), labels = (sample_label))
mortality <- ggplot(graze_rate, aes(y = SAMPLE_ORDER, x = GrazingRate_hr, fill = SAMPLE_ORDER, shape = SampleOrigin)) +
geom_errorbar(aes(xmin = GrazingRate_hr_min, xmax = GrazingRate_hr_max), size = 0.5, width = 0.1) +
geom_point(stat = "identity", size = 3, color = "black", aes(shape = SampleOrigin)) +
scale_fill_manual(values = (sample_color)) +
scale_shape_manual(values = c(23,21)) +
coord_flip() +
labs(y = "", x = bquote("Cells "~mL^-1 ~hr^-1)) +
theme_minimal() +
theme(panel.grid.major = element_line(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
axis.text.x = element_text(color="black", angle = 45, hjust = 1, vjust = 1),
axis.ticks = element_line(),
axis.text.y = element_text(color="black"),
legend.position = "none", strip.text =element_blank())
# mortality
```
Plot daily prokaryote turnover percentage.
```{r Plot percentage turnover, fig.width=4, fig.height=3}
bar_plot <- ggplot(graze_rate, aes(x = SAMPLE_ORDER, y = Prok_turnover)) +
geom_bar(stat = "identity", position = "stack", width = 0.6, aes(fill = SAMPLE_ORDER)) +
geom_errorbar(aes(ymin = Prok_turnover_min, ymax = Prok_turnover_max), size = 0.5, width = 0.1) +
scale_fill_manual(values = (sample_color)) +
scale_y_continuous(expand = c(0,0), limits = c(0,100)) +
labs(x = "", y = bquote("Prokaryote turnover %"~d^-1)) +
# coord_flip() +
theme_minimal() +
theme(panel.grid.major = element_line(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
axis.text.y=element_text(color="black"), axis.ticks = element_line(),
axis.text.x = element_text(color="black", angle = 45, hjust = 1, vjust = 1),
legend.position = "none", strip.text =element_blank())
bar_plot
```
## Compile figures for grazing results
Generate Figure 1.
```{r Generate panel plot, fig.width=3, fig.height=9}
# svg("figs/Grazing-results-panel-VERT-27-06-2021.svg", h = 9, w = 3)
plot_grid(plot_graze_trends + theme(legend.position = "none"),
# mortality,
# bar_plot,
mortality + theme(axis.text.x = element_blank()),
bar_plot,
axis = c("lrtb"), align = c("hv"), labels = c("A", "B", "B"), nrow = 3, ncol = 1)
# dev.off()
```
## Estimate carbon calculations
Continue calculations to place into context with McNichol et al. work. Consider Morono et al. 2011 value for fg C per prokaryote cell.
```{r Calculate carbon biomass}
# G = number of cells grazed during experiment duration
graze_rate_wCarbon <- graze_rate %>%
add_column(fgC_cell_morono = 86) %>% # Add in Morono et al. 2011 value
add_column(fgC_cell_mcnic = 173) %>%
mutate(
cells_consumed_perday = (G / (Hrs_Tf /24)), # Rate of cells consumed * in situ prok, per day (day = hours of incubation reported in days)
fgC_ml_perday_morono = (cells_consumed_perday * fgC_cell_morono),
fgC_ml_perday_mcnic = (cells_consumed_perday * fgC_cell_mcnic),# Convert cell amount to fg C
ugC_L_perday_morono = (fgC_ml_perday_morono * (1e-09) * 1000), # Convert to ug C per L
ugC_L_perday_mcnic = (fgC_ml_perday_mcnic * (1e-09) * 1000),
lower_mcnichol_morono = 100*(ugC_L_perday_morono / 17.3),
upper_mcnichol_morono = 100*(ugC_L_perday_morono / 321.4),
lower_mcnichol_mcnic = 100*(ugC_L_perday_mcnic / 17.3),
upper_mcnichol_mcnic = 100*(ugC_L_perday_mcnic / 321.4)
) %>%
data.frame
# write_delim(graze_rate_wCarbon, path = "Grazing-calc-wCarbon-results.txt", delim = "\t")
```
# Process 18S rRNA gene amplicons
Set up working R environment and import 18S ASV table. Modify input tables and import as phyloseq objects in order to perform quality control removal of contaminant ASVs (*decontam*).
```{r Import and set up env, message=FALSE, warning=FALSE}
load("data-input/GR-ASVtables-updatedTax.RData", verbose = TRUE)
```
## Clean ASV table with 'decontam'
Import ASV table as phyloseq object, note control samples. Control samples derived from lab and shipboard milliQ water samples.
```{r extract taxonomic info, warning=FALSE}
taxmat <- GR_tagseq_wideformat %>%
select(Feature.ID, Taxon_updated) %>%
separate(Taxon_updated, c("Kingdom","Supergroup","Division","Class","Order","Family","Genus","Species"), sep = ";", remove = FALSE) %>%
column_to_rownames(var = "Feature.ID") %>%
as.matrix
# class(taxmat)
# head(taxmat)
```
Note that Axial ID originates from a laboratory blank sample that was extracted at the same time.
```{r extract ASV count information}
asvmat <- GR_tagseq_wideformat %>%
select(Feature.ID, starts_with(c("Gorda", "Axial"))) %>%
column_to_rownames(var = "Feature.ID") %>%
as.matrix
```
Import metadata below and combine with phyloseq object.
```{r Import as phyloseq object, message=FALSE, echo=FALSE, warning=FALSE}
row.names(taxmat)<-row.names(asvmat)
# class(asvmat);class(taxmat)
ASV = otu_table(asvmat, taxa_are_rows = TRUE) #phyloseq command
TAX = tax_table(taxmat)
physeq <- phyloseq(ASV, TAX)
# physeq #Phyloseq object
# Include additional sample names
samplenames <- as.data.frame(colnames(asvmat))
# samplenames; head(asvtab)
colnames(samplenames)[1]<-"SAMPLE"
# Import metadata
ventnames <- read.delim("data-input/ventnames-gordaridge.txt")
# names(ventnames);head(ventnames)
# View(ventnames)
colnames(ventnames)[1]<-"SAMPLE"
head(ventnames)
#
samplenames_1 <- left_join(samplenames, ventnames)
row.names(samplenames_1)<-sample_names(physeq)
samplenames_1 <- samplenames_1 %>% unite(LocationName_Sampletype, LocationName, Sampletype, sep = " ", remove = FALSE)
# Convert to phyloseq object
sampledata <- sample_data(samplenames_1)
# Merge with other data
physeq_names = merge_phyloseq(physeq, sampledata)
# physeq_names
# sample_data(physeq_names)
```
### Identify contaminant ASVs
*Decontam* will identify putative contaminate ASVs based on the difference in prevalence between control blank and environmental samples. First review the library size or number of sequences within each sample to see how varied the control samples are to the experimental samples.
```{r check library size, fig.height=3, fig.width=5}
# Decontam:
physeq_names
# Check out library size of my data
df <- as.data.frame(sample_data(physeq_names))
df$LibrarySize <- sample_sums(physeq_names)
df <- df[order(df$LibrarySize),]
df$Index <- seq(nrow(df))
#
ggplot(data=df, aes(x=Index, y=LibrarySize, fill=Sample_or_Control, shape=LOCATION)) +
geom_point(color="black", size=3, aes(shape=LOCATION)) +
scale_shape_manual(values = c(21,22,23)) +
theme_bw()
```
> Shows that out of the 3 ship blanks I have, one of the sames has a pretty large library size, otherwise, control samples have very small library sizes.
```{r ID contaminants}
# Assign negative control designation
sample_data(physeq_names)$is.neg <- sample_data(physeq_names)$Sample_or_Control == "Control Sample"
# ID contaminants using Prevalence information
contamdf.prev <- isContaminant(physeq_names, method="prevalence", neg="is.neg", threshold = 0.5, normalize = TRUE)
table(contamdf.prev$contaminant) # Report number of ASVs IDed as contamintants
```
> 0.5 - this threshold will ID contaminants in all samples that are more prevalent in negative controls than in positive samples. In this study, control samples included 1 lab-based blank and 3 ship-board blanks taken at the time of field study. Results showed 34 ASVs to be considered "contaminants"
```{r incorporate contaminant info into phyloseq}
# Make phyloseq object of presence-absence in negative controls and true samples
## change to presence absence
gr.pa <- transform_sample_counts(physeq_names, function(abund) 1*(abund>0))
# isolate PA of positive and negative samples
gr.pa.neg <- prune_samples(sample_data(gr.pa)$Sample_or_Control == "Control Sample", gr.pa)
gr.pa.pos <- prune_samples(sample_data(gr.pa)$Sample_or_Control == "True Sample", gr.pa)
```
### Remove contaminant ASVs from data
```{r rm contaminants from ASV, message=FALSE, warning=FALSE}
# Subset TRUE contaminants
contams <- subset(contamdf.prev, contaminant == "TRUE")
contams$Feature.ID <- row.names(contams)
# head(contams);dim(contams)
list_of_contams <- as.character(contams$Feature.ID)
#
# Explore taxa IDed as contaminants
taxa_list <- as.data.frame(taxmat)
taxa_list$Feature.ID <- row.names(taxa_list)
taxa_contams <- left_join(contams, taxa_list)
# write_delim(taxa_contams, path = "List-of-contaminant-ASVs.txt", delim = "\t")
# Plot total sequences and which are contaminants
# Remove contaminant and count sequence sums per sample to see which samples had the highest number of contamiant sequences removed.
# After remove contaminants, what % of sequences is removed?
# head(GR_tagseq_counts[1:2,])
GR_tagseq_longformat$CONTAM <- "Pass"
# head(contams[1:2,])
# str(list_of_contams)
GR_tagseq_longformat$CONTAM[GR_tagseq_longformat$Feature.ID %in% list_of_contams] = "Fail"
# head(GR_tagseq_counts[1:2,])
# Make character list of all feature.ids to KEEP:
keep1<- subset(GR_tagseq_longformat, CONTAM %in% "Pass")
# length(unique(keep1$Feature.ID))
keep_asvs <- as.character(unique(keep1$Feature.ID)) #see below
#
passfail <- GR_tagseq_longformat %>%
group_by(SAMPLE, CONTAM) %>%
summarise(SUM_CONTAM = sum(COUNT)) %>%
data.frame
```
Report sequence stats
```{r decontam, message=FALSE, warning=FALSE}
passfail_wID <- left_join(passfail, ventnames, by = "SAMPLE")
```
```{r total sequences originally}
# Original number of reads
sum(GR_tagseq_longformat$COUNT)
# Original number of ASVs
length(unique(GR_tagseq_longformat$Feature.ID))
```
```{r Remove all the control samples}
# unique(GR_tagseq_counts$SAMPLEID)
GR_tagseq_counts_noCTRL <- subset(GR_tagseq_longformat, !(SAMPLEID %in% "CTRL"))
# New total number of sequences
sum(GR_tagseq_counts_noCTRL$COUNT)
```
```{r Remove contaminants}
counts_decont_tmp <- GR_tagseq_longformat %>%
filter(!(Feature.ID %in% list_of_contams))
# Check
length(unique(counts_decont_tmp$Feature.ID)) - length(unique(GR_tagseq_longformat$Feature.ID)) # Confirm 34 lines removed
```
```{r Percent of sequences removed after decontam}
# % of sequences was removed following decontam; this is counting the ship blank samples themselves
100*(1-(sum(counts_decont_tmp$COUNT)/sum(GR_tagseq_counts_noCTRL$COUNT)))
```
```{r Report removal of sequences by sample}
# Breakdown by samples:
passfail_wide <- dcast(passfail, SAMPLE ~ CONTAM)
passfail_wide$PercLossSeq <- paste(100*(passfail_wide$Fail/(passfail_wide$Fail+passfail_wide$Pass)))
```
Also remove sample with too few sequences and the control samples for an R data file.
```{r Remove sample with too few sequences}
# Remove controls
counts_decont <- counts_decont_tmp %>%
filter(!(SAMPLE == "GordaRidge_BSW020_sterivex_2019_REPa")) %>%
filter(!(SAMPLEID %in% "CTRL")) %>%
data.frame
```
### Save output post-decontam
```{r Save output}
# Save as R Data
save(counts_decont, file="data-input/GR-ASV-table-clean.RData")
```
# Characterize Gorda Ridge protistan diversity
Import cleaned ASV data, curate taxonomic assignments specific to protists, create bar plot to demonstrate protistan diversity at Gorda Ridge.
```{r import and format dataframe, message=FALSE, warning=FALSE}
load("data-input/GR-ASV-table-clean.RData", verbose=TRUE) # after decontam clenaing
gr_counts <- counts_decont %>%
filter(COUNT > 0) %>%
separate(Taxon_updated, c("Kingdom","Supergroup","Division","Class","Order","Family","Genus","Species"), sep = ";", remove = FALSE) %>%
data.frame
tax_only_tmp <- gr_counts %>%
select(Taxon_updated, Kingdom,Supergroup,Division,Class,Order,Family,Genus,Species) %>%
distinct() %>%
data.frame
```
Import metadata for all vent sites.
```{r Import metadata for sample names & Join, message=FALSE, warning=FALSE}
ventnames <- read.delim("data-input/ventnames-gordaridge.txt")
colnames(ventnames)[1]<-"SAMPLE"
# Join with dataframe
gr_counts_name <- gr_counts %>%
left_join(select(ventnames, SAMPLE, LOCATION_SPECIFIC, Sampletype, LocationName)) %>%
data.frame
gr_counts_name$LocationName[gr_counts_name$LOCATION == "Shipblank"]="Shipblank"
```
## Taxonomy curation - PR2
Function below **pr2_curate()** is the custom manual curation of the taxonomic assignments from the PR2 database. The function creates new columns with taxonomic information that summarizes the core groups in the dataset.
```{r Revise and explore taxonomy assignment}
pr2_curate <- function(df){
# Add a column
df$Taxa <-"Unassigned-Eukaryote"
df$Taxa[df$Supergroup == "Alveolata"]="Alveolata-Other"
df$Taxa[df$Division == "Ciliophora"]="Alveolata-Ciliates"
df$Taxa[df$Division == "Dinoflagellata"]="Alveolata-Dinoflagellates"
df$Taxa[df$Class == "Syndiniales"] = "Alveolata-Syndiniales"
df$Taxa[df$Class == "Apicomplexa"]="Alveolata-Apicomplexa"
df$Taxa[df$Supergroup == "Hacrobia"]="Hacrobia-Other"
df$Taxa[df$Division == "Cryptophyta"]="Hacrobia-Cryptophyta"
df$Taxa[df$Division == "Haptophyta"]="Hacrobia-Haptophyta"
df$Taxa[df$Supergroup == "Opisthokonta"]="Opisthokonta-Other"
df$Taxa[df$Division == "Fungi"]="Opisthokonta-Fungi"
df$Taxa[df$Division == "Metazoa"]="Opisthokonta-Metazoa"
df$Taxa[df$Supergroup == "Stramenopiles"]="Stramenopiles-Other"
df$Taxa[df$Class == "Bicoecea"]="Stramenopiles-Bicoecea"
df$Taxa[df$Division == "Ochrophyta"]="Stramenopiles-Ochrophyta"
mast <- unique(filter(df, grepl("MAST", Class)) %>% select(Class))
mast_list <- as.character(mast$Class)
df$Taxa[df$Class %in% mast_list]="Stramenopiles-MAST"
df$Taxa[df$Supergroup == "Archaeplastida"]="Archaeplastida-Other"
df$Taxa[df$Division == "Chlorophyta"]="Archaeplastida-Chlorophyta"
df$Taxa[df$Supergroup == "Excavata"]="Excavata"
df$Taxa[df$Supergroup == "Apusozoa"]="Apusozoa"
df$Taxa[df$Supergroup == "Amoebozoa"]="Amoebozoa"
df$Taxa[df$Supergroup == "Rhizaria"]="Rhizaria-Other"
df$Taxa[df$Division == "Cercozoa"]="Rhizaria-Cercozoa"
df$Taxa[df$Division == "Radiolaria"]="Rhizaria-Radiolaria"
return(df)
}
```
Apply PR2 curation to 18S data.
```{r Run taxonomy curation fxn}
gr_counts_wtax <- pr2_curate(gr_counts_name)
```
> Output is the full ASV table with added columns for curated taxonomy. Above also provides a list of the unique taxonomic names assigned.
```{r remove control and blank samples}
gr_counts_wtax_samplesonly <- subset(gr_counts_wtax, !(Sampletype == "control"))
## To average across replicates, modify SUPR sample names
gr_counts_filter <- gr_counts_wtax_samplesonly
gr_counts_filter$SAMPLEID<- sub("SUPRS9", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID<- sub("SUPRS11", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID<- sub("SUPRS10", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID<- sub("SUPRS2", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID<- sub("SUPRS3", "SUPR", gr_counts_filter$SAMPLEID)
gr_counts_filter$SAMPLEID<- sub("SUPRS1", "SUPR", gr_counts_filter$SAMPLEID)
```
### Report sequence stats after curation
```{r Sum of all sequences}
# Sum of all sequences
a <- sum(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% select(COUNT)); a
```
```{r Total ASVs}
# Total ASVs
dim(unique(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% select(Feature.ID)))[1]
```
Percentage of all sequences Unassigned Eukaryote
```{r Percent of all sequences Unassigned}
x <- sum(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Taxon_updated == "Eukaryota") %>% select(COUNT))
100*(x/a)
```
Total ASVs left "Unassigned-Eukaryote"
```{r ASVs left Unassigned}
dim(unique(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Taxon_updated == "Eukaryota") %>% select(Feature.ID)))[1]
```
Percentage of all sequences assigned Opisthokonts
```{r Percent of all sequences assigned Opisthokonts}
x <- sum(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Supergroup == "Opisthokonta") %>% select(COUNT))
100*(x/a)
dim(unique(gr_counts_filter %>% filter(!(SAMPLEID == "CTRL")) %>% filter(Supergroup == "Opisthokonta") %>% select(Feature.ID)))[1]
```
## Prepare dataframe to for bar plot
Average ASV sequence counts across replicate samples, `COUNT_AVG` column will now equal the ASV sequence count value across replicates
```{r average ASV across replicates, message=FALSE, warning=FALSE}
gr_counts_avg_wtax <- gr_counts_filter %>%
mutate(LocationName = case_when(
LOCATION_SPECIFIC == "Plume036" ~ "Candelabra Plume",
LOCATION_SPECIFIC == "Plume096" ~ "Mt Edwards Plume",
TRUE ~ as.character(LocationName)
)) %>%
group_by(Feature.ID, SAMPLEID, Sampletype, LOCATION_SPECIFIC, LocationName, Taxon_updated, Kingdom, Supergroup, Division, Class, Order, Family, Genus, Species, Taxa) %>%
summarise(COUNT_AVG = mean(COUNT)) %>%
as.data.frame
# dim(gr_counts_filter);dim(gr_counts_avg_wtax)
# tmp <- gr_counts_avg_wtax %>% select(Taxa, Taxon_updated, Kingdom, Supergroup, Division, Class, Order, Family, Genus, Species) %>% distinct() %>% data.frame
# write_delim(tmp, path = "tax-tmp-2.txt", delim = "\t")
# unique(gr_counts_avg_wtax$Taxa)
# unique(gr_counts_avg_wtax$LocationName)
```
Save output file
```{r Save 18S table with taxonomic curation and averages}
# save(gr_counts_filter,gr_counts_wtax, gr_counts_avg_wtax, file="data-input/GordaRidge-ASVtable-avg.RData")
```
Sum ASV sequence counts to taxonomic level
```{r Import curated 18S table}
# See above
# load(file="data-input/GordaRidge-ASVtable-avg.RData", verbose = T)
```
Now sum ASV counts by curated taxonomic level. Below generates both summed sequences from samples averages across replicates and for samples with replicates.
```{r Sum to curated taxa level, message=FALSE, warning=FALSE}
# Sum averaged counts at curated taxa level
gr_counts_avg_TAXA <- gr_counts_avg_wtax %>%
# Remove control samples & bsw with too few sequences
filter(!(Sampletype == "Control")) %>%
filter(!(LOCATION_SPECIFIC == "BSW020")) %>%
# sum by like taxa
group_by(SAMPLEID, Sampletype, LocationName, Taxa) %>%
summarise(SUM = sum(COUNT_AVG)) %>%
unite(SAMPLE, LocationName, Sampletype, SAMPLEID, sep = " ", remove = FALSE) %>%
data.frame
# Sum each replicate separately to curated taxa level
gr_counts_wreps_TAXA <- gr_counts_filter %>%
mutate(LocationName = case_when(
LOCATION_SPECIFIC == "Plume036" ~ "Candelabra Plume",
LOCATION_SPECIFIC == "Plume096" ~ "Mt Edwards Plume",
TRUE ~ as.character(LocationName)
)) %>%
# Remove control samples & bsw with too few sequences
filter(!(Sampletype == "Control")) %>%
filter(!(LOCATION_SPECIFIC == "BSW020")) %>%
# sum by like taxa
group_by(SAMPLEID, Sampletype, LocationName, LOCATION_SPECIFIC, Taxa) %>%
summarise(SUM = sum(COUNT)) %>%
mutate(locationspecific_mod = case_when(
LOCATION_SPECIFIC == "Plume001" ~ "NearVent001",
TRUE ~ as.character(LOCATION_SPECIFIC)
)) %>%
unite(SAMPLE, LocationName, Sampletype, SAMPLEID, sep = " ", remove = FALSE) %>%
unite(SAMPLE_REPS, LocationName, Sampletype, SAMPLEID, locationspecific_mod, sep = " ", remove = FALSE) %>%
data.frame
```
Make supplementary tables to summarize protist results.
```{r Make supplementary tables, message=FALSE, warning=FALSE}
sample_order_all<-c("Shallow seawater in situ sterivex","Deep seawater in situ sterivex", "Near vent BW in situ sterivex","Near vent BW Grazing T0","Near vent BW Grazing T24","Near vent BW Grazing T36","Mt Edwards Plume in situ sterivex","Mt Edwards Vent in situ SUPR","Mt Edwards Vent Grazing T0","Mt Edwards Vent Grazing T36","Venti Latte Vent in situ SUPR","Venti Latte Vent Grazing T0","Venti Latte Vent Grazing T36","Candelabra Plume in situ sterivex","Candelabra Vent in situ SUPR","Candelabra Vent Grazing T24","SirVentsAlot Vent in situ SUPR","SirVentsAlot Vent Grazing T24")
supp_table_seq <- gr_counts_avg_TAXA %>%
select(SAMPLE, Taxa, SUM) %>%
pivot_wider(names_from = SAMPLE, values_from = SUM, values_fill = 0) %>%
arrange(Taxa) %>%
select(Taxa, sample_order_all)
# write_delim(supp_table_seq, path = "Suppl-18s-seq-total.txt", delim = "\t")
#
# head(gr_counts_avg_wtax)
supp_table_ASV <- gr_counts_avg_wtax %>%
# Remove control samples