-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathscGate.CITE-seq.demo.Rmd
271 lines (198 loc) · 11.1 KB
/
scGate.CITE-seq.demo.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
---
title: "Isolating target cell types from CITE-seq data using scGate"
author: "M. Andreatta, A. Berenstein and S. Carmona"
date: "25/01/2022"
output:
rmdformats::readthedown:
self-contained: true
highlight: haddock
thumbnails: false
css: styles.css
knit: (function(input_file, encoding) {
out_dir <- 'docs';
rmarkdown::render(input_file, encoding=encoding,
output_file=file.path(dirname(input_file), out_dir, 'scGate.CITE-seq.html'))})
---
# Introduction
The CITE-seq dataset by [Hao.et al. 2021](https://doi.org/10.1016/j.cell.2021.04.048) characterizes human PBMCs with scRNA-seq paired with antibody-derived tags (ADT) for a panel of >200 antibodies. This demo illustrates how the [scGate package](https://github.com/carmonalab/scGate) can be applied to isolate target cell type populations from the different modalities in CITE-seq data.
```{r, message=F, warning=F,results=F}
renv::restore()
library(ggplot2)
library(dplyr)
library(Seurat)
library(UCell)
library(scGate)
library(viridis)
library(patchwork)
```
# Load CITE-seq dataset
For simplicity, in this demo we use a reduced version of the dataset by [Hao.et al. 2021](https://doi.org/10.1016/j.cell.2021.04.048), downsampled to 2k cells. The same approach can be applied to the complete dataset.
```{r,warning=F}
testing.datasets <- scGate::get_testing_data(version = "hsa.latest")
hao <- testing.datasets$Satija
palette <- c(list("Impure" = "gray", "Pure" = "green"))
```
Let's have a look at the cell type annotations provided by the authors
```{r, warning=F,collapse =T, fig.width=6,fig.height=5}
DefaultAssay(hao) <- "ADT"
DimPlot(hao, group.by = "celltype.l1",label = T,repel = T,reduction = "umap") + theme(aspect.ratio=1)
```
Are ADT signals in the same scale, how are they distributed within each cell?
```{r,echo=FALSE,warning=F}
par(mfrow=c(1,2))
boxplot(t(hao@assays$ADT@data), outline=F, xaxt='n', main="ADT signal by protein")
boxplot(hao@assays$ADT@data[,sample(seq_len(ncol(hao@assays$ADT@data)),100)],outline=F, xaxt='n', main="ADT signal by cell")
par(mfrow=c(1,1))
```
ADT signals appear to be fairly comparable between cells and antibody tags. However, unlike scRNA-seq these data are not sparse (many zeros), so we will have to adapt the parameters of scGate to this different distribution of signals.
# scGate on CITE-seq data
Because CITE-seq contains two modalities (scRNA-seq and ADT counts), we can apply scGate on one modality and verify gating purity by measuring key markers in the second modality.
## Purify B cells
First, we build a simple gating model for Bcells using the gene MS4A1 (encoding CD20)
```{r,warning=F}
bcell.sct <- scGate::gating_model(name = "Bcell.sct", signature = c("MS4A1"))
```
And we can apply it on the SCT assay (which stores normalized RNA expression counts) of the Seurat object
```{r,collapse=T,warning=F, message=FALSE}
hao <- scGate(data = hao, model = bcell.sct, assay = "SCT",output.col.name = "Bcell.SCT")
DimPlot(hao, group.by = "Bcell.SCT", cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Bcells (SCT assay)")
```
The surface protein measurements for key B cell markers CD19 and CD20 (stored in the ADT assay) confirm that scGate could isolate pure B cells from trascriptomics data
```{r fig.height=4, fig.width=6}
VlnPlot(hao,features = c("CD20","CD19"), assay = "ADT",
cols = palette, pt.size = 0)
```
We can now perform the complementary experiment: detecting B cells from ADT readouts (and confirming gating purity by RNA-seq expression)
Set up a simple gating model based on marker CD20
```{r,collapse=T,warning=F}
cd20.adt <- gating_model(name = "Bcell.adt", signature = c("CD20"))
```
Then apply this scGate model on the ADT assay. Note that, because ADT data are not sparse, and the dimensionality is much lower than scRNA-seq (up to a few hundred tags, compared to tens of thousands of genes in scRNA-seq), we will need to adjust the `maxRank` parameter. This parameter tells [UCell](https://github.com/carmonalab/scGate) how many variables to consider for ranking. For ADT data, the total number of tags divided by two is normally a reasonable choice.
```{r message=F}
DefaultAssay(hao) <- "ADT"
maxrank <- nrow(hao@assays$ADT)/2
hao <- scGate(data = hao, model = cd20.adt, assay = "ADT",output.col.name = "Bcell.ADT",maxRank = maxrank)
b1 <- DimPlot(hao, group.by = "Bcell.ADT", cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Bcells (ADT assay)")
#Verify RNA expression of MS4A1 (encoding CD20) in SCT assay
b2 <- VlnPlot(hao,features = "MS4A1", assay = "SCT",cols = palette, pt.size = 0)
b1 + b2
```
We can always examine the strength of the signatures in scGate models by plotting the UCell scores returned as part of the output. For example, for the Bcell.sct and Bcell.adt signatures, scores for all cells are available in the metadata slots `Bcell.sct_UCell` and `Bcell.adt_UCell`, respectively.
```{r warning=F, message=F, collapse=T}
DefaultAssay(hao) <- "SCT"
a <- FeaturePlot(hao, features=c("Bcell.sct_UCell"), order = T) +
scale_color_viridis(option="D") + theme(aspect.ratio=1)
DefaultAssay(hao) <- "ADT"
b <- FeaturePlot(hao, features=c("Bcell.adt_UCell"), order = T) +
scale_color_viridis(option="D") + theme(aspect.ratio=1)
a | b
```
## Purify T cells
As described in [another demo](https://carmonalab.github.io/scGate.demo/), scGate allows us defining hierarchical models, mimicking a flow cytometry gating process.
```{r warning=F, collapse=TRUE, message=F}
# First level - filter on lymphoid cells
t.sct <- gating_model(name = "lymphoid", signature = c("LCK"))
# Second level - filter on T cells markers (integrin alpha 2b expressed in endothelial cells, platelets, etc. but not T cells)
t.sct <- gating_model(model = t.sct, name = "t.sct", signature = c("CD3E","CD3D","ITGA2B-"),level = 2)
# Run scGate model on SCT assay
hao <- scGate(data = hao, model = t.sct, assay = "SCT")
a <- DimPlot(hao, cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Tcells (SCT assay)")
# and check surface protein markers in ADT assay
b <- VlnPlot(hao,features = c(grep("CD3-[1-2]",row.names(hao@assays$ADT),value = T),"CD41"),
assay = "ADT",cols = palette, pt.size = 0, ncol=2)
a | b
```
And conversely, generate a model to filter T cells based on ADT readouts.
```{r warning=F, collapse=TRUE, message=F}
#model definition
t.adt <- gating_model(name = "t.adt", signature = c("CD3-1","CD3-2","CD41-"))
# filtering
hao <- scGate(data = hao, model = t.adt, assay = "ADT",maxRank = maxrank)
# UMAP plot
DimPlot(hao, cols = palette) +
ggtitle("scGate Tcells (ADT assay)") + theme(aspect.ratio=1)
```
Almost all cells were filtered out by the method. When using ADT data, one needs to check if default positive/negative threshold (0.2) are appropriate:
```{r warning=F,collapse =T, fig.height=3, fig.width=5}
plot_UCell_scores(obj = hao, model = t.adt)
```
The distribution is bimodal, but the default threshold pos.thr=0.2 is too strict. Run again with pos.thr=0.05.
```{r warning=F, collapse=TRUE, message=F}
# model definition
t.adt <- gating_model(name = "t.adt", signature = c("CD3-1","CD3-2","CD41-"))
# filtering
hao <- scGate(data = hao, model = t.adt, assay = "ADT", maxRank = maxrank, pos.thr = 0.05)
a <- DimPlot(hao, cols = palette) +
theme(aspect.ratio = 1) + ggtitle("scGate Tcells (ADT assay)")
# Control markers on SCT assay
b <- VlnPlot(hao,features = c("CD3E","CD3D","ITGA2B"), assay = "SCT", ncol=2,
cols = palette, pt.size = 0)
a | b
```
## Purify NK cells
Build a model for scRNA-seq
```{r}
nk.sct <- gating_model(name = "NK.sct", signature = c("NCAM1","KLRD1","CD3D-"))
```
Filter on SCT data
```{r warning=F, collapse=TRUE, message=F}
hao <- scGate(data = hao, model = nk.sct, assay = "SCT",output.col.name = "NK.sct")
a <- DimPlot(hao, cols = palette) + theme(aspect.ratio = 1) +
ggtitle("scGate NKs (SCT assay)")
b <- VlnPlot(hao,features = c("CD56-1", "CD56-2","CD3-1","CD3-2"), assay = "ADT",
cols = palette, pt.size = 0,ncol = 2)
a | b
```
Build and apply NK model for ADT assay
```{r warning=F, collapse=TRUE, message=F}
# ADT: build model
# since KLRD1/CD94 is not available in Hao data, we will use the CD56 marker instead
nk.adt <- gating_model(name = "NK.adt", signature = c("CD56-1", "CD56-2"))
nk.adt <- gating_model(model = nk.adt, name = "tcell.adt",signature = c("CD3-1","CD3-2"), negative = T)
hao <- scGate(data = hao, model = nk.adt, assay = "ADT",maxRank = maxrank,
pos.thr = 0.4, neg.thr = 0.7, output.col.name = "NK.adt")
plot_UCell_scores(obj = hao, model = nk.adt, pos.thr = 0.4, neg.thr = 0.7)
a <- DimPlot(hao, cols = palette, group.by = "NK.adt") +
theme(aspect.ratio = 1) + ggtitle("scGate NKs (ADT assay)")
b <- VlnPlot(hao,features = c("NCAM1","KLRD1","CD3D","CD3G"), assay = "SCT",
cols = palette, pt.size = 0,ncol = 2)
a | b
```
## Purify Platelets
We start by filtering platelets on the SCT assay
```{r,collapse =T, warning=F, collapse=TRUE, message=F}
# model definition
platelet.sct <- gating_model(name = "platelet.sct", signature = c("ITGA2B"))
#filtering
hao <- scGate(data = hao, model = platelet.sct, assay = "SCT", output.col.name = "platelet.sct")
# UMAP plot
a <- DimPlot(hao, cols = palette) + theme(aspect.ratio = 1) + ggtitle("scGate Platelets (SCT assay)")
# Control markers in ADT assay
b <- VlnPlot(hao,features = c("CD41"), assay = "ADT",
cols = palette, pt.size = 0) + theme(aspect.ratio = 1)
(a + theme(aspect.ratio=1)) | (b + theme(aspect.ratio=1))
```
Finally, we try to detect Platelets using ADT counts.
Since CD41 could be expressed in monocytes and dendritic cells, we need to add a negative signature.
Positive and negative signatures could present different UCell distributions when dealing with ADT data; again we need to adjust the thresholds for positive and negative UCell scores (the distribution plots can help you set a threshold).
```{r,warning=F, collapse=TRUE, message=F, fig.height=3, fig.width=5}
platelet.adt <- gating_model(name = "platelet", signature = c("CD41"))
platelet.adt <- gating_model(model = platelet.adt,name = "myeloid", signature = c("HLA-DR"),negative = T)
platelet.adt <- gating_model(model = platelet.adt,name = "monocyte", signature = c("CD14","CD11c","CD11b-1","CD11b-2"),negative = T)
hao <- scGate(data = hao, model = platelet.adt, assay = "ADT",maxRank = maxrank,pos.thr = 0.95,neg.thr = 0.6)
```
```{r}
a <- DimPlot(hao, cols = palette) + ggtitle("scGate Platelets (ADT assay)") + theme(aspect.ratio=1)
b <- VlnPlot(hao,features = c("ITGA2B"), assay = "SCT",
cols = palette, pt.size = 0) +theme(aspect.ratio=1)
a | b
```
# Further reading
The scGate package and installation instructions are available at: [scGate package](https://github.com/carmonalab/scGate)
The code for this demo (and additional demos) can be found on [GitHub](https://github.com/carmonalab/scGate.demo)
# References
* Hao, Y., Hao, S., Andersen-Nissen, E., Mauck III, W. M., Zheng, S., Butler, A., ... & Satija, R. (2021). *Integrated analysis of multimodal single-cell data*. Cell
* Andreatta, M., Carmona, S. J. (2021) *UCell: Robust and scalable single-cell gene signature scoring* Computational and Structural Biotechnology Journal