This repository has been archived by the owner on Jan 10, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathdifferentially-expressed-genes.Rmd
356 lines (248 loc) · 12.1 KB
/
differentially-expressed-genes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
---
title: "Differentially Expressed Genes Network Analysis"
author: "Kozo Nishida, Kristina Hanspers and Alex Pico"
date: "`r Sys.Date()`"
output:
BiocStyle::html_document:
toc: true
toc_depth: 4
df_print: paged
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
eval=FALSE
)
```
*The R markdown is available from the pulldown menu for* Code *at the upper-right, choose "Download Rmd", or [download the Rmd from GitHub](https://raw.githubusercontent.com/nrnb/gsod2019_kozo_nishida/master/differentially-expressed-genes.Rmd).*
<hr />
This protocol describes a network analysis workflow in Cytoscape for a set of differentially expressed genes. Points covered:
- Retrieving relevant networks from public databases
- Network functional enrichment analysis
- Integration and visualization of experimental data
- Exporting network visualizations
<center>
![](https://cytoscape.org/cytoscape-tutorials/protocols/differentially-expressed-genes/string-ovarian-drivers.png)
</center>
<hr />
# Installation
```{r, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if(!"RCy3" %in% installed.packages())
BiocManager::install("RCy3")
library(RCy3)
```
# Getting started
First, launch Cytoscape and keep it running whenever using RCy3. Confirm that you have everything installed and running:
```{r}
cytoscapePing()
cytoscapeVersionInfo()
```
# Prerequisites
If you haven't already, install the [STRINGapp](http://apps.cytoscape.org/apps/stringapp)
```{r}
installApp('stringApp')
installApp('yFiles Layout Algorithms')
```
# Background
Ovarian serous cystadenocarcinoma is a type of epithelial ovarian cancer which accounts for ~90% of all ovarian cancers.
The data used in this protocol are from [The Cancer Genome Atlas](https://cancergenome.nih.gov/), in which multiple subtypes of serous cystadenocarcinoma were identified and characterized by mRNA expression.
We will focus on the differential gene expression between two subtypes, **Mesenchymal** and **Immunoreactive**.
For convenience, the data have already been analyzed and pre-filtered, using log fold change value and adjusted p-value.
# Network Retrieval
Many public databases and multiple Cytoscape apps allow you to retrieve a network or pathway relevant to your data.
For this workflow, we will use the STRING app. Some other options include:
- [WikiPathways](https://nrnb.org/gsod2019_kozo_nishida/html_documents/Rmd/wikipathways-app.html)
- [NDEx](http://www.ndexbio.org/)
- [GeneMANIA](https://genemania.org/)
# Retrieve Networks from STRING
To identify a relevant network, we will query the STRING database in two different ways:
- Query **STRING protein** with the list of differentially expressed genes.
- Query **STRING disease** for ovarian cancer.
## STRING Protein Query: Up-regulated genes
- Load the file containing the list of up-regulated genes, TCGA-Ovarian-MesenvsImmuno_UP.txt:
```{r}
df <- read.table("https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_UP.txt")
```
- We will use the identifiers in the first column of this datafile to run a **STRING protein query**, with confidence (score) cutoff of 0.4:
```{r}
string.cmd = paste('string protein query query="', paste(df$V1, collapse = '\n'), '" cutoff=0.4 species="Homo sapiens"', sep = "")
commandsRun(string.cmd)
```
The resulting network will load automatically.
![](images/string-up.png)
The resulting network contains up-regulated genes recognized by STRING, and interactions between them with an evidence score of 0.4 or greater.
```{r}
getTableColumnNames('edge')
```
```{r}
evidence.score <- getTableColumns('edge', "stringdb::score")
min(evidence.score)
```
# Enrichment Analysis Options
Next, we are going to perform enrichment anlaysis uing the STRING app.
Note that there are several other options, including:
- [clusterProfiler](http://bioconductor.org/packages/release/bioc/vignettes/clusterProfiler/inst/doc/clusterProfiler.html): An R package for ORA and GSEA
- [g-Profiler](https://biit.cs.ut.ee/gprofiler/): An enrichment analysis website
- [EnrichR](http://amp.pharm.mssm.edu/Enrichr/): A website that performs enrichment against dozens of ontologies and pathway resources
- [ClueGO](http://apps.cytoscape.org/apps/cluego): Creates and visualizes a functionally grouped network of terms/pathways
- [BiNGO](http://apps.cytoscape.org/apps/bingo): GO overrepresentation analysis on networks
- [EnrichmentMap](https://cytoscape.org/cytoscape-tutorials/protocols/enrichmentmap-pipeline/): Visualize the results of gene-set enrichment as a network
## STRING Enrichment
The STRING app has built-in enrichment analysis functionality, which includes enrichment for GO Process, GO Component, GO Function, InterPro, KEGG Pathways, and PFAM.
- First, we will run the enrichment on the whole network, against the genome:
```{r}
string.cmd = 'string retrieve enrichment allNetSpecies="Homo sapiens", background=genome selectedNodesOnly="false"'
commandsRun(string.cmd)
string.cmd = 'string show enrichment'
commandsRun(string.cmd)
```
- When the enrichment analysis is complete, a new tab titled **STRING Enrichment** will open in the **Table Panel**.
![](images/string-enrichment.png)
The STRING app includes several options for filtering and displaying the enrichment results.
The features are all available at the top of the **STRING Enrichment** tab.
- We are going to filter the table to only show GO Process.
```{r}
string.cmd = 'string filter enrichment categories="GO Process", overlapCutoff = "0.5", removeOverlapping = "true"'
commandsRun(string.cmd)
```
- Next, we will add a split donut chart to the nodes representing the top terms by clicking on ![](images/ChartIcon.png).
```{r}
string.cmd = 'string show charts'
commandsRun(string.cmd)
```
![](images/string-enrichment-chart.png)
## STRING Protein Query: Down-regulated genes
- Repeat the network search, enrichment analysis and visualization for the set of down-regulated genes by using the first column of [TCGA-Ovarian-MesenvsImmuno_DOWN.txt](https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_DOWN.txt):
```{r}
df <- read.table("https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_DOWN.txt")
string.cmd = paste('string protein query query="', paste(df$V1, collapse = '\n'), '" cutoff=0.4 species="Homo sapiens"', sep = "")
commandsRun(string.cmd)
```
Now we can perform STRING Enrichment analysis on the resulting network.
```{r}
string.cmd = 'string retrieve enrichment allNetSpecies="Homo sapiens", background=genome selectedNodesOnly="false"'
commandsRun(string.cmd)
string.cmd = 'string show enrichment'
commandsRun(string.cmd)
```
Filter the analysis results for non-redundant GO Process terms only, add split donut charts. To distinguish between the visualizations of up- and down-regulated results, pick a different color palette under Change Color Palette under Settings.
```{r}
string.cmd = 'string filter enrichment categories="GO Process", overlapCutoff = "0.5", removeOverlapping = "true"'
commandsRun(string.cmd)
```
```{r}
string.cmd = 'string show charts'
commandsRun(string.cmd)
```
## STRING Disease Query
Now, we will query the **STRING disease** database to retrieve a network of ovarian cancer associated genes, completely independent of our dataset.
```{r}
string.cmd = 'string disease query disease="ovarian cancer"'
commandsRun(string.cmd)
```
This will bring in the top 100 ovarian cancer associated genes connected with a confidence score greater than 0.4.
(We did not give them as parameters[100 and 0.4]. This is because the default values are those.)
# Data integration
Next we will import log fold changes and p-values from our TCGA dataset and use them to create a visualization.
Since the network and data use different identifiers, we first have to do some quick identifier mapping.
In this case, we will use the gene symbol in the **display name** column to retrieve **Entrez Gene** identifiers.
```{r}
getTableColumnNames('node')
```
```{r}
mapped.cols <- mapTableColumn("display name",'Human','HGNC','Entrez Gene')
```
Here we set **Human** as species, **HGNC** as **Map from**, and **Entrez Gene** as **To**.
```{r}
head(mapped.cols)
```
`mapped.cols` displays a report of how many identifiers were mapped. Make note of this information as it impacts all down-stream analysis; If the mapping was unsuccessful, downstream analysis will be as well. You will notice the new column `Entrez Gene` in the Node Table.
```{r}
tail(getTableColumnNames('node'))
```
Next, we can load the data (a local copy of [TCGA-Ovarian-MesenvsImmuno_data.csv](https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_data.csv)):
```{r}
df <- read.csv("https://cytoscape.org/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_data.csv")
```
```{r}
head(df)
```
We are now ready to integrate the data with the network (node) table in Cytoscape.
For importing the data we will use the following mapping:
- **Key Column for Network** should be **Entrez Gene**.
- **Gene** should be the key of the data(`df`).
```{r}
loadTableData(df, data.key.column = "Gene", table = "node", table.key.column = "Entrez Gene")
```
You will notice two new columns (logFC and FDR.adjusted.Pvalue) in the Node Table.
```{r}
tail(getTableColumnNames('node'))
```
# Visualization
We can now use the integrated data to create a network visualization.
- For node **Fill Color**, create a continuous mapping for **logFC**, with the following blue-red gradient.
- First we have to remove the current mapping for node fill color:
```{r}
getVisualStyleNames()
```
```{r}
deleteStyleMapping("STRING style v1.5 - ovarian cancer", "NODE_FILL_COLOR")
```
- Next, we need the min and max of the **logFC** column:
```{r}
logFC.table <- getTableColumns('node', 'logFC')
```
```{r}
logFC.min <- min(logFC.table, na.rm = T)
logFC.max <- max(logFC.table, na.rm = T)
logFC.center <- logFC.min + (logFC.max - logFC.min)/2
```
- Let's create the mapping:
```{r}
setNodeFontSizeDefault(4)
setNodeColorMapping('logFC', c(logFC.min, logFC.center, logFC.max), c('#0000FF', '#FFFFFF', '#FF0000'), style.name ="STRING style v1.5 - ovarian cancer")
```
- Change the default node **Fill Color** to light grey.
```{r}
setNodeColorDefault('#D3D3D3', style.name ="STRING style v1.5 - ovarian cancer")
```
_Pro-tip: If you apply the **yFiles Organic** layout, you can end up with network views like this..._
([**yFiles** Layout Algorithms App](http://apps.cytoscape.org/apps/yfileslayoutalgorithms) does not support any automation. Please select it in Cytoscape Desktop menubar.)
![](images/string-ovarian-styled.png)
The TCGA found several genes that were commonly mutated in ovarian cancer, so called "cancer drivers".
We can add information about these genes to the network visualization, by changing the visual style of these nodes.
Three of the most important drivers are **TP53**, **BRCA1** and **BRCA2**.
We will add a thicker, clored border for these genes in the network.
- Select all three driver genes by:
```{r}
selectNodes(c("TP53", "BRCA1", "BRCA2"), by.col = "display name")
```
- Add a style bypass for node **Border Width** (5) and node **Border Paint** (bright pink):
```{r}
setNodeBorderWidthBypass(getSelectedNodes(), 5)
setNodeBorderColorBypass(getSelectedNodes(), '#FF007F')
```
The network will now look like this:
![](images/string-ovarian-drivers.png)
# Other Analysis Options
- Exploring networks: finding paths, hubs and modules (clusterMaker, MCODE, jActiveModules, NetworkAnalyzer, PathFinder)
- Extending networks with Transcription Factors, miRNAs, etc using CyTargetLinker
# Exporting Networks
Cytoscape provides a number of ways to export results and visualizations:
- As an image:
```{r}
exportImage('./differentially-expressed-genes', 'PDF')
exportImage('./differentially-expressed-genes', 'PNG')
exportImage('./differentially-expressed-genes', 'JPEG')
exportImage('./differentially-expressed-genes', 'SVG')
exportImage('./differentially-expressed-genes', 'PS')
```
- To a public repository:
```
exportNetworkToNDEx("user", "password", TRUE)
```
- As a Cytoscape JSON file:
```{r}
exportNetwork('./differentially-expressed-genes', 'cyjs')
```