-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add expression data, sample metadata, and recount2 #2
Merged
Merged
Changes from 7 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
bf69541
Add .gitignore
jaclyn-taroni f29b2ae
Add shell script for data download
jaclyn-taroni 5a2e754
Ignore recount2 data from figshare
jaclyn-taroni 8c763ad
Add git LFS tracking pcl
jaclyn-taroni bd1bb44
Add microarray PCL (lfs)
jaclyn-taroni e2de1a7
Add GSE18885 series matrix
jaclyn-taroni f9125ea
Add sample/phenotype data
jaclyn-taroni ef60a5b
Ignore microarray expression and sample metadata
jaclyn-taroni File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
*.pcl filter=lfs diff=lfs merge=lfs -text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Ignore hidden metadata files | ||
._* | ||
|
||
# History files | ||
.Rhistory | ||
.Rapp.history | ||
|
||
# Session Data files | ||
.RData | ||
|
||
# Output files from R CMD check | ||
/*.Rcheck/ | ||
|
||
# RStudio files | ||
.Rproj.user/ | ||
|
||
# knitr and R markdown default cache directories | ||
/*_cache/ | ||
/cache/ | ||
|
||
# Temporary files created by R markdown | ||
*.utf8.md | ||
*.knit.md | ||
|
||
# Rplot default output | ||
Rplots.pdf | ||
|
||
# recount2 data from figshare | ||
data/recount2_PLIER_data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#!/bin/bash | ||
|
||
# set up directories | ||
mkdir data && mkdir plots && mkdir results && mkdir util | ||
|
||
# data directory subdirectories | ||
cd data && mkdir expression_data | ||
|
||
# get recount2 data & model from figshare, source code in | ||
# greenelab/rheum-data-plier | ||
wget https://ndownloader.figshare.com/files/10881866 \ | ||
-O recount2.zip | ||
unzip recount2.zip && rm recount2.zip | ||
|
||
## microarray data from greenelab/rheum-plier-data | ||
cd expression_data | ||
|
||
# sle-wb data | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/sle-wb/processed/aggregated_data/SLE_WB_all_microarray_QN_zto_before.pcl | ||
|
||
# NARES | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/NARES/processed/NARES_SCANfast_ComBat.pcl | ||
|
||
# GPA blood dataset (GSE18885) | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/gpa-blood/GSE18885_series_matrix.txt | ||
|
||
# isolated blood cell populations from autoimmune conditions | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/isolated-cell-pop/processed/E-MTAB-2452_hugene11st_SCANfast.pcl | ||
|
||
# get sample (e.g., phenotype) data | ||
cd .. && mkdir sample_info && cd sample_info | ||
# sle-wb sample to dataset of origin data | ||
wget https://github.com/jaclyn-taroni/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/sle-wb/processed/sle-wb_sample_dataset_mapping.tsv | ||
# other/single dataset sample information | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/sle-wb/arrayexpress/E-GEOD-65391/E-GEOD-65391.sdrf.txt | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/isolated-cell-pop/E-MTAB-2452.sdrf.txt | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/sle-wb/arrayexpress/E-GEOD-39088/E-GEOD-39088.sdrf.txt | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/sle-wb/arrayexpress/E-GEOD-78193/E-GEOD-78193.sdrf.txt | ||
wget https://github.com/greenelab/rheum-plier-data/raw/4be547553f24fecac9e2f5c2b469a17f9df253f0/NARES/NARES_demographic_data.tsv |
Git LFS file not shown
22,264 changes: 22,264 additions & 0 deletions
22,264
data/expression_data/GSE18885_series_matrix.txt
Large diffs are not rendered by default.
Oops, something went wrong.
Git LFS file not shown
Git LFS file not shown
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
Sample Disease Disease_Activity Batch Classification Severity ANCA Age Race Gender Ethnicity Disease_Duration (yrs) Flares PGA BVAS VDI Smoking_pkyrs Smoking_Status Steroids_daily_pred_mg Steroids_Cat Immune_Meds GC_or_Immune Nasal_Steroids Immune_or_Nasal Any_Immune | ||
N1004 GPA Active 1 V3 Severe MPO 62 White F non-Hisp 6 8 4 7 10 5 Former 30 1 None 1 Y 1 1 | ||
N1007 GPA Inactive 1 V2 Severe PR3 43 White M non-Hisp 5 0 0 0 1 19 Former 0 0 None 0 N 0 0 | ||
N1017 GPA Never 1 V1 Severe PR3 64 White F non-Hisp 0.75 0 0 0 0 20 Former 5 1 Azathioprine 1 N 1 1 | ||
N1025 Control Control 1 C1 N/A Neg 67 White F non-Hisp . . . . . 25 Former 0 0 None 0 N 0 0 | ||
N1030 Control Control 1 C2 N/A Neg 63 White F non-Hisp 2 0 . . . 40 Former 15 1 None 1 Y 1 1 | ||
N1033 Control Control 1 C2 N/A Neg 67 White M non-Hisp 1 2 . . . 30 Former 15 1 None 1 N 0 1 | ||
N1006 Control Control 1 C3 N/A Neg 52 White M non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1009 Control Control 1 C1 N/A Neg 61 White F non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1011 GPA Inactive 1 V2 Severe PR3 54 White M non-Hisp 3 1 0 0 0 0 Never 0 0 None 0 N 0 0 | ||
N1012 Control Control 1 C3 N/A Neg 52 White F non-Hisp . . . . . 0 Never 15 1 Rituximab 1 N 1 1 | ||
N1014 Control Control 1 C1 N/A Neg 59 White F non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1015 GPA Active 1 V3 Limited PR3 19 White F non-Hisp 5 6 3 3 2 0 Never 7 1 Rituximab 1 N 1 1 | ||
N1018 GPA Inactive 1 V2 Limited MPO 72 White F non-Hisp 10 3 0 0 2 0 Never 0 0 Azathioprine 1 N 1 1 | ||
N1021 GPA Never 1 V1 Severe MPO 54 White M non-Hisp 4 2 0 0 3 15 Former 40 1 CYC 1 N 1 1 | ||
N1026 GPA Never 1 V1 Severe PR3 78 White F non-Hisp 6 0 0 0 2 0 Never 0 0 None 0 N 0 0 | ||
N1029 Control Control 1 C2 N/A Neg 51 White M non-Hisp 2 0 . . . 0 Never 0 0 MTX 1 N 1 1 | ||
N1001 GPA Inactive 2 V2 Limited PR3 31 White F non-Hisp 3 0 0 0 0 0 Never 2.5 1 MTX 1 N 1 1 | ||
N1002 Control Control 2 C1 N/A Neg 27 White M Hispanic . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1013 GPA Never 2 V1 Limited PR3 61 White M non-Hisp 17 0 0 0 0 94 Current 0 0 MTX 1 N 1 1 | ||
N1016 GPA Inactive 2 V2 Limited PR3 51 White M non-Hisp 7 0 0 0 1 0 Never 0 0 None 0 N 0 0 | ||
N1019 Control Control 2 C1 N/A Neg 76 White M non-Hisp . . . . . 40 Former 0 0 None 0 N 0 0 | ||
N1024 GPA Inactive 2 V2 Severe PR3 67 White M non-Hisp 4 0 0 0 2 20 Former 0 0 MTX 1 N 1 1 | ||
N1027 Control Control 2 C2 N/A Neg 69 White M non-Hisp 6 2 . . . 15 Former 0 0 None 0 N 0 0 | ||
N1028 Control Control 2 C2 N/A Neg 40 White M non-Hisp 1 0 . . . 0 Never 10 1 None 1 N 0 1 | ||
N1031 GPA Never 2 V1 Severe PR3 44 White F non-Hisp 2 0 0 0 2 5 Former 0 0 MTX 1 N 1 1 | ||
N1032 Control Control 2 C1 N/A Neg 47 White M non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1034 GPA Inactive 2 V2 Severe PR3 48 White M non-Hisp 2 0 0 0 0 30 Current 0 0 None 0 N 0 0 | ||
N1036 Control Control 2 C3 N/A Neg 39 Asian M non-Hisp . . . . . 0 Never 0 0 None 0 Y 1 1 | ||
N1037 Control Control 2 C2 N/A Neg 44 White F non-Hisp 8 . . . . 0 Never 0 0 CYC 1 N 1 1 | ||
N1038 Control Control 2 C3 N/A Neg 18 Black F Hispanic . . . . . 0 Never 0 0 None 0 Y 1 1 | ||
N1040 GPA Active 2 V3 Limited PR3 32 White F non-Hisp 3 1 2 2 0 0 Never 0 0 MTX 1 N 0 0 | ||
N1041 GPA Active 2 V3 Severe MPO 56 White F non-Hisp 1.5 1 2 3 3 0 Never 7.5 1 Rituximab 1 Y 1 1 | ||
N1093 Control Control 3 C2 N/A Neg 28 White M non-Hisp 1.5 0 . . 0 2 Former 0 0 None 0 N 0 0 | ||
N1003 GPA Inactive 3 V2 Limited MPO 63 White F non-Hisp 5 0 0 0 5 100 Former 0 0 None 0 Y 1 1 | ||
N1042A GPA Active 3 V3 Limited PR3 42 White M non-Hisp 0.3 0 3 4 0 0 Never 50 1 None 1 N 0 1 | ||
N1043B Control Control 3 C1 N/A Neg 38 White F non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1044B GPA Active 3 V3 Limited PR3 40 White M Hispanic 0.8 0 4 5 2 0 Never 15 1 Azathioprine 1 N 1 1 | ||
N1048B GPA Active 3 V3 Severe PR3 63 White M non-Hisp 8 1 5 10 5 50 Former 30 1 Azathioprine 1 N 1 1 | ||
N1050A GPA Active 3 V3 Limited PR3 44 Arabic F non-Hisp 0.8 0 1 1 0 0 Never 7.5 1 MTX 1 N 1 1 | ||
N1052A GPA Never 3 V1 Severe PR3 69 White F non-Hisp 13 1 0 0 3 7 Former 5 1 MMF 1 N 1 1 | ||
N1053B GPA Never 3 V1 Severe PR3 48 White M non-Hisp 5 2 0 0 0 7 Former 0 0 Rituximab 1 N 1 1 | ||
N1055B GPA Inactive 3 V2 Limited PR3 47 White F non-Hisp 9 2 0 0 2 0 Never 0 0 None 0 N 0 0 | ||
N1057A GPA Never 3 V1 Severe MPO 75 White F non-Hisp 6 3 0 0 3 45 Former 0 0 None 0 N 0 0 | ||
N1058 GPA Inactive 3 V2 Severe PR3 29 White M non-Hisp 3 2 1 1 0 0 Never 0 0 MTX 1 N 1 1 | ||
N1060 GPA Never 3 V1 Severe PR3 58 White M non-Hisp 5 1 0 0 0 25 Former 10 1 RTX 1 N 1 1 | ||
N1061 GPA Active 3 V3 Severe PR3 55 White F non-Hisp 0.5 0 2 9 1 30 Former 30 1 None 1 N 0 1 | ||
N1064 Control Control 3 C1 N/A Neg 57 White M non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1066 Control Control 3 C1 N/A Neg 71 White M non-Hisp . . . . . 30 Former 0 0 None 0 N 0 0 | ||
N1068 Control Control 3 C2 N/A Neg 51 White M non-Hisp . . . . . 0 Never 0 0 None 0 Y 1 1 | ||
N1069 Control Control 3 C2 N/A Neg 51 Black F non-Hisp . . . . . 0 Never 0 0 MTXINFLIX 1 N 1 1 | ||
N1070 Control Control 3 C2 N/A Neg 43 White M non-Hisp . . . . . 10 Former 5 1 None 1 N 0 1 | ||
N1073 EGPA "NA" 3 C4 N/A MPO 56 White F non-Hisp 2 0 0 . 0 0 Never 5 1 MTX 1 N 1 1 | ||
N1074 GPA Inactive 3 V2 Limited PR3 71 White M non-Hisp 11 2 0 0 3 0 Never 0 0 MTX 1 N 1 1 | ||
N1081 Control Control 3 C1 N/A Neg 62 White F non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1082 Control Control 3 C2 N/A Neg 45 White F non-Hisp 1 0 . . . 0 Never 0 0 None 0 Y 1 1 | ||
N1086 Control Control 3 C3 N/A Neg 41 White M non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1087 Control Control 3 C2 N/A Neg 51 White M non-Hisp 0.5 0 . . . 0 Never 20 1 MTX 1 N 1 1 | ||
N1088 GPA Inactive 3 V2 Severe PR3 23 White M non-Hisp 5 1 0 0 2 0 Never 5 1 None 1 N 0 1 | ||
N1091 GPA Inactive 3 V2 Severe PR3 64 White M non-Hisp 10 7 0 0 5 0 Never 7 1 MMF 1 N 1 1 | ||
N1092 Control Control 3 C1 N/A Neg 59 White F non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1094 EGPA "NA" 3 C4 N/A Neg 63 White F non-Hisp 14 4 0 0 5 0 Never 0 0 Azathioprine 1 N 1 1 | ||
N1095 EGPA "NA" 3 C4 N/A Neg 52 White F non-Hisp NA 2 0 0 4 0 Never 10 1 Azathioprine 1 N 1 1 | ||
N1096 EGPA "NA" 3 C4 N/A MPO 64 White F non-Hisp . 6 0 1 4 0 Never 2 1 MTX 1 N 1 1 | ||
N1097 Control Control 3 C1 N/A Neg 66 White M non-Hisp . . . . . 0 Never 0 0 None 0 N 0 0 | ||
N1098 Control Control 3 C2 N/A Neg 59 White F non-Hisp 8 . . . . 0 Never 0 0 MTX 1 N 1 1 | ||
N1099 Control Control 3 C2 N/A Neg 53 White M non-Hisp 3 . . . . 18 Former 20 1 None 1 N 0 1 | ||
N1100 GPA Active 3 V3 Severe PR3 43 White F non-Hisp 0.1 0 6 8 0 0 Never 60 1 None 1 N 0 1 | ||
N1101 EGPA "NA" 3 C4 N/A MPO 68 White M non-Hisp 6 4 2 3 1 0 Never 15 1 None 1 N 0 1 | ||
N1102 EGPA "NA" 3 C4 N/A MPO 68 White M non-Hisp 3 1 1 1 3 30 Former 0 0 MTX 1 N 1 1 | ||
N1103 EGPA "NA" 3 C4 N/A Neg 42 White M non-Hisp 10 3 0 0 5 0 Never 5 1 MTX 1 N 1 1 | ||
N1104 EGPA "NA" 3 C4 Severe MPO 50 White F non-Hisp 7 4 4 NA 0 0 Never 50 1 None 1 N 0 1 | ||
N1105 EGPA "NA" 3 C4 N/A Neg 28 White F non-Hisp 10 9 5 5 5 0 Never 20 1 RTX 1 N 1 1 | ||
N1106 EGPA "NA" 3 C4 N/A Neg 73 White F non-Hisp 2 1 0 0 5 40 Former 10 1 Azathioprine 1 N 1 1 | ||
N1107 EGPA "NA" 3 C4 N/A Neg 70 White F non-Hisp 4 1 0 0 1 0 Never 2 1 MMF 1 N 1 1 | ||
N1108 EGPA "NA" 3 C4 N/A Neg 56 White F non-Hisp 25 4 0 0 4 0 Never 5 1 Azathioprine 1 N 1 1 | ||
N1110 EGPA "NA" 3 C4 N/A MPO 38 Asian M non-Hisp 5 3 1 1 4 0 Never 8 1 None 1 N 0 1 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why you're storing these files in this repo also. These files can be read in directly from the url (in both R and python)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case that I have to load this into R in multiple scripts/notebooks/what have you, I think it would be preferable to not have to read this from the URL (somewhere between 400-500MB in this particular case) each time. I can add these files to
.gitignore
, though, if you think that's better.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it faster to read from the file than the url? Are there other concerns too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a few orders of magnitude faster unless this does some serious caching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, in python (did not test in R)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fast internet connection there 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this use case is suitable as a git submodule then 🤔 . Up to you @jaclyn-taroni - I will approve this PR. (just wondering about pros vs. cons for alternative solutions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think I will keep downloading the data this way rather than using a submodule or reading directly from the url, but I will ignore the data from that GitHub repo (same as what I'm doing with the recount2 data from figshare). Will update in next commit.