You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Rows: 4978 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## chr (7): sample, sex, biosample, population, population_name, superpopulatio...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(samples)
sample
sex
biosample
population
population_name
superpopulation
superpopulation_name
HG00271
male
SAME123417
FIN
Finnish
EUR
European Ancestry
HG00276
female
SAME123424
FIN
Finnish
EUR
European Ancestry
HG00288
female
SAME1839246
FIN
Finnish
EUR
European Ancestry
HG00290
male
SAME1839057
FIN
Finnish
EUR
European Ancestry
HG00303
male
SAME1840115
FIN
Finnish
EUR
European Ancestry
HG00308
male
SAME124161
FIN
Finnish
EUR
European Ancestry
Eigenvalues
# Load the eigenvalueseigenval= read_tsv("working/1kGP_pca.eigenval", col_names= c("value"))
## Rows: 10 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: "\t"
## dbl (1): value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(eigenval)
value
346.99400
151.24600
42.58400
31.44490
5.55535
5.18856
Eigenvectors
# Load the eigenvectorseigenvec= read.table("working/1kGP_pca.eigenvec", sep="")
dim(eigenvec)
## [1] 3202 12
# Display the first few rows of the PCA data to inspect the structure
head(eigenvec)
# Perform t-SNE
set.seed(123) # Set a seed for reproducibilitytsne_result= Rtsne(as.matrix(pcs[,1:3]), dims=2, perplexity=30, theta=0.5, verbose=TRUE)
## Performing PCA
## Read the 3202 x 3 data matrix successfully!
## Using no_dims = 2, perplexity = 30.000000, and theta = 0.500000
## Computing input similarities...
## Building tree...
## Done in 0.09 seconds (sparsity = 0.035335)!
## Learning embedding...
## Iteration 50: error is 79.204083 (50 iterations in 0.16 seconds)
## Iteration 100: error is 64.772105 (50 iterations in 0.14 seconds)
## Iteration 150: error is 62.385642 (50 iterations in 0.16 seconds)
## Iteration 200: error is 61.401533 (50 iterations in 0.16 seconds)
## Iteration 250: error is 60.846001 (50 iterations in 0.16 seconds)
## Iteration 300: error is 1.500679 (50 iterations in 0.14 seconds)
## Iteration 350: error is 1.140660 (50 iterations in 0.15 seconds)
## Iteration 400: error is 0.968229 (50 iterations in 0.14 seconds)
## Iteration 450: error is 0.873545 (50 iterations in 0.12 seconds)
## Iteration 500: error is 0.827076 (50 iterations in 0.12 seconds)
## Iteration 550: error is 0.806732 (50 iterations in 0.13 seconds)
## Iteration 600: error is 0.790258 (50 iterations in 0.13 seconds)
## Iteration 650: error is 0.777408 (50 iterations in 0.13 seconds)
## Iteration 700: error is 0.768881 (50 iterations in 0.12 seconds)
## Iteration 750: error is 0.760994 (50 iterations in 0.13 seconds)
## Iteration 800: error is 0.753550 (50 iterations in 0.12 seconds)
## Iteration 850: error is 0.746623 (50 iterations in 0.12 seconds)
## Iteration 900: error is 0.739908 (50 iterations in 0.12 seconds)
## Iteration 950: error is 0.734250 (50 iterations in 0.13 seconds)
## Iteration 1000: error is 0.729056 (50 iterations in 0.13 seconds)
## Fitting performed in 2.73 seconds.
# The results are stored in tsne_result$Y
head(tsne_result$Y)
13.45648
-6.278075
20.47906
-9.302024
11.36764
1.970537
34.88028
-7.055738
13.59423
1.797826
20.18693
-0.624645
Visualizing t-SNE Clustering
# Create a data frame for plottingtsne_df=data.frame(TSNE1=tsne_result$Y[, 1],
TSNE2=tsne_result$Y[, 2],
sample= rownames(pcs)) %>% inner_join(samples)
## Joining with `by = join_by(sample)`
## inner_join: added 6 columns (sex, biosample, population, population_name,
## superpopulation, …)
## > rows only in x ( 0)
## > rows only in samples (1,776)
## > matched rows 3,202
## > =======
## > rows total 3,202
# Set parameters for UMAPumap_config=umap::umap.defaultsumap_config$n_neighbors=20umap_config$min_dist=0.9umap_config$metric="euclidean"
Perform UMAP
# Perform UMAP
set.seed(123) # Set a seed for reproducibilityumap_result= umap(as.matrix(pcs[, 1:3]), config=umap_config)
# The results are stored in umap_result$layout
head(umap_result$layout)
HG00096
1.2041862
-2.8331560
HG00097
3.3251365
-2.3108645
HG00099
-0.9658144
-0.7895916
HG00100
6.8363274
0.7601992
HG00101
-0.0701956
-0.3116174
HG00102
1.6608518
0.2403880
Visualizing UMAP Clustering
# Create a data frame for plottingumap_df=data.frame(UMAP1=umap_result$layout[, 1],
UMAP2=umap_result$layout[, 2],
sample= rownames(pcs)) %>% inner_join(samples)
## Joining with `by = join_by(sample)`
## inner_join: added 6 columns (sex, biosample, population, population_name,
## superpopulation, …)
## > rows only in x ( 0)
## > rows only in samples (1,776)
## > matched rows 3,202
## > =======
## > rows total 3,202