forked from immunogenomics/harmony
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathRunHarmony.default.Rd
132 lines (112 loc) · 4.43 KB
/
RunHarmony.default.Rd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/ui.R
\name{RunHarmony.default}
\alias{RunHarmony.default}
\title{This is the primary harmony interface.}
\usage{
\method{RunHarmony}{default}(
data_mat,
meta_data,
vars_use,
theta = NULL,
sigma = 0.1,
lambda = 1,
nclust = NULL,
max_iter = 10,
early_stop = TRUE,
ncores = 1,
plot_convergence = FALSE,
return_object = FALSE,
verbose = TRUE,
.options = harmony_options(),
...
)
}
\arguments{
\item{data_mat}{Matrix of cell embeddings. Cells can be rows or
columns and will be inferred by the rows of meta_data.}
\item{meta_data}{Either (1) Dataframe with variables to integrate
or (2) vector with labels.}
\item{vars_use}{If meta_data is dataframe, this defined which
variable(s) to remove (character vector).}
\item{theta}{Diversity clustering penalty parameter. Specify for
each variable in vars_use Default theta=2. theta=0 does not
encourage any diversity. Larger values of theta result in more
diverse clusters.}
\item{sigma}{Width of soft kmeans clusters. Default
sigma=0.1. Sigma scales the distance from a cell to cluster
centroids. Larger values of sigma result in cells assigned to
more clusters. Smaller values of sigma make soft kmeans cluster
approach hard clustering.}
\item{lambda}{Ridge regression penalty. Default lambda=1. Bigger
values protect against over correction. If several covariates
are specified, then lambda can also be a vector which needs to
be equal length with the number of variables to be
corrected. In this scenario, each covariate level group will be
assigned the scalars specified by the user. If set to NULL,
harmony will start lambda estimation mode to determine lambdas
automatically and try to minimize overcorrection (Use with caution still
in beta testing).}
\item{nclust}{Number of clusters in model. nclust=1 equivalent to
simple linear regression.}
\item{max_iter}{Maximum number of rounds to run Harmony. One round
of Harmony involves one clustering and one correction step.}
\item{early_stop}{Enable early stopping for harmony. The
harmonization process will stop when the change of objective
function between corrections drops below 1e-4}
\item{ncores}{Number of processors to be used for math operations
when optimized BLAS is available. If BLAS is not supporting
multithreaded then this option has no effect. By default,
ncore=1 which runs as a single-threaded process. Although
Harmony supports multiple cores, it is not optimized for
multithreading. Increase this number for large datasets iff
single-core performance is not adequate.}
\item{plot_convergence}{Whether to print the convergence plot of
the clustering objective function. TRUE to plot, FALSE to
suppress. This can be useful for debugging.}
\item{return_object}{(Advanced Usage) Whether to return the Harmony
object or only the corrected PCA embeddings.}
\item{verbose}{Whether to print progress messages. TRUE to print,
FALSE to suppress.}
\item{.options}{Advanced parameters of RunHarmony. This must be the
result from a call to `harmony_options`. See ?`harmony_options`
for more details.}
\item{...}{other parameters that are not part of the API}
}
\value{
By default, matrix with corrected PCA embeddings. If
return_object is TRUE, returns the full Harmony object (R6
reference class type).
}
\description{
Use this generic with a cell embeddings matrix, a metadata table
and a categorical covariate to run the Harmony algorithm directly
on cell embedding matrix.
}
\examples{
## By default, Harmony inputs a cell embedding matrix
\dontrun{
harmony_embeddings <- RunHarmony(cell_embeddings, meta_data, 'dataset')
}
## If PCA is the input, the PCs need to be scaled
data(cell_lines_small)
pca_matrix <- cell_lines_small$scaled_pcs
meta_data <- cell_lines_small$meta_data
harmony_embeddings <- RunHarmony(pca_matrix, meta_data, 'dataset')
## Output is a matrix of corrected PC embeddings
dim(harmony_embeddings)
harmony_embeddings[seq_len(5), seq_len(5)]
## Finally, we can return an object with all the underlying data structures
harmony_object <- RunHarmony(pca_matrix, meta_data, 'dataset', return_object=TRUE)
dim(harmony_object$Y) ## cluster centroids
dim(harmony_object$R) ## soft cluster assignment
dim(harmony_object$Z_corr) ## corrected PCA embeddings
head(harmony_object$O) ## batch by cluster co-occurence matrix
}
\seealso{
Other RunHarmony:
\code{\link{RunHarmony.Seurat}()},
\code{\link{RunHarmony.SingleCellExperiment}()},
\code{\link{RunHarmony}()}
}
\concept{RunHarmony}