Skip to content

Latest commit



107 lines (85 loc) · 14.9 KB

File metadata and controls

107 lines (85 loc) · 14.9 KB

Machine Learning Cheatsheet

The tables below consist of valuable functions or commands that will help you through this module.

Each table represents a different library/tool and its corresponding commands.

Please note that these tables are not intended to tell you all the information you need to know about each command.

The hyperlinks found in each piece of code will take you to the documentation for further information on the usage of each command.


Read the AnnotationDbi package vignette here.

Library/Package Piece of Code What it's called What it does
AnnotationDbi keytypes() Keytypes Returns a character vector of column names/keytypes (e.g., type of gene identifiers) available in an AnnotationDbi package.
AnnotationDbi mapIDs() Mapped IDs Extracts the mapped ids for a set of keys (e.g., gene identifiers) of a specific keytype

Base R

Read the Base R package documentation here.

Library/Package Piece of Code What it's called What it does
Base R round() Round Rounds the values in the object provided in the first argument to the number of decimal places specified in the second argument
Base R identical() Identical Checks if two objects are exactly equal
Base R prcomp() Principal Components Analysis Executes a principal components analysis on specified matrix or data frame
Base R rowSums() Row Sums Returns the sums of the rows in a numeric array, matrix, or data.frame
Base R rowMeans() Row Means Returns the means of the rows in a numeric array, matrix, or data.frame
Base R quantile() Sample Quantiles Returns the sample quantiles for a given numeric vector of data and numeric vector of probabilities
Base R cor() Correlation Computes correlation between columns using a specified correlation method, and returns a correlation matrix
Base R as.dist() Distance matrix computation Returns a special object of class dist, a distance matrix used by the hclust() function
Base R hclust() Hierarchical Clustering Performs hierarchical clustering analysis on a set of dissimilarities and methods
Base R table() Create Table Creates a contingency table of counts for each combination of factor levels
Base R duplicated() Duplicated Returns a logical vector, where TRUE represents elements of the object that are duplicates
Base R any() Any Checks to see if at least one of the elements are TRUE when given a logical vector
Base R cbind() Column bind Combines vectors, matrices, or data.frames by columns
Base R pairwise.wilcox.test() Pairwise Wilcoxon Rank Sum Tests Calculates the pairwise comparisons between group levels


Read the PLIER package documentation here.
A PLIER package vignette can be found here and can also serve as documentation for the commands in the table below.

Library/Package Piece of Code What it's called What it does
PLIER combinePaths() Combine Pathways Combines the pathway data obtained from PLIER and returns the result as a matrix
PLIER commonRows() Common Rows Determines the rows (genes) that are common to the specified data matrices and returns them as a character vector
PLIER rowNorm() Row Normalize Normalizes each row (gene) by z-scoring the expression values
PLIER num.pc() Number of Principal Components Returns the number of significant principal components
PLIER PLIER() Main PLIER Function Main function of the Pathway-Level Information ExtractoR.
PLIER plotU() Plot U Matrix Plots the U matrix obtained from the PLIER function results, allowing insight into the pathways or cell types captured by the latent variables


Read the ComplexHeatmap package documentation here.

Library/Package Piece of Code What it's called What it does
ComplexHeatmap Heatmap() Complex Heatmap Constructs a heatmap whose graphics and features can be defined
ComplexHeatmap HeatmapAnnotation() Heatmap Annotation Constructor Creates an annotation object to be used in conjunction with a Heatmap


Read the ggplot2 package documentation here.
A vignette on the usage of the ggplot2 package can be found here.

Library/Package Piece of Code What it's called What it does
ggplot2 geom_jitter() Jittered Points Adds a small amount of random variation at each point’s location on a plot
ggplot2 labs() Labels Sets the axis, legend, and plot labels if specified
ggplot2 theme() Theme Sets the specified non-data elements of a plot (i.e. plot title, legend spacing, text size, etc.)


Read the tidyr package documentation here.
A vignette on the usage of the tidyr package can be found here.

Library/Package Piece of Code What it's called What it does
tidyr separate() Separate Separates a character column into multiple columns with a given regular expression or numeric locations
tidyr pivot_longer() Pivot Longer Pivots data in a data.frame from wide to long format

Other packages and functions

Documentation for each of these packages can be accessed by clicking the package name in the table below.

Library/Package Piece of Code What it's called What it does
data.table fread() F read Reads in data faster than base R
purrr discard() Discard Discards the given elements
dplyr pull() Pull Pulls a single variable out of a given table of data
matrixStats rowSds() Row Standard Deviations Returns the standard deviation estimates for each row in a matrix
matrixStats rowVars() Row Variances Returns the variance estimates for each row in a matrix
umap umap() Uniform Manifold Approximation and Projection (UMAP) Computes a manifold approximation and projection on a given matrix or data.frame
ConsensusClusterPlus ConsensusClusterPlus() Consensus Clustering Finds the consensus across multiple runs of the clustering algorithm
plotly plot_ly() Plotly Visualization Initiates a plotly visualization with given R objects
ggsignif geom_signif() Create Significance Layer Adds significance information to the plot. It can be used to run statistical tests and display the significance information from those tests. We use it differently, in a way that gives us more control, in the notebook.