An implementation of the t-Sne algorithm. The main author (Laurens van der Maaten) has information on t-Sne (including papers and links to other implementations) here.
To use, run the command:
Pkg.clone("https://github.com/Wedg/TSNE.jl.git")
A detailed description of t-Sne describing each of the components of the algorithm can be found:
- here as an html file, or
- here as an ipynb file that you can run in Jupyter.
The main function to use is tsne
which is run as follows:
tsne(X, d, [perplexity = 30.0, perplexity_tol = 1e-5, perplexity_max_iter = 50,
pca_init = true, pca_dims = 30, exag = 12.0, stop_exag = 250,
μ_init = 0.5, μ_final = 0.8, μ_switch = 250,
η = 100.0, min_gain = 0.01, num_iter = 1000])
with X
containing the data with features in rows and observations in columns and d
the number of dimensions to reduce to (typically 2 or 3 for visualisations).
All the options are described in more detail by running the help request i.e.
?tsne
which shows:
F::AbstractFloat
T::Integer
X::Matrix{F}
: Data matrix - where each row is a feature and each column is an observation / pointd::T
: The number of dimensions to reduce X to (e.g. 2 or 3)
perplexity::F = 30.0
: User specified perplexity for each conditional distributionperplexity_tol::F = 1e-5
: Tolerence for binary search of bandwidthperplexity_max_iter::T = 50
: Maximum number of iterations used in binary search for bandwidthpca_init::Bool = true
: Choose whether to perform PCA before running t-SNEpca_dims::T = 30
: Number of dimensions to reduce to using PCA before applying t-SNEexag::T = 12
: Early exaggeration - multiply all pᵢⱼ's by this constantstop_exag::T = 250
: Stop the early exaggeration after this many iterationsμ_init::F = 0.5
: Initial momentum parameterμ_final::F = 0.8
: Final momentum parameterμ_switch::T = 250
: Switch from initial to final momentum parameter at this iterationη::F = 100.0
: Learning ratemin_gain::F = 0.01
: Minimum gain for adaptive learningnum_iter::T = 1000
: Number of iterationsshow_every::T = 100
: Display progress at intervals of this number of iterations
The plot below was produced by running:
using MNIST
n = 5000
X = Array{Float64}(784, n)
labels = Array{Int64}(n)
for i = 1:n
X[:, i] = trainfeatures(i) ./ 255
labels[i] = trainlabel(i)
end
to load the data,
using TSNE
Y = tsne(X, 2, perplexity = 40.0)
to produce the low dimension map Y
, and
using Gadfly, Colors
set_default_plot_size(24cm, 18cm)
palette = distinguishable_colors(10)
p = plot(x = Y[1, :], y = Y[2, :], color=labels, Geom.point,
Guide.xlabel("y₁"), Guide.ylabel("y₂"),
Scale.color_discrete_manual(palette..., levels=collect(0:9)), Theme(colorkey_swatch_shape=:circle))
to produce the image.