gaussian_word_embeddings

C implementation of Luke Vilnis and Andrew McCallum Word Representations via Gaussian Embedding, in ICLR 2015 where each word is represented as a multivariate Gaussian distribution.

Installing

A GCC compiler is required for the installation. The code is compiled by running 'make'.

Learning

Embeddings can be learned by executing './learn -train [OPTIONS]' where file is the training corpus.

Example: './learn -train data.txt -output vec.txt -size 200 -window 5 -sample 1e-4 -binary 0 -iter 3'

Closest words

The 40 closest embeddings to a query word can be displayed with './distance ' where FILE contains word projections in the binary format.

Specificity evaluation

By executing './distance ' where FILE contains word projections in the binary format, the top 100 nearest words are displayed, sorted by descending variance.

Reformatting of embeddings

Word embeddings in binary format can be converted to readable (text) format with: './binary2text ' where FILE contains word projections in the binary format. It is also possible to drop the header and/of the covariance matrix and separate the means and the covariance matrices into different files.

Visualization

An example of how to visualize the embeddings is shown in visualize.m. It requires the embeddings generated by binary2text with the option -sep-mat 1 and LS-SVMlab imported.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
binary2text.c		binary2text.c
distance.c		distance.c
import_vectors.m		import_vectors.m
learn.c		learn.c
makefile		makefile
specificity.c		specificity.c
visualize.m		visualize.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

gaussian_word_embeddings

Installing

Learning

Closest words

Specificity evaluation

Reformatting of embeddings

Visualization

About

Releases

Packages

Languages

License

ebritoc/gaussian_word_embeddings

Folders and files

Latest commit

History

Repository files navigation

gaussian_word_embeddings

Installing

Learning

Closest words

Specificity evaluation

Reformatting of embeddings

Visualization

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages