embedland

Theoretically this is a universe of code for playing with embeddings. In reality it contains one file. More to come, I hope.

bench.py

This file benchmarks various embeddings using the Enron email corpus. Once you install the various libraries it needs, you can run it with python bench.py. It will:

Download the Enron email dataset.
Unzip it.
Attempt to run embeddings on it (with OpenAI's embedder as a default, you can change that at the end of the file to T5, or some other engine.)
Cluster the embeddings.
Label the clusters by sampling the subject lines from the clusters and sending them to GPT-3.
Show you a pretty chart, like the one you see above.

viz.py

Visualization helper. This file helps you go from "a list of embeddings" to "something pretty to look at".

TODO:

Make longer embeddings work by chunking and averaging out the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

embedland

bench.py

viz.py

TODO:

Files

README.md

Latest commit

History

README.md

File metadata and controls

embedland

bench.py

viz.py

TODO: