Skip to content

Latest commit

 

History

History
33 lines (24 loc) · 1.07 KB

1e3da42a-28b6-4b33-94a2-a5671f4102f4.md

File metadata and controls

33 lines (24 loc) · 1.07 KB

BOW

Bags of identifiers generated from 140,000 most starred projects on GitHub in October 2016 - ~112k after deduplication.

Example:

from sourced.ml.models import BOW
bow = BOW().load("1e3da42a-28b6-4b33-94a2-a5671f4102f4")
print("Number of documents:", len(bow))
print("Number of tokens:", len(bow.tokens))

References

ID 1e3da42a-28b6-4b33-94a2-a5671f4102f4
Uploaded 2017-06-19 09:16:08.942880
Version 1.0.0
File https://storage.googleapis.com/models.cdn.sourced.tech/models%2Fbow%2F1e3da42a-28b6-4b33-94a2-a5671f4102f4.asdf
Size 380.8 MB
Data collection date October 2016
Number of (sub)tokens 999,424
Number of repositories 112,273
License

Dependencies