Dockerfiles

This is the dockerfiles Dinosaur Dataset

Generation

The set of words from search-terms.json is read and parsed for prefixes. The prefixes are used to search for Docker containers (using the docker command line tool with limit 100) so find unique images that match. We do this search and save the results to pickle files until all terms are used. This happens in the script 0.find_containers.py.
Once we have many pickle files, each with a list of containers, we use 1.download_dockerfiles.py to issue web requests to do the downloads. The data is stored under data in lettered folders based on the first letter of the username, and then subfolders with username and reponame, respectively.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
.gitignore		.gitignore
0.find-container.py		0.find-container.py
1.download_dockerfiles.py		1.download_dockerfiles.py
2.count_dockerfiles.py		2.count_dockerfiles.py
LICENSE		LICENSE
README.md		README.md
dockerfiles.pkl		dockerfiles.pkl
dockerfiles.tar.gz		dockerfiles.tar.gz
init.py		init.py
search-terms.json		search-terms.json