This is the dockerfiles Dinosaur Dataset
- The set of words from search-terms.json is read and parsed for prefixes. The prefixes are used to search for Docker containers (using the docker command line tool with limit 100) so find unique images that match. We do this search and save the results to pickle files until all terms are used. This happens in the script 0.find_containers.py.
- Once we have many pickle files, each with a list of containers, we use 1.download_dockerfiles.py to issue web requests to do the downloads. The data is stored under data in lettered folders based on the first letter of the username, and then subfolders with username and reponame, respectively.
- v1.0.0 Includes 129,519 Dockerfiles, also detailed on DInosaur Datasets.