Skip to content
This repository has been archived by the owner on Aug 27, 2019. It is now read-only.
/ calc-analysis Public archive

Data science experiments and analysis regarding the CALC project.

License

Notifications You must be signed in to change notification settings

18F/calc-analysis

Repository files navigation

This repository contains analysis and science experiments regarding CALC data, mostly in the form of IPython/Jupyter notebooks.

Reading the notebooks

If you only want to read the notebooks, you can actually just use GitHub--look for the files ending in .ipynb. Here's a few to get you started:

  • Anomaly detection can help us find invalid/incorrect data in CALC.

  • Fun with CALC and pre-trained GloVe word embeddings can help us perform semantic search in CALC, e.g. searching for "tutor" and getting results like "Principal Instruction Technologist".

  • Log analysis contains some open-ended exploration of CALC's API logs, to learn how people are using CALC (note that CALC's API is used by CALC's front page).

  • Contract analysis contains open-ended exploration of CALC's contract data.

  • SIN analysis attempts to parse and make sense of CALC's unstructured Special Item Number (SIN) data.

Interacting with the notebooks

If you want to interactively explore the notebooks or create new ones, you'll need to clone this repository locally.

This project's git repository uses Git Large File Storage to store some of our large data assets. Please install it before cloning. (If you've already cloned the repository, you can obtain the large files after installing Git LFS by running git lfs pull.)

If you're familiar with virtualenvs and pip, you can use them to install all dependencies:

virtualenv venv

# On Windows, replace the following line with 'venv\Scripts\activate'.
source venv/bin/activate

pip install -r requirements.txt

On the other hand, if you're not familiar with virtualenvs, you can probably just install Anaconda, as it ships with most/all the dependencies you'll need.

Once you've done either of those, just run jupyter notebook in the root directory of the repository and use its web interface to explore any of the notebooks.

About

Data science experiments and analysis regarding the CALC project.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages