GitHub - bws0013/retrosheet-to-sql: A python 3 tool to add retrosheet data into a mysql database for later analysis

Last Big Edit Date

11/04/20

Description

Read in retrosheet data and add it to a database for easier access.

Roadmap

Current goals include

Skipping converting data to json before adding it to db.
Improving how games are determined to have already been read in through the database
Clean up code base to remove unneeded files.
Automate more of the run instructions

My Run Environment

$ python --version
Python 3.8.6

$ pip --version
pip 20.2.3 from /usr/local/lib/python3.8/site-packages/pip (python 3.8)

$ mysql --version
mysql  Ver 8.0.21 for osx10.13 on x86_64 (Homebrew)

$ bash --version
GNU bash, version 3.2.57(1)-release (x86_64-apple-darwin17)
Copyright (C) 2007 Free Software Foundation, Inc.

Run Instructions

When running on a linux box I needed used root the entire time
Make sure you have a similar Run Environment to what I have above
- Python3, Pip3, Mysql, Bash
- Note that when installing and setting up mysql you should set the root password as password unless you want to change it in db.py
Download dependencies per requirements.txt.
- pip install -r requirements.txt
Create a mysql database called baseballdata
- CREATE DATABASE baseballdata;
Run the setup.sh script to create the required directories
- bash setup.sh
- Optionally these can be created manually
Change directories in the code directory cd code
Change the year in main.py or keep it where it is (2016)
Run main.py
- python main.py
Access the database and use this data
- Login to the mysql database sudo mysql -uroot -p
- Select the baseballdata database with use baseballdata;
Make queries against the data in mysql

Run Notes

The only files that can be read right now are the "Regular Season Event Files" which represent a large (though not complete) set of the retrosheet data. This limitation is due to how data files are downloaded from the retrosheet site and is a limitation I am looking to solve for.

The check for if one or more games has been read in currently works by taking all game ids from the database before any others are added. If the new game ids being read in are a subset of the existing game ids the file is not read in. This has some problems in the event a file was only partially read in and is something I will look to improve.

Notes on files

The sql_tables.sql file contains the tables that are present in the database.

The structure.json file contains an example of what the json object looks like before its added to the database.

The parse_guide.md file is an explanation of data in each field. This was a modified guide from the better ones found on retrosheets.

As far as code goes the code/main.py file is the primary file to run. It will take a take in a year, download the zip file for the games of that year, extract the files, convert them into json objects, then send the json data to mysql.

There is a config file code/config.ini that I plan to utilize more later on. I would not recommend changing its values unless you know what you are doing.

Required Notice Related To Data Usage

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Requirement found here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Last Big Edit Date

Description

Roadmap

My Run Environment

Run Instructions

Run Notes

Notes on files

Required Notice Related To Data Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code		code
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
parse_guide.md		parse_guide.md
requirements.txt		requirements.txt
setup.sh		setup.sh
sql_tables.sql		sql_tables.sql
structure.json		structure.json

License

bws0013/retrosheet-to-sql

Folders and files

Latest commit

History

Repository files navigation

Last Big Edit Date

Description

Roadmap

My Run Environment

Run Instructions

Run Notes

Notes on files

Required Notice Related To Data Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages