Skip to content

degiz/crate_test_repo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Test assignment for CrateDB

Thanks a lot for the testing task, I had a lot of fun working on it!

Installation

I've used python3 + virtual env, the only external dependency was crate lib

python3 -m venv ./venv
source ./venv/bin/activate
pip install crate

Running the code

Database connection can be configured with config.ini.

I'm assuming that CrateDB is running in docker on port 4200, and we can use default user "crate". I'm also including dataset to the git.

In order to run the script, you can use following command:

python3 main.py -c config.ini -i ./GTFS

Some of the dataset files have over 4M lines to insert, that's quite a task for a relational DB.

On my machine with 3,5 GHz Dual-Core Intel Core i7 I've used a threading pool with 4 workers, and got folowwing results:

python3 main.py -c config.ini -i ./GTFS  58.78s user 1.79s system 23% cpu 4:12.42 total

In order to be able to add dataset to github, I had to truncate shapes.txt and stop_times.txt. If you want to test original dataset, feel free to pass the path with "-i" option

Should I continue improving the performance, I'd focus on splitting huge files (over 1M lines) into smaller portions and deal with then in parallel

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages