Version 1.0.0: Tagr Data Science Experimentation Library

Latest

Latest

ericlee0112 released this 21 Mar 21:23

· 70 commits to master since this release

4341648

What is Tagr ?

A cloud agnostic data science productivity tool that will:

help streamline the data science experimentation process
allow data scientists to manage models and experiment data
seamlessly integrate with different cloud storage providers. As of right now, v1.0.0 currently supports Amazon S3

Instructions

Import tagr

from tagr.tagging.artifacts import Tags
from tagr.config import EXP_OBJECTS, OBJECTS

After building your model and performing exploratory data analysis of your dataset, tag your training/testing/prediction datasets and model

x = tag.save(mock_df1, "X_train", "int")
y = tag.save(mock_df2, "y_train")
model = tag.save(RandomForestClassifier(max_depth=30), "model")
lin_model = tag.save(LinearRegression(), 'linmodel', 'model')
y_pred = tag.save(mock_df3, 'y_pred')

View what artifacts you have tagged so far

tag.inspect()

Push all your tagged artifacts to a cloud storage solution of your choice

# s3
tag.flush('waterflow-tagr', 'dev/eric', 'aws', 'demo')

# local
tag.flush('waterflow-tagr', 'eric', 'demo')

Assets 2