Library for ML feature type inference: https://github.com/pvn25/MLDataPrepZoo/tree/master/MLFeatureTypeInference
Due to git-lfs limits, the resources files are moved to: https://drive.google.com/drive/folders/1eC8F5pO2hSoQf4RQM7zww49y2ZbLIvqG
By default, these resources will be auto downloaded the first time you run the program. If for some reason, this does not work you can try manual download.
- Install the package using python-pip
git clone https://github.com/pvn25/SortingHatLib.git
pip install SortingHatLib/
- Import the library using
import sortinghat.pylib as pl
- Read in csv file using pandas
dataDownstream = pd.read_csv('adult.csv')
- Perform base featurization of the raw CSV file:
dataFeaturized = pl.FeaturizeFile(dataDownstream)
- bigram feature extraction for Random Forest:
dataFeaturized1 = pl.FeatureExtraction(dataFeaturized)
- Finally, load the model for prediction
y_RF = pl.Load_RF(dataFeaturized1)