forked from rapidsai/cuml
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request rapidsai#230 from tfeher/fea-ext-svm-notebook
Add SVM demo notebook
- Loading branch information
Showing
2 changed files
with
205 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,203 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Support Vector Machine\n", | ||
"Support vector machines are supervised machine learning methods that can be used for classification and regression. Currently cuML supports binary classification.\n", | ||
"\n", | ||
"The SVC classifier can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well as cuDF DataFrames/Series as the input.\n", | ||
"\n", | ||
"For information on converting your dataset to cuDF documentation: https://docs.rapids.ai/api/cudf/stable/\n", | ||
"\n", | ||
"For more information about cuML's Support Vector Classifier: https://docs.rapids.ai/api/cuml/stable/" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"import cuml.svm \n", | ||
"import sklearn.svm\n", | ||
"\n", | ||
"from sklearn.datasets.samples_generator import make_gaussian_quantiles\n", | ||
"from sklearn.model_selection import train_test_split" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Generate data" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"n_samples = 20000\n", | ||
"n_features = 200\n", | ||
"\n", | ||
"X, y = make_gaussian_quantiles(n_samples=n_samples, n_features=n_features, n_classes=2)\n", | ||
"\n", | ||
"#X, y = make_classification(\n", | ||
"# n_rows, n_cols, n_informative=2, n_redundant=0,\n", | ||
"# n_classes=n_classes, n_clusters_per_class=2, shuffle=False)\n", | ||
"\n", | ||
"X_train, X_test, y_train, y_test = train_test_split(X, y)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Define parameters" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"C = 1\n", | ||
"tol = 1e-3\n", | ||
"kernel = 'rbf'\n", | ||
"gamma = 'scale'" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## cuML Model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"cumlSVC = cuml.svm.SVC(kernel=kernel, C=C, tol=tol, gamma=gamma)\n", | ||
"cumlSVC.fit(X_train, y_train)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Scikit-learn Model" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"sklSVC = sklearn.svm.SVC(kernel=kernel, C=C, tol=tol, gamma=gamma)\n", | ||
"sklSVC.fit(X_train, y_train)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Prediction" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"cuml_pred = cumlSVC.predict(X_test)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"skl_pred = sklSVC.predict(X_test)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Compare Accuracy" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cuml_accuracy = np.sum(cuml_pred.to_array()==y_test) / y_test.shape[0] * 100\n", | ||
"skl_accuracy = np.sum(skl_pred==y_test) / y_test.shape[0] * 100\n", | ||
"print(\"Accuracy: cumlSVC {}%, sklSVC {}%\".format(cuml_accuracy, skl_accuracy))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Notes\n", | ||
"- The time measurements will be inaccurate for the first run. You can re-run the cells to get a better estimate of the execution time.\n", | ||
"\n", | ||
"- Currently the output of the prediction is a cuDF Series object. You can use the `to_array()` method to create a numpy array.\n", | ||
"\n", | ||
"- The training algorithm uses a cache in GPU memory to accelerate training. You can specify the size (in MiB) using the cache_size argument. This is more relevant for training with larger input size.\n", | ||
"\n", | ||
"- Similar to other cuML algorithms, cuML SVC is optimized both for single and double precision input data. If your problem allows it, then using single precision input can improve the execution time" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"%%time\n", | ||
"cumlSVC = cuml.svm.SVC(kernel=kernel, C=C, tol=tol, gamma='scale', cache_size=2000)\n", | ||
"cumlSVC.fit(X_train.astype(np.float32), y_train)" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |