Add SGD #118

nineil · 2014-06-09T13:14:39Z

SGD Classifier and Regressor added

The default parameters are:

'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01], 'penalty': ['l1', 'l2', 'elasticnet']

dan-blanchard · 2014-06-09T13:53:17Z

I should probably have mentioned this before, but we strictly follow the PEP8 style guidelines for our code. I know you're using PyCharm, and there's supposed to be a way to enable PEP8 checking while you're editing.

dan-blanchard · 2014-06-09T13:53:58Z

skll/learner.py

@@ -35,6 +35,7 @@
 from sklearn.preprocessing import StandardScaler
 from sklearn.svm.base import BaseLibLinear
 from sklearn.utils import shuffle as sk_shuffle
+


No need for this extra blank line here.

dan-blanchard · 2014-06-09T14:34:48Z

To ensure that there are unit tests that use SGDClassifier and SGDRegressor, please add SGDRegressor and RescaledSGDRegressor to tests/configs/test_regression1.template.cfg, and SGDClassifier to tests/configs/test_sparse.template.cfg

I also just realized you haven't add a rescaled version of SGDRegressor to learner.py (because I forgot to tell you to 😄). You'll just want to add the following three lines around line 340 or so:

@rescaled
class RescaledSGDRegressor(SGDRegressor):
    pass

All that's different with the rescaled versions of the regressors is that the predictions are rescaled and constrained to better match the training set.

dan-blanchard · 2014-06-09T14:58:32Z

Not that it has to be done for this PR, but one thing we also talked about as necessary for really being able to use SGDClassifier is the need for kernel approximation support. I imagine we will implement this by adding another field to the config files that specifies what type of Sampler you would like to use, and then just making that transformation get applied after self.feat_selector.fit_transform() is called in Learner.train().

and added the Rescaled SGDRegressor

dan-blanchard · 2014-06-11T13:21:02Z

tests/configs/test_sparse.template.cfg

@@ -6,7 +6,7 @@ task=evaluate
 train_location=
 test_location=
 featuresets=[["test_sparse"]]
-learners=['LogisticRegression', 'LinearSVC', 'SVC', 'MultinomialNB', 'DecisionTreeClassifier', 'RandomForestClassifier', 'GradientBoostingClassifier']
+learners=['LogisticRegression', 'LinearSVC', 'SVC', 'MultinomialNB', 'DecisionTreeClassifier', 'RandomForestClassifier', 'GradientBoostingClassifier', 'SGDClassifier'']


You've got an extra ' at the end of this line before the ] that is causing this unit test to fail. Take that out and everything should be fine, I think.

I have corrected that error.

Hope that now works.

dan-blanchard · 2014-06-11T13:58:32Z

skll/learner.py

@@ -427,13 +444,13 @@ def __init__(self, model_type, probability=False, feature_scaling='none',
        elif self._model_type == 'SVR':
            self._model_kwargs['cache_size'] = 1000
            self._model_kwargs['kernel'] = 'linear'
-


It looks like somewhere around here you're going to want to set self._model_kwargs['loss'] to either log or modified_huber for SGDClassifier, because getting probabilities is only supported with those loss types, and we typically want the prediction probabilities. It's currently failing unit tests for this reason, as you can see here.

Oks.

Added. So now, our default loss function will be ‘log’

Add SGD

SGDClassifier and Regressor added.

b014a9c

dan-blanchard self-assigned this Jun 9, 2014

dan-blanchard added this to the 1.0 milestone Jun 9, 2014

dan-blanchard added the enhancement label Jun 9, 2014

dan-blanchard reviewed Jun 9, 2014
View reviewed changes

dan-blanchard mentioned this pull request Jun 9, 2014

Add missing learners #7

Closed

12 tasks

nineil added 3 commits June 10, 2014 10:46

Conventions of PEP8 solved

6da4c0c

and added the Rescaled SGDRegressor

RescaledSGDRegressor and SGDRegressor unit test added

5a33e41

SGDClassifier unit test added

4f28f63

dan-blanchard reviewed Jun 11, 2014
View reviewed changes

SGDClassifier unit test corrected

5253d21

dan-blanchard reviewed Jun 11, 2014
View reviewed changes

SGDClassifier now supports Probability

7bbe8da

dan-blanchard added a commit that referenced this pull request Jun 12, 2014

Merge pull request #118 from nineil/feature/add_sgd

3f25150

Add SGD

dan-blanchard merged commit 3f25150 into EducationalTestingService:develop Jun 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SGD #118

Add SGD #118

nineil commented Jun 9, 2014

dan-blanchard commented Jun 9, 2014

dan-blanchard Jun 9, 2014

dan-blanchard commented Jun 9, 2014

dan-blanchard commented Jun 9, 2014

dan-blanchard Jun 11, 2014

nineil Jun 11, 2014

dan-blanchard Jun 11, 2014

nineil Jun 11, 2014

Add SGD #118

Add SGD #118

Conversation

nineil commented Jun 9, 2014

dan-blanchard commented Jun 9, 2014

dan-blanchard Jun 9, 2014

Choose a reason for hiding this comment

dan-blanchard commented Jun 9, 2014

dan-blanchard commented Jun 9, 2014

dan-blanchard Jun 11, 2014

Choose a reason for hiding this comment

nineil Jun 11, 2014

Choose a reason for hiding this comment

dan-blanchard Jun 11, 2014

Choose a reason for hiding this comment

nineil Jun 11, 2014

Choose a reason for hiding this comment