Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SGD #118

Merged
Merged

Conversation

nineil
Copy link
Contributor

@nineil nineil commented Jun 9, 2014

SGD Classifier and Regressor added

The default parameters are:

'alpha': [0.000001, 0.00001, 0.0001, 0.001, 0.01], 'penalty': ['l1', 'l2', 'elasticnet']

@dan-blanchard dan-blanchard self-assigned this Jun 9, 2014
@dan-blanchard dan-blanchard added this to the 1.0 milestone Jun 9, 2014
@dan-blanchard
Copy link
Contributor

I should probably have mentioned this before, but we strictly follow the PEP8 style guidelines for our code. I know you're using PyCharm, and there's supposed to be a way to enable PEP8 checking while you're editing.

@@ -35,6 +35,7 @@
from sklearn.preprocessing import StandardScaler
from sklearn.svm.base import BaseLibLinear
from sklearn.utils import shuffle as sk_shuffle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this extra blank line here.

@dan-blanchard
Copy link
Contributor

To ensure that there are unit tests that use SGDClassifier and SGDRegressor, please add SGDRegressor and RescaledSGDRegressor to tests/configs/test_regression1.template.cfg, and SGDClassifier to tests/configs/test_sparse.template.cfg

I also just realized you haven't add a rescaled version of SGDRegressor to learner.py (because I forgot to tell you to 😄). You'll just want to add the following three lines around line 340 or so:

@rescaled
class RescaledSGDRegressor(SGDRegressor):
    pass

All that's different with the rescaled versions of the regressors is that the predictions are rescaled and constrained to better match the training set.

@dan-blanchard dan-blanchard mentioned this pull request Jun 9, 2014
12 tasks
@dan-blanchard
Copy link
Contributor

Not that it has to be done for this PR, but one thing we also talked about as necessary for really being able to use SGDClassifier is the need for kernel approximation support. I imagine we will implement this by adding another field to the config files that specifies what type of Sampler you would like to use, and then just making that transformation get applied after self.feat_selector.fit_transform() is called in Learner.train().

@@ -6,7 +6,7 @@ task=evaluate
train_location=
test_location=
featuresets=[["test_sparse"]]
learners=['LogisticRegression', 'LinearSVC', 'SVC', 'MultinomialNB', 'DecisionTreeClassifier', 'RandomForestClassifier', 'GradientBoostingClassifier']
learners=['LogisticRegression', 'LinearSVC', 'SVC', 'MultinomialNB', 'DecisionTreeClassifier', 'RandomForestClassifier', 'GradientBoostingClassifier', 'SGDClassifier'']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got an extra ' at the end of this line before the ] that is causing this unit test to fail. Take that out and everything should be fine, I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have corrected that error.

Hope that now works.

@@ -427,13 +444,13 @@ def __init__(self, model_type, probability=False, feature_scaling='none',
elif self._model_type == 'SVR':
self._model_kwargs['cache_size'] = 1000
self._model_kwargs['kernel'] = 'linear'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like somewhere around here you're going to want to set self._model_kwargs['loss'] to either log or modified_huber for SGDClassifier, because getting probabilities is only supported with those loss types, and we typically want the prediction probabilities. It's currently failing unit tests for this reason, as you can see here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oks.

Added. So now, our default loss function will be ‘log’

dan-blanchard added a commit that referenced this pull request Jun 12, 2014
@dan-blanchard dan-blanchard merged commit 3f25150 into EducationalTestingService:develop Jun 12, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants