Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading a saved LabelModel and using it to predict #1460

Closed
cdeepakroy opened this issue Sep 12, 2019 · 3 comments · Fixed by #1463
Closed

Error loading a saved LabelModel and using it to predict #1460

cdeepakroy opened this issue Sep 12, 2019 · 3 comments · Fixed by #1463
Assignees
Labels

Comments

@cdeepakroy
Copy link

cdeepakroy commented Sep 12, 2019

Issue description

I wanted to save a label model trained within a jupyter notebook and use it in standalone python scripts elsewhere.

I used snorkel.labeling.LabelModel.save() method to save the model. Then, I loaded the model using the snorkel.labeling.LabelModel.load() method and it throws the following error:

AttributeError: 'LabelModel' object has no attribute 'c_tree'

Code example/repro steps

import numpy as np
import snorkel.labeling

L_train = np.random.randint(-1, 2, size=(10**6, 10), dtype=np.int8)

lm = snorkel.labeling.LabelModel()
lm.fit(L_train)

lm.save('label_mode.pt')  # open this file and you will see the aforementioned error

lm2 = snorkel.labeling.LabelModel()
lm2.load('label_model.pt')
lm2.predict(L_train)  # throws AttributeError: 'LabelModel' object has no attribute 'c_tree'

Error stack trace

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-57-39ae7bf111ea> in <module>
      1 lm2 = snorkel.labeling.LabelModel()
      2 lm2.load('label_model.pt')
----> 3 lm2.predict(L_train)

~\AppData\Local\Continuum\anaconda3\envs\snorkel\lib\site-packages\snorkel\labeling\model\label_model.py in predict(self, L, return_probs, tie_break_policy)
    429         array([0, 1, 0])
    430         """
--> 431         Y_probs = self.predict_proba(L)
    432         Y_p = probs_to_preds(Y_probs, tie_break_policy)
    433         if return_probs:

~\AppData\Local\Continuum\anaconda3\envs\snorkel\lib\site-packages\snorkel\labeling\model\label_model.py in predict_proba(self, L)
    377         L_shift = L + 1  # convert to {0, 1, ..., k}
    378         self._set_constants(L_shift)
--> 379         L_aug = self._get_augmented_label_matrix(L_shift)
    380         mu = np.clip(self.mu.detach().clone().numpy(), 0.01, 0.99)
    381         jtm = np.ones(L_aug.shape[1])

~\AppData\Local\Continuum\anaconda3\envs\snorkel\lib\site-packages\snorkel\labeling\model\label_model.py in _get_augmented_label_matrix(self, L, higher_order)
    178                     [
    179                         j
--> 180                         for j in self.c_tree.nodes()
    181                         if i in self.c_tree.node[j]["members"]
    182                     ]

~\AppData\Local\Continuum\anaconda3\envs\snorkel\lib\site-packages\torch\nn\modules\module.py in __getattr__(self, name)
    537                 return modules[name]
    538         raise AttributeError("'{}' object has no attribute '{}'".format(
--> 539             type(self).__name__, name))
    540 
    541     def __setattr__(self, name, value):

AttributeError: 'LabelModel' object has no attribute 'c_tree'

System info

  • How you installed Snorkel (conda, pip, source): conda
  • OS: Windows 10
  • Python version: 3.7.4
  • Snorkel version: 0.9.0
@cdeepakroy cdeepakroy changed the title Error saving trained LabelModel using save() method Error loading a saved LabelModel and using it to predict Sep 12, 2019
@paroma paroma self-assigned this Sep 13, 2019
@paroma paroma added the bug label Sep 13, 2019
@paroma
Copy link
Contributor

paroma commented Sep 13, 2019

Thank you posting details of this error, we were able to reproduce it. The c_tree variable is created in the fit() method, which is why it throws this error. For now, training the second LabelModel instance with a dummy L matrix before loading the model should work.

We will fix this bug in the upcoming release.

Here's a modification to your example that will get it working:

import numpy as np
import snorkel.labeling

L_train = np.random.randint(-1, 2, size=(10**6, 10), dtype=np.int8)
lm = snorkel.labeling.LabelModel()
lm.fit(L_train)
lm.save('label_model.pt') 

#an additional call to .fit() with a dummy L here
L_train_dummy = np.random.randint(-1, 2, size=(10**6, 10), dtype=np.int8)
lm2 = snorkel.labeling.LabelModel()
lm2.fit(L_train_dummy)
lm2.load('label_model.pt')
lm2.predict(L_train)

#check predictions are as expected
original_preds = lm.predict(L_train)
loaded_preds = lm2.predict(L_train)
np.sum(original_preds != loaded_preds) #should return 0

@cdeepakroy
Copy link
Author

cdeepakroy commented Sep 13, 2019

@paroma Thanks a lot for looking into this and suggesting a work around.

Could you explain what c_tree attribute is? It seems to be of type networkx.classes.graph.Graph. In my case the graph has 10 nodes (equal to number of label functions) and 0 edges. Is this the graph encoding the graphical model representation of the relationships between random variables corresponding to the label functions and the predicted label Y? If so, I am concerned if refitting on the dummy matrix will learn wrong relationships between the random variables.

I tried to run print(lm.state_dict()) and found that it has a tensor called mu


OrderedDict([('mu', tensor([[0.3690, 0.2977],
        [0.2977, 0.3690],
        [0.3693, 0.2975],
        [0.2975, 0.3691],
        [0.3691, 0.2977],
        [0.2976, 0.3690],
        [0.3690, 0.2977],
        [0.2976, 0.3691],
        [0.3689, 0.2976],
        [0.2977, 0.3690],
        [0.3691, 0.2975],
        [0.2976, 0.3691],
        [0.3690, 0.2975],
        [0.2976, 0.3692],
        [0.3690, 0.2977],
        [0.2976, 0.3690],
        [0.3692, 0.2974],
        [0.2975, 0.3692],
        [0.3690, 0.2975],
        [0.2975, 0.3692]]))])

Could you explain what the tensor mu is used for or point me to a paper where the mu notation is used? In my case the shape of mu is torch.Size([20, 2]) for 10 labeling functions. If all the information needed for predict is present the state_dict, then I was thinking I can save the state_dict and try to write a standalone function that calculates the prediction.

@paroma
Copy link
Contributor

paroma commented Sep 20, 2019

mu and c_tree (junction tree) are defined in the AAAI'19 paper. And try out the new save/load methods for LabelModel, thanks for pointing this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants