Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First try at saving/loading records from pickle #504

Merged
merged 5 commits into from
Oct 29, 2020

Conversation

FraPochetti
Copy link
Contributor

So, I added get_root_dir to data_dir.py but tests are failing because they cannot find the function.
Am I screwing something up with the imports?

FAILED parsers/test_coco_parser.py::test_bbox_parser - NameError: name 'get_root_dir' is not defined
FAILED parsers/test_coco_parser.py::test_mask_parser - NameError: name 'get_root_dir' is not defined
FAILED parsers/test_parser.py::test_parser - NameError: name 'get_root_dir' is not defined
FAILED parsers/test_voc_parsers.py::test_voc_annotation_parser - NameError: name 'get_root_dir' is not defined
FAILED parsers/test_voc_parsers.py::test_voc_mask_parser - NameError: name 'get_root_dir' is not defined

@FraPochetti FraPochetti mentioned this pull request Oct 23, 2020
@lgvaz lgvaz linked an issue Oct 24, 2020 that may be closed by this pull request
Copy link
Collaborator

@lgvaz lgvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks great! Exactly what we discussed in the issue!

One test is failing because a parser is being used twice and the cache is being loaded while the data changed, all we have to do is pass use_cached=False to it. You can fix that if you want or I can fix it before mergining.

Just one more request before we merge this, can you write a simple test checking the pickle file was saved correctly? I understand creating tests can be hard at first but I'll help with everything I can.

Go to tests/parsers/test_parser.py and you will find the function test_parser, at the end of the asserts add a new check to verify the pickle file exists. Additionaly you can load the pickle file and assert it has the same contents as records

@FraPochetti
Copy link
Contributor Author

One test is failing because a parser is being used twice and the cache is being loaded while the data changed

I wanted to check which one you were referring to as all tests were passing for me locally (I mean pytest parsers), and something super awkward happened.
I re-run pytest parsers and go the following

FAILED parsers/test_coco_parser.py::test_bbox_parser - AttributeError: Can't get attribute 'RecordBaseRecordImageidRecordMixinFilepathRecordMixinSizeRecordMixinLabelsRecordMixinAreasRecordMixinIsCrowdsRecordMixinBBoxesRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobject' on <module '...
FAILED parsers/test_coco_parser.py::test_mask_parser - AttributeError: Can't get attribute 'RecordBaseRecordImageidRecordMixinFilepathRecordMixinSizeRecordMixinLabelsRecordMixinAreasRecordMixinIsCrowdsRecordMixinBBoxesRecordMixinMasksRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobje...
FAILED parsers/test_parser.py::test_parser - AttributeError: Can't get attribute 'RecordBaseRecordImageidRecordMixinSizeRecordMixinFilepathRecordMixinLabelsRecordMixinBBoxesRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobject' on <module '__main__' from '/home/fra/miniconda3/envs/ice...
FAILED parsers/test_voc_parsers.py::test_voc_annotation_parser - AttributeError: Can't get attribute 'RecordBaseRecordImageidRecordMixinFilepathRecordMixinSizeRecordMixinLabelsRecordMixinBBoxesRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobject' on <module '__main__' from '/home/fra...
FAILED parsers/test_voc_parsers.py::test_voc_mask_parser - AttributeError: Can't get attribute 'RecordBaseRecordImageidRecordMixinFilepathRecordMixinSizeRecordMixinLabelsRecordMixinBBoxesRecordMixinMasksRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobject' on <module '__main__' from ...

Is this what you are referring to?

@lgvaz
Copy link
Collaborator

lgvaz commented Oct 24, 2020

Is this what you are referring to?

hahahah that was not the one, for some reason that is not happening on the CI, you can check the full run here

__________________________ test_voc_annotation_parser __________________________

samples_source = Path('/home/runner/work/icevision/icevision/samples')
voc_class_map = <ClassMap: {'background': 0, 'aeroplane': 1, 'bicycle': 2, 'bird': 3, 'boat': 4, 'bottle': 5, 'bus': 6, 'car': 7, 'cat... 'horse': 13, 'motorbike': 14, 'person': 15, 'pottedplant': 16, 'sheep': 17, 'sofa': 18, 'train': 19, 'tvmonitor': 20}>

    def test_voc_annotation_parser(samples_source, voc_class_map):
        annotation_parser = parsers.voc(
            annotations_dir=samples_source / "voc/Annotations",
            images_dir=samples_source / "voc/JPEGImages",
            class_map=voc_class_map,
        )
        records = annotation_parser.parse(data_splitter=SingleSplitSplitter())[0]
    
>       assert len(records) == 2
E       assert 4 == 2
E        +  where 4 = len([Record:\n	- Image ID: 0\n	- Filepath: /home/runner/work/icevision/icevision/samples/fridge/odFridgeObjects/images/10.jp...s: [2, 4]\n	- BBoxes: [<BBox (xmin:56, ymin:148, xmax:209, ymax:514)>, <BBox (xmin:328, ymin:180, xmax:449, ymax:502)>]])

tests/parsers/test_voc_parsers.py:12: AssertionError

@FraPochetti
Copy link
Contributor Author

Interesting! :D let me look into this. I might have declared victory too soon.

@FraPochetti
Copy link
Contributor Author

Ok, so the awkward errors I refer to, occur when I run pytest parsers the second time, not the first (which I why I did not catch them in the first place). So, clearly, given use_cached=True, the pickle it is loading back is not what it expects to be.

@FraPochetti
Copy link
Contributor Author

FraPochetti commented Oct 24, 2020

I am still trying to figure out what the heck this means RecordBaseRecordImageidRecordMixinSizeRecordMixinFilepathRecordMixinLabelsRecordMixinBBoxesRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobject

In the meantime, getting back to what you commented

One test is failing because a parser is being used twice and the cache is being loaded while the data changed

Are you referring to test_voc_annotation_parser?
parsers.voc is indeed invoked twice, the first time within test_voc_annotation_parser, the second within test_voc_mask_parser.
It seems the first is failing, not the second, so it has nothing to do with the cache.

Also, when I run pytest parsers on my local machine everything looks good (the second time I get the above awkward error).
Are CI tests somehow different?
image

@FraPochetti
Copy link
Contributor Author

FraPochetti commented Oct 24, 2020

Btw, the screenshot above is already after adding the 2 asserts you requested to test_parsers.py 😉

@FraPochetti
Copy link
Contributor Author

So, I think I found the culprit but I am still having a hard time figuring this out.
First, let me explain how we got here and how this issue fell through the cracks:

  1. First mistake: I run my experiments in Colab. Everything worked well there, but obviously, to get it to work I had to hack some source files, so eventually, I lost some critical interdependencies across modules. I thought all was good but just because everything was defined in the notebook (one unique file).
  2. Second mistake: Once done with Colab, I copy-pasted the new parser.parse function in VSCode, in my local env. I then run pytest parsers. All tests passed. I run those once, though. In this first pass, all pickle files got created as expected. I should have run a second time to check whether loading back worked too (given use_cached=True). I did not. If I had done so I would have stumbled upon the following (this is one example; all tests fail in the same way):
AttributeError: Can't get attribute 'RecordBaseRecordImageidRecordMixinSizeRecordMixinFilepathRecordMixinLabelsRecordMixinBBoxesRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobject' on <module '__main__' from '/home/fra/miniconda3/envs/icevision/bin/pytest'>

This monster is driven by this annoying pickle issue, which, tbh, I am not even sure I completely get.
I tried to implement a couple of proposed solutions, with no luck.
Keep you posted.

@FraPochetti
Copy link
Contributor Author

I have played around with this quite a bit without much luck.
The issue is described here and here and it basically boils down to loading a pickled file in a module that is different from the module where I pickled the file.
Do you have any suggestions?

@FraPochetti
Copy link
Contributor Author

Maybe we can bring this to the attention of the community in Discord?

@lgvaz
Copy link
Collaborator

lgvaz commented Oct 25, 2020

I am still trying to figure out what the heck this means RecordBaseRecordImageidRecordMixinSizeRecordMixinFilepathRecordMixinLabelsRecordMixinBBoxesRecordMixinRecordMixinMutableMappingMappingCollectionSizedIterableContainerobject

Oh, I can help you with that! This was a new addition and still has to be documented, it's a hack for making a local object pickable. It would be easier to explain it in a call, are you available at any time in the next week?

This monster is driven by this annoying pickle issue, which, tbh, I am not even sure I completely get.

Hah! Pickle it's then!!! Okay okay, I have some ideas, the easiest and less stresfull solution would be for us to jump into a call and take a look at this, as I said in the previous comment I can explain why we are doing this and what is the best way of handling what is happening.

I think this is the only place on the library where some python black magic is happening, and it was just added last week (with the autofix) so it's still not documented, I was not expecting that this issue would stumble into that, sry =x

@FraPochetti
Copy link
Contributor Author

Sure, let's do this!
Super eager to learn something new :D
I am available anyday next week, almost anytime.
We are 4 hours apart (myself in CET, yourself in GMT-3).
Just pick a slot and we can arrange something.

@FraPochetti
Copy link
Contributor Author

And no need to be sorry man!

@lgvaz
Copy link
Collaborator

lgvaz commented Oct 25, 2020

I am available anyday next week, almost anytime.

Nice! I'll message you on Discord tomorrow and we can find a common time =)

@FraPochetti
Copy link
Contributor Author

image

@FraPochetti
Copy link
Contributor Author

FraPochetti commented Oct 28, 2020

Ok... this does not happen to me locally
image

Locally, I am running tests twice to force pickle reloading
image

@FraPochetti
Copy link
Contributor Author

I think I found the issue.
Keep you posted.

@FraPochetti
Copy link
Contributor Author

image

@codecov
Copy link

codecov bot commented Oct 29, 2020

Codecov Report

Merging #504 into master will increase coverage by 0.20%.
The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #504      +/-   ##
==========================================
+ Coverage   86.42%   86.63%   +0.20%     
==========================================
  Files          91       91              
  Lines        2181     2184       +3     
==========================================
+ Hits         1885     1892       +7     
+ Misses        296      292       -4     
Flag Coverage Δ
#unittests 86.63% <87.50%> (+0.20%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
icevision/core/record.py 92.30% <ø> (+1.00%) ⬆️
icevision/utils/data_dir.py 80.00% <66.66%> (-7.50%) ⬇️
icevision/data/convert_records_to_coco_style.py 97.80% <75.00%> (+7.41%) ⬆️
icevision/parsers/parser.py 86.11% <84.61%> (-3.55%) ⬇️
icevision/core/mask.py 87.12% <89.47%> (-1.17%) ⬇️
icevision/core/record_mixins.py 90.00% <100.00%> (+1.33%) ⬆️
icevision/parsers/mixins/parser_mixins.py 81.19% <100.00%> (ø)
icevision/tfms/albumentations/tfms.py 68.88% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cc1c45c...6f230e9. Read the comment docs.

Copy link
Collaborator

@lgvaz lgvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! This issue was way harder than I originally expected but after a lot of figuring it out I think we reached a very good solution!

The final requested change is just to use a temporary path to save the pickle file so it automatically gets deleted after the testing session is over. You can find more info about it here.


Also, be sure to add this do the CHANGELOG.md into the Added section, say parser.parse has a new cache_filepath argument and point to this PR =)

@@ -43,8 +43,9 @@ def bboxes(self, o) -> List[BBox]:

def test_parser(data):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_parser(data):
def test_parser(data, tmpdir):

@@ -43,8 +43,9 @@ def bboxes(self, o) -> List[BBox]:

def test_parser(data):
parser = SimpleParser(data)

cache_filepath = Path("simple_parser.pkl")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cache_filepath = Path("simple_parser.pkl")
cache_filepath = Path(tmpdir / "simple_parser.pkl")

@lgvaz
Copy link
Collaborator

lgvaz commented Oct 29, 2020

Alright, perfect!! Here we go Francesco!

Thanks a lot again ^^

@lgvaz lgvaz merged commit be2c4ba into airctic:master Oct 29, 2020
@FraPochetti FraPochetti deleted the pickle_records_fra branch October 29, 2020 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pickle records
2 participants