Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Sygil-Dev/whoosh-reloaded #1645

Open
UlrichB22 opened this issue Mar 12, 2024 · 15 comments
Open

Switch to Sygil-Dev/whoosh-reloaded #1645

UlrichB22 opened this issue Mar 12, 2024 · 15 comments
Milestone

Comments

@UlrichB22
Copy link
Collaborator

We use whoosh for indexing. The currently used package https://github.com/mchaput/whoosh is no longer maintained.
There is a successor at https://github.com/Sygil-Dev/whoosh-reloaded.

@RogerHaase RogerHaase added this to the 2.0.0b1 milestone Apr 3, 2024
@UlrichB22 UlrichB22 self-assigned this Apr 3, 2024
@ThomasWaldmann
Copy link
Member

Yes, whoosh-reloaded definitely looks more alive.

Maybe that also fixes the warnings seen about whoosh using "is" instead of "==".

@UlrichB22
Copy link
Collaborator Author

UlrichB22 commented Apr 5, 2024

Some tests are failing with whoosh-reloaded 2.7.5. Following command gives an error:

pytest -k test_index_update src/moin/storage/middleware/_tests/test_indexing.py

======================================================================================== ERRORS =========================================================================================
_____________________________________________________________ ERROR at teardown of TestIndexingMiddleware.test_index_update _____________________________________________________________

cfg = <class 'moin._tests.wikiconfig.Config'>

    @pytest.fixture
    def app_ctx(cfg):
        namespace_mapping, backend_mapping, acl_mapping = create_simple_mapping(
            "stores:memory:",
            cfg.default_acl
        )
        [...] 
        before_wiki()
    
        yield app, ctx
    
        teardown_wiki('')
        ctx.pop()
        try:
            # simulate ERROR PermissionError:
            # [WinError 32] The process cannot access the file because it is being used by another process
>           assert [] == get_open_wiki_files()
E           AssertionError: assert [] == [popenfile(pa...flags=557056)]
E             
E             Right contains 5 more items, first extra item: popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/all_revs_xww72entzpa6xbrs.seg', fd=18, position=14490, mode='r', flags=557056)
E             Use -v to get more diff

src/moin/conftest.py:76: AssertionError
---------------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------------
DEBUG    passlib.registry:registry.py:296 registered 'sha512_crypt' handler: <class 'passlib.handlers.sha2_crypt.sha512_crypt'>
------------------------------------------------------------------------------- Captured stdout teardown --------------------------------------------------------------------------------
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/all_revs_xww72entzpa6xbrs.seg', fd=18, position=14490, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/all_revs_5kyi50kvnnqlsouv.seg', fd=19, position=11728, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/latest_revs_c93258ojfx0q51ng.seg', fd=20, position=37513, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/latest_revs_c93258ojfx0q51ng.seg', fd=21, position=37513, mode='r', flags=557056)
open wiki popenfile(path='/home/moindev/moin/src/moin/_tests/wiki/index/latest_revs_wrjovyaf9q1yy3n4.seg', fd=22, position=22249, mode='r', flags=557056)
================================================================================ short test summary info ================================================================================
ERROR src/moin/storage/middleware/_tests/test_indexing.py::TestIndexingMiddleware::test_index_update - AssertionError: assert [] == [popenfile(pa...flags=557056)]
======================================================================= 1 passed, 22 deselected, 1 error in 0.78s =======================================================================

The index is not closed as before with whoosh 2.7.4. Is this important or can we just remove the failing assert statement?
Maybe it is related to this old PR mchaput/whoosh#15 that has been merged into whoosh-reloaded: Sygil-Dev/whoosh-reloaded#8 .

@ThomasWaldmann
Copy link
Member

Check if the amount of open files stays at some level or is always increasing.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Apr 5, 2024

Did some experiments:

@@ -73,9 +73,10 @@ def app_ctx(cfg):
     create_app_ext()  # calls: init_backends(), opens indexes
     ...
     try:
         # simulate ERROR PermissionError:
         # [WinError 32] The process cannot access the file because it is being used by another process
-        assert [] == get_open_wiki_files()  # FAILS!
+        assert len(get_open_wiki_files()) < 6  # doesn't fail
     finally:
         destroy_app(app)  # calls: deinit_backends(), which closes the indexes
+        assert len(get_open_wiki_files()) < 1  # doesn't fail 

@RogerHaase Can you remember why you added this?

It looks like the original first assert is expecting too much (the backend is not deinitialised yet).

The second assert (which I added to check) succeeds, so everything is closed correctly there.

@RogerHaase
Copy link
Member

No, don't remember. In general, Windows has similar non-repeatable issues with files left open.

I am thinking that def get_open_wiki_files(): or similar may help with #1626 which occurs every 6 months or so.

@RogerHaase
Copy link
Member

@bylsmad do you remember why the above was added?

@ThomasWaldmann
Copy link
Member

I think the check for open files might be useful, but it should be done at that 2nd place, when we can really expect index files to be closed.

@UlrichB22
Copy link
Collaborator Author

Now struggling with index files left open by whoosh, but I have the failure reproducible on Linux now so hoping to have a clean test run soon

ERROR src/moin/_tests/test_user.py::TestUser::test_recovery_token - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\moin\\moin\\src\\moin\\_tests\\wiki\\index\\all_revs_8zhbn7j04fuf7rlb.seg'

latest test fail: https://github.com/bylsmad/moin/actions/runs/5102359184/jobs/9171871358

Originally posted by @bylsmad in #1455 (comment)


The old logs are not accessible any more. For me it looks like the destroy_app(app) call failed with WinError 32 and the assert has been added to show the filenames for debugging. I have no Win environment to test.

@bylsmad
Copy link
Contributor

bylsmad commented Apr 7, 2024

the check for the open files is there because the test tear down deletes all the files in the test wiki

the tear down was intermittently failing on windows because of open file handles, windows does not allow deletion of the index files when there are open file handles

the assertion was added to allow me to test this issue on Linux by raising an AssertionError which mirrors the windows PermissionError for attempt to delete the index when there are open file handles

bottom line is the test will fail on windows if there remain open file handles when you reach teardown, I found that adding a gc.collect in store_revision was the actual fix to prevent whoosh from holding the index files handles, this worked because the call to close on the file handles was inside some whoosh object's __del__ method

@ThomasWaldmann
Copy link
Member

Guess the "tear down" should not delete the whoosh index files as long as they could be open. Maybe something there is in wrong order, they are closed in destroy_app.

@UlrichB22
Copy link
Collaborator Author

I have found that removing gc.collect() in indexing.py fixes the issue.

gc.collect() affects performance and IMO should not be used in every store_revision() call. I have moved it to conftest.py in case of a PermissionError.

Please test above PR in a Win environment.

@RogerHaase
Copy link
Member

Created new wiki with help files, updated an item, viewed indexes, history, all ok, but 2 failing tests. See attached files:
m-tox.txt

@UlrichB22
Copy link
Collaborator Author

Thanks for testing. Can you do two more tests with pytest src/moin/storage/middleware/_tests/test_indexing.py?

  1. with whoosh-reloaded and storage/middleware/indexing.py from master branch including gc.collect()
  2. with whoosh 2.7.4

When it runs successful on whoosh 2.7.4 we can raise an issue for whoosh.reloaded. Maybe we can ask for an option for close() to be a true close without caching the connection.

The cli move-index function is executed in a separate process, so that the situation differs from that of the test module, where it is executed in the same process as the rebuild and update.

@RogerHaase
Copy link
Member

Master branch tests, no errors on 274, 1 failure on whoosh-reloaded:

whoosh-274.txt
whoosh-reloaded.txt

@UlrichB22 UlrichB22 removed their assignment Aug 1, 2024
@UlrichB22
Copy link
Collaborator Author

There has been no human activity in the whoosh-reloaded repo in the last 6 months. A pull request was automatically closed after 60 days of no activity. I've spent a lot of time testing and always run into the locking/timeout issue with locust. I'm giving up at the moment. We should remove the issue from the 2.0.0b1 milestone.

@RogerHaase RogerHaase modified the milestones: 2.0.0b1, 2.0.0c1 Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants