Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repairing parentid for destroy rev, fixes #1448 #1455

Conversation

bylsmad
Copy link
Contributor

@bylsmad bylsmad commented May 21, 2023

repairing parentid for destroy rev, fixes #1448

also updating maint-validate-metadata, final fix for #1402

  • updating index when fixing items
  • not replacing revision numbers for gaps in sequence

also updating maint-validate-metadata, final fix for moinwiki#1402
  * updating index when fixing items
  * not replacing revision numbers for gaps in sequence
@@ -113,6 +115,14 @@ def test_destroy_revision(self):
with pytest.raises(AccessDenied):
item.destroy_revision(revid_protected)

def test_destroy_middle_revision(self):
item, item_name, revid0, revid1, revid2 = self._store_three_revs('joe:read,write,destroy')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destroying a middle revision now requires write permission as well as destroy in order to allow for update of the parentid

@RogerHaase
Copy link
Member

If I create an item with 3 revisions and delete revision 1, then create a new revision; the revision number of the new revision is 3; should be 4. Item History shows revisions 2,3,3.

moin maint-validate-metadata shows no errors.

@bylsmad
Copy link
Contributor Author

bylsmad commented May 22, 2023

I will reinstate some of the revision_number fixing in maint-validate-metadata, sounds like the rule is the revision_numbers are allowed to have gaps but not repeats?

@RogerHaase
Copy link
Member

right, revision numbers may have gaps, not repeats

also correcting index update to search in ALL_REVS
  existing_item only searches LATEST_REVS
@bylsmad
Copy link
Contributor Author

bylsmad commented May 23, 2023

latest commit adds validation for repeated revision_number, also fixed issue with the way I was querying the index, previous version only got hits from LATEST_REVS this commit finds revs in ALL_REVS

@RogerHaase
Copy link
Member

Oops updated issue, meant to update here:

Testing with an item with 4 revisions...

If rev-number 1 (or lowest rev-number) is destroyed, rev-number 2 is updated, and becomes the current revision with updated timestamp. Item history shows rev-numbers 2, 4, 3.

Testing with another item with 4 revisions...

If rev-number 2 is destroyed, then the timestamp for rev-number 3 is updated and becomes current. Item history shows rev-numbers 3, 4, 1.

I think the metadata timestamp (or MTIME) should refer to the update time of the revision data. Updating the metadata should not change the metadata MTIME. Is there a way to update the parentid without updating the MTIME?

@RogerHaase
Copy link
Member

Hang in there, I know this is a hard one...

2023-05-28 09:41:16,639 ERROR werkzeug:224 Error on request:
Traceback (most recent call last):
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\werkzeug\serving.py", line 333, in run_wsgi
    execute(self.server.app)
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\werkzeug\serving.py", line 320, in execute
    application_iter = app(environ, start_response)
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\flask\app.py", line 2091, in __call__
    return self.wsgi_app(environ, start_response)
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\flask\app.py", line 2076, in wsgi_app
    response = self.handle_exception(e)
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\flask\app.py", line 2073, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\flask\app.py", line 1518, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\flask\app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\git-bylsmad\moin-venv-python\lib\site-packages\flask\app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "c:\git-bylsmad\moin\src\moin\apps\frontend\views.py", line 1197, in destroy_item
    item.destroy(comment=comment, destroy_item=destroy_item, subitem_names=subitem_names)
  File "c:\git-bylsmad\moin\src\moin\items\__init__.py", line 771, in destroy
    self.rev.item.destroy_revision(self.rev.revid)
  File "c:\git-bylsmad\moin\src\moin\storage\middleware\protecting.py", line 432, in destroy_revision
    self.item.destroy_revision(revid)
  File "c:\git-bylsmad\moin\src\moin\storage\middleware\indexing.py", line 1291, in destroy_revision
    self.store_revision(child_meta, child_rev.data, overwrite=True, trusted=True)
  File "c:\git-bylsmad\moin\src\moin\storage\middleware\indexing.py", line 1225, in store_revision
    raise ValueError(_('Error: metadata validation failed, invalid field value(s) = {0}'.format(
ValueError: Error: metadata validation failed, invalid field value(s) =

@bylsmad
Copy link
Contributor Author

bylsmad commented May 28, 2023

I've got the metadata validation failed fixed in my fork by allowing userid to be Unset

Now struggling with index files left open by whoosh, but I have the failure reproducible on Linux now so hoping to have a clean test run soon

ERROR src/moin/_tests/test_user.py::TestUser::test_recovery_token - PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'D:\\a\\moin\\moin\\src\\moin\\_tests\\wiki\\index\\all_revs_8zhbn7j04fuf7rlb.seg'

latest test fail: https://github.com/bylsmad/moin/actions/runs/5102359184/jobs/9171871358

previous code when destroying a middle revision
the update to PARENTID was making the next revision
into the latest

applying same fix ValidateMetadata
@bylsmad bylsmad force-pushed the 1448-destroying-a-revision-breaks-parentid-chain branch from f2d368c to ee698f2 Compare May 29, 2023 18:16
@bylsmad
Copy link
Contributor Author

bylsmad commented May 29, 2023

all tests passing ,ready for review

@@ -1240,7 +1254,8 @@ def store_revision(self, meta, data, overwrite=False,
data.seek(0) # rewind file
backend_name, revid = backend.store(meta, data)
meta[REVID] = revid
self.indexer.index_revision(meta, content, backend_name)
self.indexer.index_revision(meta, content, backend_name, force_latest=not overwrite)
gc.collect() # triggers close of index files from is_latest search
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know what you think of adding gc.collect() here, it solves the test failure seen in https://github.com/bylsmad/moin/actions/runs/5102359184/jobs/9171871358

it is also possible to solve the test failure by putting the gc.collect just before destroy_app() in src.moin.conftest
but I was worried this would mask potential issues where unclosed files could lead to "too many open files" exception

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will defer to you as where to put gc.collect().

The code seems to work well, but tests on Windows 10 came up with one error:

================================== FAILURES ===================================
________________________ TestSiteCrawl.test_home_page _________________________

self = <moin.cli._tests.test_scrapy_crawl.TestSiteCrawl object at 0x000001F70A1768F0>
crawl_results = []

    def test_home_page(self, crawl_results):
>       assert len(crawl_results) > 0
E       assert 0 > 0
E        +  where 0 = len([])

C:\git-david\moin\src\moin\cli\_tests\test_scrapy_crawl.py:54: AssertionError
----------------------------- Captured log setup ------------------------------

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you attach these files from the failed run:

_test_artifacts/crawl.log
_test_artifacts/server-crawl.log
_test_artifacts/crawl.csv

@RogerHaase
Copy link
Member

@bylsmad
Copy link
Contributor Author

bylsmad commented May 29, 2023

try latest commit, previous one had a typo in the logging setup for crawl.log and I've added logging to make timeouts easier to see, hopefully you will get same error and we can see what happened

@RogerHaase
Copy link
Member

================================== FAILURES ===================================
________________________ TestSiteCrawl.test_home_page _________________________

self = <moin.cli._tests.test_scrapy_crawl.TestSiteCrawl object at 0x000001FE42C423E0>
crawl_results = []

    def test_home_page(self, crawl_results):
>       assert len(crawl_results) > 0
E       assert 0 > 0
E        +  where 0 = len([])

C:\git-david\moin\src\moin\cli\_tests\test_scrapy_crawl.py:54: AssertionError
----------------------------- Captured log setup ------------------------------
INFO     moin.cli._tests:__init__.py:33 running ['moin', 'index-build']
INFO     moin.cli._tests:__init__.py:33 running ['moin', 'run', '-p', '9080']
INFO     moin.cli._tests.conftest:conftest.py:162 starting crawl
INFO     moin.cli._tests:__init__.py:33 running ['scrapy', 'crawl', '-a', 'url=http://127.0.0.1:9080/help-en', 'ref_checker']
============================== warnings summary ===============================

crawl.log
server-crawl.log
crawl.csv

@bylsmad
Copy link
Contributor Author

bylsmad commented May 31, 2023

still no clues to the test failure in the logs. something is stopping the spider before it completes.

In both of the failures, the spider stopped at right around the 2 minute point. Do you have any custom pytest.ini with a timeout setting? Could you attach full output of pytest (i.e. m-tox.txt)?

@RogerHaase
Copy link
Member

No pytest.ini, but have a tox.ini under version control. Github does not support .ini, so here is txt version.

tox.ini.txt
m-tox.txt

Here are fresh copies of *crawl*

crawl.log
server-crawl.log
crawl.csv

@bylsmad
Copy link
Contributor Author

bylsmad commented May 31, 2023

I'm still stumped...what could be different between your workstation setup and the Windows box which runs the test in the github action?

does the test fail when invoked directly using command line?

pytest cli/_tests/test_scrapy_crawl.py

@bylsmad
Copy link
Contributor Author

bylsmad commented Jun 1, 2023

@RogerHaase another troubleshooting method, manually run the commands from test_scrapy

NOTE: my commands are from Linux terminal, you will need to adjust for Windows

In one window:

# from the root of your git clone, 
. ./activate
cd _test_artifacts/
rm server-crawl.log crawl.csv crawl.log
mkdir cli
cd cli
moin create-instance
moin index-create
moin welcome
moin load-help -n common
moin load-help -n en
moin run -p 9080

NOTE: the test currently does an index-build after the load-help, just realized this is not needed and have filed a new ticket to eliminate the redundant index-build commands #1458

validate that your server is running by going to http://localhost:9080/Home in a browser

In a second window:

. ./activate
cd src/moin/cli/_tests/scrapy/
scrapy crawl -a url=http://127.0.0.1:9080/help-en ref_checker

once the scrapy command finishes, check for existence of _test_artifacts/crawl.csv there should be something like 421 lines in the file

If this does not succeed, please attach the console output of the scrappy command
If it does succeed maybe interesting to see whether it fails if you do the redundant index-build before the emoin run commmand

@RogerHaase
Copy link
Member

I am confused and must rerun your tests. I was about to claim user error because I created a fresh clone and ran m tests without doing m extras (which installs scrappy in venv).

After correcting the problem by doing m extras and rerunning tests I get the same error? The same error whether scrappy is installed or not?

I ran your suggested tests above and they seem to have worked successfully.

I am distracted by other tasks so I may need a day or two to rerun everything.

fixing couple more bad file paths
@bylsmad
Copy link
Contributor Author

bylsmad commented Jun 2, 2023

pull down the latest from this branch before you resume testing, I've added more exception traps which should hopefully leave us some clues in crawl.log

one thing I encountered as I was testing error conditions is I had scrapy installed in both my venv and also my user dir and also in the system dir, I got strange errors when scrapy was coming from my user dir or the system dir

I did a closer look at your logs and it appears that the entire crawl is getting completed, but something is going wrong with the output of crawl.csv in spider_closed hopefully the increased logging will show what is happening

bylsmad added 2 commits June 2, 2023 05:45
error was caused by AsyncWriter which is not used
when overwrite is True
@bylsmad
Copy link
Contributor Author

bylsmad commented Jun 2, 2023

sorry for all the commits, but the final test failure was a race condition which only happened on the github host

I can squash them down once we have scrapy test passing on your workstation

@RogerHaase
Copy link
Member

Here are the latest logs:

m-tox.txt
crawl.csv
crawl.log
server-crawl.log

@RogerHaase
Copy link
Member

Here is server log from manual run:

manual.txt

Need anything else?

@bylsmad
Copy link
Contributor Author

bylsmad commented Jun 2, 2023

ok should be fixed now

the difference is you are running the tests via tox via ./m tests when I run via tox I see the crawl.csv is written to one place and read from another (logging from the latest commit):

INFO     moin.cli._tests.conftest:conftest.py:199 reading /home/bylsmad/PycharmProjects/moin/_test_artifacts/crawl.csv
INFO: writing 420 to /home/bylsmad/PycharmProjects/moin/.tox/py39/lib/python3.9/_test_artifacts/crawl.csv

btw I'm heading out on a bicycle tour for a week, this should be ready for merge now assuming the fix for ./m tests in your environment is working. see you on the other side

@bylsmad bylsmad force-pushed the 1448-destroying-a-revision-breaks-parentid-chain branch from c4fa6d0 to 7fe23f9 Compare June 2, 2023 22:00
@RogerHaase
Copy link
Member

Horay all tests pass> Have a fun trip.

@RogerHaase RogerHaase merged commit 0dcde9b into moinwiki:master Jun 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

destroying a revision breaks parentid chain
2 participants