-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No Module named Html5lib.tokenizer #385
Comments
the old html5lib==0.9999999 works fine. |
I mean that isn't really solving the problem, so I'm a bit unclear as to why all those issues are closed. Maybe it's time to mark this project as unmaintained? |
The project is not maintained very well at the moment because ArchiveTeam is mainly focusing on actually archiving everything. Most web crawler development is at https://github.com/ArchiveTeam/wget-lua/ at the moment, as that's what they use for the Warrior. That's a fork of Wget with Lua scripting, zstd WARC compression, deduplication, and other stuff.
chfoo linked issue #332, which is still open and is probably the canonical (non-duplicate) issue. |
Not much maintenance has happened in recent years, yeah. I recently tried to start working on it again, but that's currently blocked by technical issues with our CI. |
@JustAnotherArchivist How would you feel about github actions as the CI? I know it's proprietary and kind of crappy, but there's an open source more or less compatible implementation here: https://github.com/nektos/act and I hear that gitea is supporting it as well. Might give that a shot |
My thoughts about GitHub Actions: ew no. Nice to see that there's a replacement though. That will help once GitHub Actions disappears or changes in stupid ways. |
Ahh, fair enough. I've used drone-ci in the past and it is a lot nicer than github actions. Hopefully that gets up and running sooner, I've been trying to get stuff running (I want a python-playwright subprocessor to replace the phantomjs one) but having some real hard times getting an environment I can actually develop in, seems like if I do get the python version right I can't compile LXML, or other issues. |
ERROR Fatal exception.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/wpull/application/app.py", line 152, in run
yield from pipeline.process()
File "/usr/local/lib/python3.6/dist-packages/wpull/pipeline/pipeline.py", line 194, in process
yield from self._process_one_worker()
File "/usr/local/lib/python3.6/dist-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
task.result()
File "/usr/local/lib/python3.6/dist-packages/wpull/pipeline/pipeline.py", line 119, in process
item = yield from self.process_one(_worker_id=worker_id)
File "/usr/local/lib/python3.6/dist-packages/wpull/pipeline/pipeline.py", line 103, in process_one
yield from task.process(item)
File "/usr/lib/python3.6/asyncio/coroutines.py", line 212, in coro
res = func(*args, **kw)
File "/usr/local/lib/python3.6/dist-packages/wpull/application/tasks/download.py", line 31, in process
self.build_html_parser(session)
File "/usr/local/lib/python3.6/dist-packages/wpull/application/tasks/download.py", line 37, in build_html_parser
from wpull.document.htmlparse.html5lib import HTMLParser
File "/usr/local/lib/python3.6/dist-packages/wpull/document/htmlparse/html5lib.py", line 3, in
import html5lib.tokenizer
ModuleNotFoundError: No module named 'html5lib.tokenizer'
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
INFO Exiting with status 1.
Any Help .
The text was updated successfully, but these errors were encountered: