Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR Fatal exception #368

Closed
differentieel opened this issue Jul 6, 2017 · 1 comment
Closed

ERROR Fatal exception #368

differentieel opened this issue Jul 6, 2017 · 1 comment

Comments

@differentieel
Copy link

What I wanted: Using WPULL to archive hosted website per year

What I expect: WPULL makes a copy of my own website and puts is an index.html (it does it before by using a very simple only WPULL command with the website address)

What happened: Command structure:
MyMachine:~ MeUser$ wpull https://MyWebsite.net/2003/ \

--warc-file MyWebsite-2003
--secure-protocol auto
--no-check-certificate
--no-cookies
--no-robots --user-agent "InconspiuousWebBrowser/1.0"
--wait 0.5 --random-wait --waitretry 600
--page-requisites --recursive --level inf
--span-hosts-allow linked-pages,page-requisites
--escaped-fragment --strip-session-id
--no-check-certificate
--sitemaps
--convert-links
--html-parser html5lib
--content-on-error
--reject-regex "/login.php"
--no-warc-compression
--output-file ProgramMessages.txt
--tries 3 --retry-connrefused --retry-dns-error
--timeout 60 --session-timeout 21600
--directory-prefix /Users/MeUser/Sites/MyWebsite/

The command or website causes the problem: https://dailym.net

Operating system: macOS Sierra V. 1012.5 (16F73)

Python version: Python 2.7.10 and Python 3.6.1

Wpull version: 2.0.1

Log/Output:
Error Report:
ERROR Fatal exception.
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/application/app.py", line 152, in run
yield from pipeline.process()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
yield from self._process_one_worker()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
task.result()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
item = yield from self.process_one(_worker_id=worker_id)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
yield from task.process(item)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/asyncio/coroutines.py", line 210, in coro
res = func(*args, **kw)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/application/tasks/download.py", line 31, in process
self.build_html_parser(session)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/application/tasks/download.py", line 37, in build_html_parser
from wpull.document.htmlparse.html5lib
import HTMLParser
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/wpull/document/htmlparse/html5lib
.py", line 3, in
import html5lib.tokenizer

Copy a snippet of the log or what Wpull outputs:

ModuleNotFoundError: No module named 'html5lib.tokenizer'
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
INFO Exiting with status 1.

No file Samples - no result

@chfoo
Copy link
Member

chfoo commented Jul 8, 2017

This appears to be the same issue as #332. Try installing version 0.9999999 for now: pip3 install html5lib==0.9999999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants