Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Killed before writing to markdown #150

Closed
captn3m0 opened this issue Jan 20, 2018 · 2 comments
Closed

Killed before writing to markdown #150

captn3m0 opened this issue Jan 20, 2018 · 2 comments

Comments

@captn3m0
Copy link

I'm running pshtt against a file with ~13k domains. These are the final lines of the debug output:

Pinging https://www.zunheboto.nic.in...

Starting new HTTPS connection (1): www.zunheboto.nic.in
HTTPSConnectionPool(host='www.zunheboto.nic.in', port=443): Max retries exceeded with url: / (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f1c9bc4e940>, 'Connection to www.zunheboto.nic.in timed out. (connect timeout=10)'))


-------------------------

Printing Markdown...

Killed

The invocation was:

pshtt domains.csv --markdown --debug --cache --output output.md

The output.md file is empty. I've tried running on shorter domain lists and it works fine (No Killed in output, and the file is generated correctly).

@captn3m0
Copy link
Author

captn3m0 commented Feb 6, 2018

You can download the raw CSV file from https://git.captnemo.in/nemo/pulse/ (domains.csv)

@konklone
Copy link
Collaborator

@captn3m0 This isn't quite enough information to debug the problem. The word "Killed" is never outputted by any of our code. One possibility is that your system may be running out of memory to process the entire list.

You could try out https://github.com/18F/domain-scan as an alternative parallelizer for pshtt, that makes use of Python 3 concurrency primitives to manage a rolling pool of threads over a given CSV. If you don't use the --sort flag, I believe it never holds the entire dataset in memory, and should be time- and memory-efficient.

If you see the same problems, then I would also look at running the tool on a vanilla EC2 server or something like that, to eliminate the possibility of novel interference from something in your local environment.

cisagovbot pushed a commit that referenced this issue Jul 30, 2024
…abel-sync-workflow

Add a diagnostics job for the label syncing workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants