Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrawlImmobilienscout crashing #199

Closed
timsegger opened this issue Aug 20, 2022 · 9 comments
Closed

CrawlImmobilienscout crashing #199

timsegger opened this issue Aug 20, 2022 · 9 comments

Comments

@timsegger
Copy link

On running the flathunter.service I get this error:

Aug 20 17:03:12 tsegger systemd[1]: Started Flathunter Python Script.
Aug 20 17:03:16 tsegger flathunter[19722]: [2022/08/20 17:03:16|config.py |INFO ]: Using config /opt/flathunter/config.yaml
Aug 20 17:03:16 tsegger flathunter[19722]: [2022/08/20 17:03:16|abstract_crawler.py |INFO ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"...
Aug 20 17:03:17 tsegger flathunter[19722]: Traceback (most recent call last):
Aug 20 17:03:17 tsegger flathunter[19722]: File "flathunt.py", line 105, in
Aug 20 17:03:17 tsegger flathunter[19722]: main()
Aug 20 17:03:17 tsegger flathunter[19722]: File "flathunt.py", line 76, in main
Aug 20 17:03:17 tsegger flathunter[19722]: config.init_searchers()
Aug 20 17:03:17 tsegger flathunter[19722]: File "/opt/flathunter/flathunter/config.py", line 44, in init_searchers
Aug 20 17:03:17 tsegger flathunter[19722]: CrawlImmobilienscout(self),
Aug 20 17:03:17 tsegger flathunter[19722]: File "/opt/flathunter/flathunter/crawl_immobilienscout.py", line 38, in init
Aug 20 17:03:17 tsegger flathunter[19722]: self.driver = self.configure_driver(driver_arguments)
Aug 20 17:03:17 tsegger flathunter[19722]: File "/opt/flathunter/flathunter/abstract_crawler.py", line 62, in configure_driver
Aug 20 17:03:17 tsegger flathunter[19722]: options=chrome_options
Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 72, in init
Aug 20 17:03:17 tsegger flathunter[19722]: service_log_path, service, keep_alive)
Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 97, in init
Aug 20 17:03:17 tsegger flathunter[19722]: options=options)
Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 277, in init
Aug 20 17:03:17 tsegger flathunter[19722]: self.start_session(capabilities, browser_profile)
Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session
Aug 20 17:03:17 tsegger flathunter[19722]: response = self.execute(Command.NEW_SESSION, parameters)
Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
Aug 20 17:03:17 tsegger flathunter[19722]: self.error_handler.check_response(response)
Aug 20 17:03:17 tsegger flathunter[19722]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
Aug 20 17:03:17 tsegger flathunter[19722]: raise exception_class(message, screen, stacktrace)
Aug 20 17:03:17 tsegger flathunter[19722]: selenium.common.exceptions.WebDriverException: Message: unknown error: Chrome failed to start: exited abnormally.
Aug 20 17:03:17 tsegger flathunter[19722]: (unknown error: DevToolsActivePort file doesn't exist)
Aug 20 17:03:17 tsegger flathunter[19722]: (The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)
Aug 20 17:03:17 tsegger flathunter[19722]: Stacktrace:
Aug 20 17:03:17 tsegger flathunter[19722]: #0 0x55d6d37a92d3
Aug 20 17:03:17 tsegger flathunter[19722]: #1 0x55d6d35b33fa
Aug 20 17:03:17 tsegger flathunter[19722]: #2 0x55d6d35d87da
Aug 20 17:03:17 tsegger flathunter[19722]: #3 0x55d6d35d3ae4
Aug 20 17:03:17 tsegger flathunter[19722]: #4 0x55d6d360f1f3
Aug 20 17:03:17 tsegger flathunter[19722]: #5 0x55d6d3608fe3
Aug 20 17:03:17 tsegger flathunter[19722]: #6 0x55d6d35dee33
Aug 20 17:03:17 tsegger flathunter[19722]: #7 0x55d6d35e0015
Aug 20 17:03:17 tsegger flathunter[19722]: #8 0x55d6d37f53fd
Aug 20 17:03:17 tsegger flathunter[19722]: #9 0x55d6d37f899c
Aug 20 17:03:17 tsegger flathunter[19722]: #10 0x55d6d37dc39e
Aug 20 17:03:17 tsegger flathunter[19722]: #11 0x55d6d37f95d3
Aug 20 17:03:17 tsegger flathunter[19722]: #12 0x55d6d37d028f
Aug 20 17:03:17 tsegger flathunter[19722]: #13 0x55d6d3817728
Aug 20 17:03:17 tsegger flathunter[19722]: #14 0x55d6d38178d2
Aug 20 17:03:17 tsegger flathunter[19722]: #15 0x55d6d383199f
Aug 20 17:03:17 tsegger flathunter[19722]: #16 0x7fb6fc844fa3
Aug 20 17:03:17 tsegger systemd[1]: flathunter.service: Main process exited, code=exited, status=1/FAILURE
Aug 20 17:03:17 tsegger systemd[1]: flathunter.service: Failed with result 'exit-code'.

When I remove the CrawlImmobilienscout(self) from config.py everything works perfectly

@alexanderroidl
Copy link

Hi @TimS-Official, can you try adding the following arguments for --no-sandbox and/or --remote-debugging-port=9222 to your main configuration config.yaml at captcha/driver_arguments?

@timsegger
Copy link
Author

When only adding --no-sandbox the error is the same
When adding both the DevToolsActivePort file doesn't exist is replaced by chrome not reachable

@timsegger
Copy link
Author

timsegger commented Aug 21, 2022

I tried refollowing the installation guide. And set verbose: true
I then saw that CrawlImmobilienscout uses my google-chrome preinstalled driver instead of the new way.
Thus I simply removed my preinstalled google-chrome
Now the error is different:

Aug 21 17:42:05 tsegger systemd[1]: Started Flathunter Python Script.
Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|config.py |INFO ]: Using config /opt/flathunter/config.yaml
Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|flathunt.py |DEBUG ]: Settings from config: <flathunter.config.Config object at 0x7f0a9100b0b8>
Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09|abstract_crawler.py |INFO ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"...
Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: ====== WebDriver manager ======
Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: Get LATEST chromedriver version for google-chrome None
Aug 21 17:42:09 tsegger flathunter[15266]: [2022/08/21 17:42:09| |DEBUG ]: Driver [/home/flathunter/.wdm/drivers/chromedriver/linux64/104.0.5112/chromedriver] found in cache
Aug 21 17:42:09 tsegger flathunter[15266]: Traceback (most recent call last):
Aug 21 17:42:09 tsegger flathunter[15266]: File "flathunt.py", line 105, in
Aug 21 17:42:09 tsegger flathunter[15266]: main()
Aug 21 17:42:09 tsegger flathunter[15266]: File "flathunt.py", line 76, in main
Aug 21 17:42:09 tsegger flathunter[15266]: config.init_searchers()
Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/config.py", line 44, in init_searchers
Aug 21 17:42:09 tsegger flathunter[15266]: CrawlImmobilienscout(self),
Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/crawl_immobilienscout.py", line 38, in init
Aug 21 17:42:09 tsegger flathunter[15266]: self.driver = self.configure_driver(driver_arguments)
Aug 21 17:42:09 tsegger flathunter[15266]: File "/opt/flathunter/flathunter/abstract_crawler.py", line 62, in configure_driver
Aug 21 17:42:09 tsegger flathunter[15266]: options=chrome_options
Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chrome/webdriver.py", line 72, in init
Aug 21 17:42:09 tsegger flathunter[15266]: service_log_path, service, keep_alive)
Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/chromium/webdriver.py", line 97, in init
Aug 21 17:42:09 tsegger flathunter[15266]: options=options)
Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 277, in init
Aug 21 17:42:09 tsegger flathunter[15266]: self.start_session(capabilities, browser_profile)
Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 370, in start_session
Aug 21 17:42:09 tsegger flathunter[15266]: response = self.execute(Command.NEW_SESSION, parameters)
Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
Aug 21 17:42:09 tsegger flathunter[15266]: self.error_handler.check_response(response)
Aug 21 17:42:09 tsegger flathunter[15266]: File "/home/flathunter/.local/share/virtualenvs/flathunter--s35lxKo/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
Aug 21 17:42:09 tsegger flathunter[15266]: raise exception_class(message, screen, stacktrace)
Aug 21 17:42:09 tsegger flathunter[15266]: selenium.common.exceptions.WebDriverException: Message: unknown error: cannot find Chrome binary
Aug 21 17:42:09 tsegger flathunter[15266]: Stacktrace:
Aug 21 17:42:09 tsegger flathunter[15266]: #0 0x55b24afd7403
Aug 21 17:42:09 tsegger flathunter[15266]: #1 0x55b24addd778
Aug 21 17:42:09 tsegger flathunter[15266]: #2 0x55b24adff916
Aug 21 17:42:09 tsegger flathunter[15266]: #3 0x55b24adfd12b
Aug 21 17:42:09 tsegger flathunter[15266]: #4 0x55b24ae3883a
Aug 21 17:42:09 tsegger flathunter[15266]: #5 0x55b24ae328f3
Aug 21 17:42:09 tsegger flathunter[15266]: #6 0x55b24ae080d8
Aug 21 17:42:09 tsegger flathunter[15266]: #7 0x55b24ae09205
Aug 21 17:42:09 tsegger flathunter[15266]: #8 0x55b24b01ee3d
Aug 21 17:42:09 tsegger flathunter[15266]: #9 0x55b24b021db6
Aug 21 17:42:09 tsegger flathunter[15266]: #10 0x55b24b00813e
Aug 21 17:42:09 tsegger flathunter[15266]: #11 0x55b24b0229b5
Aug 21 17:42:09 tsegger flathunter[15266]: #12 0x55b24affc970
Aug 21 17:42:09 tsegger flathunter[15266]: #13 0x55b24b03f228
Aug 21 17:42:09 tsegger flathunter[15266]: #14 0x55b24b03f3bf
Aug 21 17:42:09 tsegger flathunter[15266]: #15 0x55b24b059abe
Aug 21 17:42:09 tsegger flathunter[15266]: #16 0x7f9c77812fa3
Aug 21 17:42:09 tsegger systemd[1]: flathunter.service: Main process exited, code=exited, status=1/FAILURE
Aug 21 17:42:09 tsegger systemd[1]: flathunter.service: Failed with result 'exit-code'.

This error is apparently independent from driver_arguments.
But again, when I remove CrawlImmobilienscout(self) from config.py everything works.

@codders
Copy link

codders commented Aug 22, 2022

Do you have an exact version match between your webdriver version and your chrome version? I see the error cannot find Chrome binary - where is chrome located on the target system?

@timsegger
Copy link
Author

timsegger commented Aug 22, 2022

I followed the first part of this tutorial: https://yizeng.me/2014/04/20/install-chromedriver-and-phantomjs-on-linux-mint/

Therefore I installed the newest version which was something with 105.x
Which version would be the one of the webdriver?

@squilaSC
Copy link

Running in docker I needed to add these two arguments for it to run.

- "--headless"
- "--no-sandbox"

@abuchmueller
Copy link

abuchmueller commented Aug 23, 2022

Running in docker I needed to add these two arguments for it to run.

- "--headless" - "--no-sandbox"

@squilaSC thanks, this makes the docker image run but I experienced a crash.
After rebooting it seems to work now, though. Maybe resolving the captcha failed?

Edit: I've run the container over night now, I crashed again after a couple of hours.

22/08/23 22:12:55|config.py               |INFO    ]: Using config /config.yaml
[2022/08/23 22:12:55|flathunt.py             |DEBUG   ]: Settings from config: <flathunter.config.Config object at 0x7fada60c5bd0>
[2022/08/23 22:12:55|abstract_crawler.py     |INFO    ]: Initializing Chrome WebDriver for crawler "CrawlImmobilienscout"...

[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: ====== WebDriver manager ======
[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: Get LATEST chromedriver version for google-chrome 104.0.5112
[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: There is no [linux64] chromedriver for browser 104.0.5112 in cache
[2022/08/23 22:12:55|<WebDriverManager>      |DEBUG   ]: About to download new driver from https://chromedriver.storage.googleapis.com/104.0.5112.79/chromedriver_linux64.zip
[2022/08/23 22:12:56|<WebDriverManager>      |DEBUG   ]: Driver has been saved in cache [/root/.wdm/drivers/chromedriver/linux64/104.0.5112]
[2022/08/23 22:12:57|crawl_immobilienscout.py|DEBUG   ]: Got search URL https://www.immobilienscout24.de/Suche/de/XXXX
[2022/08/23 22:13:00|twocaptcha_solver.py    |INFO    ]: Trying to solve geetest.
[2022/08/23 22:13:00|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/in: OK|71318916888
[2022/08/23 22:13:00|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:00|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:05|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:05|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:10|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:10|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:15|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:15|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:20|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:20|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:25|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:25|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:30|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:30|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:35|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:35|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:41|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: CAPCHA_NOT_READY
[2022/08/23 22:13:41|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
[2022/08/23 22:13:46|twocaptcha_solver.py    |DEBUG   ]: Got response from 2captcha/res: OK|{"geetest_challenge":"013d7f12095c46075d2bed1d1f2e1736","geetest_validate":"xxx","geetest_seccode":"xxx|jordan"}
Traceback (most recent call last):
  File "flathunt.py", line 105, in <module>
    main()
  File "flathunt.py", line 101, in main
    launch_flat_hunt(config, heartbeat)
  File "flathunt.py", line 31, in launch_flat_hunt
    hunter.hunt_flats()
  File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
    for url in self.config.get('urls', [])])
  File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
    return searcher.crawl(url, max_pages)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
    return self.get_results(url, max_pages)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 55, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 128, in get_page
    afterlogin_string=self.afterlogin_string
  File "/usr/src/app/flathunter/abstract_crawler.py", line 93, in get_soup_from_url
    return BeautifulSoup(driver.page_source, 'html.parser')
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 541, in page_source
    return self.execute(Command.GET_PAGE_SOURCE)['value']
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=104.0.5112.101)
Stacktrace:
#0 0x55f5fd451403 <unknown>
#1 0x55f5fd25764b <unknown>
#2 0x55f5fd24475a <unknown>
#3 0x55f5fd24365b <unknown>
#4 0x55f5fd243c1c <unknown>
#5 0x55f5fd24fc3f <unknown>
#6 0x55f5fd2507a2 <unknown>
#7 0x55f5fd25edad <unknown>
#8 0x55f5fd262c6a <unknown>
#9 0x55f5fd244046 <unknown>
#10 0x55f5fd25e951 <unknown>
#11 0x55f5fd2bfb53 <unknown>
#12 0x55f5fd2ac8f3 <unknown>
#13 0x55f5fd2820d8 <unknown>
#14 0x55f5fd283205 <unknown>
#15 0x55f5fd498e3d <unknown>
#16 0x55f5fd49bdb6 <unknown>
#17 0x55f5fd48213e <unknown>
#18 0x55f5fd49c9b5 <unknown>
#19 0x55f5fd476970 <unknown>
#20 0x55f5fd4b9228 <unknown>
#21 0x55f5fd4b93bf <unknown>
#22 0x55f5fd4d3abe <unknown>
#23 0x7f19cccf6ea7 <unknown>

Log 2

[2022/08/24 07:40:02|hunter.py               |INFO    ]: New offer: beliebte 2 Zimmer Wohnung
[2022/08/24 07:40:02|idmaintainer.py         |DEBUG   ]: is_processed(7917747028695013)
[2022/08/24 07:40:02|idmaintainer.py         |DEBUG   ]: is_processed(7044444671130838)
[2022/08/24 07:40:02|idmaintainer.py         |DEBUG   ]: is_processed(8824293547546021)
[2022/08/24 07:50:02|crawl_immobilienscout.py|DEBUG   ]: Got search URL https://www.immobilienscout24.de/Suche/de/hamburg/hamburg/wohnung-mieten?numberofrooms=1.5-&price=-800.0&livingspace=30.0-&pricetype=rentpermonth&geocodes=0200000006057,0200000006058,0200000006059,0200000007070,0200000006084,0200000006073,0200000007076,0200000006075,0200000005054,0200000006055&sorting=2&pagenumber={0}
Traceback (most recent call last):
  File "flathunt.py", line 105, in <module>
    main()
  File "flathunt.py", line 101, in main
    launch_flat_hunt(config, heartbeat)
  File "flathunt.py", line 38, in launch_flat_hunt
    hunter.hunt_flats()
  File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
    for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
  File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
    for searcher in self.config.searchers()
  File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
    for url in self.config.get('urls', [])])
  File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
    return searcher.crawl(url, max_pages)
  File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
    return self.get_results(url, max_pages)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 55, in get_results
    soup = self.get_page(search_url, self.driver, page_no)
  File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 128, in get_page
    afterlogin_string=self.afterlogin_string
  File "/usr/src/app/flathunter/abstract_crawler.py", line 88, in get_soup_from_url
    driver.get(url)
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 447, in get
    self.execute(Command.GET, {'url': url})
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
    self.error_handler.check_response(response)
  File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
from tab crashed
  (Session info: headless chrome=104.0.5112.101)
Stacktrace:
#0 0x560b2c5ff403 <unknown>
#1 0x560b2c40564b <unknown>
#2 0x560b2c3f3b2d <unknown>
#3 0x560b2c3f3545 <unknown>
#4 0x560b2c3f2995 <unknown>
#5 0x560b2c3f20d4 <unknown>
#6 0x560b2c40c951 <unknown>
#7 0x560b2c46e078 <unknown>
#8 0x560b2c45a8f3 <unknown>
#9 0x560b2c4300d8 <unknown>
#10 0x560b2c431205 <unknown>
#11 0x560b2c646e3d <unknown>
#12 0x560b2c649db6 <unknown>
#13 0x560b2c63013e <unknown>
#14 0x560b2c64a9b5 <unknown>
#15 0x560b2c624970 <unknown>
#16 0x560b2c667228 <unknown>
#17 0x560b2c6673bf <unknown>
#18 0x560b2c681abe <unknown>
#19 0x7f2b269b1ea7 <unknown>

@timsegger
Copy link
Author

timsegger commented Aug 24, 2022

Can confirm @abuchmueller 's experience.
Switched to docker using @squilaSC 's driver arguments and get the same Errors very irregularly (sometimes after 30min sometimes after 3 hours)

Thanks to docker's "restart unless stopped"-policy it just restarts the process and I can actually use it.

After rebooting it seems to work now, though. Maybe resolving the captcha failed?

Looking into the logs (docker logs -t <name>) I can say that those crashes do not happen due to the captcha itself, but multiple minutes or hours later. So I guess they happen before the next captcha needs to be solved or something similar?

Edit:

2022-08-24T09:29:15.712743300Z [2022/08/24 09:29:15|twocaptcha_solver.py    |INFO    ]: Trying to solve geetest.
2022-08-24T09:29:15.879073118Z [2022/08/24 09:29:15|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:20.963026882Z [2022/08/24 09:29:20|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:26.045437341Z [2022/08/24 09:29:26|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:31.120585833Z [2022/08/24 09:29:31|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:36.218154553Z [2022/08/24 09:29:36|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:41.302687462Z [2022/08/24 09:29:41|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:46.389405839Z [2022/08/24 09:29:46|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:51.462905713Z [2022/08/24 09:29:51|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:29:56.536812746Z [2022/08/24 09:29:56|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:30:01.609837660Z [2022/08/24 09:30:01|twocaptcha_solver.py    |INFO    ]: Captcha is not ready yet, waiting...
2022-08-24T09:50:20.670315785Z [2022/08/24 09:50:20|hunter.py               |INFO    ]: New offer: redacted
2022-08-24T09:50:20.929963807Z [2022/08/24 09:50:20|hunter.py               |INFO    ]: New offer: redacted
2022-08-24T09:50:21.163025163Z [2022/08/24 09:50:21|hunter.py               |INFO    ]: New offer: redacted
2022-08-24T10:00:23.587249546Z Traceback (most recent call last):
2022-08-24T10:00:23.587383206Z   File "flathunt.py", line 105, in <module>
2022-08-24T10:00:23.587791048Z     main()
2022-08-24T10:00:23.587881667Z   File "flathunt.py", line 101, in main
2022-08-24T10:00:23.588307985Z     launch_flat_hunt(config, heartbeat)
2022-08-24T10:00:23.588382834Z   File "flathunt.py", line 38, in launch_flat_hunt
2022-08-24T10:00:23.588717841Z     hunter.hunt_flats()
2022-08-24T10:00:23.588813640Z   File "/usr/src/app/flathunter/hunter.py", line 54, in hunt_flats
2022-08-24T10:00:23.589107038Z     for expose in processor_chain.process(self.crawl_for_exposes(max_pages)):
2022-08-24T10:00:23.589193119Z   File "/usr/src/app/flathunter/hunter.py", line 34, in crawl_for_exposes
2022-08-24T10:00:23.589481068Z     for searcher in self.config.searchers()
2022-08-24T10:00:23.589555297Z   File "/usr/src/app/flathunter/hunter.py", line 35, in <listcomp>
2022-08-24T10:00:23.589852712Z     for url in self.config.get('urls', [])])
2022-08-24T10:00:23.589924226Z   File "/usr/src/app/flathunter/hunter.py", line 25, in try_crawl
2022-08-24T10:00:23.590209891Z     return searcher.crawl(url, max_pages)
2022-08-24T10:00:23.590293677Z   File "/usr/src/app/flathunter/abstract_crawler.py", line 158, in crawl
2022-08-24T10:00:23.590632441Z     return self.get_results(url, max_pages)
2022-08-24T10:00:23.590717780Z   File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 59, in get_results
2022-08-24T10:00:23.591014675Z     return self.get_entries_from_javascript()
2022-08-24T10:00:23.591089234Z   File "/usr/src/app/flathunter/crawl_immobilienscout.py", line 90, in get_entries_from_javascript
2022-08-24T10:00:23.591414473Z     result_json = self.driver.execute_script('return window.IS24.resultList;')
2022-08-24T10:00:23.591544866Z   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 495, in execute_script
2022-08-24T10:00:23.591856990Z     'args': converted_args})['value']
2022-08-24T10:00:23.591868120Z   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/webdriver.py", line 435, in execute
2022-08-24T10:00:23.592033049Z     self.error_handler.check_response(response)
2022-08-24T10:00:23.592067974Z   File "/usr/local/lib/python3.7/site-packages/selenium/webdriver/remote/errorhandler.py", line 247, in check_response
2022-08-24T10:00:23.592252650Z     raise exception_class(message, screen, stacktrace)
2022-08-24T10:00:23.592322380Z selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
2022-08-24T10:00:23.592326828Z from unknown error: cannot determine loading status
2022-08-24T10:00:23.592329052Z from tab crashed
2022-08-24T10:00:23.592331166Z   (Session info: headless chrome=104.0.5112.101)
2022-08-24T10:00:23.592333320Z Stacktrace:
2022-08-24T10:00:23.592335504Z #0 0x561c6c683403 <unknown>
2022-08-24T10:00:23.592337709Z #1 0x561c6c48964b <unknown>
2022-08-24T10:00:23.592339812Z #2 0x561c6c47675a <unknown>
2022-08-24T10:00:23.592341846Z #3 0x561c6c47565b <unknown>
2022-08-24T10:00:23.592349471Z #4 0x561c6c475c1c <unknown>
2022-08-24T10:00:23.592351635Z #5 0x561c6c481c3f <unknown>
2022-08-24T10:00:23.592353668Z #6 0x561c6c4827a2 <unknown>
2022-08-24T10:00:23.592355712Z #7 0x561c6c491076 <unknown>
2022-08-24T10:00:23.592357806Z #8 0x561c6c4f1ee1 <unknown>
2022-08-24T10:00:23.592359830Z #9 0x561c6c4de8f3 <unknown>
2022-08-24T10:00:23.592361894Z #10 0x561c6c4b40d8 <unknown>
2022-08-24T10:00:23.592363938Z #11 0x561c6c4b5205 <unknown>
2022-08-24T10:00:23.592365972Z #12 0x561c6c6cae3d <unknown>
2022-08-24T10:00:23.592367955Z #13 0x561c6c6cddb6 <unknown>
2022-08-24T10:00:23.592369909Z #14 0x561c6c6b413e <unknown>
2022-08-24T10:00:23.592371903Z #15 0x561c6c6ce9b5 <unknown>
2022-08-24T10:00:23.592374007Z #16 0x561c6c6a8970 <unknown>
2022-08-24T10:00:23.592376662Z #17 0x561c6c6eb228 <unknown>
2022-08-24T10:00:23.592379567Z #18 0x561c6c6eb3bf <unknown>
2022-08-24T10:00:23.592381962Z #19 0x561c6c705abe <unknown>
2022-08-24T10:00:23.592383955Z #20 0x7f49ec9ccea7 <unknown>

After taking a second look at the timestamps I saw that they are almost exactly 10min after the last successful search. Thus the crash seems to occur during crawling and for this special occurance there even was no captcha required.
I did not get those crashes when I removed the Immoscout24 crawler. So I assume it is because of that.

Edit2:

According to stackoverflow (https://stackoverflow.com/questions/53902507/unknown-error-session-deleted-because-of-page-crash-from-unknown-error-cannot) the extra parameter --disable-dev-shm-usage or increasing the dockers shm-size should solve this issue. Will try this later today

Under docker/cli#1278 you find a way to persistently increase dockers ShmSize. I will try this approach today.

Edit3:

Runs for 24+ hours without crashing now. Seems to work :)

@alexanderroidl
Copy link

Edit3:
Runs for 24+ hours without crashing now. Seems to work :)

 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants