Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: OCR error "Failed to fetch" #2796

Open
1 task done
bjinthahouse opened this issue Jan 27, 2025 · 3 comments
Open
1 task done

[Bug]: OCR error "Failed to fetch" #2796

bjinthahouse opened this issue Jan 27, 2025 · 3 comments
Labels
Back End Issues related to back-end development needs investigation Issues that require further investigation

Comments

@bjinthahouse
Copy link

Installation Method

Docker

The Problem

When trying to run OCR over a scanned PDF-document, it will take a while to process and then throw an error "Failed to fetch". Unfortunately the Stack Trace is empty:

Image

Smaller documents seem to run fine. The document I'm trying to scan at the moment has 1,7 MBs and includes 6 pages.

Version of Stirling-PDF

0.39.0

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

/ocr-pdf

Docker Configuration

services:
  stirling-pdf:
    container_name: c_stirlingpdf
    image: stirlingtools/stirling-pdf:latest
    ports:
      - 7890:8080
    volumes:
      #- v_stirlingpdf_trainingdata:/usr/share/tesseract-ocr/5/tessdata
      - v_stirlingpdf_trainingdata:/usr/share/tessdata
      - v_stirlingpdf_config:/configs
      - v_stirlingpdf_files:/customFiles/
      - v_stirlingpdf_logs:/logs/
      - v_stirlingpdf_pipeline:/pipeline/
    environment:
      SYSTEM_DEFAULTLOCALE: de-DE
      SYSTEM_GOOGLEVISIBILITY: false
      UI_APPNAME: "Stirling PDF"
      UI_HOMEDESCRIPTION: "Stirling PDF"
      UI_APPNAVBARNAME: "Stirling PDF"
    restart: unless-stopped
volumes:
  v_stirlingpdf_trainingdata:
    external: true
  v_stirlingpdf_config:
    external: true
  v_stirlingpdf_files:
    external: true
  v_stirlingpdf_logs:
    external: true
  v_stirlingpdf_pipeline:
    external: true

networks:
  default:
    name: net_home
    external: true

Relevant Log Output

Stack Trace is empty

Additional Information

No response

Browsers Affected

Chrome

No Duplicate of the Issue

  • I have verified that there are no existing issues raised related to my problem.
@dosubot dosubot bot added the Back End Issues related to back-end development label Jan 27, 2025
@Stirling-Tools Stirling-Tools deleted a comment from dosubot bot Jan 27, 2025
@Ludy87 Ludy87 added the needs investigation Issues that require further investigation label Jan 27, 2025
@Fact0ryy
Copy link

Hello,

I had a similar issue with Stirling-PDF running behind OPNsense with an HAProxy setup. In my case, the problem was related to HAProxy timeouts. I increased the timeout values in HAProxy, and after that, everything worked fine without any further issues.

Unfortunately, I don’t remember the exact settings I changed, but you might want to check and increase the following timeout values in HAProxy:
• Client Timeout
• Server Timeout
• Queue Timeout
• Connect Timeout

I hope this helps you in troubleshooting the issue!

Best regards

@bjinthahouse
Copy link
Author

Hi,

thank you very much for your input! Indeed, I run Stirling behind an HAProxy. I decreased the Server timeout to 10s and I can reproduce the error in Stirling after 10s. So I then increased the HAProxy server and client timeout to 300s.
I receive the same error "Failed to fetch" after exactly 60 seconds. So I did some searching and found the following github issue: #1094 (comment)
I then added SYSTEM_CONNECTIONTIMEOUTMINUTES: 10m to my Portainer stack and hit rebuild. But I'm experiencing the same issue.

I took a look at htop on my docker host and saw that tesseract seems to continue running, even after the error..

Image

The file I'm using is 1,8MB in size and is only text.

Cheers

@Fact0ryy
Copy link

Fact0ryy commented Feb 26, 2025

Maybe the following excerpts from the HAproxy config will help you:

defaults
log global
option redispatch -1
timeout client 30s
timeout connect 30s
timeout server 30s

retries 3
default-server init-addr last,libc

#Backend: PDF_StirlingPDF_backend ()
backend PDF_StirlingPDF_backend
# health checking is DISABLED
mode http
balance source
# stickiness
stick-table type ip size 50k expire 30m
stick on src
timeout connect 10s
timeout server 10m

http-reuse safe
option forwarded
option forwardfor
server PDF_StirlingPDF ...:8085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Back End Issues related to back-end development needs investigation Issues that require further investigation
Projects
None yet
Development

No branches or pull requests

3 participants