Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[source-greenhouse] Check fails as API timeouts #45411

Closed
1 task done
etiennecallies opened this issue Sep 12, 2024 · 2 comments · Fixed by #45625
Closed
1 task done

[source-greenhouse] Check fails as API timeouts #45411

etiennecallies opened this issue Sep 12, 2024 · 2 comments · Fixed by #45625

Comments

@etiennecallies
Copy link
Contributor

Connector Name

source-greenhouse

Connector Version

0.5.15

What step the error happened?

During the sync

Relevant information

Hi,

There is a major issue with Greenhouse source connector at the check step. The checks actually send a GET on https://harvest.greenhouse.io/v1/applications?per_page=100&created_after=1970-01-01T00%3A00%3A00.000Z and return for us 500 {"message":"Server timeout in API"}. We are a major user of Greenhouse indeed with a lot of application items. Greenhouse support advise us to use the skip_count parameter to avoid pagination. I tried it, it is indeed 10x faster.

If someone can do this little addition, it would be great, otherwise we'll be obliged to do it.

Thanks

Relevant log output

2024-09-11 19:51:38 platform > Retry State: RetryManager(completeFailureBackoffPolicy=BackoffPolicy(minInterval=PT10S, maxInterval=PT30M, base=3), partialFailureBackoffPolicy=null, successiveCompleteFailureLimit=5, totalCompleteFailureLimit=10, successivePartialFailureLimit=1000, totalPartialFailureLimit=20, successiveCompleteFailures=0, totalCompleteFailures=0, successivePartialFailures=1, totalPartialFailures=1)
2024-09-11 19:51:38 platform > Backing off for: 0 seconds.
2024-09-11 19:51:39 platform > Docker volume job log path: /tmp/workspace/154476/1/logs.log
2024-09-11 19:51:39 platform > Executing worker wrapper. Airbyte version: 0.63.13
2024-09-11 19:51:39 platform > Using default value for environment variable SIDECAR_KUBE_CPU_LIMIT: '2.0'
2024-09-11 19:51:39 platform > 
2024-09-11 19:51:39 platform > Using default value for environment variable SOCAT_KUBE_CPU_LIMIT: '2.0'
2024-09-11 19:51:39 platform > ----- START CHECK -----
2024-09-11 19:51:39 platform > 
2024-09-11 19:51:39 platform > Using default value for environment variable SIDECAR_KUBE_CPU_REQUEST: '0.1'
2024-09-11 19:51:39 platform > Using default value for environment variable SOCAT_KUBE_CPU_REQUEST: '0.1'
2024-09-11 19:51:39 platform > Checking if airbyte/source-greenhouse:0.5.15 exists...
2024-09-11 19:51:39 platform > airbyte/source-greenhouse:0.5.15 was found locally.
2024-09-11 19:51:39 platform > Creating docker container = source-greenhouse-check-154476-1-xbonv with resources io.airbyte.config.ResourceRequirements@1ec18270[cpuRequest=,cpuLimit=,memoryRequest=,memoryLimit=,additionalProperties={}] and allowedHosts io.airbyte.config.AllowedHosts@687b175f[hosts=[harvest.greenhouse.io, *.datadoghq.com, *.datadoghq.eu, *.sentry.io],additionalProperties={}]
2024-09-11 19:51:39 platform > Preparing command: docker run --rm --init -i -w /data/154476/1 --log-driver none --name source-greenhouse-check-154476-1-xbonv --network host -v airbyte_workspace:/data -v oss_local_root:/local -e DEPLOYMENT_MODE=OSS -e WORKER_CONNECTOR_IMAGE=airbyte/source-greenhouse:0.5.15 -e AUTO_DETECT_SCHEMA=true -e LAUNCHDARKLY_KEY= -e SOCAT_KUBE_CPU_REQUEST=0.1 -e SOCAT_KUBE_CPU_LIMIT=2.0 -e FIELD_SELECTION_WORKSPACES= -e USE_STREAM_CAPABLE_STATE=true -e WORKER_ENVIRONMENT=DOCKER -e AIRBYTE_ROLE=dev -e APPLY_FIELD_SELECTION=false -e WORKER_JOB_ATTEMPT=1 -e OTEL_COLLECTOR_ENDPOINT=http://host.docker.internal:4317 -e FEATURE_FLAG_CLIENT=config -e AIRBYTE_VERSION=0.63.13 -e WORKER_JOB_ID=154476 airbyte/source-greenhouse:0.5.15 check --config source_config.json
2024-09-11 19:51:39 platform > Reading messages from protocol version 0.2.0
2024-09-11 19:52:31 platform > Backing off _send(...) for 0.0s (airbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Request URL: https://harvest.greenhouse.io/v1/applications?per_page=100&created_after=1970-01-01T00%3A00%3A00.000Z, Response Code: 500, Response Text: {"message":"Server timeout in API"})
2024-09-11 19:52:31 platform > Retrying. Sleeping for 20 seconds
2024-09-11 19:53:43 platform > Backing off _send(...) for 0.0s (airbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Request URL: https://harvest.greenhouse.io/v1/applications?per_page=100&created_after=1970-01-01T00%3A00%3A00.000Z, Response Code: 500, Response Text: {"message":"Server timeout in API"})
2024-09-11 19:53:43 platform > Retrying. Sleeping for 80 seconds
2024-09-11 19:55:55 platform > Backing off _send(...) for 0.0s (airbyte_cdk.sources.streams.http.exceptions.UserDefinedBackoffException: Request URL: https://harvest.greenhouse.io/v1/applications?per_page=100&created_after=1970-01-01T00%3A00%3A00.000Z, Response Code: 500, Response Text: {"message":"Server timeout in API"})
2024-09-11 19:55:55 platform > Retrying. Sleeping for 320 seconds
2024-09-11 20:01:39 platform > Retry State: RetryManager(completeFailureBackoffPolicy=BackoffPolicy(minInterval=PT10S, maxInterval=PT30M, base=3), partialFailureBackoffPolicy=null, successiveCompleteFailureLimit=5, totalCompleteFailureLimit=10, successivePartialFailureLimit=1000, totalPartialFailureLimit=20, successiveCompleteFailures=1, totalCompleteFailures=1, successivePartialFailures=0, totalPartialFailures=1)
 Backoff before next attempt: 10 seconds
2024-09-11 20:01:59 platform > Check connection job subprocess finished with exit code 143
2024-09-11 20:01:59 platform > Lost connection to the source: 
java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.ensureOpen(BufferedInputStream.java:206) ~[?:?]
        at java.base/java.io.BufferedInputStream.implRead(BufferedInputStream.java:411) ~[?:?]
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:405) ~[?:?]
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:350) ~[?:?]
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:393) ~[?:?]
        at java.base/sun.nio.cs.StreamDecoder.lockedRead(StreamDecoder.java:217) ~[?:?]
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:171) ~[?:?]
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:188) ~[?:?]
        at java.base/java.io.BufferedReader.fill(BufferedReader.java:160) ~[?:?]
        at java.base/java.io.BufferedReader.implReadLine(BufferedReader.java:370) ~[?:?]
        at java.base/java.io.BufferedReader.readLine(BufferedReader.java:347) ~[?:?]
        at java.base/java.io.BufferedReader.readLine(BufferedReader.java:436) ~[?:?]
        at io.airbyte.workers.WorkerUtils.getStdErrFromErrorStream(WorkerUtils.java:255) ~[io.airbyte-airbyte-commons-worker-0.63.13.jar:?]

Contribute

  • Yes, I want to contribute
@marcosmarxm
Copy link
Member

@etiennecallies let me know if you need any help to make the contribution!

@etiennecallies
Copy link
Contributor Author

@etiennecallies let me know if you need any help to make the contribution!

@marcosmarxm thanks! I created this mini-PR. The idea? Switch from one stream to another while performing health-check. Way safer than having a custom parameter only used for checks. Please tell me, if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants