Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source salesforce: failed async bulk jobs #8009

Merged
merged 13 commits into from
Nov 17, 2021

Conversation

antixar
Copy link
Contributor

@antixar antixar commented Nov 16, 2021

What

Periodically the salesforce connector doesn't wait completion of bulk jobs. Default waiting timeout is 10mins.

How

Implementation of 2 new features:

  1. provide an opportunity to change waiting timeouts by users.
  2. this connector should retry to run same job after failure several times.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@antixar antixar temporarily deployed to more-secrets November 16, 2021 12:56 Inactive
@github-actions github-actions bot added the area/connectors Connector related issues label Nov 16, 2021
@antixar antixar self-assigned this Nov 16, 2021
@antixar antixar linked an issue Nov 16, 2021 that may be closed by this pull request
@github-actions github-actions bot added area/documentation Improvements or additions to documentation area/frontend area/platform issues related to the platform labels Nov 16, 2021
@antixar antixar temporarily deployed to more-secrets November 16, 2021 14:07 Inactive
@antixar antixar force-pushed the antixar/7857-salesforce-failed-normalization branch from 118a207 to f2f748c Compare November 16, 2021 14:27
@github-actions github-actions bot removed area/platform issues related to the platform area/frontend labels Nov 16, 2021
@antixar antixar temporarily deployed to more-secrets November 16, 2021 14:29 Inactive
@antixar antixar temporarily deployed to more-secrets November 16, 2021 15:12 Inactive
@antixar antixar temporarily deployed to more-secrets November 16, 2021 15:22 Inactive
@antixar antixar temporarily deployed to more-secrets November 16, 2021 15:42 Inactive
@antixar
Copy link
Contributor Author

antixar commented Nov 16, 2021

/test connector=connectors/source-salesforces

🕑 connectors/source-salesforces https://github.com/airbytehq/airbyte/actions/runs/1467612083

@jrhizor jrhizor temporarily deployed to more-secrets November 16, 2021 15:46 Inactive
@antixar
Copy link
Contributor Author

antixar commented Nov 16, 2021

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1467636677
❌ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1467636677
🐛 https://gradle.com/s/3kdmcoh4ncg3g

@jrhizor jrhizor temporarily deployed to more-secrets November 16, 2021 15:51 Inactive
@antixar antixar requested a review from davinchia November 16, 2021 16:11
@jrhizor jrhizor temporarily deployed to more-secrets November 16, 2021 16:14 Inactive
@antixar antixar marked this pull request as ready for review November 16, 2021 16:44
Copy link
Contributor

@davinchia davinchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactor + nice tests. I have one comment on the versioning. Please fix that before merging.

@antixar
Copy link
Contributor Author

antixar commented Nov 17, 2021

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1473976936
❌ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1473976936
🐛 https://gradle.com/s/sqcu2xq2xpdrk

@antixar antixar temporarily deployed to more-secrets November 17, 2021 22:45 Inactive
@antixar antixar temporarily deployed to more-secrets November 17, 2021 22:47 Inactive
@antixar
Copy link
Contributor Author

antixar commented Nov 17, 2021

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1473984918
❌ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1473984918
🐛 https://gradle.com/s/557j3hkpz6xmm

@jrhizor jrhizor temporarily deployed to more-secrets November 17, 2021 22:50 Inactive
@antixar
Copy link
Contributor Author

antixar commented Nov 17, 2021

/test connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1474038559
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1474038559
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        75      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              200     94    53%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  41     24    41%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  896    440    51%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                 Stmts   Miss  Cover
	 --------------------------------------------------------
	 source_salesforce/__init__.py            2      0   100%
	 source_salesforce/api.py               114     58    49%
	 source_salesforce/exceptions.py          1      0   100%
	 source_salesforce/rate_limiting.py      22      6    73%
	 source_salesforce/source.py             57     24    58%
	 source_salesforce/streams.py           203     41    80%
	 --------------------------------------------------------
	 TOTAL                                  399    129    68%

@antixar antixar temporarily deployed to more-secrets November 17, 2021 23:06 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets November 17, 2021 23:07 Inactive
@antixar
Copy link
Contributor Author

antixar commented Nov 17, 2021

/publish connector=connectors/source-salesforce

🕑 connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1474087572
✅ connectors/source-salesforce https://github.com/airbytehq/airbyte/actions/runs/1474087572

@jrhizor jrhizor temporarily deployed to more-secrets November 17, 2021 23:24 Inactive
@antixar antixar temporarily deployed to more-secrets November 17, 2021 23:44 Inactive
@antixar antixar merged commit 9a3a327 into master Nov 17, 2021
@antixar antixar deleted the antixar/7857-salesforce-failed-normalization branch November 17, 2021 23:45
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking change on spec.json

@@ -1,7 +1,7 @@
#!/usr/bin/env sh

# Build latest connector image
docker build . -t $(cat acceptance-test-config.yml | grep "connector_image" | head -n 1 | cut -d: -f2)
docker build . -t $(cat acceptance-test-config.yml | grep "connector_image" | head -n 1 | cut -d: -f2-)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also do this in templates? (can be separate pr)

Copy link
Contributor Author

@antixar antixar Nov 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to aggregate more improvements for a separate PR)) for example I want to fix an issue with running of custom acceptance tests locally. These tests can be executed by GitHub CI only and developers can't run them by the script acceptance-test-docker.sh

return resp.json()

def generate_schema(self, stream_name: str) -> Mapping[str, Any]:
schema = {"$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "additionalProperties": True, "properties": {}}
def generate_schema(self, stream_name: str = None) -> Mapping[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did the default value change? is it meaningful to call this without a stream name supplied?

@@ -43,6 +43,14 @@
"type": "string",
"enum": ["BULK", "REST"],
"default": "BULK"
},
"wait_timeout": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this option should not be exposed - it is an implementation detail. See the UX handbook for more context behind why it's not a good idea to expose it. Maybe instead we can use a dynamically increasing wait time for each job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thus I forwarded this PR to Airbyte team)
Your idea with autoscaling of waiting timeout is pretty but we can reproduce this case with long responses locally. For more details I added more informative log messages for troubleshooting of possible similar cases.

@@ -734,6 +734,8 @@ List of available streams:

| Version | Date | Pull Request | Subject |
| :--- | :--- | :--- | :--- |

| 0.1.6 | 2021-11-16 | [8009](https://github.com/airbytehq/airbyte/pull/8009) | Fix retring of BULK jobs |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| 0.1.6 | 2021-11-16 | [8009](https://github.com/airbytehq/airbyte/pull/8009) | Fix retring of BULK jobs |
| 0.1.6 | 2021-11-16 | [8009](https://github.com/airbytehq/airbyte/pull/8009) | Fix retrying of BULK jobs |

MAX_CHECK_INTERVAL_SECONDS = 2.0
MAX_RETRY_NUMBER = 3

def __init__(self, wait_timeout: Optional[int], **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every time we pass a time duration we should either use a data type which encodes the unit of time or we should embed the unit of time in the var name e.g: wait_timeout_in_minutes

expiration_time: DateTime = pendulum.now().add(seconds=int(self._wait_timeout * 60.0))
job_status = "InProgress"
delay_timeout = 0
delay_cnt = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a more expressive variable name?

self.logger.info(f"Sleeping {self.CHECK_INTERVAL_SECONDS} seconds while waiting for Job: {job_id} to complete")
time.sleep(self.CHECK_INTERVAL_SECONDS)
def execute_job(self, query: Mapping[str, Any], url: str) -> str:
job_status = "Failed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's probably better to encode these job statuses using an enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was the bug-fixing only with minimal refactoring. Sure we can do it while a next review.


self.logger.info(f"Sleeping {self.CHECK_INTERVAL_SECONDS} seconds while waiting for Job: {job_id} to complete")
time.sleep(self.CHECK_INTERVAL_SECONDS)
def execute_job(self, query: Mapping[str, Any], url: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signature should be -> Optional[str]

job_status = self.wait_for_job(url=job_full_url)
if job_status not in ["UploadComplete", "InProgress"]:
break
self.logger.error(f"Waiting error. Try to run this job again {i+1}/{self.MAX_RETRY_NUMBER}...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.logger.error(f"Waiting error. Try to run this job again {i+1}/{self.MAX_RETRY_NUMBER}...")
self.logger.error(f"Waiting error. Running retry {i+1} out of {self.MAX_RETRY_NUMBER}...")


if job_status in ["Aborted", "Failed"]:
self.delete_job(url=job_full_url)
raise Exception(f"Job for {self.name} stream using BULK API was failed.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we be retrying this job instead of failing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure about possible reasons of failures. Unfortunately I didn't succeed to catch any real failed or aborted job. Maybe there are some internal job failure(timeout or resource issues)

schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* update dockerfile

* update version

* update changelogs

* remove test config

* fix after flake8

* bump versio to 0.1.6

* remove secrets from config

* update source specs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Source Salesforce: failed during normalization step
5 participants