Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Source Intercom: backoff for companies' scrolling #8395

Merged
merged 6 commits into from
Dec 2, 2021

Conversation

antixar
Copy link
Contributor

@antixar antixar commented Dec 1, 2021

What

OnCall issue: https://github.com/airbytehq/oncall/issues/39
Reason:
The Intercom API can provide only one scroll request at same time and every "scroll" request should load all data pages. Creation of a new "scroll" request will be blocked within a minute if a previous request doesn't load all data. More details here.

How

Implementation of a custom backoff case for this error response. The API returns code_status == 400 and a response body includes JSON item "code" with the value scroll_exists.

Additionally we've fixed another possible bug: The Intercom API supports several versions and its relevant value is managed into accounts' profiles (docs). Not all streams are available for every version.
But we can set a necessary version with an additional header directly.

Both bugs've been covered by tests.

Recommended reading order

  1. integration_test.py
  2. source.py

Pre-merge Checklist

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • Credentials added to Github CI. Instructions.
  • /test connector=connectors/<name> command is passing.
  • New Connector version released on Dockerhub by running the /publish command described here
  • After the new connector version is published, connector version bumped in the seed directory as described here
  • Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

@github-actions github-actions bot added the area/connectors Connector related issues label Dec 1, 2021
@antixar
Copy link
Contributor Author

antixar commented Dec 1, 2021

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1527074508
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1527074508
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     20    88%
	 -------------------------------------------------
	 TOTAL                           167     20    88%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     76    54%
	 -------------------------------------------------
	 TOTAL                           167     76    54%

@jrhizor jrhizor temporarily deployed to more-secrets December 1, 2021 17:52 Inactive
@antixar antixar requested a review from grubberr December 1, 2021 18:52
@antixar antixar self-assigned this Dec 1, 2021

return False

def should_retry(self, response: requests.Response) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the scroll exists, shouldn't we use it instead? why would we retry the same request if the scroll exists?

Also, if we restart the scroll, wouldn't that re-load all data from scratch? From the docs:

When this occurs you will need to restart your scroll query as it is not possible to continue from a specific point when using scroll.

Copy link
Contributor Author

@antixar antixar Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every request must add a next token ID for loading of a next page. It will be a new scroll if this token is not set.
Restarting is available after 2 events only:

  • all pages are loaded before
  • anything didn't load a page within a minute.

Thus working with this scroll endpoint is atomic. I mean at same time only one script(connector etc) can use it. The intercom API will return some another error(code value) and our connector will handle it by default CDK logic if there are some problems with resources or requests. And it will be a failure of sync.
All other connectors(scripts etc) have to wait.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so scrolls might have been created from external processes. very interesting.

I wonder if this means we should just use the other endpoint by default. Maybe using the scroll endpoint only on the first sync, or if there is an ongoing external scroll, just use the other endpoint by default. It would allow us to offer incremental sync for this endpoint as well by exploiting the sorting of the data.

Maybe the strategy to use here is:

  1. on the very first sync, use the scroll endpoint. If there is an ongoing scroll, wait until it is done and try to acquire a scroll.
  2. every other sync use the other endpoint. It is highly unlikely there will be more than 10,000 companies since the previous time the data was read so the limitation of that endpoint will not be relevant.e

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sherifnada ,
This "scroll" endpoints is bottleneck for a part of streams. I propose to implement your second point only with comments:

  1. load the first page with only one record by the "non-scroll" endpoint (it doesn't have any restrictions).
  2. read its response value "total_count" and:
    2a. use the "non-scroll" endpoint further if total_count <= N (100, 1000, 10^5 ?)
    2b. use the "scroll" endpoint if total_count > N

Sure we can add a new spec property for installation of the variable "N". But I understand to it is the bad idea)

@antixar antixar temporarily deployed to more-secrets December 2, 2021 13:06 Inactive
@antixar
Copy link
Contributor Author

antixar commented Dec 2, 2021

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1530695434
❌ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1530695434
🐛 https://gradle.com/s/x6c2bm53jtruo

@antixar antixar requested a review from sherifnada December 2, 2021 13:07
@jrhizor jrhizor temporarily deployed to more-secrets December 2, 2021 13:09 Inactive
@antixar
Copy link
Contributor Author

antixar commented Dec 2, 2021

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1531449354
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1531449354
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     20    88%
	 -------------------------------------------------
	 TOTAL                           167     20    88%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     76    54%
	 -------------------------------------------------
	 TOTAL                           167     76    54%

@antixar antixar temporarily deployed to more-secrets December 2, 2021 16:15 Inactive
@jrhizor jrhizor temporarily deployed to more-secrets December 2, 2021 16:16 Inactive
Copy link
Contributor

@sherifnada sherifnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship this as an immediate fix, but apply the strategy you describe in the comment below. We need to validate though, if there are more than 10k companies, will the total_count returned from the non-scroll list endpoint contain the correct total number?

@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Dec 2, 2021
@antixar antixar temporarily deployed to more-secrets December 2, 2021 22:48 Inactive
@antixar
Copy link
Contributor Author

antixar commented Dec 2, 2021

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532907405
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532907405
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     20    88%
	 -------------------------------------------------
	 TOTAL                           167     20    88%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     76    54%
	 -------------------------------------------------
	 TOTAL                           167     76    54%

@jrhizor jrhizor temporarily deployed to more-secrets December 2, 2021 22:52 Inactive
@antixar
Copy link
Contributor Author

antixar commented Dec 2, 2021

/publish connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532933089
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532933089

@jrhizor jrhizor temporarily deployed to more-secrets December 2, 2021 23:02 Inactive
@antixar antixar temporarily deployed to more-secrets December 2, 2021 23:17 Inactive
@antixar antixar merged commit 64bd0a6 into master Dec 2, 2021
@antixar antixar deleted the antixar/oncall-38-intercom-companies-sync-failure branch December 2, 2021 23:17
schlattk pushed a commit to schlattk/airbyte that referenced this pull request Jan 4, 2022
* backoff for companies scroll

* remove a unused companies stream property

* fix tests

* bump version

* update source_specs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants