🐛 Source Intercom: backoff for companies' scrolling #8395

antixar · 2021-12-01T17:41:27Z

What

OnCall issue: https://github.com/airbytehq/oncall/issues/39
Reason:
The Intercom API can provide only one scroll request at same time and every "scroll" request should load all data pages. Creation of a new "scroll" request will be blocked within a minute if a previous request doesn't load all data. More details here.

How

Implementation of a custom backoff case for this error response. The API returns code_status == 400 and a response body includes JSON item "code" with the value scroll_exists.

Additionally we've fixed another possible bug: The Intercom API supports several versions and its relevant value is managed into accounts' profiles (docs). Not all streams are available for every version.
But we can set a necessary version with an additional header directly.

Both bugs've been covered by tests.

Pre-merge Checklist

Updating a connector

Community member or Airbyter

Grant edit access to maintainers (instructions)
Secrets in the connector's spec are annotated with airbyte_secret
Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
Code reviews completed
Documentation updated
- Connector's README.md
- Connector's bootstrap.md. See description and examples
- Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

Create a non-forked branch based on this PR and test the below items on it
Build is successful
Credentials added to Github CI. Instructions.
/test connector=connectors/<name> command is passing.
New Connector version released on Dockerhub by running the /publish command described here
After the new connector version is published, connector version bumped in the seed directory as described here
Seed specs have been re-generated by building the platform and committing the changes to the seed spec files, as described here

antixar · 2021-12-01T17:49:49Z

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1527074508
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1527074508
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     20    88%
	 -------------------------------------------------
	 TOTAL                           167     20    88%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     76    54%
	 -------------------------------------------------
	 TOTAL                           167     76    54%

airbyte-integrations/connectors/source-intercom/acceptance-test-docker.sh

airbyte-integrations/connectors/source-intercom/source_intercom/source.py

sherifnada · 2021-12-02T01:39:03Z

airbyte-integrations/connectors/source-intercom/source_intercom/source.py

+
+        return False
+
+    def should_retry(self, response: requests.Response) -> bool:


if the scroll exists, shouldn't we use it instead? why would we retry the same request if the scroll exists?

Also, if we restart the scroll, wouldn't that re-load all data from scratch? From the docs:

When this occurs you will need to restart your scroll query as it is not possible to continue from a specific point when using scroll.

Every request must add a next token ID for loading of a next page. It will be a new scroll if this token is not set.
Restarting is available after 2 events only:

all pages are loaded before

anything didn't load a page within a minute.

Thus working with this scroll endpoint is atomic. I mean at same time only one script(connector etc) can use it. The intercom API will return some another error(code value) and our connector will handle it by default CDK logic if there are some problems with resources or requests. And it will be a failure of sync.
All other connectors(scripts etc) have to wait.

I see, so scrolls might have been created from external processes. very interesting.

I wonder if this means we should just use the other endpoint by default. Maybe using the scroll endpoint only on the first sync, or if there is an ongoing external scroll, just use the other endpoint by default. It would allow us to offer incremental sync for this endpoint as well by exploiting the sorting of the data.

Maybe the strategy to use here is:

on the very first sync, use the scroll endpoint. If there is an ongoing scroll, wait until it is done and try to acquire a scroll.

every other sync use the other endpoint. It is highly unlikely there will be more than 10,000 companies since the previous time the data was read so the limitation of that endpoint will not be relevant.e

WDYT?

@sherifnada ,
This "scroll" endpoints is bottleneck for a part of streams. I propose to implement your second point only with comments:

load the first page with only one record by the "non-scroll" endpoint (it doesn't have any restrictions).

read its response value "total_count" and:
2a. use the "non-scroll" endpoint further if total_count <= N (100, 1000, 10^5 ?)
2b. use the "scroll" endpoint if total_count > N

Sure we can add a new spec property for installation of the variable "N". But I understand to it is the bad idea)

…ntercom-companies-sync-failure

antixar · 2021-12-02T13:06:56Z

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1530695434
❌ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1530695434
🐛 https://gradle.com/s/x6c2bm53jtruo

antixar · 2021-12-02T16:14:21Z

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1531449354
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1531449354
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     20    88%
	 -------------------------------------------------
	 TOTAL                           167     20    88%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     76    54%
	 -------------------------------------------------
	 TOTAL                           167     76    54%

sherifnada

Let's ship this as an immediate fix, but apply the strategy you describe in the comment below. We need to validate though, if there are more than 10k companies, will the total_count returned from the non-scroll list endpoint contain the correct total number?

antixar · 2021-12-02T22:50:18Z

/test connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532907405
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532907405
Python tests coverage:

	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                                                 Stmts   Miss  Cover
	 ------------------------------------------------------------------------
	 source_acceptance_test/__init__.py                       2      0   100%
	 source_acceptance_test/base.py                          10      4    60%
	 source_acceptance_test/config.py                        76      8    89%
	 source_acceptance_test/conftest.py                     108    108     0%
	 source_acceptance_test/plugin.py                        47     47     0%
	 source_acceptance_test/tests/__init__.py                 4      0   100%
	 source_acceptance_test/tests/test_core.py              235     95    60%
	 source_acceptance_test/tests/test_full_refresh.py       38     27    29%
	 source_acceptance_test/tests/test_incremental.py        69     38    45%
	 source_acceptance_test/utils/__init__.py                 6      0   100%
	 source_acceptance_test/utils/asserts.py                 37      2    95%
	 source_acceptance_test/utils/common.py                  54     24    56%
	 source_acceptance_test/utils/compare.py                 62     25    60%
	 source_acceptance_test/utils/connector_runner.py        82     49    40%
	 source_acceptance_test/utils/json_schema_helper.py     115     14    88%
	 ------------------------------------------------------------------------
	 TOTAL                                                  945    441    53%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     20    88%
	 -------------------------------------------------
	 TOTAL                           167     20    88%
	 ---------- coverage: platform linux, python 3.8.10-final-0 -----------
	 Name                          Stmts   Miss  Cover
	 -------------------------------------------------
	 source_intercom/__init__.py       2      0   100%
	 source_intercom/source.py       165     76    54%
	 -------------------------------------------------
	 TOTAL                           167     76    54%

antixar · 2021-12-02T23:00:11Z

/publish connector=connectors/source-intercom

🕑 connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532933089
✅ connectors/source-intercom https://github.com/airbytehq/airbyte/actions/runs/1532933089

* backoff for companies scroll * remove a unused companies stream property * fix tests * bump version * update source_specs

backoff for companies scroll

11dd13a

github-actions bot added the area/connectors Connector related issues label Dec 1, 2021

antixar requested a review from ChristopheDuong December 1, 2021 17:44

jrhizor temporarily deployed to more-secrets December 1, 2021 17:52 Inactive

antixar requested a review from grubberr December 1, 2021 18:52

antixar self-assigned this Dec 1, 2021

sherifnada reviewed Dec 2, 2021

View reviewed changes

antixar added 2 commits December 2, 2021 14:54

Merge remote-tracking branch 'origin/master' into antixar/oncall-38-i…

25ab44b

…ntercom-companies-sync-failure

remove a unused companies stream property

f0560e3

antixar temporarily deployed to more-secrets December 2, 2021 13:06 Inactive

antixar requested a review from sherifnada December 2, 2021 13:07

jrhizor temporarily deployed to more-secrets December 2, 2021 13:09 Inactive

fix tests

8df70cb

antixar temporarily deployed to more-secrets December 2, 2021 16:15 Inactive

jrhizor temporarily deployed to more-secrets December 2, 2021 16:16 Inactive

sherifnada approved these changes Dec 2, 2021

View reviewed changes

bump version

2d15fcb

github-actions bot added the area/documentation Improvements or additions to documentation label Dec 2, 2021

antixar temporarily deployed to more-secrets December 2, 2021 22:48 Inactive

jrhizor temporarily deployed to more-secrets December 2, 2021 22:52 Inactive

jrhizor temporarily deployed to more-secrets December 2, 2021 23:02 Inactive

update source_specs

b7daf26

antixar temporarily deployed to more-secrets December 2, 2021 23:17 Inactive

antixar merged commit 64bd0a6 into master Dec 2, 2021

antixar deleted the antixar/oncall-38-intercom-companies-sync-failure branch December 2, 2021 23:17

jrhizor mentioned this pull request Dec 3, 2021

Bump Airbyte version from 0.32.11-alpha to 0.33.0-alpha #8497

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Source Intercom: backoff for companies' scrolling #8395

🐛 Source Intercom: backoff for companies' scrolling #8395

antixar commented Dec 1, 2021 •

edited

Loading

antixar commented Dec 1, 2021 •

edited by github-actions bot

Loading

sherifnada Dec 2, 2021

antixar Dec 2, 2021 •

edited

Loading

sherifnada Dec 2, 2021

antixar Dec 2, 2021

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading

sherifnada left a comment

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading


		return False

		def should_retry(self, response: requests.Response) -> bool:

🐛 Source Intercom: backoff for companies' scrolling #8395

🐛 Source Intercom: backoff for companies' scrolling #8395

Conversation

antixar commented Dec 1, 2021 • edited Loading

What

How

Recommended reading order

Pre-merge Checklist

Community member or Airbyter

Airbyter

antixar commented Dec 1, 2021 • edited by github-actions bot Loading

sherifnada Dec 2, 2021

Choose a reason for hiding this comment

antixar Dec 2, 2021 • edited Loading

Choose a reason for hiding this comment

sherifnada Dec 2, 2021

Choose a reason for hiding this comment

antixar Dec 2, 2021

Choose a reason for hiding this comment

antixar commented Dec 2, 2021 • edited by github-actions bot Loading

antixar commented Dec 2, 2021 • edited by github-actions bot Loading

sherifnada left a comment

Choose a reason for hiding this comment

antixar commented Dec 2, 2021 • edited by github-actions bot Loading

antixar commented Dec 2, 2021 • edited by github-actions bot Loading

antixar commented Dec 1, 2021 •

edited

Loading

antixar commented Dec 1, 2021 •

edited by github-actions bot

Loading

antixar Dec 2, 2021 •

edited

Loading

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading

antixar commented Dec 2, 2021 •

edited by github-actions bot

Loading