-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Source Hubspot: Fix issue with getting 414 HTTP error for streams #6954
🐛 Source Hubspot: Fix issue with getting 414 HTTP error for streams #6954
Conversation
/test connector=connectors/source-hubspot
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we should go with a character-length-based check to prevent this issue from happening again (what is the max length we can use?)
- we should have very thorough unit tests around this functionality to ensure data integrity and correct pagination/merging records/making the correct requests/etc....
airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py
Outdated
Show resolved
Hide resolved
airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py
Outdated
Show resolved
Hide resolved
…urochkin/hubspot-source-issue-with-414-errors � Conflicts: � airbyte-integrations/connectors/source-hubspot/sample_files/configured_catalog_for_oauth_config.json � airbyte-integrations/connectors/source-hubspot/sample_files/full_refresh_catalog.json � docs/integrations/sources/hubspot.md
/test connector=connectors/source-hubspot
|
/test connector=connectors/source-hubspot
|
@sherifnada could you please make code review one more time? |
airbyte-integrations/connectors/source-hubspot/unit_tests/test_client.py
Outdated
Show resolved
Hide resolved
@@ -17,6 +17,8 @@ | |||
from base_python.entrypoint import logger | |||
from source_hubspot.errors import HubspotAccessDenied, HubspotInvalidAuth, HubspotRateLimited, HubspotTimeout | |||
|
|||
PROPERTIES_PARAM_MAX_LENGTH = 15000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were you able to confirm this is the max allowed length? is there a reference you can include for the next person looking at the code? (or rather, how did you get this number? can you leave a comment about it?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not found any information on this in the Hubspot documentation or support anywhere. The user has a similar problem as we do, he described it here, but he was also not answered correctly to his question.
I obtained this value experimentally, using various URLs for Hubspot, I found that somewhere on the border of 16400 characters in URL, the Hubspot starts returning us a 414 error.
I put 15000 as a constant for the length of the "property" value only, and leaving about 1000 for the rest of the URL (basic URL, pagination, additional fields such as associations and contacts).
I`ll leave a comment inside the code about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonderful. Very helpful comment, thank you!
airbyte-integrations/connectors/source-hubspot/source_hubspot/api.py
Outdated
Show resolved
Hide resolved
response = getter(params=params) | ||
if stream_records: | ||
for record in self._transform(self.parse_response(response)): | ||
index = next((i for i, item in enumerate(stream_records) if item.get("id") == record.get("id")), -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we key stream_records
in a dict of ID to record? This way we don't need to loop repeatedly e.g:
id_to_record = dict(map(lambda record: (record["id"],record), stream_records))
for record in self._transform....
It also makes it clearer how we handle the case where a record with a different ID was added
Also, are we guaranteed that all streams have IDs whose name is id
?
We should have a LOT of testing around this merging logic because it is super critical -- a mistake here would output incorrect data for a user
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, for all streams that use the properties
parameter there is id
for record.
/test connector=connectors/source-google-directory
|
/test connector=connectors/source-hubspot
|
/test connector=connectors/source-hubspot
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I have one question about why we are changing schemas and the impact users should expect, but otherwise we should be ready to ship
@@ -17,6 +17,8 @@ | |||
from base_python.entrypoint import logger | |||
from source_hubspot.errors import HubspotAccessDenied, HubspotInvalidAuth, HubspotRateLimited, HubspotTimeout | |||
|
|||
PROPERTIES_PARAM_MAX_LENGTH = 15000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonderful. Very helpful comment, thank you!
@@ -2,67 +2,25 @@ | |||
"$schema": "http://json-schema.org/draft-07/schema#", | |||
"type": ["null", "object"], | |||
"properties": { | |||
"portalId": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come we changed so many schemas? were all of these changes schemas previously incorrect? Most importantly: were these fields not being populated before?
The thing I'm trying to verify here is whether a user who has built a workflow based on the output of this connector (e.g: SQL statements or dashboard) will have some breaking changes as a result of this change, or if this is just declaring the data that has always been output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All but one of the changed schemas were previously incorrect and the schemas did not match the records.
The only moment where I changed the scheme, but it was correct, was Forms
Stream. I changed it, since I also changed the URL to get data on it. Before that, we used an outdated URL with version 2 (/forms/v2/form
), I updated it to the third version (/marketing/v3/forms
), we get the same data, but the format for them is slightly changed. I did this because we refer to the Hubspot documentation, but there is no longer any description for getting forms
using the API version 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. thanks for the context!
airbyte-integrations/connectors/source-hubspot/unit_tests/test_client.py
Outdated
Show resolved
Hide resolved
params.update({"properties": ",".join(properties)}) | ||
response = getter(params=params) | ||
for record in self._transform(self.parse_response(response)): | ||
if record["id"] not in stream_records: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonderful, this is really easy to follow now! also simpler than my earlier suggestion ;)
…urochkin/hubspot-source-issue-with-414-errors
/publish connector=connectors/source-hubspot
|
…irbytehq#6954) * Source Hubspot: Fix issue with getting 414 HTTP error for streams * update code and schemas * bump version
What
resolves #3977 and #5835.
Pre-merge Checklist
Community member or Airbyter
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
bootstrap.md
. See description and examplesdocs/integrations/<source or destination>/<name>.md
including changelog. See changelog example