-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hs_date_entered_*
and hs_date_exited_*
fields for the Deals Stream
#133
Conversation
tap_hubspot/__init__.py
Outdated
if force_extras is not None: | ||
extras = force_extras | ||
else: | ||
extras = entity_name != 'contacts' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird the way the code is written. Why not make force_extras a boolean instead of an Optional of... I guess a boolean? Have I mentioned I miss type hints in python? I wish we used them... but that's not important right now. I see... you allow force extras to be passed as either true or false.
Is entity_name != 'contacts'
really the most common default use case? I would rather you just didn't have a default value and passed it in, you only us this method twice, ever right? So what's the benefit of pushing the logic into the method and not from the caller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is entity_name != 'contacts' really the most common default use case?
Yes, it's what it was before. I didn't spend too much thing clarifying the logic, but I needed to the ability to override that value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but why do you need to put this logic inside the method. I see this method called twice below. Why not just pass in extras = entity_name != 'contacts'
as the second required parameter to this method for the only time it's not set to extras=False
in the method get_v3_schema
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the one time I use force_extras
, the boolean would come out to false
anyway. So I took out this addition
tap_hubspot/__init__.py
Outdated
if entity_name in ["deals"]: | ||
v3_schema = get_v3_schema(entity_name) | ||
for key, value in v3_schema.items(): | ||
if 'hs_date_entered' in key or 'hs_date_exited' in key or 'hs_time_in' in key: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure, but could you use any
to make this line a bit easier on the eyes? It's essentially saying are any of the strings in key
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's checking for any of the three prefixes in key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if it's a prefix why not do startswith
(I think that's the method in python)? And why does that mean you can't use any
?
https://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned this up with an any()
tap_hubspot/__init__.py
Outdated
hapikey = CONFIG['hapikey'] | ||
if hapikey is None: | ||
if CONFIG['token_expires'] is None or CONFIG['token_expires'] < datetime.datetime.utcnow(): | ||
acquire_access_token_from_refresh_token() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly remember discussing this in a previous PR... but why not make the code simply:
headers = {'Authorization': 'Bearer {}'.format(access_token())}
And then pushing this logic of if CONFIG['token_expires'] is None or CONFIG['token_expires'] < datetime.datetime.utcnow():
into an access_token method which would take care of it? It feels like a leaky abstraction right now that seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this into a function we use in both functions now
tap_hubspot/__init__.py
Outdated
for record in v3_data: | ||
new_properties = {field_name : {'value': field_value} | ||
for field_name, field_value in record['properties'].items() | ||
if 'hs_date_entered' in field_name or 'hs_date_exited' in field_name or 'hs_time_in' in field_name} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again I think any
could be used to reduce the number of if
and or
in this line...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tap_hubspot/__init__.py
Outdated
@@ -522,9 +609,14 @@ def sync_deals(STATE, ctx): | |||
params['includeAllProperties'] = True | |||
params['allPropertiesFetchMode'] = 'latest_version' | |||
|
|||
# Grab selected `hs_date_entered/exited` fields to call the v3 endpoint with | |||
v3_fields = [x[1].replace('property_', '') | |||
for x,y in mdata.items() if x and (y.get('selected') == True or has_selected_properties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Niptick, but please do not name these x,y. Name them something longer so I understand better what information they represent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Select only the expected streams tables | ||
expected_streams = self.expected_streams() | ||
catalog_entries = [ce for ce in found_catalogs if ce['tap_stream_id'] in expected_streams] | ||
self.select_all_streams_and_fields(conn_id, catalog_entries, select_all_fields=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the exception of this line, the majority of this code above and some of the test code below is identical. Can you move them into the base? You can look at tap-square for examples. In my opinion most of this should be moved to be shared tap-tester code... since all our tests do it.
- run_and_verify_check_mode: https://github.com/singer-io/tap-square/blob/master/tests/base.py#L513
- run_and_verify_sync: https://github.com/singer-io/tap-square/blob/master/tests/base.py#L573
if entity_name in ["deals"]: | ||
v3_schema = get_v3_schema(entity_name) | ||
for key, value in v3_schema.items(): | ||
if any(prefix in key for prefix in V3_PREFIXES): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice
authentication values available. If there is an `hapikey` in the config, we | ||
need that in `params` and not in the `headers`. Otherwise, we need to get an | ||
`access_token` to put in the `headers` and not in the `params` | ||
""" | ||
params = params or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this could also just be a default value, but unfortunately dict default values can be modified which linters catch... so just leave it like this...
interval=10) | ||
def request(url, params=None): | ||
|
||
params, headers = get_params_and_headers(params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how clean this looks to me now
@@ -307,8 +331,59 @@ def lift_properties_and_versions(record): | |||
record['properties_versions'] += versions | |||
return record | |||
|
|||
def post_search_endpoint(url, data, params=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same with this entire method
v3_url = get_url('deals_v3_batch_read') | ||
v3_resp = post_search_endpoint(v3_url, v3_body) | ||
return v3_resp.json()['results'] | ||
|
||
#pylint: disable=line-too-long |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not important for this PR but I normally add line-too-long to the disables in the pylint command. It's from a time when screens were not so large...
@@ -499,7 +589,7 @@ def sync_deals(STATE, ctx): | |||
max_bk_value = start | |||
LOGGER.info("sync_deals from %s", start) | |||
most_recent_modified_time = start | |||
params = {'count': 250, | |||
params = {'limit': 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't ask last time, but is there a reason we dropped this to paginate by 100 and not by 250?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The V3 endpoint has a limit of 100. So I needed the V1 endpoint to match it in order for my "search for this deal IDs" strategy to work without pagination
Description of change
This PR hits the V3 Deals CRM API to get all
hs_date_entered_*
andhs_date_exited_*
fields.The "batch read" endpoint that Hubspot has lets us search for specific IDs. Given that V1 Deals and V3 Deals have the same primary key (
"dealId"
and"id"
respectively) values, we can merge these new fields into the existing recordsManual QA steps
Risks
Rollback steps