Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hs_date_entered_* and hs_date_exited_* fields for the Deals Stream #133

Merged
merged 46 commits into from
Dec 2, 2020

Conversation

luandy64
Copy link
Contributor

Description of change

This PR hits the V3 Deals CRM API to get all hs_date_entered_* and hs_date_exited_* fields.

The "batch read" endpoint that Hubspot has lets us search for specific IDs. Given that V1 Deals and V3 Deals have the same primary key ("dealId" and "id" respectively) values, we can merge these new fields into the existing records

Manual QA steps

  • Ran the tap before and after the change to get records
  • Diffed the two sets of records

Risks

  • Lower than Add hs_date_entered/exited fields from v3 #124. The main issue there is we did not paginate the V3 response and could not prove whether the first page of V1 records contains the same IDs as the first page of V3 records. But now that we are searching the V3 for explicit IDs we don't have to worry about that

Rollback steps

  • revert this branch and bump the version

if force_extras is not None:
extras = force_extras
else:
extras = entity_name != 'contacts'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels weird the way the code is written. Why not make force_extras a boolean instead of an Optional of... I guess a boolean? Have I mentioned I miss type hints in python? I wish we used them... but that's not important right now. I see... you allow force extras to be passed as either true or false.

Is entity_name != 'contacts' really the most common default use case? I would rather you just didn't have a default value and passed it in, you only us this method twice, ever right? So what's the benefit of pushing the logic into the method and not from the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is entity_name != 'contacts' really the most common default use case?

Yes, it's what it was before. I didn't spend too much thing clarifying the logic, but I needed to the ability to override that value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but why do you need to put this logic inside the method. I see this method called twice below. Why not just pass in extras = entity_name != 'contacts' as the second required parameter to this method for the only time it's not set to extras=False in the method get_v3_schema below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the one time I use force_extras, the boolean would come out to false anyway. So I took out this addition

if entity_name in ["deals"]:
v3_schema = get_v3_schema(entity_name)
for key, value in v3_schema.items():
if 'hs_date_entered' in key or 'hs_date_exited' in key or 'hs_time_in' in key:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but could you use any to make this line a bit easier on the eyes? It's essentially saying are any of the strings in key right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's checking for any of the three prefixes in key

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it's a prefix why not do startswith (I think that's the method in python)? And why does that mean you can't use any?

https://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned this up with an any()

hapikey = CONFIG['hapikey']
if hapikey is None:
if CONFIG['token_expires'] is None or CONFIG['token_expires'] < datetime.datetime.utcnow():
acquire_access_token_from_refresh_token()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly remember discussing this in a previous PR... but why not make the code simply:
headers = {'Authorization': 'Bearer {}'.format(access_token())}

And then pushing this logic of if CONFIG['token_expires'] is None or CONFIG['token_expires'] < datetime.datetime.utcnow(): into an access_token method which would take care of it? It feels like a leaky abstraction right now that seems unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this into a function we use in both functions now

for record in v3_data:
new_properties = {field_name : {'value': field_value}
for field_name, field_value in record['properties'].items()
if 'hs_date_entered' in field_name or 'hs_date_exited' in field_name or 'hs_time_in' in field_name}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I think any could be used to reduce the number of if and or in this line...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -522,9 +609,14 @@ def sync_deals(STATE, ctx):
params['includeAllProperties'] = True
params['allPropertiesFetchMode'] = 'latest_version'

# Grab selected `hs_date_entered/exited` fields to call the v3 endpoint with
v3_fields = [x[1].replace('property_', '')
for x,y in mdata.items() if x and (y.get('selected') == True or has_selected_properties)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niptick, but please do not name these x,y. Name them something longer so I understand better what information they represent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Select only the expected streams tables
expected_streams = self.expected_streams()
catalog_entries = [ce for ce in found_catalogs if ce['tap_stream_id'] in expected_streams]
self.select_all_streams_and_fields(conn_id, catalog_entries, select_all_fields=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the exception of this line, the majority of this code above and some of the test code below is identical. Can you move them into the base? You can look at tap-square for examples. In my opinion most of this should be moved to be shared tap-tester code... since all our tests do it.

if entity_name in ["deals"]:
v3_schema = get_v3_schema(entity_name)
for key, value in v3_schema.items():
if any(prefix in key for prefix in V3_PREFIXES):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

authentication values available. If there is an `hapikey` in the config, we
need that in `params` and not in the `headers`. Otherwise, we need to get an
`access_token` to put in the `headers` and not in the `params`
"""
params = params or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like this could also just be a default value, but unfortunately dict default values can be modified which linters catch... so just leave it like this...

interval=10)
def request(url, params=None):

params, headers = get_params_and_headers(params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how clean this looks to me now

@@ -307,8 +331,59 @@ def lift_properties_and_versions(record):
record['properties_versions'] += versions
return record

def post_search_endpoint(url, data, params=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with this entire method

v3_url = get_url('deals_v3_batch_read')
v3_resp = post_search_endpoint(v3_url, v3_body)
return v3_resp.json()['results']

#pylint: disable=line-too-long
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not important for this PR but I normally add line-too-long to the disables in the pylint command. It's from a time when screens were not so large...

@@ -499,7 +589,7 @@ def sync_deals(STATE, ctx):
max_bk_value = start
LOGGER.info("sync_deals from %s", start)
most_recent_modified_time = start
params = {'count': 250,
params = {'limit': 100,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't ask last time, but is there a reason we dropped this to paginate by 100 and not by 250?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The V3 endpoint has a limit of 100. So I needed the V1 endpoint to match it in order for my "search for this deal IDs" strategy to work without pagination

@luandy64 luandy64 merged commit 1d590dc into master Dec 2, 2020
@luandy64 luandy64 deleted the v3_deals_pagination branch December 2, 2020 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants