Add `hs_date_entered_` and `hs_date_exited_` fields for the Deals Stream #133

luandy64 · 2020-11-20T15:30:19Z

Description of change

This PR hits the V3 Deals CRM API to get all hs_date_entered_* and hs_date_exited_* fields.

The "batch read" endpoint that Hubspot has lets us search for specific IDs. Given that V1 Deals and V3 Deals have the same primary key ("dealId" and "id" respectively) values, we can merge these new fields into the existing records

Manual QA steps

Ran the tap before and after the change to get records
Diffed the two sets of records

Risks

Lower than Add hs_date_entered/exited fields from v3 #124. The main issue there is we did not paginate the V3 response and could not prove whether the first page of V1 records contains the same IDs as the first page of V3 records. But now that we are searching the V3 for explicit IDs we don't have to worry about that

Rollback steps

revert this branch and bump the version

tap_hubspot/__init__.py

asaf-erlich · 2020-12-01T18:10:40Z

tap_hubspot/__init__.py

+    if force_extras is not None:
+        extras = force_extras
+    else:
+        extras = entity_name != 'contacts'


This feels weird the way the code is written. Why not make force_extras a boolean instead of an Optional of... I guess a boolean? Have I mentioned I miss type hints in python? I wish we used them... but that's not important right now. I see... you allow force extras to be passed as either true or false.

Is entity_name != 'contacts' really the most common default use case? I would rather you just didn't have a default value and passed it in, you only us this method twice, ever right? So what's the benefit of pushing the logic into the method and not from the caller?

Is entity_name != 'contacts' really the most common default use case?

Yes, it's what it was before. I didn't spend too much thing clarifying the logic, but I needed to the ability to override that value

Ok, but why do you need to put this logic inside the method. I see this method called twice below. Why not just pass in extras = entity_name != 'contacts' as the second required parameter to this method for the only time it's not set to extras=False in the method get_v3_schema below?

I noticed that the one time I use force_extras, the boolean would come out to false anyway. So I took out this addition

asaf-erlich · 2020-12-01T18:13:18Z

tap_hubspot/__init__.py

+        if entity_name in ["deals"]:
+            v3_schema = get_v3_schema(entity_name)
+            for key, value in v3_schema.items():
+                if 'hs_date_entered' in key or 'hs_date_exited' in key or 'hs_time_in' in key:


I'm not 100% sure, but could you use any to make this line a bit easier on the eyes? It's essentially saying are any of the strings in key right?

No, it's checking for any of the three prefixes in key

if it's a prefix why not do startswith (I think that's the method in python)? And why does that mean you can't use any?

https://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string

I cleaned this up with an any()

asaf-erlich · 2020-12-01T18:16:39Z

tap_hubspot/__init__.py

+    hapikey = CONFIG['hapikey']
+    if hapikey is None:
+        if CONFIG['token_expires'] is None or CONFIG['token_expires'] < datetime.datetime.utcnow():
+            acquire_access_token_from_refresh_token()


I slightly remember discussing this in a previous PR... but why not make the code simply:
headers = {'Authorization': 'Bearer {}'.format(access_token())}

And then pushing this logic of if CONFIG['token_expires'] is None or CONFIG['token_expires'] < datetime.datetime.utcnow(): into an access_token method which would take care of it? It feels like a leaky abstraction right now that seems unnecessary.

I moved this into a function we use in both functions now

asaf-erlich · 2020-12-01T18:21:15Z

tap_hubspot/__init__.py

+    for record in v3_data:
+        new_properties = {field_name : {'value': field_value}
+                          for field_name, field_value in record['properties'].items()
+                          if 'hs_date_entered' in field_name or 'hs_date_exited' in field_name or 'hs_time_in' in field_name}


Again I think any could be used to reduce the number of if and or in this line...

tap_hubspot/__init__.py

asaf-erlich · 2020-12-01T18:24:33Z

tap_hubspot/__init__.py

@@ -522,9 +609,14 @@ def sync_deals(STATE, ctx):
        params['includeAllProperties'] = True
        params['allPropertiesFetchMode'] = 'latest_version'

+        # Grab selected `hs_date_entered/exited` fields to call the v3 endpoint with
+        v3_fields = [x[1].replace('property_', '')
+                     for x,y in mdata.items() if x and (y.get('selected') == True or has_selected_properties)


Niptick, but please do not name these x,y. Name them something longer so I understand better what information they represent.

asaf-erlich · 2020-12-01T18:29:13Z

tests/test_hubspot_automatic_fields_test.py

+        # Select only the expected streams tables
+        expected_streams = self.expected_streams()
+        catalog_entries = [ce for ce in found_catalogs if ce['tap_stream_id'] in expected_streams]
+        self.select_all_streams_and_fields(conn_id, catalog_entries, select_all_fields=False)


With the exception of this line, the majority of this code above and some of the test code below is identical. Can you move them into the base? You can look at tap-square for examples. In my opinion most of this should be moved to be shared tap-tester code... since all our tests do it.

run_and_verify_check_mode: https://github.com/singer-io/tap-square/blob/master/tests/base.py#L513

run_and_verify_sync: https://github.com/singer-io/tap-square/blob/master/tests/base.py#L573

tap_hubspot/__init__.py

asaf-erlich · 2020-12-02T15:51:35Z

tap_hubspot/__init__.py

+        if entity_name in ["deals"]:
+            v3_schema = get_v3_schema(entity_name)
+            for key, value in v3_schema.items():
+                if any(prefix in key for prefix in V3_PREFIXES):


asaf-erlich · 2020-12-02T15:53:26Z

tap_hubspot/__init__.py

+    authentication values available. If there is an `hapikey` in the config, we
+    need that in `params` and not in the `headers`. Otherwise, we need to get an
+    `access_token` to put in the `headers` and not in the `params`
+    """
    params = params or {}


feels like this could also just be a default value, but unfortunately dict default values can be modified which linters catch... so just leave it like this...

asaf-erlich · 2020-12-02T15:53:45Z

tap_hubspot/__init__.py

+                      interval=10)
+def request(url, params=None):
+
+    params, headers = get_params_and_headers(params)


I like how clean this looks to me now

asaf-erlich · 2020-12-02T15:53:59Z

tap_hubspot/__init__.py

@@ -307,8 +331,59 @@ def lift_properties_and_versions(record):
            record['properties_versions'] += versions
    return record

+def post_search_endpoint(url, data, params=None):


same with this entire method

asaf-erlich · 2020-12-02T15:54:52Z

tap_hubspot/__init__.py

+    v3_url = get_url('deals_v3_batch_read')
+    v3_resp = post_search_endpoint(v3_url, v3_body)
+    return v3_resp.json()['results']
+
 #pylint: disable=line-too-long


Not important for this PR but I normally add line-too-long to the disables in the pylint command. It's from a time when screens were not so large...

asaf-erlich · 2020-12-02T15:55:47Z

tap_hubspot/__init__.py

@@ -499,7 +589,7 @@ def sync_deals(STATE, ctx):
    max_bk_value = start
    LOGGER.info("sync_deals from %s", start)
    most_recent_modified_time = start
-    params = {'count': 250,
+    params = {'limit': 100,


I didn't ask last time, but is there a reason we dropped this to paginate by 100 and not by 250?

The V3 endpoint has a limit of 100. So I needed the V1 endpoint to match it in order for my "search for this deal IDs" strategy to work without pagination

luandy64 added 8 commits September 4, 2020 20:04

added pagination logic for v3 deals sync

1642065

Add failing test to look for v3 fields

8b3f25e

Use batch read endpoint instead of search endpoint

8b42f76

Fix test_can_fetch_hs_date_entered_props to call function properly

e61bb63

Add test for process_v3_deals_records

e8e36a8

Add test for merge_responses

60b3799

Fix refactor error

a5db641

Merge branch 'master' into v3_deals_pagination

c5ec082

luandy64 mentioned this pull request Nov 20, 2020

Add hs_date_entered_* and hs_date_exited_* fields for Deals #132

Closed

luandy64 added 21 commits November 20, 2020 15:34

Fix error from merge conflict

51c86f3

Update expected mock call

4c32cd7

Merge branch 'master' into v3_deals_pagination

1cb9f72

Get automatic_fields_test working

0689c83

Get start_date_test working

26109b9

Fix more missing code from git-fu mistakes

37902d9

Automatic fields test actually works

95fa202

Fix mocks

0667cc3

Remove asserts that are expected to fail

13ce96e

Add to start_date_test: verify second sync data appears in first sync

90b907c

Stop syncing empty streams to speed up test runs

9bc2d83

Remove unused property

d8bb839

Fix more git mistakes

c05e825

Last one: fix mistake from merge conflict

b8be332

Update unit tests

c31644b

hs_time_in are discovered in v1, only sync in v3

00c6b43

Add pagination test

a2dd110

Add failing all_fields_test

10d8ff1

Make pagination test more strict

07b39fc

Add passing all_fields_test for deals

da50754

Clarify the comments

b833a98

asaf-erlich reviewed Dec 1, 2020

View reviewed changes

tap_hubspot/__init__.py Show resolved Hide resolved

asaf-erlich reviewed Dec 1, 2020

View reviewed changes

tap_hubspot/__init__.py Show resolved Hide resolved

asaf-erlich reviewed Dec 1, 2020

View reviewed changes

luandy64 added 2 commits December 1, 2020 18:26

Update tests to use base.py methods

c669786

Get tests passing, remove incorrect comment

8db54a8

asaf-erlich reviewed Dec 1, 2020

View reviewed changes

luandy64 added 7 commits December 1, 2020 18:42

PR Feedback: simplify parse_custom_schema

226b98f

PR Feedback: user clearer variable names

1cc1b5b

PR Feedback: Use any()

afbf863

PR Feedback: Dry up params+headers logic

cbd9113

PR Feedback: Use connection's ensure_connection

8082598

Remove unused imports

3189a2e

Correctly call the function

d0ba700

KAllan357 reviewed Dec 2, 2020

View reviewed changes

tap_hubspot/__init__.py Show resolved Hide resolved

PR Feedback: Raise exception on bad API response

66bb35e

KAllan357 approved these changes Dec 2, 2020

View reviewed changes

asaf-erlich reviewed Dec 2, 2020

View reviewed changes

asaf-erlich approved these changes Dec 2, 2020

View reviewed changes

luandy64 merged commit 1d590dc into master Dec 2, 2020

luandy64 deleted the v3_deals_pagination branch December 2, 2020 16:01

luandy64 mentioned this pull request Dec 2, 2020

Bump to v2.9.0, update changelog #135

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `hs_date_entered_` and `hs_date_exited_` fields for the Deals Stream #133

Add `hs_date_entered_` and `hs_date_exited_` fields for the Deals Stream #133

luandy64 commented Nov 20, 2020

asaf-erlich Dec 1, 2020

luandy64 Dec 1, 2020

asaf-erlich Dec 1, 2020

luandy64 Dec 1, 2020

asaf-erlich Dec 1, 2020

luandy64 Dec 1, 2020

asaf-erlich Dec 1, 2020

luandy64 Dec 1, 2020

asaf-erlich Dec 1, 2020

luandy64 Dec 1, 2020

asaf-erlich Dec 1, 2020

luandy64 Dec 1, 2020

asaf-erlich Dec 1, 2020

luandy64 Dec 1, 2020

asaf-erlich Dec 1, 2020

asaf-erlich Dec 2, 2020

asaf-erlich Dec 2, 2020

asaf-erlich Dec 2, 2020

asaf-erlich Dec 2, 2020

asaf-erlich Dec 2, 2020

asaf-erlich Dec 2, 2020

luandy64 Dec 2, 2020

Add hs_date_entered_* and hs_date_exited_* fields for the Deals Stream #133

Add hs_date_entered_* and hs_date_exited_* fields for the Deals Stream #133

Conversation

luandy64 commented Nov 20, 2020

Description of change

Manual QA steps

Risks

Rollback steps

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `hs_date_entered_` and `hs_date_exited_` fields for the Deals Stream #133

Add `hs_date_entered_` and `hs_date_exited_` fields for the Deals Stream #133