[Bug] Poor performance on int_hubspot__contact_merge_adjust #109

kcraig-ats · 2023-05-24T00:23:08Z

Is there an existing issue for this?

I have searched the existing issues

Describe the issue

With the changes that came in v0.9.0 our project has seen a increase in run times with the HubSpot package. It looks like int_hubspot__contact_merge_adjust is the issue, and specifically the merge_contacts macro that is driving the decrease in performance. We've seen run times range between 15-35 minutes.

Relevant error log or model output

No response

Expected behavior

I don't expect the query to run as long as it does.

dbt Project configurations

vars:
  hubspot_source:
    hubspot_schema: hubspot_fivetran 
  hubspot_email_event_forward_enabled: false
  hubspot_email_event_print_enabled: false
  hubspot_email_event_spam_report_enabled: false
  hubspot_service_enabled: false
  hubspot_contact_property_enabled: false
  hubspot__pass_through_all_columns: true

Package versions

  - package: fivetran/hubspot
    version: [">=0.9.0", "<0.10.0"]

What database are you using dbt with?

redshift

dbt Version

1.3

Additional Context

My guess is the merge_contacts query is suboptimal in redshift. I played around with a few solutions that sped up my run:

I filtered the return set of numbers so that the generated number was <= (select max(json_array_length(json_serialize(split_to_array(calculated_merged_vids, ';')), true)) from contacts). I added an additional subquery to numbers to achieve this, but I also considered doing this by setting the upper_bound variable using the run_query macro.
I also filtered contacts where calculated_merged_vids is not null. I'm not sure this achieved a significant boost but calculated_merged_vids is populated for <.01% of our contacts.

The query time averaged a little over a minute with the changes.

Are you willing to open a PR to help address this issue?

Yes.
Yes, but I will need assistance and will schedule time during our office hours for guidance
No.

The text was updated successfully, but these errors were encountered:

kcraig-ats mentioned this issue Jun 1, 2023

[Issue-109] Optimize int_hubspot__contact_merge_adjust for redshift users #110

Merged

2 tasks

kcraig-ats closed this as completed Jun 7, 2023

fivetran-jamie mentioned this issue Jun 8, 2023

Releases/v0.10.latest #112

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Poor performance on int_hubspot__contact_merge_adjust #109

[Bug] Poor performance on int_hubspot__contact_merge_adjust #109

kcraig-ats commented May 24, 2023

[Bug] Poor performance on int_hubspot__contact_merge_adjust #109

[Bug] Poor performance on int_hubspot__contact_merge_adjust #109

Comments

kcraig-ats commented May 24, 2023

Is there an existing issue for this?

Describe the issue

Relevant error log or model output

Expected behavior

dbt Project configurations

Package versions

What database are you using dbt with?

dbt Version

Additional Context

Are you willing to open a PR to help address this issue?