You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With the changes that came in v0.9.0 our project has seen a increase in run times with the HubSpot package. It looks like int_hubspot__contact_merge_adjust is the issue, and specifically the merge_contacts macro that is driving the decrease in performance. We've seen run times range between 15-35 minutes.
Relevant error log or model output
No response
Expected behavior
I don't expect the query to run as long as it does.
My guess is the merge_contacts query is suboptimal in redshift. I played around with a few solutions that sped up my run:
I filtered the return set of numbers so that the generated number was <= (select max(json_array_length(json_serialize(split_to_array(calculated_merged_vids, ';')), true)) from contacts). I added an additional subquery to numbers to achieve this, but I also considered doing this by setting the upper_bound variable using the run_query macro.
I also filtered contacts where calculated_merged_vids is not null. I'm not sure this achieved a significant boost but calculated_merged_vids is populated for <.01% of our contacts.
The query time averaged a little over a minute with the changes.
Are you willing to open a PR to help address this issue?
Yes.
Yes, but I will need assistance and will schedule time during our office hours for guidance
No.
The text was updated successfully, but these errors were encountered:
Is there an existing issue for this?
Describe the issue
With the changes that came in v0.9.0 our project has seen a increase in run times with the HubSpot package. It looks like
int_hubspot__contact_merge_adjust
is the issue, and specifically themerge_contacts
macro that is driving the decrease in performance. We've seen run times range between 15-35 minutes.Relevant error log or model output
No response
Expected behavior
I don't expect the query to run as long as it does.
dbt Project configurations
Package versions
What database are you using dbt with?
redshift
dbt Version
1.3
Additional Context
My guess is the
merge_contacts
query is suboptimal in redshift. I played around with a few solutions that sped up my run:numbers
so that the generated number was<= (select max(json_array_length(json_serialize(split_to_array(calculated_merged_vids, ';')), true)) from contacts)
. I added an additional subquery tonumbers
to achieve this, but I also considered doing this by setting theupper_bound
variable using therun_query
macro.calculated_merged_vids is not null
. I'm not sure this achieved a significant boost butcalculated_merged_vids
is populated for <.01% of our contacts.The query time averaged a little over a minute with the changes.
Are you willing to open a PR to help address this issue?
The text was updated successfully, but these errors were encountered: