[CT-2141] [Spike] Sanity check on full parse performance #7005

jtcohen6 · 2023-02-17T18:42:14Z

Anecdotally we've noticed (or can imagine) some potential for slowdown in parsing (namely "full parses") since we last had parsing performance as a top-level priority.

As a first go, we can do some basic benchmarking on a large (ideally real-world) project, comparing the main branch versus v1.0.

The text was updated successfully, but these errors were encountered:

peterallenwebb · 2023-04-14T18:41:13Z

@jtcohen6 Good news, I think.

Below are excerpts of the perf_info report when running dbt --no-partial-parse --no-version-check parse on the internal_analytics project for the latest 1.3, 1.4, and 1.5 releases. I have speedscope profiles for each of the runs listed here, and I'll take a closer look at them, but I don't see any reason at this level of analysis to suspect a performance degradation from 1.3 -> 1.5.

dbt 1.3.3:

    "is_partial_parse_enabled": false,
    "is_static_analysis_enabled": true,
    "read_files_elapsed": 0.7166596669703722,
    "load_macros_elapsed": 1.905235208105296,
    "parse_project_elapsed": 14.996926792198792,
    "patch_sources_elapsed": 0.1171803749166429,
    "process_manifest_elapsed": 0.7198715419508517,
    "load_all_elapsed": 19.084472500020638,

dbt 1.4.5:

    "is_partial_parse_enabled": false,
    "is_static_analysis_enabled": true,
    "read_files_elapsed": 0.7186416250187904,
    "load_macros_elapsed": 1.8487964579835534,
    "parse_project_elapsed": 15.356004958041012,
    "patch_sources_elapsed": 0.12841375009156764,
    "process_manifest_elapsed": 0.6919153342023492,
    "load_all_elapsed": 19.337341208010912,

dbt 1.5.0rc1:

    "is_partial_parse_enabled": false,
    "is_static_analysis_enabled": true,
    "read_files_elapsed": 0.7262364169582725,
    "load_macros_elapsed": 1.8892155408393592,
    "parse_project_elapsed": 15.864262333139777,
    "patch_sources_elapsed": 0.1376477919984609,
    "process_manifest_elapsed": 0.7611132920719683,
    "load_all_elapsed": 20.021727832965553,

peterallenwebb · 2023-04-17T20:49:27Z

Looking at the profiles for the above runs, I didn't see much difference between them, and for the most part we are spending time in the jinja library code.

There is one interesting quirk that may be worth mentioning. We are spending almost 10% of the execution time in MacroGenerator.__init__(), which I presume is because it is called so many times. The function itself does almost nothing.

In summary, there is no evidence of recent regressions in parse performance, but a review of the higher level algorithms could allow us to do less rework during parsing, and rewriting some of the lower level processing in native-compiled code could still be considered if this is a pain point.

jtcohen6 · 2023-04-19T12:36:46Z

@peterallenwebb That is good news!! Thank you for looking into this in detail.

My suspicion about MacroGenerator.__init__() is that we're passing in the rendering context (dictionary), which can be big. And we do it over and over and over again, because we have to create a slightly different context for every node (even if it's only a handful of context attributes that actually differ). If I recall correctly, @gshank saw this and had some thoughts, back when performance speedup was our top priority in 2021.

jtcohen6 added performance tech_debt Behind-the-scenes changes, with little direct impact on end-user functionality Team:Language labels Feb 17, 2023

github-actions bot changed the title ~~Sanity check on full parse performance~~ [CT-2141] Sanity check on full parse performance Feb 17, 2023

jtcohen6 mentioned this issue Feb 17, 2023

[CT-1535] Investigate performance of functional tests #6289

Closed

jtcohen6 added the spike label Feb 17, 2023

jtcohen6 changed the title ~~[CT-2141] Sanity check on full parse performance~~ [CT-2141] [Spike] Sanity check on full parse performance Feb 17, 2023

jtcohen6 assigned peterallenwebb Apr 12, 2023

peterallenwebb closed this as completed Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-2141] [Spike] Sanity check on full parse performance #7005

[CT-2141] [Spike] Sanity check on full parse performance #7005

jtcohen6 commented Feb 17, 2023

peterallenwebb commented Apr 14, 2023

peterallenwebb commented Apr 17, 2023 •

edited

Loading

jtcohen6 commented Apr 19, 2023

[CT-2141] [Spike] Sanity check on full parse performance #7005

[CT-2141] [Spike] Sanity check on full parse performance #7005

Comments

jtcohen6 commented Feb 17, 2023

peterallenwebb commented Apr 14, 2023

peterallenwebb commented Apr 17, 2023 • edited Loading

jtcohen6 commented Apr 19, 2023

peterallenwebb commented Apr 17, 2023 •

edited

Loading