Cancel unneeded materializations #18101
-
We have a pipeline that integrates data from different sources. The timing of the updates is unknown. Therefore, we create results from partial data and update it whenever there is an update from any source. We use the following policy: dagster.AutoMaterializePolicy.eager().without_rules(
dagster.AutoMaterializeRule.skip_on_parent_missing()
), Everything works fine when a single source update triggers the pipeline, which is wonderful. However, we encounter problems when multiple sources are updated:
This sequence of events is perfectly valid in theory, but we face two problems here:
Both of these problems could be solved by using a sensor which checks the run queue. If the sensor finds a run that will materialize C-D, it can cancel any previous queued or started C-D materialization. However the run generated by auto materialization makes this difficult because it contains multiple steps. For example, if I want to cancel C only, I would have to cancel the entire run and queue a new run for D. If we could turn off this multi-step run feature, we could easily create such a sensor. It would be perfectly fine for us if auto materialization creates separate runs for the root assets and later, when they are completed, creates another runs. This would be equivalent to the multi-step approach. So, my question is: Do we have an easy way to create a setting that tells auto materialization how many steps can be put into a single run? I would like to set it to 1. Alternatively, if you have any other ideas, I'm open to them. Maybe we can achieve the same result with a different approach. Thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @janosroden ! Two thoughts here:
|
Beta Was this translation helpful? Give feedback.
Hi @janosroden ! Two thoughts here:
skip_on_not_all_parents_updated
rule, as described here: https://docs.dagster.io/concepts/assets/asset-auto-execution#customizing-auto-materialize-policies, although it's not clear to me that this would be the exact behavior that you want (as it seems like in some cases you're ok with just a single parent update causing the children to update)