-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Materialized Views #1162
Comments
Looks like this was added in 0.13.1 |
@clausherther not so! #1432 fixed a problem where dbt couldn't run if a materialized view lived in the dbt schema. dbt still does not support the creation of materialized views on Snowflake, though it is something I've been experimenting with recently. In your mind, what's the advantage of using a materialized view over a dbt |
Ah, I misread this in the release notes: "Add "MaterializedView" relation type to the Snowflake adapter" Personally, I don't think I have a good use case for MVs in Snowflake, that need has really been largely replaced by dbt. I think I opened the issue alongside the one for transient tables, to keep track of new Snowflake features you potentially want to support. |
alright, thanks for the additional context! Happy to leave this open to track the Snowflake feature, but also welcome input from anyone who 👍'd the issue to chime in with some use cases! |
I'm trying to think how we would deal with MViews in dbt. You don't really want to drop and recreate them every time because then there would be a lot of extra time and cost by Snowflake bringing them up-to-speed with the underlying data. They do act like views when they are "stale" or not completely refreshed so at least there wouldn't be any problems with what happens if they are queried if not completely rebuilt:
Perhaps they would be like incremental tables where they are only re-created when a) they don't exist or b ) when forcing a full refresh? There is also the potential for a lot of run-time error with MViews in Snowflake because they are limited in what they support from a SQL perspective. My concern is that it would be pretty difficult for dbt to parse and do a compile-time validation of what is not legitimate for an MView which would lead in many cases to dbt runs failing if the MView gets compiled but then fails creation or update at run-time. The list of things is too long for me to write here but see link above. |
@drewbanin materialized views are useful for:
CREATE MATERIALIZED VIEW cdc_highwater
AS
(
SELECT primary_key, column_name, MAX(wal_ts) AS wal_ts
FROM cdc_table
) to accelerate consolidation into current-state tables like: SELECT primary_key
, DECODE(column_name, 'col1', value) AS col1
, DECODE(column_name, 'col2', value) AS col2
, ...
, DECODE(column_name, 'colN', value) AS colN
FROM cdc_table JOIN cdc_highwater USING (primary_key, column_name, wal_ts) Frankly I think only use case 1 is value-generating, since the performance improvements are (allegedly) nontrivial for end users. |
Thanks @mike-weinberg! I don't think it would be too challenging to create a
I still think that the list of caveats are too restrictive for most modeling use cases (no window functions, no unions, limited aggregates, can't query views, etc etc etc). Still, happy to support these as I do think the first use-case you've pointed out here @mike-weinberg is good and legitimate. Plus, I imagine that these materialized views will only become more capable over time. If someone is interested in creating this materialization, it all should be pretty doable in user-space! Would love to discuss the implementation further if anyone picks it up :) |
Redshift is also planning on supporting materialized views (in preview): https://docs.aws.amazon.com/redshift/latest/dg/mv.html It appears that many of the same limitations apply as with Snowflake materialized views. In addition, Redshift appears to keep the underlying materialized data static until a refresh command is issued at which time it either incrementally updates the data if certain conditions are met, or does a full rebuild. While I agree that there will be limited cases where this makes sense in modeling, surely there are valid and interesting use cases. |
Thanks @automagic! I used to be not-so-bullish on Materialized views, but the idea of these incremental refreshes is really appealing. One thing I particularly like about them: refreshing a table can happen concurrently with external queries. When dbt replaces tables, it currently blocks if there are long-running select statements hitting the table. The one challenge here is going to be knowing when dbt should rebuild the MV, vs. just refresh it. We can leverage the |
BigQuery has a similar feature https://cloud.google.com/bigquery/docs/materialized-views-intro Would definitely be useful |
Even with the limitations native Snowflake MV's have, supporting them in DBT will become more useful once the Transparent MV usage functionality has been released (in private preview at the moment of writing this). Basically this is functionality akin to Oracle's QUERY_REWRITE. |
Hey @bashyroger, check out:
|
Since this thread is warming up, let's make it spicy 🌶️ ! Snowflake recently announced a clever feature they're calling "transparent materialized views". In the future, if a plan node of a query is equivalent to or is a subset of the plan node of a materialized view, then the materialized view will be used, even if it isn't referenced, because it is provably equal. So, if you identify common aggregations, and build materialized views for those aggregations, then queries which have steps which are equivalent to those materialized views will be rewritten by the planner to use those materialized views. Alternatively, you could maintain multiple copies of the same table, but with different clustering keys, and snowflake will automatically pick the best one. This is analogous to Vertica's concept of projections. Personally, I think this is a big deal, and if snowflake expands support to include INNER JOIN and UNION ALL clauses, then chains of MV's might become the best way to express many dags. Even in the absence of that, the ability to maintain copies of tables with various clustering keys means that MVs could accelerate regular views enough to enable decent-performing low latency reporting when combined with a CDC ELT vendor. EDIT: When I wrote this, I either didn't see the prior two comments, or my page needed to be refreshed. Embarrassing! I'll leave this here though =) |
What is the latest state on this one ? Is this issue still open ? |
I also want to know this as I am going to source control snowflake and views also need to source control like tables. |
Hey @nareshkumaryadav, this issue is definitely still open.
All of which is to say: I'd find it really valuable to hear more about what you're thinking of doing with MVs in your own project. @kamleshsinhchauhan It sounds like what you're after may be slightly different: we have long supported |
I thought I'd share another use case. We are considering use of materialized views in snowflake. Our use case is to load Debezium cdc data to a staging table with variant column that accumulates the stream. Materialized views would be created from the staging table to a history table for the corresponding source table. We could create the materialized views out of band from DBT and then treat them as DBT sources, but would prefer to have it all in DBT if possible. It would be great to hear if folks are having success with the experimental implementation, and/or if that is likely to make its way into a dbt release in the near future. |
Not sure I understand why this is an issue? Redshift does support MVs now, including automated refresh, and internally knows whether they can be incrementally updated or require a full update. The refresh occurs asynchronously (scheduled to avoid conflict with other uses). dbt should simply recognize that when a view is MV, it simply needs to leave it alone unless the query itself has changed. |
Our analytics team is suffering with poor performance of |
I didn't know this! I just updated the sample code in dbt-labs/dbt-labs-experimental-features#11 to add an The question at this point isn't, Can you use materialized views with dbt? You absolutely can, by including some custom macros and materializations in your project. The question is, Do we have a recommended implementation of MVs we feel great about, such that we want to set it as the standard for all new projects going forward? Materialized views have come a long way in just the past year, but they still have significant limitations. Particularly on Redshift, MVs do not support late binding, and they can't be identified in |
We support a large number of event types which we split out in separate tables (one for each event) for a variety of reasons. Currently we keep all these up-to-date by running hourly incremental updates. When moving this part of our setup to materialized views:
|
+1 Would really love to see this implemented. Auto refreshed materialized views in Redshift are much more performant and efficient than incremental materialization based on my tests. |
+1 for redshift materialised views built into dbt. |
+1 for Bigquery materialised views |
+1 for Snowflake materialized views |
Materialised Views have definite uses in anything latency sensitive. The warehouse will update the MV on insert into the underlying table rather than waiting for DBT to run. It would be great to create these from DBT and then create references to them for downstream batch DBT runs. Not sure what the plan is with Clickhouse, but that has additional features around MVs such as their SummingMergeTree and ReplacingMergeTree engine which can also be really useful to get the semantics you need. Another +1 for this feature, Snowflake primarily. |
+1 MV's Snowflake |
+1 MVs in Postgres |
+1 MV's in Snowflake |
+1 for MVs in general, but given this has been open since 2018 and still no MVs... not so hopeful. |
At this point we might need a community contributor to write the custom materialization with the logic @drewbanin outlined above. #1162 (comment) |
+1 for MVs Would help solve this issue with cloning: dbt-labs/dbt-snowflake#118 |
Any update here from the DBT side? This would be very helpful for the performance reasons mentioned many times earlier. |
+1 really need this for performance. |
+1 MVs in Postgres |
I am using PostgreSQL, and when I started using dbt, I really wanted it to support MVs in PostgreSQL. That was when my thinking was to run dbt on demand to create the structure of my data mart. Since then I think I've come more to the "dbt way" where I run dbt on a schedule in order to populate my data mart. In PG, there is no standard way to schedule MV refreshes and no support for log-based refresh of MVs. This says to me that an MV in PG is equivalent to materializing a dbt model as a table and then scheduling a dbt run at the OS level to refresh it. I see the value of MV support for platforms with either internal scheduling (e.g. Oracle REFRESH clause) or incremental refresh (e.g. Oracle FAST clause) of MVs based on underlying changes, such as Oracle or I think Snowflake. I no longer feel like I need this for PG. |
+1 for clickhouse MV's! |
+1 for MV BigQuery and Snowflake |
Any updates on this ticket? |
Use https://github.com/dbt-labs/dbt-labs-experimental-features/tree/main/materialized-views and give feedback there. |
Sharing some early work on integrating DBT, Coral, and Spark to implement materialized views and incremental view maintenance on various engines, but starting with Spark. This DBT extension defines a new materialization mode called |
Thank you for sharing @wmoustafa ! The slides were well-written and explained a couple big ideas (transpilation via Coral and |
Closing in favor of: |
Docs: https://docs.snowflake.net/manuals/sql-reference/sql/create-materialized-view.html
Ideally this would also support the
cluster by
keyword:The text was updated successfully, but these errors were encountered: