You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.
Snowflake has some capabilities when it comes to transforming during a load. From my very basic understanding of what the transformer does, it seems like much of its logic replaced with a transformation on load, which means we can also directly load data from the enriched files in S3.
That's a really good idea to explore and I need to admit we always took it as granted that all loaders should use same Analytics SDK transformation, but I still see several problems with this approach (although, I didn't try to get deeply into it yet):
First one is somehow related to above paragraph. We'd need to have a lot of logic inside SQL. Right now its well-tested, battle-proven across all loaders Scala code, which makes it very easy to add additional logic into Transformer. I belive it will be a lot harder to do with SQL
Bad rows. I don't think it will be poissible to emit bad rows that we added in 0.5.0 with SQL approach
Am I right that this approach can work only with static set of schemas? Also I don't see how we could mutate the table when new type is discovered.
At the same time, I also see two big benefits:
Reduced costs. That spark job is far from ideal in terms of optimization.
Reduced delay. Basically, we reduced it by time of running transformer
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Snowflake has some capabilities when it comes to transforming during a load. From my very basic understanding of what the transformer does, it seems like much of its logic replaced with a transformation on load, which means we can also directly load data from the enriched files in S3.
Example test file in the enriched file format:
I was able to load this into Snowflake with each type of unstructured event in the right format.
Open questions
If this is indeed possible, there could potentially just be a single loader step without the transformer Spark job at all?
The text was updated successfully, but these errors were encountered: