Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for arbitrary SQL Statement / idea for streamlined generalized "fast" load support #73

Closed
hmayer1980 opened this issue Jan 4, 2021 · 2 comments

Comments

@hmayer1980
Copy link

Since this is supposed to be the Successor of https://github.com/Azure/azure-sqldb-spark, the old connector had the option to use sqlContext.sqlDBQuery(config) to execute arbitrary SQL code. There should be a possibility to migrate from the old to the new connector - and to have this option you need to support aribitaray SQL.

This will also help many other scenarios in migration and speed up in loads - more generic loading scenarios at least. It could cover issues #68 and #46. I posted this idea there already once, but thought it would be worth its own topic.

I think you have this option in your code already because you need to run truncate table or create table, its just not exposed externally - or am I missing something?

I myself have been coding it manually on the old connector with this functionality to bulk load in STAGING tables and was running sqlContext.sqlDBQuery(config) before and after for table creation / partition switching / table removal.

I could envision something like azure data factory has on the COPY Activity - an aribitaray "PRE-COPY-SCRIPT" (https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal#create-pipelines) if you do not want to expose something like sqlContext.sqlDBQuery(config).
Adding a PRE/POST COPYSCRIPT would allow for a streamlined "dataframe" interface. And for example using the PRE-COPY-SCRIPT on and empty dataframe could allow arbitrary SQL to be run.

Options like "overwrite" or "truncate: true" would then just be special simple cases of an aribitary pre-copy script.

Whatever options you intend to provided, you cannot cover every usecase, so better give the uses to option to do what they want if they know what the do.

Curious to hear other options.

@gbrueckl
Copy link

gbrueckl commented Jan 4, 2021

having the very same issues here.
While TRUNCATE is useful in a lot of scenarios, PRE- or POST- copy scripts would definitely increase the usability of this connector, especially when dealing with more complex scenarios that involve partitioning etc. in the destination table

having the option to execute any arbitrary SQL command via this connector would also solve this issue and allow even more flexible use of the connector - e.g. to start a stored procedure

@rajmera3
Copy link
Contributor

Arbitrary SQL is out of scope of this connector. #77 has more information as does #21

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants