You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since this is supposed to be the Successor of https://github.com/Azure/azure-sqldb-spark, the old connector had the option to use sqlContext.sqlDBQuery(config) to execute arbitrary SQL code. There should be a possibility to migrate from the old to the new connector - and to have this option you need to support aribitaray SQL.
This will also help many other scenarios in migration and speed up in loads - more generic loading scenarios at least. It could cover issues #68 and #46. I posted this idea there already once, but thought it would be worth its own topic.
I think you have this option in your code already because you need to run truncate table or create table, its just not exposed externally - or am I missing something?
I myself have been coding it manually on the old connector with this functionality to bulk load in STAGING tables and was running sqlContext.sqlDBQuery(config) before and after for table creation / partition switching / table removal.
I could envision something like azure data factory has on the COPY Activity - an aribitaray "PRE-COPY-SCRIPT" (https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal#create-pipelines) if you do not want to expose something like sqlContext.sqlDBQuery(config).
Adding a PRE/POST COPYSCRIPT would allow for a streamlined "dataframe" interface. And for example using the PRE-COPY-SCRIPT on and empty dataframe could allow arbitrary SQL to be run.
Options like "overwrite" or "truncate: true" would then just be special simple cases of an aribitary pre-copy script.
Whatever options you intend to provided, you cannot cover every usecase, so better give the uses to option to do what they want if they know what the do.
Curious to hear other options.
The text was updated successfully, but these errors were encountered:
having the very same issues here.
While TRUNCATE is useful in a lot of scenarios, PRE- or POST- copy scripts would definitely increase the usability of this connector, especially when dealing with more complex scenarios that involve partitioning etc. in the destination table
having the option to execute any arbitrary SQL command via this connector would also solve this issue and allow even more flexible use of the connector - e.g. to start a stored procedure
Since this is supposed to be the Successor of https://github.com/Azure/azure-sqldb-spark, the old connector had the option to use
sqlContext.sqlDBQuery(config)
to execute arbitrary SQL code. There should be a possibility to migrate from the old to the new connector - and to have this option you need to support aribitaray SQL.This will also help many other scenarios in migration and speed up in loads - more generic loading scenarios at least. It could cover issues #68 and #46. I posted this idea there already once, but thought it would be worth its own topic.
I think you have this option in your code already because you need to run truncate table or create table, its just not exposed externally - or am I missing something?
I myself have been coding it manually on the old connector with this functionality to bulk load in STAGING tables and was running
sqlContext.sqlDBQuery(config)
before and after for table creation / partition switching / table removal.I could envision something like azure data factory has on the COPY Activity - an aribitaray "PRE-COPY-SCRIPT" (https://docs.microsoft.com/en-us/azure/data-factory/tutorial-bulk-copy-portal#create-pipelines) if you do not want to expose something like
sqlContext.sqlDBQuery(config)
.Adding a PRE/POST COPYSCRIPT would allow for a streamlined "dataframe" interface. And for example using the PRE-COPY-SCRIPT on and empty dataframe could allow arbitrary SQL to be run.
Options like "overwrite" or "truncate: true" would then just be special simple cases of an aribitary pre-copy script.
Whatever options you intend to provided, you cannot cover every usecase, so better give the uses to option to do what they want if they know what the do.
Curious to hear other options.
The text was updated successfully, but these errors were encountered: