-
Notifications
You must be signed in to change notification settings - Fork 612
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(clickhouse): add implementations of create_table
, create_view
, and create_database
#4316
Comments
An alternative to dropping a table if it already exists is to truncate the table if the schema passed in and the table schema match |
@mharrisb1 This is a really great proposal. I'll go in reverse order to get the easier bits out of the way first. Let's dive in. PreliminariesIt'd be great if each of the Backend-specific Keyword ArgumentsWe'll likely have to change the base class APIs
|
@mharrisb1 Thoughts here? I know @saulpw has been thinking about DDL-in-ibis recently and likely also has some thoughts here. |
@cpcloud that all sounds good 👍 thanks for the point-by-point feedback If @saulpw has a general approach to DDL in Ibis I would love to follow their lead. Either using another backend as a reference or guinea-pig some ideas with Clickhouse. We've currently have an in-house solution for this so there is no way I can convince my team to use company time to implement this in Ibis so I wouldn't count on this being done (at least well) anytime soon. I think a good first step here could be creating a topic in Discussions about Ibis + DDL to get some community feedback if people are interested |
create_table
, create_view
, and create_database
create_table
, create_view
, and create_database
All of these are accomplishable through
raw_sql
method but it would be nice for these to be fully implemented. In addition, it would also be useful to have the inverse methods implemented:drop_table
,drop_view
, anddrop_database
.Proposal
From my experience with Clickhouse, and some experience with implementing these for my own use cases, I've outlined a brief propsal for implementing these methods. Feedback would be greatly appreciated.
Create Table
This one would be the most complicated to implement. There would be two major use cases:
Schema
object)Expr
objectImplementation for the first use case would also be used in the second.
For the first use case, a
Schema
object would be passed in. You would be able to grab the name and type for each field of the table from that object.For the second use case, an
Expr
object would be passed in. You would be able to grab the name and type of each field from theExpr
schema attribute using the same code as the previous use case.One thing to note for creating a table from an expression: in my experience it is better to call the
CREATE TABLE...
DDL statement and then separately call theINSERT ...
statement instead ofCREATE TABLE ... AS ...
.You would also need to check to see if the table already exists. If it exists, there are a few ways to replace it. You could create a temp table and once that is complete drop the target table and rename the temp table. This is how
dbt-clickhouse
handles it but I've found this can present challenges with theReplicatedMergeTree
table engine. Another method is to drop the target table and then run the DDL and insertion statements. The downside to this is that there is a longer period of time where not table with the identifier will be available for users to query.Backing up a bit, there is also the question of whether a full replace is desirable to the user. Should the user have an option to abandon the process if the table already exists?
There are also a number of additional options that would need to be available to the user when creating a table:
connect()
should be usedstr
Expr
child objects have a name attribute, this would need to be provided by the userstr
str
str
toYYYYMM(some_timestamp)
)tuple[str]
tuple[str]
tuple[str]
dict[str, Any]
Create View
This one would be much easier to implement and Clickhouse provides the standard
CREATE OR REPLACE VIEW ...
option. This would just be created from anExpr
object.Only the following options would need to be provided to the user:
connect()
should be usedstr
Expr
child objects have a name attribute, this would need to be provided by the userstr
str
Create Database
This one is likely the lowest priority out of the three. It would also likely be the easiest to implement. The user would need to be presented the following options:
str
str
str
IF NOT EXISTS
BehaviorIt would likely be beneficial to also allow some way to only create the object if it doesn't already exist.
[TODO]
Notes
I'll keep working on this proposal and flesh out some ideas in a project I already have going. I'm pretty sure I've got the codegen down for the create table piece but I'll need to do some more digging to see how it is already handled in Ibis so I'm following the current approaches.
I'm happy to be the one to implement this over a weekend or two but I'd like some feedback before I dive into the code.
The text was updated successfully, but these errors were encountered: