-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix querying and defining table / view names with period #4530
Conversation
@@ -206,7 +206,7 @@ impl DFSchema { | |||
(Some(qq), None) => { | |||
// the original field may now be aliased with a name that matches the | |||
// original qualified name | |||
let table_ref: TableReference = field.name().as_str().into(); | |||
let table_ref = TableReference::parse_str(field.name().as_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is correct as it potentially splits on "." -- but I chose to keep the existing behavior
@@ -1058,7 +1060,7 @@ pub struct CreateCatalogSchema { | |||
#[derive(Clone)] | |||
pub struct DropTable { | |||
/// The table name | |||
pub name: String, | |||
pub name: OwnedTableReference, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key change -- previously in DataFusion relation (table, view, etc) were stored as String which was then "parsed" (really split on .
). This PR changes the code so the original parsed multi-part TableReference is used instead
@@ -915,6 +917,29 @@ message StringifiedPlan { | |||
string plan = 2; | |||
} | |||
|
|||
message BareTableReference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the corresponding serialization support for this feature.
@@ -542,3 +542,11 @@ pub(crate) fn normalize_ident(id: &Ident) -> String { | |||
None => id.value.to_ascii_lowercase(), | |||
} | |||
} | |||
|
|||
// Normalize an owned identifier to a lowercase string unless the identifier is quoted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the non-owned variant in a follow on PR -- but I was trying to minimize the size of this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -262,7 +264,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
}; | |||
|
|||
Ok(LogicalPlan::CreateMemoryTable(CreateMemoryTable { | |||
name: name.to_string(), | |||
name: object_name_to_table_reference(name)?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than encoding the name in a string, it is now encoded as a reference
0 => Err(ParserError("Missing table name.".to_string()).into()), | ||
1 => object_name_to_table_reference(names.pop().unwrap()), | ||
_ => { | ||
Err(ParserError("Multiple objects not supported".to_string()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously if multiple objects were specified, the second was ignored: #4531
@@ -592,6 +602,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
|
|||
let schema = self.build_schema(columns)?; | |||
|
|||
// External tables do not support schemas at the moment, so the name is just a table name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the parser returns a String not an ObjectIdentifier, so I kept the behavior the same
})) | ||
} | ||
} | ||
// only support "schema.table" type identifiers here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has the same behavior as master but I think the logic is clearer
// First, use sqlparser to get table name and insert values | ||
let table_name; | ||
let table_reference; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the test harness to use the new name handling as well
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed insert.slt
and appended to it.
Writing tests using sqllogictest
is life changing . Thank you @mvanschellebeeck and @xudong963 for driving this work forward!
I am sorry to request a large PR code review @xudong963 and @andygrove but it is mostly tests and generated code. cc @returnString in case you are around as you wrote the initial |
cc @NGA-TRAN in case you are interested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR does what described
########## | ||
|
||
# Verify that creating tables with periods in their name works | ||
# (note "foo.bar" is the table name, NOT table "bar" in schema "foo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the period in a table name only works if the name is in double quotes, right? Without double quotes. foo will be treated as a schema?
Is this "foo.bar.baz" a valid table name? can we add test for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without double quotes. foo foo will be treated as a schema?
Yes
Is this "foo.bar.baz" a valid table name? can we add test for it?
It is a valid name. Added tests in c03cf2b -- thank you for the suggestion
I'll review it carefully in the evening. |
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed this file to ddl.slt
and added some more tests, but the github UI doesn't seem to recognize that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ticket makes sense to me, thanks @alamb
FYI, databend processes the case in parser: catalog, database, and table are separated in parser
// Table name
Table {
span: &'a [Token<'a>],
catalog: Option<Identifier<'a>>,
database: Option<Identifier<'a>>,
table: Identifier<'a>,
alias: Option<TableAlias<'a>>,
travel_point: Option<TimeTravelPoint<'a>>,
},
/// Parse one two three idents seperated by a period, fulfilling from the right.
///
/// Example: `db.table.column`
#[allow(clippy::needless_lifetimes)]
pub fn peroid_separated_idents_1_to_3<'a>(
i: Input<'a>,
) -> IResult<'a, (Option<Identifier>, Option<Identifier>, Identifier)> {
map(
rule! {
#ident ~ ("." ~ #ident ~ ("." ~ #ident)?)?
},
|res| match res {
(ident2, None) => (None, None, ident2),
(ident1, Some((_, ident2, None))) => (None, Some(ident1), ident2),
(ident0, Some((_, ident1, Some((_, ident2))))) => (Some(ident0), Some(ident1), ident2),
},
)(i)
}
Makes sense @xudong963 -- this is basically what sqlparser-rs does as well (parses into |
I am merging this one in as I have a few PRs backed up behind it. Thanks again @NGA-TRAN and @xudong963 for your quick review. |
Benchmark runs are scheduled for baseline = 4c1f60c and contender = fbadebb. fbadebb is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4513
Closes #4531
This is really not that large of a PR
This looks like a massive change but a substantial amount of it is generated protobuf code
Rationale for this change
See more details on #4513
TLDR is that this SQL:
Should not be treating "foo" as a schema name
The issue is that the SQL planner was flattening multi part identifiers like
['"foo"', 'bar']
into Strings like'"foo".bar'
and then the strings were being parsed by https://github.com/apache/arrow-datafusion/blob/740a4fa2c6ba4b85875a433bb86e5b00435a5969/datafusion/common/src/table_reference.rs#L97 which does not parse them properly (instead it simply splits them on.
rather than handling"
properly)What changes are included in this PR?
Changes:
OwnedTableReference
, and change the planner and relevant DML plan nodes to use that rather than aString
impl<'a> From<&'a str> for TableReference<'a> {
to document its limitations..slt
tests for sameThe correct way to parse strings to references is to run through the SQL parser and our planner, but this API seems to be a fairly large part of the public API. I filed a follow on ticket to track properly supporting name resolution via the API: #4532
Are these changes tested?
Yes (both extensively with existing tests as well as new tests)
Are there any user-facing changes?
Better support for names with periods