Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review expression generated column names #5583

Closed
jdunkerley opened this issue Feb 7, 2023 · 8 comments · Fixed by #5850
Closed

Review expression generated column names #5583

jdunkerley opened this issue Feb 7, 2023 · 8 comments · Fixed by #5850
Assignees
Labels
-libs Libraries: New libraries to be implemented l-derived-columns x-chore Type: chore

Comments

@jdunkerley
Copy link
Member

Currently, we remove all non alpha numeric characters and replace with _.

Need to check that DB escaping is safe and disable this behavior.

@jdunkerley jdunkerley added x-chore Type: chore -libs Libraries: New libraries to be implemented l-derived-columns labels Feb 7, 2023
@jdunkerley jdunkerley added this to the Beta Release milestone Feb 7, 2023
@github-project-automation github-project-automation bot moved this to ❓New in Issues Board Feb 7, 2023
@jdunkerley jdunkerley assigned jdunkerley and unassigned jdunkerley Feb 7, 2023
@jdunkerley
Copy link
Member Author

Rules for SQL92 identifiers
Ordinary identifiers are identifiers not surrounded by double quotation marks.
Delimited identifiers are identifiers surrounded by double quotation marks.

A delimited identifier can contain any characters within the double quotation marks. The enclosing double quotation marks are not part of the identifier; they serve only to mark its beginning and end. Spaces at the end of a delimited identifier are insignificant (truncated). Derby translates two consecutive double quotation marks within a delimited identifier as one double quotation mark-that is, the "translated" double quotation mark becomes a character in the delimited identifier.

Postgres:
Quoted identifiers can contain any character, except the character with code zero. (To include a double quote, write two double quotes.)

SQLite:
"keyword" A keyword in double-quotes is an identifier.

@jdunkerley jdunkerley moved this from ❓New to 📤 Backlog in Issues Board Feb 14, 2023
@jdunkerley
Copy link
Member Author

We should reject names which are empty, Nothing or contain \0

@radeusgd radeusgd moved this from 📤 Backlog to 🔧 Implementation in Issues Board Mar 4, 2023
@radeusgd
Copy link
Member

radeusgd commented Mar 6, 2023

TODO:

  • Change "Result" new name in column operations to an expression of that operation (where possible, where not, an approximation)
  • Add tests verifying that generated column names stemming from operation are consistent between in-memory and Database.
  • Add tests for irregular column names in databases (Unicode, weird quotes etc.)
  • Check behaviour of Unicode across DBs.
    • Decide how to handle it.

@radeusgd
Copy link
Member

radeusgd commented Mar 6, 2023

Summary of my findings of weird character support:

Postgres and SQLite are fine with all weird stuff that I threw at it. I added tests to our suite and they are passing. I will add a few more, but it seems that for our current backends we can safely use any characters as column names :)
Zrzut ekranu 2023-03-06 150316

Even slightly older versions of Postgres seem to do just fine:
Zrzut ekranu 2023-03-06 150920

I also checked a few other backends that we may want to potentially support in the future.

  1. SQLServer seems to be doing just fine:

Zrzut ekranu 2023-03-06 150939

  1. There are some problems with MySQL it does allow basic Unicode like characters with accents, but does not support emojis:

Zrzut ekranu 2023-03-06 150835

Zrzut ekranu 2023-03-06 150845

  1. Apparently **Oracle ** database is weird as it allows Unicode and Emojis but does not allow quotes.

Zrzut ekranu 2023-03-06 151059

Zrzut ekranu 2023-03-06 151032

Zrzut ekranu 2023-03-06 150956

@radeusgd
Copy link
Member

radeusgd commented Mar 6, 2023

So for now the rules will be:

  1. empty or Nothing names will raise Illegal_Argument
  2. names containing \0 will raise Illegal_Argument
  3. all other names will be accepted.

We may need to revise this once we start supporting more backends which do not support some of the weirder names.

@radeusgd
Copy link
Member

radeusgd commented Mar 6, 2023

Note:
Newer MySQL seems to accept emojis, but it replaces them with ?:
image
image

@enso-bot
Copy link

enso-bot bot commented Mar 7, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-03-06):

Progress: Added weird name tests. Researched what other DB engines allow. Discussions on NaN handling in Map. Added tests for column operation names. It should be finished by 2023-03-08.

Next Day: Next day I will be working on the same task. Implement new column operation names. Add a hook for name verification for DB Dialects. Start work on mismatched quotes.

@radeusgd radeusgd moved this from 🔧 Implementation to 👁️ Code review in Issues Board Mar 8, 2023
@enso-bot
Copy link

enso-bot bot commented Mar 8, 2023

Radosław Waśko reports a new STANDUP for yesterday (2023-03-07):

Progress: Implemented new column operation names, updated the tests. Added a hook for name verification for DB Dialects. Encountered some weird NullPtr error. It should be finished by 2023-03-08.

Next Day: Next day I will be working on the same task. Find out the error and prepare a PR. Do the common core lib as a separate PR. Prepare the pending mismatched quotes PR. Start work on cross_tab unique naming.

@mergify mergify bot closed this as completed in #5850 Mar 10, 2023
mergify bot pushed a commit that referenced this issue Mar 10, 2023
@github-project-automation github-project-automation bot moved this from 👁️ Code review to 🟢 Accepted in Issues Board Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-libs Libraries: New libraries to be implemented l-derived-columns x-chore Type: chore
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants