Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support specifying collations on columns and databases #20602

Merged
merged 3 commits into from
Apr 14, 2020

Conversation

roji
Copy link
Member

@roji roji commented Apr 11, 2020

  • Added metadata and migration support for column- and database- level collations.
  • Scaffolding implemented for SQL Server. Sqlite seems to not be reporting the collation in pragma_table_info (need newer version?)

Closes #19275
Closes #6577

Some notes:

  • The builder extension methods are called UseCollation, not HasCollation. This is because collations are are logically separate thing which a column or database can use, but which can (at least in some cases) be managed separately. For example, PostgreSQL allows creating and dropping collations just like tables or sequences, so I'd use HasCollation on the database there.
  • At the database/model, I called the builder extension methods UseCollation like for the columns. It may be better to have UseDefaultCollation, or in this case maybe even HasDefaultCollation (like HasDefaultSchema). Thoughts?
  • Re Sqlite collation scaffolding, outside of EF Core I could see the collation inside the column type (exactly as it had been defined), but somehow in EF Core it doesn't show up (could this be another issue with an old version of SQLitePCLRaw?). I'm also not thrilled about starting to parse this, @bricelam what do you think?

@roji roji requested review from bricelam and AndriySvyryd April 11, 2020 14:24
* Added metadata and migration support for column- and database- level
  collations.
* Scaffolding implemented for SQL Server. Sqlite seems to not be
  reporting the collation in pragma_table_info (need newer version?)

Closes dotnet#19275
Closes dotnet#6577
@roji roji force-pushed the CollationMetadataMigrations branch from 55bd7e2 to d2b8650 Compare April 11, 2020 18:10
@lauxjpn
Copy link
Contributor

lauxjpn commented Apr 14, 2020

What about table level collations?

@roji
Copy link
Member Author

roji commented Apr 14, 2020

@lauxjpn

What about table level collations?

I looked into this - table-level collations is a MySQL-only extension, whereas database and column collations seem to be near-universal. So I think it makes total sense to do the same metadata and migrations changes for table-leve collations in your provider (using the same annotation etc.), but it doesn't seem to belong in Relational. Makes sense?

@lauxjpn
Copy link
Contributor

lauxjpn commented Apr 14, 2020

I looked into this - table-level collations is a MySQL-only extension, whereas database and column collations seem to be near-universal.

Interesting. Makes sense then.

@roji roji merged commit 64695e3 into dotnet:master Apr 14, 2020
@roji roji deleted the CollationMetadataMigrations branch April 14, 2020 14:22
@roji
Copy link
Member Author

roji commented Apr 14, 2020

Grrr, I'm sure I hit squash... 😠

@bricelam
Copy link
Contributor

I'm also not thrilled about starting to parse [the collation SQL], @bricelam what do you think?

Me either. File an issue. Same problem as #8800 and #8802

@JonPSmith
Copy link

I don't know if this is useful, but I did a study of Sqlite case-sensitivity via EF Core. Here are the answers, with the SQL that EF Core 5.0.0-preview.2.20159.4 produced.

As you will see, the default case-sensitivity varies for different string comparison, and some of that is due to the way the LINQ is converted to SQL.

== on Sqlite is	 case-sensitive. SQL created is:
      SELECT "b"."BookId", "b"."Description", "b"."ImageUrl", "b"."Price", "b"."PublishedOn", "b"."Publisher", "b"."SoftDeleted", "b"."Title"
      FROM "Books" AS "b"
      WHERE NOT ("b"."SoftDeleted") AND ("b"."Title" = 'Entity Framework in Action')
Equals on Sqlite is	 case-sensitive. SQL created is:
      SELECT "b"."BookId", "b"."Description", "b"."ImageUrl", "b"."Price", "b"."PublishedOn", "b"."Publisher", "b"."SoftDeleted", "b"."Title"
      FROM "Books" AS "b"
      WHERE NOT ("b"."SoftDeleted") AND ("b"."Title" = 'Entity Framework in Action')
StartsWith on Sqlite is	 case-INsensitive. SQL created is:
      SELECT "b"."BookId", "b"."Description", "b"."ImageUrl", "b"."Price", "b"."PublishedOn", "b"."Publisher", "b"."SoftDeleted", "b"."Title"
      FROM "Books" AS "b"
      WHERE NOT ("b"."SoftDeleted") AND ("b"."Title" IS NOT NULL AND ("b"."Title" LIKE 'Entity%'))
EndsWith on Sqlite is	 case-INsensitive. SQL created is:
      SELECT "b"."BookId", "b"."Description", "b"."ImageUrl", "b"."Price", "b"."PublishedOn", "b"."Publisher", "b"."SoftDeleted", "b"."Title"
      FROM "Books" AS "b"
      WHERE NOT ("b"."SoftDeleted") AND ("b"."Title" IS NOT NULL AND ("b"."Title" LIKE '%Action'))
Contains on Sqlite is	 case-sensitive. SQL created is:
      SELECT "b"."BookId", "b"."Description", "b"."ImageUrl", "b"."Price", "b"."PublishedOn", "b"."Publisher", "b"."SoftDeleted", "b"."Title"
      FROM "Books" AS "b"
      WHERE NOT ("b"."SoftDeleted") AND (('Framework' = '') OR (instr("b"."Title", 'Framework') > 0))
IndexOf on Sqlite is	 case-sensitive. SQL created is:
      SELECT "b"."BookId", "b"."Description", "b"."ImageUrl", "b"."Price", "b"."PublishedOn", "b"."Publisher", "b"."SoftDeleted", "b"."Title"
      FROM "Books" AS "b"
      WHERE NOT ("b"."SoftDeleted") AND ((instr("b"."Title", 'Entity') - 1) = 0)
Like on Sqlite is	 case-INsensitive. SQL created is:
      SELECT "b"."BookId", "b"."Description", "b"."ImageUrl", "b"."Price", "b"."PublishedOn", "b"."Publisher", "b"."SoftDeleted", "b"."Title"
      FROM "Books" AS "b"
      WHERE NOT ("b"."SoftDeleted") AND ("b"."Title" LIKE 'Entity Framework in Action')

@roji
Copy link
Member Author

roji commented Apr 24, 2020

Thanks for this valuable info @JonPSmith! @bricelam was just mentioning the pecularities of Sqlite case-sensitivity...

The basic thing here seems to be that LIKE is case-insensitive by default, and is not sensitive to collations in any way (see these docs). There's a case_sensitive_like pragma to make it case-sensitive though, but that's still not a very good situation. All the examples seen above (Contains, StartsWith...) basically vary based on what they translate to - if it's LIKE they're insensitive, if it's INSTR they're sensitive.

I'm not sure there's anything we can actually do here, except provide guidance to users... I'll also take this into account when working about #19402 - we should avoid different pattern types (constant, parameter, column) producing different case-sensitivity behavior on Sqlite because of LIKE vs. INSTR.

Some tests to show this weird behavior
SELECT sqlite_version(); -- 3.25.1

CREATE TABLE data
(
    name TEXT,
    name_cs TEXT COLLATE BINARY
);
INSERT INTO data (name, name_cs) VALUES ('foo', 'foo');

-- Equality operator is sensitive to collation
SELECT COUNT(*) FROM data WHERE name = 'foo'; -- 1
SELECT COUNT(*) FROM data WHERE name = 'FOO'; -- 0
SELECT COUNT(*) FROM data WHERE name = 'FOO' COLLATE NOCASE; -- 1

-- LIKE is case-insensitive, regardless of collation
SELECT COUNT(*) FROM data WHERE name LIKE 'f%'; -- 1
SELECT COUNT(*) FROM data WHERE name LIKE 'F%'; -- 1
SELECT COUNT(*) FROM data WHERE name LIKE 'F%' COLLATE BINARY; -- 1
SELECT COUNT(*) FROM data WHERE name COLLATE BINARY LIKE 'F%'; -- 1

-- Same if the collation is defined on the column
SELECT COUNT(*) FROM data WHERE name_cs LIKE 'f%'; -- 1
SELECT COUNT(*) FROM data WHERE name_cs LIKE 'F%'; -- 1
SELECT COUNT(*) FROM data WHERE name_cs LIKE 'F%' COLLATE BINARY; -- 1
SELECT COUNT(*) FROM data WHERE name_cs COLLATE BINARY LIKE 'F%'; -- 1

-- There's a pragma to make LIKE case-sensitive pragma
PRAGMA case_sensitive_like=ON;
SELECT COUNT(*) FROM data WHERE name LIKE 'f%'; -- 1
SELECT COUNT(*) FROM data WHERE name LIKE 'F%'; -- 0
SELECT COUNT(*) FROM data WHERE name LIKE 'F%' COLLATE BINARY; -- 0
SELECT COUNT(*) FROM data WHERE name LIKE 'F%' COLLATE NOCASE; -- 0
SELECT COUNT(*) FROM data WHERE name COLLATE BINARY LIKE 'F%'; -- 0
SELECT COUNT(*) FROM data WHERE name COLLATE NOCASE LIKE 'F%'; -- 0

@JonPSmith
Copy link

JonPSmith commented Jun 25, 2020

Hi @roji,

Just to say that I'm updating my book to include collations and I built some more tests. They all work but a couple of questions:

Sorry, my edits managed to muck things up.

@roji
Copy link
Member Author

roji commented Jun 25, 2020

Hey @JonPSmith,

SQLite is case-insensitive now, but the query code only seems to have changed for the IndexOf version. I assume you did something in the setup of a SQLite database to make the simple '==' change from case-sensitive to case-INsensitive.

I don't recall making any changes for IndexOf, and definitely didn't change the general SQLite database setup to make equality insensitive... Your SQL for IndexOf is a bit odd too - we should be translating it to the SQLite instr function, which I'm not seeing. Can you please verify that it's the right query, and if so, post a small code sample? I'll take a deeper look...

Can I provide a collation to SQLite? I tried adding "BINARY" (which I got from here) via the EF.Functions.Collate method but it failed with the this exception message

The exception message seems to be coming from SqlClient, are you sure you haven't gotten your providers crossed? :)

@JonPSmith
Copy link

Doh!

Sorry about that. I have edited my last entry and remove that rubbish - big refactoring of unit tests needs more care by me.

The results for SQLite are:

  • Default SQLite is the same as before - mixed case sensitivity as shown in previous comment.
  • SQLite with BINARY collation has the same output as the default SQLite, i.e. mixed case sensitivity - I assume that is the default.
  • SQLite with NOCASE collation only changed the == and Equals (which produce the same code) to case-insensitive.

I think SQLite is always going to be less 'standard' than the other relational database. Cosmos DB will also be different, although they have added the ability to turn case-insensitive search on the case-sensitively in some API commands I doubt that will be added to EF Core.

I feel like I have the information to update my book - sorry for bothering you.

@roji
Copy link
Member Author

roji commented Jun 25, 2020

No problem @JonPSmith, thanks for the additional information on SQLite. At the end of the day, EF Core exposes the mechanism to specify the collation at various levels, but what the database actually does with it is going to be database-dependent...

Re Cosmos, I strongly suspect that the API around collations/case-sensitivity is going to look quite different there - note that all of the new support we brought into EF 5.0 is in Relational only (so does not apply to Cosmos). Would you mind opening an issue for our backlog to look at Cosmos collations/case-sensitivity?

@JonPSmith
Copy link

JonPSmith commented Jun 25, 2020

You are very kind - I should I seen that.

I did ask about Cosmos DB a while ago and it wasn't being used much so it was on the back burner. I think there are many other issues that come before collation, like aggregates so I'll let someone else ask for Cosmos collations/case-sensitivity when they need it.

@roji
Copy link
Member Author

roji commented Jun 25, 2020

Sure thing. It never hurts to have an issue in the backlog, but as you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify collation on columns Ability to specify database collation
5 participants