Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: SqlFunction to compare byte arrays (necessary to filter on rowversion) #5936

Closed
jnm2 opened this issue Jul 1, 2016 · 24 comments
Closed
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported

Comments

@jnm2
Copy link

jnm2 commented Jul 1, 2016

Right now you can look at https://stackoverflow.com/questions/7437970/how-to-query-code-first-entities-based-on-rowversion-timestamp-value and see how difficult this is- it's next to impossible to get right.

Since we can't have comparison operators on byte arrays, the next most idiomatic thing would be to have the following methods in SqlFunctions (or DbFunctions, if possible), naming convention following the Expression class:

  • bool LessThan(byte[], byte[])
  • bool LessThanOrEqual(byte[], byte[])
  • bool GreaterThan(byte[], byte[])
  • bool GreaterThanOrEqual(byte[], byte[])

Questions

  1. I don't know if there is any value in Equal and NotEqual. They would be useful if evaluated client-side, because they would be semantically purer than the == operator which compares byte arrays by reference. The semantics of byte array == are incorrect because SQL compares binary by value (as it should).
  2. Could they be extension methods? Might be a nice experience, might not.

Considered alternatives

Casting

A function that would be cool for other purposes would be long SqlFunctions.Cast(byte[]), but that would not work for this scenario because SQL Server only has signed comparisons, plus it's less performant server-side:

select case when cast(0x0FFFFFFFFFFFFFFF as bigint) < cast(0xFFFFFFFFFFFFFFFF as bigint)
    then 'unsigned' else 'signed - oops' end

Doing a multi-step comparison would get around the signed comparison issue, but that's hacky and slower. Casting to char(8) would subject you to collation comparisons.

Idiomatic binary type

Another alternative would be to use a Binary primitive struct that wraps a byte array and provides all the value-comparison operators and has an implicit conversion to and from a byte array. Additionally, a Timestamp primitive type (mine) for rowversion columns. This puts comparison operators back on the table and would be ideal over SqlFunctions, but obviously will have to wait for #242. If it speeds up #242 all the better!

@rowanmiller
Copy link
Contributor

We want to discuss this with @divega when he is back in the office

@divega
Copy link
Contributor

divega commented Jul 21, 2016

Clearing up for triage. I think for EF Core we should initially consider adding support for translating existing idioms that express this kind of comparisons between byte arrays in .NET, e.g.:

  • StructuralComparisons.StructuralComparer.Compare(x, y) >= 1 would translate to x >= y in SQL.
  • StructuralComparisons.StructuralEqualityComparer.Equals(x, y) and a.SequenceEqual(b) would both translate to x = y in SQL.

The proposed methods LessThan.../GreaterThan... seem like they could be easier to use and more discoverable than StructuralComparisons.StructuralComparer.Compare() however that would be better addressed by adding such methods or even operator overloads on the .NET base class libraries instead of doing it as a one-off usability improvement in EF Core.

For EF 6.x I solved the problem some time ago for a customer using model functions. I have added that as an alternative answer to the question in StackOverflow.com.

PS: @jnm2 your answer in StackOverflow.com is a nice hack!

@divega divega removed this from the 1.1.0 milestone Jul 21, 2016
@divega divega removed their assignment Jul 21, 2016
@jnm2
Copy link
Author

jnm2 commented Jul 21, 2016

@divega, what is your take on a Binary struct and a Timestamp struct where operators have SQL semantics? StructuralComparisons.StructuralComparer.Compare is okay, but nothing beats a strong type model. Can that be the long-term goal?

@divega
Copy link
Contributor

divega commented Jul 21, 2016

@jnm2 I don't have a strong opinion on that right now . I can see that byte[] misses a few details that a more strongly typed alternative could capture, e.g.:

  1. A property of type Timestamp (or rather RowVersion) would imply a length of 8
  2. Properties of type Timestamp would imply value generation on updates and concurrency checks.
  3. Timestamp and Binary could have more well-defined order-comparability semantics than byte[]

We have other means to specify (1) and (2) that are not terribly verbose, e.g. data annotations and fluent APIs. Also I think that (3) is actually debatable because StructuralComparision clearly makes a choice of how two byte[] are compared (i.e. it assigns more weight to initial elements of the array, just like SQL Server when comparing two binaries).

In the end it is also a matter of how general purpose (i.e. provider agnostic) those types are vs. the value they add. E.g. it seems to me that Timestamp could add some value but would probably be SQL Server specific and Binary could be more general purpose but would add very little value (unless I missing something).

@jnm2
Copy link
Author

jnm2 commented Jul 22, 2016

I only really care about 3. I can add conventions for 1 and 2; I don't care if EF has provider-specific knowledge of them. Or UseSqlServer can wire it up. What I really want is purity and expressiveness in the code that I write.

It adds value in polish and making it easy to write the right thing by default. It's always going to feel hackish to write byte[] == byte[] and know that it's a value comparison if it gets executed serverside, and it's always going to be a pain and look ugly to type StructuralComparisons.StructuralComparer.Compare(x, y) < 0 when what I'm thinking is x < y. I want the freedom to be able to get away from workarounds, doing gymnastics on top of C# array semantics when C# array semantics should have nothing to do with it (until the point where you implicitly cast Binary to a byte[]). The Binary mental model should be simple and single-purpose.

I don't believe operators imply big-endian comparisons and less or more than StructuralComparisons. No storage provider that I'm aware of does little-endian binary comparisons. It's not a very useful concept, for either binary or strings. Plus, are you going to support more kinds of structural comparers, like LittleEndianStructuralComparer? I doubt it.

Rather than comparer instances, I think operators are the ideal level at which to work in terms of conciseness, obviousness, and semantics.

@divega
Copy link
Contributor

divega commented Jul 22, 2016

@jnm2 Personally I agree that easy to use relational operators or methods with the existing comparability semantics would be nice, though I doubt this would require coming up with a new type (byte[] could be already fine) and I doubt EF is the right place to do it.

Though from experience, when something so obvious isn't supported, there is usually some reason I didn't know about 😄 I will follow up on that.

@rowanmiller rowanmiller changed the title Feature request for SqlFunction to compare byte arrays, necessary to filter on rowversion Query: SqlFunction to compare byte arrays (necessary to filter on rowversion) Jul 25, 2016
@rowanmiller rowanmiller added this to the Backlog milestone Jul 25, 2016
@MichalPavlik
Copy link
Member

MichalPavlik commented Nov 8, 2017

I'm in the similar situation. I need to get entries whose timestamp/rowversion is greater than some number. FromSql is ugly solution so I tried some syntax hacks and I found one acceptable:

...Where(e => (long)((object)e.RowVersion) > 2017)

translates to

...WHERE [e].[RowVersion] > 2017

It's also working with shadow property and as a bonus, you don't need to care about endianness. Nevertheless, the built-in support for this column type would be appreciated :)

@jnm2
Copy link
Author

jnm2 commented Nov 8, 2017

@MichalPavlik So long as EF never decides to run that client-side, of course.

@MichalPavlik
Copy link
Member

MichalPavlik commented Nov 8, 2017

@jnm2 Yeah, that's why I'm looking forward the custom type mapping. For now it seems that explicit casting is ignored (at least in this case). Need to check with every feature version :)

@werwolfby
Copy link

@MichalPavlik thank you very much for this workaround!

@shahafal
Copy link

@divega are there any news since the last message in this thread? or still the best approach is @MichalPavlik 's workaround?

@kvpt
Copy link

kvpt commented Jun 18, 2018

To provide an update on the thread,

With the release of the 2.1 version and the support of value conversions, it's now easier to do that.

In Entity

public class Entity
{
    public ulong RowVersion { get; set; } // anymore byte[]
}

In Context

modelBuilder.Entity<Entity>(entity =>
{
     entity
         .Property(e => e.RowVersion)
         .HasConversion(new NumberToBytesConverter<ulong>())
         .IsRowVersion();
});

In Query

ulong sinceRowVersion;

...
      .Where(e => e.RowVersion > sinceRowVersion);
...

@skyflyer
Copy link

@kvpt / MS team, I must be doing something wrong, as in my case, the Timestamp field is created as varbinary(max) in the DB, and even though it is marked with ValueGeneratedOnAdd in initial migration, the value is not generated and the insert, of course, fails with the following error:

Cannot insert the value NULL into column 'Timestamp', table 'EFConcurrencyTest.dbo.Orders'; column does not allow nulls. INSERT fails.
The statement has been terminated.

The generated migration is clearly wrong when using ulong for the field, as seen here:

        protected override void Up(MigrationBuilder migrationBuilder)
        {
            migrationBuilder.CreateTable(
                name: "Orders",
                columns: table => new
                {
                    Id = table.Column<Guid>(nullable: false),
                    ReferenceNumber = table.Column<string>(nullable: true),
                    Timestamp = table.Column<byte[]>(nullable: false)
                },
                constraints: table =>
                {
                    table.PrimaryKey("PK_Orders", x => x.Id);
                });
        }

As you can see, the field is only nullable: false -- it is missing the rowVersion: true parameter, so the generated SQL is wrong as well.

I was testing this with "localdb" provider (which should be the same as SQL Server?). The whole TestDbContext I'm using is here.

@kvpt, can you share your setup with which you've been successful? (I though there might be something off 'cause I'm using Guid, but even after changing to int the situation is still the same).

@divega
Copy link
Contributor

divega commented Jun 18, 2018

@skyflyer I think the problem is that @kvpt's description of the workaround is incomplete. I tried a few things and I believe the minimal configuration to make it work is this (notice the HasColumnType is added, and the HasConversion call isn't needed):

            modelBuilder.Entity<Order>(entity =>
            {
                entity
                    .Property(e => e.RowVersion)
                    .HasColumnType("RowVersion")
                    .IsRowVersion();
            });

@ajcvickers do we have a test for this? Otherwise, should we add a test to make sure this continues to work in future rleases?

@skyflyer
Copy link

@divega, excellent! Thanks!

The generated migration is still not the same as before, but since the db type hint is present, it is correctly created in the DB and EF concurrency handling works correct as well.. For example:

CREATE TABLE [Orders] (
    [Id] uniqueidentifier NOT NULL,
    [ReferenceNumber] nvarchar(max) NULL,
    [Timestamp] RowVersion NOT NULL,
    CONSTRAINT [PK_Orders] PRIMARY KEY ([Id])
);

@kvpt
Copy link

kvpt commented Jun 18, 2018

Mine is a little different, I use the RowVersion type, like this :

[RowVersion]  RowVersion    NOT NULL

Because timestamp seem to be deprecated in the documentation.

@ajcvickers
Copy link
Contributor

@divega Yeah, tests would be good. Did you test whether the query generated is correct? (For example, not client evalled?)

@divega
Copy link
Contributor

divega commented Jun 18, 2018

Now yes 😄 Here is an example of a translated query for a constant:

var q = db.Orders.Where(o => o.RowVersion > 2017).ToList();
SELECT [o].[Id], [o].[RowVersion]
FROM [Orders] AS [o]
WHERE [o].[RowVersion] > 0x00000000000007E1

@divega divega removed this from the Backlog milestone Jun 18, 2018
@divega
Copy link
Contributor

divega commented Jun 18, 2018

Clearing up milestone to bring this up in next triage.

  • It appears to me that one of the most important scenarios for the issue is addressed by type conversions
  • The rest of the scenarios (not for timestamp) that this issue was about would be better addressed by having some sort of binary type with stronger order semantics than byte[] and operator support as @jnm2 mentioned. For that, it is probably better to ask @jnm2 to re-report the issue in corefx or some other repo.
  • This is a good example for the type conversions documentation
  • We may want to add a test to make sure we don't ever regress this

@ajcvickers
Copy link
Contributor

@skyflyer
Copy link

@kvpt, is your [RowVersion] a custom attribute? I know timestamp is deprecated, but it is still an alias and the [Timestamp] attribute is the only one that I can find.

Having said that, the @divega's approach with HasColumnType works well.

@kvpt
Copy link

kvpt commented Jun 22, 2018

@skyflyer RowVersion is just the name of my column of RowVersion type.
So the sql generated [ColumnName] TYPE become [RowVersion] ROWVERSION.
In Visual Studio it's displayed like this :
image

@skyflyer
Copy link

@kvpt, OK, I misread [RowVersion] as a c# property annotation. :) Everything clear.

We were referring to different layers (I was talking about c# prop annotation, which (should) drive the model field type as well), whereas you were referring to the DB type... I believe it is clear now.

The renaming of [Timestamp] attribute to better reflect the underlying data type would be a candidate for another ticket, though...

@jtheisen
Copy link

At least the following simple way of doing it in EF6 should work on Core also:

https://stackoverflow.com/a/42765541/870815

@ajcvickers ajcvickers added the closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. label Mar 10, 2022
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. customer-reported
Projects
None yet
Development

No branches or pull requests

10 participants