Performance issues when selecting megabytes of text from nvarchar(max) column using ExecuteScalarAsync #689

alser · 2020-08-13T13:48:55Z

Describe the bug

Selecting ~20 MB of text data from nvarchar(max) column using ExecuteScalarAsync has very poor performance, executing up to 10 minutes with 100% CPU consumption (1 core) and lots of garbage collections.

It happens probably because of DataReader is creating multiple byte arrays for each portion of data received from SQL.

Selecting the text in SSMS occures in less than a second.

To reproduce

Sample project to reproduce the issue: https://github.com/alser/SqlDataClientStringPerformanceIssue

Its console app connecting to localhost to SQL Server (no network issues involved)

Expected behavior

Selecting data should occur in few seconds with low CPU consumptions and without lots of GCs.

Probably DataReader should rent byte arrays when reading string fields instead of creating new ones for each portion of data.

Further technical details

Microsoft.Data.SqlClient version: 2.0.0 (nuget)
.NET target: Core 3.1.7
SQL Server version: SQL Server 2016 (13.0.1742.0)
Operating system: Windows 10 version 10.0.19041.450

karinazhou · 2020-08-13T19:12:10Z

Hi @alser ,

Thank you for reporting this. I will try your repro project on my side. Does old driver like v1.1.3 have the same performance?

alser · 2020-08-13T21:14:35Z

Hello, @karinazhou

Yes, I've tried 1.1.3, same problems

Also System.Data.SqlClient 4.8.2 has the same issue (changing nuget and namespace to System.Data.SqlClient)

And I've checked SQL Server 2012 (11.0.5388.0) - same here

alser · 2020-08-13T22:13:15Z

I added Microsoft.Data.SqlClient project to solution (to my sample project) from last commit in master: dbbcde8

when ExecuteScalarAsync was executed, there were 2521 calls creating byte array new byte[8000] in following location:

file: src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\TdsParserStateObject.cs

method: internal bool TryReadNetworkPacket()

line: _inBuff = new byte[_inBuff.Length];

roji · 2020-08-14T06:12:32Z

Related to #593, possibly even a dup?

alser · 2020-08-14T09:24:28Z

@roji , it seems so, synchronous ExecuteScalar() call returns in 2 seconds

alser · 2020-08-14T09:26:39Z

when ExecuteScalarAsync was executed, there were 2521 calls creating byte array new byte[8000] in following location:

On second thought, it shouldn't affect performance in such way, just creating all this arrays is almost immediate, and total size is ~20 MB

Wraith2 · 2020-08-16T08:45:31Z

it isn't just memory use (thought that can be quite poor) it's the fact that it resets state and replays from the first packet of the data load each time a packet arrives, this drives up cpu and wastes time. The larger the field the more packets it will span and the longer it will take over the sync version.

If you can detect the data size and use a scalar approach when the field is larger than around 2/3 of your packet size try that. Or try using a streaming function to get it because those won't buffer packets.

alser · 2020-08-16T11:23:35Z

@Wraith2 thanks for advice

Actually I'm working on large enterprise product using ORM to access different databases, SqlServer and Postrgres, then there is multiple layers of business logic. Its quite challenging to detect which type of request is executing, i.e. what fields are being selected, from what data source, and how it should be processed.

Given situation described in #593 , I think of introducing several low level hacks in that product`s code base, when for SqlServer synchronous method will be called instead of async, but it can't be done in every place.

Do you know when we can expect fix for async methods in Microsoft.Data.SqlClient package ? thanks

Wraith2 · 2020-08-16T11:42:21Z

Do you know when we can expect fix for async methods in Microsoft.Data.SqlClient package ? thanks

It isn't broken. It's slower than we'd like but don't confuse that with not working.

alser · 2020-08-16T12:05:47Z

It's slower than we'd like but don't confuse that with not working.

I meant performance fix, I didn't say "not working". If there are no plans, please, say that, there were no demands from my end to immediately address the problem.

Wraith2 · 2020-08-16T12:17:05Z

I'm an external contributor. I have no say on what happens to this library. I mainly contribute performance improvements.

This particular performance problem is complicated, difficult, risky and time consuming. If I found the time needed to rework the library to change the behaviour it would take me months and would then take more months to be reviewed and at that point given the scope may still not be approved because of the scope. It would also likely involve subtle behavioural changes and the risk of new errors emerging is high. So it's a time consuming and thankless task for me.

roji · 2020-08-16T12:30:53Z

@Wraith2 just to confirm for me, is this the same issue as #593? In other words, after #603 (in SqlClient 2.0.0) at least streaming should be good? If so, we should close this as a dup.

@alser if so, then as @Wraith2 suggested above, as a workaround you can drop down to raw ADO.NET and use the streaming API DbDataReader.GetTextReader. The same code should work as-is across both SqlClient and Npgsql.

Another, possibly more reasonable option, is to simply use sync I/O for the time being. I don't know what your application looks like and what its perf characteristics are like, but that may be a completely reasonable option without any real drawbacks.

Wraith2 · 2020-08-16T14:51:35Z

Looks like it. The repro code is simple, it creates a large string then reads it and we know that's slow and why.

roji · 2020-08-17T09:00:34Z

@cheenamalhotra as above, this seems like it can be closed as a dup of #593.

cheenamalhotra · 2020-08-17T15:45:08Z

Closing as duplicate of #593

alser · 2020-08-18T18:02:16Z

Thanks a lot for answers! Yes, its definetely dup. Its easier to use sync api in my case (given queries are executed using ORM) , and in some cases streaming can be useful.

In my app text column is usually json data, it can be several megabytes in size.

cheenamalhotra closed this as completed Aug 17, 2020

roji mentioned this issue Aug 31, 2020

Async operation takes 130x longer to finish on a nvarchar(MAX) field comparing to a sync operation dotnet/efcore#22331

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues when selecting megabytes of text from nvarchar(max) column using ExecuteScalarAsync #689

Performance issues when selecting megabytes of text from nvarchar(max) column using ExecuteScalarAsync #689

alser commented Aug 13, 2020

karinazhou commented Aug 13, 2020

alser commented Aug 13, 2020

alser commented Aug 13, 2020

roji commented Aug 14, 2020

alser commented Aug 14, 2020

alser commented Aug 14, 2020

Wraith2 commented Aug 16, 2020

alser commented Aug 16, 2020

Wraith2 commented Aug 16, 2020

alser commented Aug 16, 2020

Wraith2 commented Aug 16, 2020

roji commented Aug 16, 2020

Wraith2 commented Aug 16, 2020

roji commented Aug 17, 2020

cheenamalhotra commented Aug 17, 2020

alser commented Aug 18, 2020

Performance issues when selecting megabytes of text from nvarchar(max) column using ExecuteScalarAsync #689

Performance issues when selecting megabytes of text from nvarchar(max) column using ExecuteScalarAsync #689

Comments

alser commented Aug 13, 2020

Describe the bug

To reproduce

Expected behavior

Further technical details

karinazhou commented Aug 13, 2020

alser commented Aug 13, 2020

alser commented Aug 13, 2020

roji commented Aug 14, 2020

alser commented Aug 14, 2020

alser commented Aug 14, 2020

Wraith2 commented Aug 16, 2020

alser commented Aug 16, 2020

Wraith2 commented Aug 16, 2020

alser commented Aug 16, 2020

Wraith2 commented Aug 16, 2020

roji commented Aug 16, 2020

Wraith2 commented Aug 16, 2020

roji commented Aug 17, 2020

cheenamalhotra commented Aug 17, 2020

alser commented Aug 18, 2020