-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issues when selecting megabytes of text from nvarchar(max) column using ExecuteScalarAsync #689
Comments
Hi @alser , Thank you for reporting this. I will try your repro project on my side. Does old driver like v1.1.3 have the same performance? |
Hello, @karinazhou Yes, I've tried 1.1.3, same problems Also System.Data.SqlClient 4.8.2 has the same issue (changing nuget and namespace to System.Data.SqlClient) And I've checked SQL Server 2012 (11.0.5388.0) - same here |
I added when file: method: line: |
Related to #593, possibly even a dup? |
@roji , it seems so, synchronous ExecuteScalar() call returns in 2 seconds |
On second thought, it shouldn't affect performance in such way, just creating all this arrays is almost immediate, and total size is ~20 MB |
it isn't just memory use (thought that can be quite poor) it's the fact that it resets state and replays from the first packet of the data load each time a packet arrives, this drives up cpu and wastes time. The larger the field the more packets it will span and the longer it will take over the sync version. If you can detect the data size and use a scalar approach when the field is larger than around 2/3 of your packet size try that. Or try using a streaming function to get it because those won't buffer packets. |
@Wraith2 thanks for advice Actually I'm working on large enterprise product using ORM to access different databases, SqlServer and Postrgres, then there is multiple layers of business logic. Its quite challenging to detect which type of request is executing, i.e. what fields are being selected, from what data source, and how it should be processed. Given situation described in #593 , I think of introducing several low level hacks in that product`s code base, when for SqlServer synchronous method will be called instead of async, but it can't be done in every place. Do you know when we can expect fix for async methods in Microsoft.Data.SqlClient package ? thanks |
It isn't broken. It's slower than we'd like but don't confuse that with not working. |
I meant performance fix, I didn't say "not working". If there are no plans, please, say that, there were no demands from my end to immediately address the problem. |
I'm an external contributor. I have no say on what happens to this library. I mainly contribute performance improvements. This particular performance problem is complicated, difficult, risky and time consuming. If I found the time needed to rework the library to change the behaviour it would take me months and would then take more months to be reviewed and at that point given the scope may still not be approved because of the scope. It would also likely involve subtle behavioural changes and the risk of new errors emerging is high. So it's a time consuming and thankless task for me. |
@Wraith2 just to confirm for me, is this the same issue as #593? In other words, after #603 (in SqlClient 2.0.0) at least streaming should be good? If so, we should close this as a dup. @alser if so, then as @Wraith2 suggested above, as a workaround you can drop down to raw ADO.NET and use the streaming API DbDataReader.GetTextReader. The same code should work as-is across both SqlClient and Npgsql. Another, possibly more reasonable option, is to simply use sync I/O for the time being. I don't know what your application looks like and what its perf characteristics are like, but that may be a completely reasonable option without any real drawbacks. |
Looks like it. The repro code is simple, it creates a large string then reads it and we know that's slow and why. |
@cheenamalhotra as above, this seems like it can be closed as a dup of #593. |
Closing as duplicate of #593 |
Thanks a lot for answers! Yes, its definetely dup. Its easier to use sync api in my case (given queries are executed using ORM) , and in some cases streaming can be useful. In my app text column is usually json data, it can be several megabytes in size. |
Describe the bug
Selecting ~20 MB of text data from nvarchar(max) column using ExecuteScalarAsync has very poor performance, executing up to 10 minutes with 100% CPU consumption (1 core) and lots of garbage collections.
It happens probably because of DataReader is creating multiple byte arrays for each portion of data received from SQL.
Selecting the text in SSMS occures in less than a second.
To reproduce
Sample project to reproduce the issue: https://github.com/alser/SqlDataClientStringPerformanceIssue
Its console app connecting to localhost to SQL Server (no network issues involved)
Expected behavior
Selecting data should occur in few seconds with low CPU consumptions and without lots of GCs.
Probably DataReader should rent byte arrays when reading string fields instead of creating new ones for each portion of data.
Further technical details
Microsoft.Data.SqlClient version: 2.0.0 (nuget)
.NET target: Core 3.1.7
SQL Server version: SQL Server 2016 (13.0.1742.0)
Operating system: Windows 10 version 10.0.19041.450
The text was updated successfully, but these errors were encountered: