Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlClient's connection resiliency adds 10 seconds to database existence checks. Consider exposing a way to override it at runtime #29

Closed
divega opened this issue Dec 20, 2016 · 14 comments
Labels
💡 Enhancement Issues that are feature requests for the drivers we maintain. 📈 Performance Issues that are targeted to performance improvements.

Comments

@divega
Copy link

divega commented Dec 20, 2016

This is to follow up on a specific issue a customer created today on EF Core at dotnet/efcore#7283, but the problem has been brought to our attention before.

Both EF6 and EF Core provide APIs that allow applications to create and initialize databases, check for their existence and modify their schemas. These operations can happen at design time (e.g. when using the migrations feature) or at runtime. Attempting to connect to a database that doesn't exist is part of the normal flow of these operations.

In .NET Framework 4.5.1 SqlClient introduced a connection resiliency feature that performs automatic retries if failures occur during SqlConnection.Open(). Two new settings ConnectRetryCount and ConnectRetryInterval were added to the list of settings recognized in connection strings and a default behavior was adopted so if the first attempt to connect to a database fails a retry will occur after 10 seconds if these settings are not specified. The same feature is now included in the .NET Core version of SqlClient.

With only the default behavior this feature introduces a lag of 10 second for any calling code that attempts to connect to a database that doesn't exist. It also prevents any calling code from implementing its own (potentially more efficient) retry logic correctly. This severely affects customer experience and can potentially affect runtime performance.

Two main approaches have been proposed to mitigate this issue on EF code, but they have severe disadvantages:

  1. Recommend customers to disable SqlClient's connection resiliency altogether by setting ConnectRetryCount=0 when working EF and make sure the feature is disabled any time EF runtime or tooling creates a connection string, or even throw if an attempt is made to use a connection that has the feature enabled. The main disadvantages of this approach are:
  • It is not discoverable for customers and so users will still experience the 10 second lags (or exceptions if we decide to throw) for the default case.
  • It prevents the connection resiliency feature in SqlClient from being available in scenarios in which it could actually have been helpful.
  • EF6 supports older versions of .NET Framework than 4.5.1 and for those versions these settings are not valid. The logic would need to take the executing version of .NET Framework into account.
  1. Store aside a copy of the original connection string so that we can modify it with ConnectRetryCount = 0 and create a separate SqlConnection object to perform existence checks.
  • The main problem with this approach is that both EF Core and EF6 support passing a SqlConnection object to be used in the EF context. It is possible that the password would have already been removed from the ConnectionString in that connection object if it was open before, so in that case the connection string would not contain enough credentials to be able to create a separate functional connection object.

At this point we believe that if SqlConnection exposed a way to programmatically disable connection retries without requiring the modification of the connection string we could modify EF6 and EF Core code to restore the correct behavior and eliminate 10 second lags. We are happy to discuss other options with the SqlClient team.

Also note that this issue applies to both .NET Core and .NET Framework.

cc @ajcvickers

Note on how EF Core implements its own retry logic to check for database existence:

In general, any code that is calling SqlConnection.Open() can leverage contextual knowledge to make connection retries more efficient. E.g.:

  • When EF Core performs existence checks, it will only retry on certain errors if they occur immediately after a database has been created.
  • For a regular existence checks (i.e. those that don't happen immediately after the database has been created) EF Core assumes that immediate failures coming from a database server mean that the database effectively doesn't exist and can avoid any retry logic.
@divega divega changed the title SqlClient's connection resiliency causes performance issues to EF customers. Consider exposing a way to override it at runtime SqlClient's connection resiliency hinders fail-fast health or database existence checks. Consider exposing a way to override it at runtime Jul 31, 2018
@divega divega changed the title SqlClient's connection resiliency hinders fail-fast health or database existence checks. Consider exposing a way to override it at runtime SqlClient's connection resiliency adds 10 seconds to database existence checks. Consider exposing a way to override it at runtime Oct 13, 2018
@divega
Copy link
Author

divega commented May 16, 2019

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

@divega divega transferred this issue from dotnet/corefx May 16, 2019
@David-Engel David-Engel added the 💡 Enhancement Issues that are feature requests for the drivers we maintain. label May 20, 2019
@cheenamalhotra cheenamalhotra added the 📈 Performance Issues that are targeted to performance improvements. label Oct 7, 2019
@cheenamalhotra
Copy link
Member

cheenamalhotra commented Feb 25, 2020

@ajcvickers @roji @saurabh500 @David-Engel

I was looking at this issue seems a simple to do if we can conclude a solution..

Diego's proposition:

"At this point we believe that if SqlConnection exposed a way to programmatically disable connection retries without requiring the modification of the connection string we could modify EF6 and EF Core code to restore the correct behavior and eliminate 10 second lags. We are happy to discuss other options with the SqlClient team."

With my initial investigation, I think we can do this if this is needed. In order to do this, we'd have to expose a SqlConnection property "ConnectRetryCount" (internal reference is already on connection level) and handle it's value change in SqlConnectionPoolKey to make sure it's sent back to the matching connection pool when connection is closed.

Are there any other scenarios/limitations to this change that I'm missing? Opening this old request for discussions/alternative ideas.

@roji
Copy link
Member

roji commented Feb 26, 2020

Thanks for looking at this again @cheenamalhotra... I think we should consider @divega's 2nd point above:

Store aside a copy of the original connection string so that we can modify it with ConnectRetryCount = 0 and create a separate SqlConnection object to perform existence checks.
The main problem with this approach is that both EF Core and EF6 support passing a SqlConnection object to be used in the EF context. It is possible that the password would have already been removed from the ConnectionString in that connection object if it was open before, so in that case the connection string would not contain enough credentials to be able to create a separate functional connection object.

I also had a version of this exact problem in Npgsql, wher connection string changes are required. My solution was to simply add a CloneWith(string connectionString), which accepts a new connection string to override settings in the old one, and returns a new connection instance. Security settings are carried over from the original connection to the cloned one.

This solution seems better because it works for any connection string parameter, not just the retry policy. For example, it's sometimes necessary to create a non-pooling connection from an existing connection which has already been opened; I do this for database existence checks in the EF Core provider, to make sure an actual physical (unpooled) connection is done.

How does that sound? I'm assuming that the actual credentials are kept inside SqlClient, since IIRC it should be possible today to call regular Clone on an open connection, and open the cloned connection (so the password is kept somewhere). Adding CloneWith (or similar) should be trivial. If we feel this is useful, we could even add this to DbConnection in ADO.NET.

PS Some quick testing seems to show that Microsoft.Data.SqlClient continues to expose the password in the connection string even after Open is called, opened #448.

@roji
Copy link
Member

roji commented Feb 26, 2020

For an example of how the Npgsql EF Core provider does database connection checks with CloneWith, see this.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Feb 26, 2020

Thanks @roji

Store aside a copy of the original connection string so that we can modify it with ConnectRetryCount = 0 and create a separate SqlConnection object to perform existence checks.

That does sound like a good solution to me. This can be better handled at EF layer than for us to manage pool modification properties, in case more properties need similar support in future.

Edit: SqlClient would expose CloneWith method on SqlConnection to clone properties from provided Connection String.

I will look into the new issue, thanks!

@roji
Copy link
Member

roji commented Feb 26, 2020

Just to be sure we're saying the same thing, what I was describing would require SqlClient to add a CloneWith method. Upper layers such as EF can't fully handle this because they need to support scenarios where they're given an already-opened SqlConnection object and not a connection string.

@ajcvickers
Copy link

Taking a little step back here, the real problem is the 10 second hang when the server is reachable but the database does not exist. This would solve all the cases in EF where people get the unwanted delay.

To state this a different way, if the server can't be reached, then it's not unreasonable to spend some time trying to reach it. But if the server can be reached and gives a definitive answer that no database with the given name exists, then this answer should be returned immediately.

@roji I'm very reluctant to start cloning connections. It has been a huge and horrifying can of worms in the past.

@ajcvickers
Copy link

We should discuss this in our sync next week. /cc @bricelam

@roji
Copy link
Member

roji commented Mar 4, 2020

Taking a little step back here, the real problem is the 10 second hang when the server is reachable but the database does not exist. This would solve all the cases in EF where people get the unwanted delay.

Ah, I see - that behavior indeed seems odd. However, if memory serves, right after creating a new database in SQL Server, there's a period of time during which the database still does not exist, as SQL Server is preparing it (or whatever). If that is so, then retrying for a non-existing database does make some sense, by allowing people to connect successfully immediately after creating a new database. But I'm not sure if the above is actually correct.

Regardless of the above, I think CloneWith could be necessary for other purposes, e.g. to disable pooling while performing a database existence check. Our current check in EF Core opens pooled connections (which doesn't actually do anything if there are idle pooled connections). Granted, we do execute SELECT 1 after that - so if the database is down we'd know it - but there some scenarios where executing something on an existing connection could work, while creating a new physical connection could fail. So I think we should ideally disable pooling and open a new connection, an in order to do that we'd need CloneWith. This can be a completely separate issue which I can open.

I'm very reluctant to start cloning connections. It has been a huge and horrifying can of worms in the past.

OK, I guess I don't have that history. This mechanism has worked quite well in Npgsql but may be more complicated for SqlClient, I'd be interested in hearing about the problems you've seen.

@saurabh500
Copy link
Contributor

We could introduce an overload of Open and OpenAsync which take in some connection override options instead of providing a set of APIs to target only the Connection Resiliency feature override.

I propose the following

class SqlConnection {
    //... Other APIs
    public void Open(SqlConnectionOverrides options);
    public Task OpenAsync(SqlConnectionOverrides options);

    // Other Apis
}

[Flags] 
enum SqlConnectionOverrides : int
{
    None = 0,
    DisableAzureConnectionResiliency = 1, // This probably requires a better name
    // More options can be added to this flag if needed for any other overrides
}

@David-Engel
Copy link
Contributor

@roji @ajcvickers
In testing the proposal, I discovered that if you use OpenAsync(), you get a connection failure instantly with the current code (SDS or MDS).

@saurabh500 OpenAsync() looks completely wrong to me from an async perspective. I'd like to discuss it in our next sync to make sure I'm understanding it correctly.

@cheenamalhotra
Copy link
Member

Hi @ajcvickers

Closing the issue as PR #463 addresses the issue.

@ajcvickers
Copy link

@David-Engel This recently regressed. See the linked EF issue.

/cc @SamMonoRT

@David-Engel
Copy link
Contributor

Fixed in MDS 5.1.6 and current versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 Enhancement Issues that are feature requests for the drivers we maintain. 📈 Performance Issues that are targeted to performance improvements.
Projects
None yet
Development

No branches or pull requests

6 participants