Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicating the same connection twice gives "Integrity error, probably unique constraint" #18050

Closed
1 of 2 tasks
jyotsa09 opened this issue Sep 6, 2021 · 12 comments · Fixed by #18161
Closed
1 of 2 tasks
Assignees
Labels

Comments

@jyotsa09
Copy link

jyotsa09 commented Sep 6, 2021

Apache Airflow version

2.2.0

Operating System

Debian buster

Versions of Apache Airflow Providers

No response

Deployment

Astronomer

Deployment details

astro dev start

What happened

When I have tried to duplicate a connection first time it has created a connection with hello_copy1 and duplicating the same connection second time giving me an error of
Connection hello_copy2 can't be added. Integrity error, probably unique constraint.

What you expected to happen

It should create a connection with name hello_copy3 or the error message should be more user friendly.

How to reproduce

#15574 (comment)

Anything else

Suggested Error

Connection hello_copy2 can't be added because it already exists.

or
Change the number logic in _copyN so the new name will be always unique.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jyotsa09 jyotsa09 added area:core kind:bug This is a clearly a bug labels Sep 6, 2021
@boring-cyborg
Copy link

boring-cyborg bot commented Sep 6, 2021

Thanks for opening your first issue here! Be sure to follow the issue template!

@jyotsa09 jyotsa09 changed the title Duplicating the same connection twice gives **Integrity error, probably unique constraint** Duplicating the same connection twice gives *Integrity error, probably unique constraint* Sep 6, 2021
@jyotsa09 jyotsa09 changed the title Duplicating the same connection twice gives *Integrity error, probably unique constraint* Duplicating the same connection twice gives "Integrity error, probably unique constraint" Sep 6, 2021
@potiuk
Copy link
Member

potiuk commented Sep 6, 2021

Is it possible that you make a PR fixing it? Seems like simple issue to fix?

@potiuk
Copy link
Member

potiuk commented Sep 6, 2021

Airflow is build by the community - so anyone can make such fixes.

@subkanthi
Copy link
Contributor

@potiuk can I take this please.

@uranusjr
Copy link
Member

uranusjr commented Sep 8, 2021

Please do!

@subkanthi
Copy link
Contributor

It looks definitely easy to change the error message, to implement creating new connections with _copy[number+1] , we have to query the existing connections, not sure if thats the way to go.

Thoughts, @potiuk , @uranusjr

@potiuk
Copy link
Member

potiuk commented Sep 11, 2021

I think we have to query the connections anyway ;) so why not a short loop :)

@uranusjr
Copy link
Member

The problem with quering is it could be quite expensive to find the correct number if you have a ton of existing _copyX, the time complexity is polynomial. IMO it’s good enough to simply refuse to duplicate _copy1 if _copy2 already exists amd show an error message to tell the user to first rename _copy2 or duplicate _copy2 instead.

@BasPH
Copy link
Contributor

BasPH commented Sep 11, 2021

Copies will be created after a user action so I can't imagine ever having to query for e.g. 10000 connections to find N. In my opinion it's not very user-friendly to refuse copy2.

To mitigate querying the DB for every copy we could implement e.g. a search_connection method which queries filter(Connection.conn_id.startswith(search_id)) and find N in code?

@potiuk
Copy link
Member

potiuk commented Sep 11, 2021

Pragmatic proposal (which I mentioned in the PR) is to have for loop with 1-10 and try them in sequence (and fail if we already have 10 copies). I can imagine someone making 2-3 copies, but really if someone tries to create 10th copy from the UI, it can fail. I am mostly thinking about a user trying to have several variants of the same connection one after another, which is rather likely (but unlikely to scale up)

@subkanthi
Copy link
Contributor

My opinion also sides with @BasPH , to query with a regex pattern to find all the connections which has _copy[0-9], but should we add a column to indicate that it was a duplicate, what are the chances that the user will also create a connection with the same syntax

@uranusjr
Copy link
Member

The problem with a partial match is that LIKE queries (which is what I imagine startswith() would produce) can easily end up producing a table scan, so I’d be extremely wary to the proposal unless someone carefully analyses the situations for all database backends we support. A practical 0-9 (or 1-10, don’t remember what scheme we’re using right now) would work though since we can simply flatten the query as a series of ORs instead. That’s guaranteed to be quite efficient, so I’d be fine with that. The logic would be like

  1. When duplicating a connection, generate up to 10 possible ids for the duplication and query the db to find an empty slot by querying the 10 ids all at once.
  2. Loop through the query result to find the lowerest-numbered empty slot to use.
  3. If the query returns 10 results (i.e. no empty slots available), emit an error asking the user to rename one of the existing connections first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants