-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v.0.14.0 [AWS] External-DNS cannot remove records from 2 Route 53 hosted zones (InvalidChangeBatch: [The request contains an invalid set of changes]) #4241
Comments
I can confirm this was introduced with #3747 |
Can confirm, reverting to 0.13.6 addresses this issue, and |
@cilindrox Thank you for sharing that here. Can you confirm the use case that you are using is the same? 2 Hosted zones with similar names as (internal.dev.yourdomain.com & dev.yourdomain.com)? |
correct, several instances of the above ^ We deploy this with another |
I can confirm that the issue is also present for us since updating to the latest build. Reverting this to pre 0.14.0 fixed the issue. |
Thank you for reporting this |
@leonardocaylent I can try and add a test case I just need to know what records are in play and possibly more logs to know whats happening.
(if you are generating custom builds for testing, you could log external-dns/controller/controller.go Line 248 in 52460ba
Do you see a log message like:
|
Hi @cronik, here are the debugging logs for the 2 versions 0.13.6 and 0.14.0: At creation (0.13.6)(Success):
At removal (0.13.6)(Success):
At creation (0.14.0)(Success):
At removal (0.14.0)(Failure):
|
More information about the Delete Requests:
Failure on DELETE RECORDS (0.14.0) 1 ChangeResourceRecordSet call to AWS:
Seems like version 0.14.0 is grouping the 6 DELETES in the 2 batchs where they should be 3 DELETES per batch (3 records per hosted zone) |
Ack, I've seen this and will try to reproduce it and see if we can ship a fix. I was planning a release of the next version, I will consider this a showstopper if I manage to reproduce it. Will keep you posted, probably next week. |
Thank you!
|
@leonardocaylent I am not sure I understand what the desired behavior should be. I haven't worked with overlapping zones, so I may be confused on what you actually desire that it would happen. Can you make an example with what the behavior of overlapping zone was, is today and what you expect it to be? I would personally assume that we don't double write records to zones that overlap. |
@Raffo this is behavior of overlapping zones on all versions of external-dns: The hostname for the ingress is: The Route 53 Hosted Zones:
The 3 Records on Hosted Zone internal.sandbox.yourdomain.com:
The 3 Records on Hosted Zone sandbox.yourdomain.com:
Both records are identical on the 2 different hosted zones. The behavior of creating the Route53 records on all the hosted zones that finishes in sandbox.yourdomain.com is expected, the issue that started in 0.14.0 is that external-dns is not able to delete the records. If we would have another private or public zone that is called yourdomain.com, we probably would have another 3 records in that hosted zone also. It would be great if external-dns could know that we only want to create the records in internal.sandbox.yourdomain.com private hosted zone, but I believe for retro-compatibility and other users use cases, they may need to keep the behavior just as it is right now, fixing the grouping of the DELETEs on the ChangeResourceRecordSet api call |
@Raffo @cronik 1)Found the culprit of this issue: We could fix that doing something like this:
We will also add more granular Debug logs as they were super useful to fix this issue.
Waiting for thoughts/comments |
Behavior with the fix:
|
Adding more details on each file call: On Create at version 0.14.0 with the fix:
On Delete at version 0.14.0 with the fix:
On Create at version 0.14.0 without the fix: Same behavior (no changes) On Delete at version 0.14.0 without the fix:
It's creating two times the Route53 record so that should maybe also grouped by or fixed in another ticket |
@leonardocaylent please open a PR with the proposed fix. I would love to understand what is the impact of this change and it's hard to reason about it without a proposed code change. |
I'm not sure to agree with that 🤔 . => IMHO, the expected behavior should be to create and delete records only in |
@leonardocaylent Am I wrong to think there is an easy workaround ? I mean : if you run two different instances of external-dns, one per overlapping zone, then it may behave as (you) expect. |
@mloiseleur Maybe there is a confusion about what is "expected" and how external-dns was behaving with all the previous versions. The bug was reported since external-dns lost the ability of deleting Route53 records on multiple hosted zones with the same name, which wouldn't be needed if it's only created on the correct/best matching hosted zone(which is a feature that I guess is not on external-dns yet). |
@mloiseleur I considered doing something like that but it would be a huge impact for people that has more than 1 overlapping hosted zone, or more than 5 eks clusters. It would dramatically increase the number of pods or IaC code to mantain and they'd need to have different filters on each deployment. A possible solution is to "Feature Flag" the FilterEndpointsByOwnerId function and keep that as an optional between the previous behavior and the new one. What do you think about that? |
@mloiseleur Small update: there is a new commit on #4296 that is a good candidate to solve the issue without using feature flags |
What happened:
External-DNS pod can create records but cannot delete records from 2 different hosted zones since 0.14.0. This doesn't happen on 0.13.6
What you expected to happen:
External-DNS detects A & TXT records on 2 Hosted zones and can remove them without making the pod crash
On version 0.14.0:
level=error msg="Failure in zone internal.dev.mydomain.com. [Id: /hostedzone/<HOSTEDZONE1>] when submitting change batch: InvalidChangeBatch: [The request contains an invalid set of changes for a resource record set 'A ....
On version 0.13.6 and earlier:
How to reproduce it (as minimally and precisely as possible):
Create 2 Hosted Zones with overlapping names (internal.dev.yourdomain.com & dev.yourdomain.com)
Install External-DNS 0.14.0 on EKS
Create an ingress that the host is testapplication.internal.dev.yourdomain.com
Wait for external-dns to detect the changes
External-DNS will create the records correctly in the 2 hosted zones
Remove the ingress created
Wait for external-dns to detect the changes
Error will show up in the external-dns pod logs:
How to reproduce the expected/previous behaviour?:
Create 2 Hosted Zones with overlapping names (internal.dev.yourdomain.com & dev.yourdomain.com)
Install External-DNS 0.13.6 on EKS
Create an ingress that the host is testapplication.internal.dev.yourdomain.com
Wait for external-dns to detect the changes
External-DNS will create the records correctly in the 2 hosted zones
Remove the ingress created
Wait for external-dns to detect the changes
Success will show up in the external-dns pod logs:
Anything else we need to know?:
This is working fine for ingresses that uses only 1 hosted zone (it can be easily tested with the same ingress example using the host testapplication.dev.yourdomain.com)
Environment:
external-dns --version
): 0.14.0The text was updated successfully, but these errors were encountered: