-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix planning for multi-cluster dual stack record types #3747
Fix planning for multi-cluster dual stack record types #3747
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the right approach. It is not AAAA that is special, it is CNAME that is special per RFC 1034 3.6.2.
This approach is likely to stop working once #3605 lands.
To address this issue, the planner will need to resolve conflicts between CNAME and other types.
@johngmyers thanks for the review and the additional RFC context. If I read between the lines in your comment, it sounds like you agree there is an issue with the current planner implementation, but my solution needs to be refactored around CNAME rather than a configurable set "dual-stack-record-types". |
That is correct. My apologies, I intended that to be the text, not the subtext. |
ea36062
to
572c341
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem correct. The conflict resolver doesn't appear to be examining the record types of the current and candidates. A CNAME should conflict with records of any type, but no two records of different types, neither type CNAME, should conflict with each other.
With one key exception: a CNAME that is an alias should not conflict with any records of different type.
plan/plan.go
Outdated
// dog.com | [->1.1.1.2] | | = delete (dog.com [-> 1.1.1.2]) | ||
// -------------------------------------------------------------- | ||
// cat.com | [->::1, ->1.1.1.3] | [->1.1.1.3] | = update old (cat.com [-> ::1, -> 1.1.1.3]) new (cat.com [-> 1.1.1.3]) | ||
// -------------------------------------------------------------- | ||
// big.com | [->1.1.1.4] | [->ing.elb.com] | = update old (big.com [-> 1.1.1.4]) new (big.com [-> ing.elb.com]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spacing is off.
plan/plan.go
Outdated
if len(row.current) > 0 && len(row.candidates) == 0 { | ||
for _, current := range row.current { | ||
changes.Delete = append(changes.Delete, current) | ||
} | ||
} | ||
|
||
// TODO: allows record type change, which might not be supported by all dns providers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this comment.
Perhaps CNAME conflicts should be resolved as a post-processing step? For each row, if the type is not CNAME and there is a corresponding CNAME row with >1 candidate, save the row keyed by domain and identifier. Then for each saved <domain, identifier> tuple, determine whether the CNAME wins or one of the other records wins. Depending on who wins, delete either the candidates for the CNAME or the candidates for all the other types. |
572c341
to
c5047c0
Compare
@johngmyers I took another cut at this. In order for the planner to correctly handle record type changes I had to introduce a owned record filter to the planner, sourced from the registry. This filter allows the planner to add create records only when the controller has a clear ownership claim on it. With this filter available the planning became much easier and the change sets more accurate. With respect to the conflicting record types, I wasn't sure how to best resolve conflicts when the sources produce conflicting desired record types. I ended up punting on this by skipping any updates and logging a warning. |
c5047c0
to
1e251b7
Compare
47b5fda
to
03379e4
Compare
Moving I suspect all the conflict resolution code should be pushed down into the
Conflicts between CNAME and other record types should be resolved in favor of what exists in |
@johngmyers I don't follow the request to only move Regarding the interface, I extracted
I think these are covered in the unit test cases I added (See
I will add a new |
The With the noop registry, the controller owns all records returned by the provider. We might need to modify the noop registry to add ownership labels to any returned records lacking them. Or we carve out a mechanism to tell the planner it shouldn't care about lack of ownership.
The missing test case is if there is an existing unowned SRV and a candidate A, then the planner should create the A. There is no conflict in that case. |
The filter interface pattern is getting refactored out, I get that now. But I'm still not clear exactly what you have in mind as a replacement. Since I'm not aware of all these refactorings or designs in the works, can you please be more specific in how you would like both the planner and the registries to deal with determining ownership? Should the planner be configured with the ownerID? If so, where should that come from? The controller doesn't have that information, it's currently only stored as a private field in the registry. If you layout the flow of information I can try to execute that rather than through trial and error.
external-dns/pkg/apis/externaldns/types.go Line 437 in bc61d4d
|
|
When AAAA multi-target / dual stack support was added via kubernetes-sigs#2461 it broke ownership of domains across different clusters with different ingress records types. For example if 2 clusters manage the same zone, 1 cluster uses A records and the other uses CNAME records, when each record type is treated as a separate planning record, it will cause ownership to bounce back and forth and records to be constantly created and deleted. This change updates the planner to keep track of multiple current records for a domain. This allows for A and AAAA records to exist for a domain while allowing record type changes. The planner will ignore desired records for a domain that represent conflicting record types allowed by RFC 1034 3.6.2. For example if the desired records for a domain contains a CNAME record plus any other record type no changes for that domain will be planned. The planner now contains an owned record filter provided by the registry. This allows the planner to accurately plan create updates when there are record type changes between the current and desired endpoints. Without this filter the planner could add create changes for domains not owned by the controller.
When AAAA multi-target / dual stack support was added via kubernetes-sigs#2461 it broke ownership of domains across different clusters with different ingress records types. For example if 2 clusters manage the same zone, 1 cluster uses A records and the other uses CNAME records, when each record type is treated as a separate planning record, it will cause ownership to bounce back and forth and records to be constantly created and deleted. This change updates the planner to keep track of multiple current records for a domain. This allows for A and AAAA records to exist for a domain while allowing record type changes. The planner will ignore desired records for a domain that represent conflicting record types allowed by RFC 1034 3.6.2. For example if the desired records for a domain contains a CNAME record plus any other record type no changes for that domain will be planned. The planner now contains an owned record filter provided by the registry. This allows the planner to accurately plan create updates when there are record type changes between the current and desired endpoints. Without this filter the planner could add create changes for domains not owned by the controller.
8a56a39
to
635da46
Compare
There's one special case where a CNAME record doesn't conflict with other record types: when there is a ProviderSpecificProperty |
I don't really understand this special case. Can I just ignore CNAME endpoints with "alias" specific property "true" when search for conflicts? |
@Raffo @szuecs @mloiseleur I'd like opinions as to how we should handle this problem: This PR is trying to update the planner to deal with the fact that several providers enforce the RFC 1034 3.6.2 restriction against there being both CNAME and other record types for the same domain. So it's resolving the conflict between CNAME and other record types for the same domain. (Prior to #2461 there couldn't be multiple record types for the same domain.) The problem is with AWS alias records. An AWS alias record of type A is represented to the planner as a record of type CNAME with an I see two ways of fixing this:
The problem with (2) is that existing alias records of type A will have new-format TXT registry ownership records registered with the type "cname". So we would also have to add special-case code to the TXT registry to do this mapping. I think it's worth doing (2) as that will eventually lead to simpler code, especially if we migrate off of the current "new format" TXT records. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All minor stuff; we're getting close.
plan/conflict.go
Outdated
|
||
// conflict was found, remove candiates of non-preferred record types | ||
if cname && other { | ||
log.Warnf("Domain %s contains conflicting record type candidates, keeping CNAME record", key.dnsName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warn is for deprecations; please use Infof
The "keeping CNAME record" part of the log message is no longer correct.
plan/conflict.go
Outdated
records := map[string]*domainEndpoints{} | ||
for recordType, recs := range row.records { | ||
if recordType != endpoint.RecordTypeCNAME { | ||
// policy is to prefer the A and AAAA record type when a conflict is found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// policy is to prefer the A and AAAA record type when a conflict is found | |
// policy is to prefer the non-CNAME record types when a conflict is found |
plan/conflict.go
Outdated
log.Warnf("Domain %s contains conflicting record type candidates, keeping CNAME record", key.dnsName) | ||
records := map[string]*domainEndpoints{} | ||
for recordType, recs := range row.records { | ||
if recordType != endpoint.RecordTypeCNAME { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer to invert the condition of the if statement, putting the CNAME case first.
plan/plan.go
Outdated
type planTableRow struct { | ||
current *endpoint.Endpoint | ||
// current corresponds to the records currently occupying dns name on the dns provider. More than 1 record may |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// current corresponds to the records currently occupying dns name on the dns provider. More than 1 record may | |
// current corresponds to the records currently occupying dns name on the dns provider. More than one record may |
plan/plan.go
Outdated
type planTableRow struct { | ||
current *endpoint.Endpoint | ||
// current corresponds to the records currently occupying dns name on the dns provider. More than 1 record may | ||
// be represented here, for example A and AAAA. If current domain record is CNAME, no other record types are allowed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// be represented here, for example A and AAAA. If current domain record is CNAME, no other record types are allowed | |
// be represented here: for example A and AAAA. If the current domain record is a CNAME, no other record types are allowed |
UpdateNew: endpoint.FilterEndpointsByOwnerID(im.ownerID, changes.UpdateNew), | ||
UpdateOld: endpoint.FilterEndpointsByOwnerID(im.ownerID, changes.UpdateOld), | ||
Delete: endpoint.FilterEndpointsByOwnerID(im.ownerID, changes.Delete), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the filtering in the registry still needed now that the planner is filtering by OwnerID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One might argue that the registry shouldn't assume the state of the arguments provided by the caller since technically plan.Changes
is mutable. But practically speaking the controller is the only caller (except for tests) so it's probably a safe assumption. Would suggest making this optimization in a separate PR since it will likely require a significant changeset to cleanup the registry tests.
9cc5da4
to
4c8b5dc
Compare
plan/conflict.go
Outdated
|
||
// conflict was found, remove candiates of non-preferred record types | ||
if cname && other { | ||
log.Infof("Domain %s contains conflicting record type candidates, discarding CNAME record", key.dnsName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.Infof("Domain %s contains conflicting record type candidates, discarding CNAME record", key.dnsName) | |
log.Infof("Domain %s contains conflicting record type candidates; discarding CNAME record", key.dnsName) |
/lgtm |
endpoint.Targets.Less gives priority to A and AAAA over CNAME records so the plan record type conflict resolver should also prefer A/AAAA.
4c8b5dc
to
55ae13e
Compare
/lgtm |
/assign @Raffo |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: johngmyers The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
still seeing an error with chart version 6.28.0 and app version 0.13.6. Getting the error below: time="2023-11-01T13:29:42Z" level=fatal msg="googleapi: Error 400: The resource record set 'entity.change.additions[www.beta.domain.com.][A]' is invalid because the DNS name 'www.beta.domain.com.' has a resource record set of the type 'CNAME'. A DNS name may have either one CNAME resource record set or resource record sets of other types, but not both., cnameResourceRecordSetConflict" www. is CNAMED to a cloudfront distro outside of external-dns |
@leonardocaylent Took a quick look at where the error is occurring and the logs from your issue and it seems to me like an issue with aws provider not batching planned changes correctly. The log message provided seems to cut off the full "InvalidChangeBatch" reason and I'm not familiar with aws route 53, certainly not the details of change batch rules. But I can see from the logs that it appears to batch both hosted zone 1 and 2 changes together while before there was a "successful update" message between the desired change messages. v0.14.0 contained changes to the AWS provider as well via #3910, maybe it's related to that given the working log messages attempt to adjust "cname-testapplication.internal.sandbox.yourdomain.com"? |
@cronik I compiled and created Docker Images from master on commits 030342f and 65db0c7 and I confirmed the one that introduced the change is this PR #3747 . This is AWS Cloudtrail for ChangeResourceRecordSets:
I don't know if this use case could be added to the test suite (since the behavior needs to be tested with the AWS API and cannot be faked) because it's really common and we should get coverage for that (if possible) |
When AAAA multi-target / dual stack support was
added via #2461 it broke ownership of domains across different clusters with different ingress records types.
For example if 2 clusters manage the same zone,
1 cluster uses A records and the other uses CNAME
records, when each record type is treated as a separate planning record, it will cause ownership to bounce back and forth and records to be constantly created and deleted.
Description
This change updates the planner to keep track of multiple
current records for a domain. This allows for A and AAAA
records to exist for a domain while allowing record type
changes.
The planner will ignore desired records for a domain that
represent conflicting record types allowed by RFC 1034 3.6.2.
For example if the desired records for a domain contains
a CNAME record plus any other record type no changes for
that domain will be planned.
The planner now contains an owned record filter provided
by the registry. This allows the planner to accurately plan
create updates when there are record type changes between
the current and desired endpoints. Without this filter the
planner could add create changes for domains not owned
by the controller.
Fixes #2461
Checklist