Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface service-controller LB provisioning failures through status #245

Merged
merged 1 commit into from
Jun 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 3 additions & 7 deletions manifests/00-cluster-role.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# Cluster role for the operator itself.
# TODO: A lot of these should be replaced by roles in the namespaces for which
# the operator actually needs access.
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
Expand All @@ -14,15 +16,9 @@ rules:
- services
- secrets
- pods
verbs:
- "*"

- apiGroups:
- ""
resources:
- events
verbs:
- create
- "*"

- apiGroups:
- apps
Expand Down
23 changes: 13 additions & 10 deletions pkg/operator/controller/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,13 +45,12 @@ var log = logf.Logger.WithName("controller")
//
// The controller will be pre-configured to watch for IngressController resources
// in the manager namespace.
func New(mgr manager.Manager, statusCache *ingressStatusCache, config Config) (controller.Controller, error) {
func New(mgr manager.Manager, config Config) (controller.Controller, error) {
reconciler := &reconciler{
Config: config,
client: mgr.GetClient(),
cache: mgr.GetCache(),
recorder: mgr.GetEventRecorderFor("operator-controller"),
statusCache: statusCache,
Config: config,
client: mgr.GetClient(),
cache: mgr.GetCache(),
recorder: mgr.GetEventRecorderFor("operator-controller"),
}
c, err := controller.New("operator-controller", mgr, controller.Options{Reconciler: reconciler})
if err != nil {
Expand Down Expand Up @@ -106,8 +105,6 @@ type reconciler struct {
client client.Client
cache cache.Cache
recorder record.EventRecorder

statusCache *ingressStatusCache
}

// Reconcile expects request to refer to a ingresscontroller in the operator
Expand Down Expand Up @@ -401,7 +398,8 @@ func (r *reconciler) ensureIngressController(ci *operatorv1.IngressController, d
Controller: &trueVar,
}

if lbService, err := r.ensureLoadBalancerService(ci, deploymentRef, infraConfig); err != nil {
lbService, err := r.ensureLoadBalancerService(ci, deploymentRef, infraConfig)
if err != nil {
errs = append(errs, fmt.Errorf("failed to ensure load balancer service for %s: %v", ci.Name, err))
} else if lbService != nil {
if err := r.ensureDNS(ci, lbService, dnsConfig); err != nil {
Expand All @@ -415,7 +413,12 @@ func (r *reconciler) ensureIngressController(ci *operatorv1.IngressController, d
errs = append(errs, fmt.Errorf("failed to integrate metrics with openshift-monitoring for ingresscontroller %s: %v", ci.Name, err))
}

if err := r.syncIngressControllerStatus(deployment, ci); err != nil {
operandEvents := &corev1.EventList{}
if err := r.cache.List(context.TODO(), operandEvents, client.InNamespace("openshift-ingress")); err != nil {
errs = append(errs, fmt.Errorf("failed to list events in namespace %q: %v", "openshift-ingress", err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"openshift-ingress"lbService.Namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so this is really subtle and dangerous, but lbService can be nil. Would love to separately take some action on our prior discussions about:

  1. result types vs. nil for dealing with things that are not found
  2. coming up with a strategy for dealing with references that doesn't involve hard-coded namespaces or inferring from random resources

Basically I wonder if there's any local fix right here (where's the canonical place to discover the operand namespace in this context?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, snap, good point! Well, does it make sense to look for events if the service does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup isn't dependent on the service, and the lookup is from a cache, so we could choose to provide whatever context we can (consistent with the other possibly-nil inputs).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service doesn't exist, then computeLoadBalancerStatus won't even look at the events, and logically, why would it? Do we anticipate that we might with future changes care about events that are not related to the service?

Copy link
Contributor Author

@ironcladlou ironcladlou Jun 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the service doesn't exist, then computeLoadBalancerStatus won't even look at the events

computeLoadBalancerStatus won't, but this function doesn't know that. This function just knows computeLoadBalancerStatus wants events in a namespace. I propose this contract:

// syncIngressControllerStatus takes whatever you give it and updates status.
// Give it whatever you have and it will do its best.
func (r *reconciler) syncIngressControllerStatus(ic *operatorv1.IngressController, deployment *appsv1.Deployment, service *corev1.Service, operandEvents []corev1.Event) error {

😁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable... or would it be simpler for syncIngressControllerStatus to look up events itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could perhaps ask the same question regarding all the other arguments... for now unless there's a logic error maybe we can continue the style the discussion on subsequent PRs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could perhaps ask the same question regarding all the other arguments...

No, syncIngressControllerStatus needs to know the ingress controller for which it is updating status, and ensureIngressController just created or got the deployment and service, which are sound reasons for ensureIngressController to pass those values to syncIngressControllerStatus, whereas listing events in ensureIngressController instead of in syncIngressControllerStatus gratuitously separates the logic of listing events from the logic that determines whether the events need to be listed.

That is not to say that your current approach is unacceptable; the above is only responding to the above-quoted assertion.

That said, if ensureIngressController does handling listing events but gets an error, how about passing a nil slice to syncIngressControllerStatus?

for now unless there's a logic error maybe we can continue the style the discussion on subsequent PRs?

That's fine.

}

if err := r.syncIngressControllerStatus(ci, deployment, lbService, operandEvents.Items); err != nil {
errs = append(errs, fmt.Errorf("failed to sync ingresscontroller status: %v", err))
}
}
Expand Down
11 changes: 2 additions & 9 deletions pkg/operator/controller/controller_lb.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ import (

"k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/types"
utilerrors "k8s.io/apimachinery/pkg/util/errors"
)

Expand Down Expand Up @@ -51,12 +50,6 @@ func (r *reconciler) ensureLoadBalancerService(ci *operatorv1.IngressController,
return currentLBService, nil
}

// TODO: This should take operator config into account so that the operand
// namespace isn't hard-coded.
func loadBalancerServiceName(ci *operatorv1.IngressController) types.NamespacedName {
return types.NamespacedName{Namespace: "openshift-ingress", Name: "router-" + ci.Name}
}

// desiredLoadBalancerService returns the desired LB service for a
// ingresscontroller, or nil if an LB service isn't desired. An LB service is
// desired if the high availability type is Cloud. An LB service will declare an
Expand All @@ -67,7 +60,7 @@ func desiredLoadBalancerService(ci *operatorv1.IngressController, deploymentRef
}
service := manifests.LoadBalancerService()

name := loadBalancerServiceName(ci)
name := LoadBalancerServiceName(ci)

service.Namespace = name.Namespace
service.Name = name.Name
Expand Down Expand Up @@ -95,7 +88,7 @@ func desiredLoadBalancerService(ci *operatorv1.IngressController, deploymentRef
// ingresscontroller.
func (r *reconciler) currentLoadBalancerService(ci *operatorv1.IngressController) (*corev1.Service, error) {
service := &corev1.Service{}
if err := r.client.Get(context.TODO(), loadBalancerServiceName(ci), service); err != nil {
if err := r.client.Get(context.TODO(), LoadBalancerServiceName(ci), service); err != nil {
if errors.IsNotFound(err) {
return nil, nil
}
Expand Down
174 changes: 50 additions & 124 deletions pkg/operator/controller/ingress_status.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,24 +7,17 @@ import (
"github.com/google/go-cmp/cmp"
"github.com/google/go-cmp/cmp/cmpopts"

"github.com/openshift/cluster-ingress-operator/pkg/manifests"

operatorv1 "github.com/openshift/api/operator/v1"

appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"

"k8s.io/apimachinery/pkg/api/meta"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"

"sigs.k8s.io/controller-runtime/pkg/cache"
"sigs.k8s.io/controller-runtime/pkg/client"
)

// syncIngressControllerStatus computes the current status of ic and
// updates status upon any changes since last sync.
func (r *reconciler) syncIngressControllerStatus(deployment *appsv1.Deployment, ic *operatorv1.IngressController) error {
func (r *reconciler) syncIngressControllerStatus(ic *operatorv1.IngressController, deployment *appsv1.Deployment, service *corev1.Service, operandEvents []corev1.Event) error {
selector, err := metav1.LabelSelectorAsSelector(deployment.Spec.Selector)
if err != nil {
return fmt.Errorf("deployment has invalid spec.selector: %v", err)
Expand All @@ -36,7 +29,7 @@ func (r *reconciler) syncIngressControllerStatus(deployment *appsv1.Deployment,

updated.Status.Conditions = []operatorv1.OperatorCondition{}
updated.Status.Conditions = append(updated.Status.Conditions, computeIngressStatusConditions(updated.Status.Conditions, deployment)...)
updated.Status.Conditions = append(updated.Status.Conditions, r.statusCache.computeLoadBalancerStatus(ic)...)
updated.Status.Conditions = append(updated.Status.Conditions, computeLoadBalancerStatus(ic, service, operandEvents)...)

for i := range updated.Status.Conditions {
newCondition := &updated.Status.Conditions[i]
Expand Down Expand Up @@ -127,119 +120,17 @@ func ingressStatusesEqual(a, b operatorv1.IngressControllerStatus) bool {
return true
}

func isProvisioned(service *corev1.Service) bool {
ingresses := service.Status.LoadBalancer.Ingress
return len(ingresses) > 0 && (len(ingresses[0].Hostname) > 0 || len(ingresses[0].IP) > 0)
}

func isPending(service *corev1.Service) bool {
return !isProvisioned(service)
}

func getServiceOwnerIfMatches(service *corev1.Service, matches func(*corev1.Service) bool) []string {
if !matches(service) {
return []string{}
}
controller, ok := service.Labels[manifests.OwningIngressControllerLabel]
if !ok {
return []string{}
}
return []string{controller}
}

func indexLoadBalancerControllerByName(obj runtime.Object) []string {
c, ok := obj.(*operatorv1.IngressController)
if !ok {
return []string{}
}
if c.Status.EndpointPublishingStrategy != nil &&
c.Status.EndpointPublishingStrategy.Type == operatorv1.LoadBalancerServiceStrategyType {
return []string{c.Name}
}
return []string{}
}

func indexProvisionedLoadBalancerServiceByOwner(obj runtime.Object) []string {
service, ok := obj.(*corev1.Service)
if !ok {
return []string{}
}
if service.Spec.Type != corev1.ServiceTypeLoadBalancer {
return []string{}
}
return getServiceOwnerIfMatches(service, isProvisioned)
}

func indexPendingLoadBalancerServiceByOwner(obj runtime.Object) []string {
service, ok := obj.(*corev1.Service)
if !ok {
return []string{}
}
if service.Spec.Type != corev1.ServiceTypeLoadBalancer {
return []string{}
}
return getServiceOwnerIfMatches(service, isPending)
}

// ingressStatusCache knows how to compute status for ingress controllers by
// querying indexes caches.
type ingressStatusCache struct {
// contains returns true if there are >0 matches on value for the named index
// for the given kind.
//
// listKind must be the List type for the object being indexed.
contains func(listKind runtime.Object, name, value string) bool
}

// serviceIndexers are all the Service indexes supported by the cache.
var serviceIndexers = map[string]client.IndexerFunc{
"pending-for": indexPendingLoadBalancerServiceByOwner,
"provisioned-for": indexProvisionedLoadBalancerServiceByOwner,
}

// controllerIndexers are all the IngressController indexes supported by the
// cache.
var controllerIndexers = map[string]client.IndexerFunc{
"wants-load-balancer": indexLoadBalancerControllerByName,
}

// cacheContains is a contains fuction which knows how to query a cache.Cache.
func cacheContains(cache cache.Cache, list runtime.Object, name, value string) bool {
err := cache.List(context.TODO(), list, client.MatchingField(name, value))
if err != nil {
return false
}
// TODO: after rebase, replace with:
// meta.LenList(list) > 0
items, _ := meta.ExtractList(list)
return len(items) > 0
}

func NewIngressStatusCache(c cache.Cache) *ingressStatusCache {
add := func(cache cache.Cache, kind runtime.Object, indexers map[string]client.IndexerFunc) {
for name := range indexers {
cache.IndexField(kind, name, indexers[name])
}
}
add(c, &operatorv1.IngressController{}, controllerIndexers)
add(c, &corev1.Service{}, serviceIndexers)
return &ingressStatusCache{
contains: func(kind runtime.Object, name, value string) bool {
return cacheContains(c, kind, name, value)
},
}
}

// computeLoadBalancerStatus returns the complete set of current
// LoadBalancer-prefixed conditions for the given ingress controller.
func (c *ingressStatusCache) computeLoadBalancerStatus(ic *operatorv1.IngressController) []operatorv1.OperatorCondition {
if !c.contains(&operatorv1.IngressControllerList{}, "wants-load-balancer", ic.Name) {
func computeLoadBalancerStatus(ic *operatorv1.IngressController, service *corev1.Service, operandEvents []corev1.Event) []operatorv1.OperatorCondition {
if ic.Status.EndpointPublishingStrategy == nil ||
ic.Status.EndpointPublishingStrategy.Type != operatorv1.LoadBalancerServiceStrategyType {
return []operatorv1.OperatorCondition{
{
Type: operatorv1.LoadBalancerManagedIngressConditionType,
Status: operatorv1.ConditionFalse,
Reason: "UnsupportedEndpointPublishingStrategy",
Message: fmt.Sprintf("The %s endpoint publishing strategy does not support a load balancer", ic.Status.EndpointPublishingStrategy.Type),
Message: fmt.Sprintf("The endpoint publishing strategy does not support a load balancer"),
},
}
}
Expand All @@ -249,33 +140,68 @@ func (c *ingressStatusCache) computeLoadBalancerStatus(ic *operatorv1.IngressCon
conditions = append(conditions, operatorv1.OperatorCondition{
Type: operatorv1.LoadBalancerManagedIngressConditionType,
Status: operatorv1.ConditionTrue,
Reason: "HasLoadBalancerEndpointPublishingStrategy",
Message: "IngressController has LoadBalancer endpoint publishing strategy",
Reason: "WantedByEndpointPublishingStrategy",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I threw this in for discussion. Reflecting on the k8s API guidelines I wonder if this condition type should have been "LoadBalancerUnmanaged=True" — too late!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition types should indicate state in the “abnormal-true” polarity. For example, if the condition indicates when a policy is invalid, the “is valid” case is probably the norm, so the condition should be called “Invalid”.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition types should indicate state in the “abnormal-true” polarity.

We already violate that guideline with Available.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on the Reason I'm proposing here? I can revert if the old one is better (and am open to new suggestions).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No preference. I'm not sure a reason is strictly necessary for the "normal" state, but what you have is fine.

Message: "The endpoint publishing strategy supports a managed load balancer",
})

switch {
case c.contains(&corev1.ServiceList{}, "pending-for", ic.Name):
case service == nil:
conditions = append(conditions, operatorv1.OperatorCondition{
Type: operatorv1.LoadBalancerReadyIngressConditionType,
Status: operatorv1.ConditionFalse,
Reason: "LoadBalancerPending",
Message: "The LoadBalancer service is pending",
Reason: "ServiceNotFound",
Message: "The LoadBalancer service resource is missing",
})
case c.contains(&corev1.ServiceList{}, "provisioned-for", ic.Name):
case isProvisioned(service):
conditions = append(conditions, operatorv1.OperatorCondition{
Type: operatorv1.LoadBalancerReadyIngressConditionType,
Status: operatorv1.ConditionTrue,
Reason: "LoadBalancerProvisioned",
Message: "The LoadBalancer service is provisioned",
})
default:
case isPending(service):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this the default case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah if "Pending" is the default case, does the current default ConditionUnknown get removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It is dead code anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and removed the default case as it's unreachable, but I left isProvisioned and isPending. I believe isProvisioned and default (replacing isPending) would be a functional alternative, but even so I guessed having explicitly named cases would aid readability. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine. I wonder whether the compiler is smart enough (and the language semantics are flexible enough) to optimize the isPending call out.

reason := "LoadBalancerPending"
message := "The LoadBalancer service is pending"

// Try and find a more specific reason for for the pending status.
createFailedReason := "CreatingLoadBalancerFailed"
failedLoadBalancerEvents := getEventsByReason(operandEvents, "service-controller", createFailedReason)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps these should be sorted by time, descending?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be useful once we check for more than 1 event type, but otherwise it is not strictly necessary since we ignore events once the LB is provisioned.

for _, event := range failedLoadBalancerEvents {
involved := event.InvolvedObject
if involved.Kind == "Service" && involved.Namespace == service.Namespace && involved.Name == service.Name {
reason = "CreatingLoadBalancerFailed"
message = fmt.Sprintf("The %s component is reporting CreatingLoadBalancerFailed events like: %s\n%s",
event.Source.Component, event.Message, "The kube-controller-manager logs may contain more details.")
break
}
}
conditions = append(conditions, operatorv1.OperatorCondition{
Type: operatorv1.LoadBalancerReadyIngressConditionType,
Status: operatorv1.ConditionFalse,
Reason: "LoadBalancerNotFound",
Message: "The LoadBalancer service resource is missing",
Reason: reason,
Message: message,
})
}

return conditions
}

func isProvisioned(service *corev1.Service) bool {
ingresses := service.Status.LoadBalancer.Ingress
return len(ingresses) > 0 && (len(ingresses[0].Hostname) > 0 || len(ingresses[0].IP) > 0)
}

func isPending(service *corev1.Service) bool {
return !isProvisioned(service)
}

func getEventsByReason(events []corev1.Event, component, reason string) []corev1.Event {
filtered := []corev1.Event{}
for i := range events {
event := events[i]
if event.Source.Component == component && event.Reason == reason {
filtered = append(filtered, event)
}
}
return filtered
}
Loading