Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: [WIP] Add Graphite metrics provider to address issue #1403 #1404

Closed
wants to merge 20 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
9ea97c6
feat: add support for Graphite metrics provider
mdb Jul 24, 2021
d4b4ba6
improve testability via API interface and mock
mdb Jul 26, 2021
1a0124b
add test for AnalysisPhaseFailed scenario
mdb Aug 1, 2021
32e96f9
add graphite.Provider#Type test
mdb Aug 1, 2021
f2d75b0
add graphite.APIClient#Query test
mdb Aug 2, 2021
a6c2261
chore: github release action was using incorect docker cache (#1387)
jessesuen Aug 2, 2021
7486254
chore: release workflow docker build context should use local path an…
jessesuen Aug 3, 2021
6980947
chore: Raname variables, import pkg for clarification (#1313)
huikang Aug 3, 2021
6b66e9d
feat: configurable and more aggressive cleanup of old AnalysisRuns an…
perenesenko Aug 5, 2021
91be160
fix: retarget blue-green previewService before scaling up preview Rep…
jessesuen Aug 6, 2021
d07c8a8
docs: add custom namespace name tips (#1354)
dabaooline Aug 6, 2021
e33bd17
test invalid Graphite JSON response scenario
mdb Aug 8, 2021
c4aa543
test Graphite error response code scenario
mdb Aug 8, 2021
3820303
Graphite provider test exercising evaluation error
mdb Aug 9, 2021
1bc173d
add provider test exercising Graphite query error
mdb Aug 9, 2021
5efba24
add Graphite provider #Resume and #Terminate tests
mdb Aug 9, 2021
e5ff86b
test Graphite Provider #Run measurement error
mdb Aug 9, 2021
971d12b
add test exercising invalid query string
mdb Aug 10, 2021
8513292
test if Graphite returns a data point with 1 item
mdb Aug 10, 2021
f107673
test various Graphite JSON unmarshaling scenarios
mdb Aug 10, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 11 additions & 3 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,10 @@ jobs:
with:
ref: ${{ github.event.inputs.tag }}

- name: Get SHA
id: get-sha
run: echo "::set-output name=sha::$(git log -1 --format='%H')"

- name: Set up QEMU
uses: docker/setup-qemu-action@v1

Expand All @@ -27,9 +31,7 @@ jobs:
uses: actions/cache@v2
with:
path: /tmp/.buildx-cache
key: ${{ runner.os }}-buildx-${{ github.sha }}
restore-keys: |
${{ runner.os }}-buildx-
key: ${{ runner.os }}-buildx-${{ steps.get-sha.outputs.sha }}

- name: Print Disk Usage
run: |
Expand All @@ -45,6 +47,8 @@ jobs:
ghcr.io/argoproj/argo-rollouts
tags: |
type=semver,pattern={{version}},prefix=v,value=${{ github.event.inputs.tag }}
flavor: |
latest=false

- name: Docker meta (plugin)
id: plugin-meta
Expand All @@ -55,6 +59,8 @@ jobs:
ghcr.io/argoproj/kubectl-argo-rollouts
tags: |
type=semver,pattern={{version}},prefix=v,value=${{ github.event.inputs.tag }}
flavor: |
latest=false

- name: Login to GitHub Container Registry
if: github.event_name != 'pull_request'
Expand All @@ -75,6 +81,7 @@ jobs:
- name: Build and push (controller-image)
uses: docker/build-push-action@v2
with:
context: .
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.controller-meta.outputs.tags }}
Expand All @@ -84,6 +91,7 @@ jobs:
- name: Build and push (plugin-image)
uses: docker/build-push-action@v2
with:
context: .
target: kubectl-argo-rollouts
platforms: linux/amd64,linux/arm64
push: true
Expand Down
4 changes: 2 additions & 2 deletions cmd/rollouts-controller/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ func newCommand() *cobra.Command {

kubeClient, err := kubernetes.NewForConfig(config)
checkError(err)
rolloutClient, err := clientset.NewForConfig(config)
argoprojClient, err := clientset.NewForConfig(config)
checkError(err)
dynamicClient, err := dynamic.NewForConfig(config)
checkError(err)
Expand Down Expand Up @@ -150,7 +150,7 @@ func newCommand() *cobra.Command {
cm := controller.NewManager(
namespace,
kubeClient,
rolloutClient,
argoprojClient,
dynamicClient,
smiClient,
discoveryClient,
Expand Down
12 changes: 6 additions & 6 deletions controller/controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ import (
"fmt"
"time"

"github.com/argoproj/notifications-engine/pkg/api"
"github.com/argoproj/notifications-engine/pkg/controller"
notificationapi "github.com/argoproj/notifications-engine/pkg/api"
notificationcontroller "github.com/argoproj/notifications-engine/pkg/controller"
"github.com/pkg/errors"
smiclientset "github.com/servicemeshinterface/smi-sdk-go/pkg/gen/client/split/clientset/versioned"
log "github.com/sirupsen/logrus"
Expand Down Expand Up @@ -72,7 +72,7 @@ type Manager struct {
analysisController *analysis.Controller
serviceController *service.Controller
ingressController *ingress.Controller
notificationsController controller.NotificationController
notificationsController notificationcontroller.NotificationController

rolloutSynced cache.InformerSynced
experimentSynced cache.InformerSynced
Expand Down Expand Up @@ -148,10 +148,10 @@ func NewManager(
ingressWorkqueue := workqueue.NewNamedRateLimitingQueue(queue.DefaultArgoRolloutsRateLimiter(), "Ingresses")

refResolver := rollout.NewInformerBasedWorkloadRefResolver(namespace, dynamicclientset, discoveryClient, argoprojclientset, rolloutsInformer.Informer())
apiFactory := api.NewFactory(record.NewAPIFactorySettings(), defaults.Namespace(), secretInformer.Informer(), configMapInformer.Informer())
apiFactory := notificationapi.NewFactory(record.NewAPIFactorySettings(), defaults.Namespace(), secretInformer.Informer(), configMapInformer.Informer())
recorder := record.NewEventRecorder(kubeclientset, metrics.MetricRolloutEventsTotal, apiFactory)
notificationsController := controller.NewController(dynamicclientset.Resource(v1alpha1.RolloutGVR), rolloutsInformer.Informer(), apiFactory,
controller.WithToUnstructured(func(obj metav1.Object) (*unstructured.Unstructured, error) {
notificationsController := notificationcontroller.NewController(dynamicclientset.Resource(v1alpha1.RolloutGVR), rolloutsInformer.Informer(), apiFactory,
notificationcontroller.WithToUnstructured(func(obj metav1.Object) (*unstructured.Unstructured, error) {
data, err := json.Marshal(obj)
if err != nil {
return nil, err
Expand Down
29 changes: 20 additions & 9 deletions docs/features/bluegreen.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,25 @@ spec:
scaleDownDelayRevisionLimit: *int32
```

## Sequence of Events

The following describes the sequence of events that happen during a blue-green update.

1. Beginning at a fully promoted, steady-state, a revision 1 ReplicaSet is pointed to by both the `activeService` and `previewService`.
1. A user initiates an update by modifying the pod template (`spec.template.spec`).
1. The revision 2 ReplicaSet is created with size 0.
1. The preview service is modified to point to the revision 2 ReplicaSet. The `activeService` remains pointing to revision 1.
1. The revision 2 ReplicaSet is scaled to either `spec.replicas` or `previewReplicaCount` if set.
1. Once revision 2 ReplicaSet Pods are fully available, `prePromotionAnalysis` begins.
1. Upon success of `prePromotionAnalysis`, the blue/green pauses if `autoPromotionEnabled` is false, or `autoPromotionSeconds` is non-zero.
1. The rollout is resumed either manually by a user, or automatically by surpassing `autoPromotionSeconds`.
1. The revision 2 ReplicaSet is scaled to the `spec.replicas`, if the `previewReplicaCount` feature was used.
1. The rollout "promotes" the revision 2 ReplicaSet by updating the `activeService` to point to it. At this point, there are no services pointing to revision 1
1. `postPromotionAnalysis` analysis begins
1. Once `postPromotionAnalysis` completes successfully, the update is successful and the revision 2 ReplicaSet is marked as stable. The rollout is considered fully-promoted.
1. After waiting `scaleDownDelaySeconds` (default 30 seconds), the revision 1 ReplicaSet is scaled down


### autoPromotionEnabled
The AutoPromotionEnabled will make the rollout automatically promote the new ReplicaSet to the active service once the new ReplicaSet is healthy. This field is defaulted to true if it is not specified.

Expand Down Expand Up @@ -111,15 +130,6 @@ This feature is used to provide an endpoint that can be used to test a new versi

Defaults to an empty string

Here is a timeline of how the active and preview services work (if you use a preview service):

1. During the Initial deployment there is only one ReplicaSet. Both active and preview services point to it. This is the **old** version of the application.
1. A change happens in the Rollout resource. A new ReplicaSet is created. This is the **new** version of the application. The preview service is modified to point to the new ReplicaSet. The active service still points to the old version.
1. The blue/green deployment is "promoted". Both active and preview services are pointing to the new version. The old version is still there but no service is pointing at it.
1. Once the the blue/green deployment is scaled down (see the `scaleDownDelaySeconds` field) the old ReplicaSet is has 0 replicas and we are back to the initial state. Both active and preview services point to the new version (which is the only one present anyway)



### previewReplicaCount
The PreviewReplicaCount field will indicate the number of replicas that the new version of an application should run. Once the application is ready to promote to the active service, the controller will scale the new ReplicaSet to the value of the `spec.replicas`. The rollout will not switch over the active service to the new ReplicaSet until it matches the `spec.replicas` count.

Expand All @@ -136,3 +146,4 @@ Defaults to 30
The ScaleDownDelayRevisionLimit limits the number of old active ReplicaSets to keep scaled up while they wait for the scaleDownDelay to pass after being removed from the active service.

If omitted, all ReplicaSets will be retained for the specified scaleDownDelay

8 changes: 8 additions & 0 deletions docs/features/specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,14 @@ spec:
# Number of desired pods.
# Defaults to 1.
replicas: 5
analysis:
# limits the number of successful analysis runs and experiments to be stored in a history
# Defaults to 5.
successfulRunHistoryLimit: 10
# limits the number of unsuccessful analysis runs and experiments to be stored in a history.
# Stages for unsuccessful: "Error", "Failed", "Inconclusive"
# Defaults to 5.
unsuccessfulRunHistoryLimit: 10

# Label selector for pods. Existing ReplicaSets whose pods are selected by
# this will be the ones affected by this rollout. It must match the pod
Expand Down
3 changes: 3 additions & 0 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/rele

This will create a new namespace, `argo-rollouts`, where Argo Rollouts controller will run.

!!! tip
If you are using another namspace name, please update `install.yall` clusterrolebinding's serviceaccount namespace name.

!!! tip
When installing Argo Rollouts on Kubernetes v1.14 or lower, the CRD manifests must be kubectl applied with the --validate=false option. This is caused by use of new CRD fields introduced in v1.15, which are rejected by default in lower API servers.

Expand Down
3 changes: 2 additions & 1 deletion examples/analysis-templates.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ spec:
args: [exit 0]
restartPolicy: Never
backoffLimit: 0

count: 1
---
# This AnalysisTemplate will run a Kubernetes Job every 5 seconds, with a 50% chance of failure.
# When the number of accumulated failures exceeds failureLimit, it will cause the analysis run to
Expand All @@ -36,6 +36,7 @@ metadata:
spec:
metrics:
- name: random-fail
count: 2
interval: 5s
failureLimit: 1
provider:
Expand Down
9 changes: 9 additions & 0 deletions manifests/crds/rollout-crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,15 @@ spec:
type: object
spec:
properties:
analysis:
properties:
successfulRunHistoryLimit:
format: int32
type: integer
unsuccessfulRunHistoryLimit:
format: int32
type: integer
type: object
minReadySeconds:
format: int32
type: integer
Expand Down
9 changes: 9 additions & 0 deletions manifests/install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9745,6 +9745,15 @@ spec:
type: object
spec:
properties:
analysis:
properties:
successfulRunHistoryLimit:
format: int32
type: integer
unsuccessfulRunHistoryLimit:
format: int32
type: integer
type: object
minReadySeconds:
format: int32
type: integer
Expand Down
9 changes: 9 additions & 0 deletions manifests/namespace-install.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9745,6 +9745,15 @@ spec:
type: object
spec:
properties:
analysis:
properties:
successfulRunHistoryLimit:
format: int32
type: integer
unsuccessfulRunHistoryLimit:
format: int32
type: integer
type: object
minReadySeconds:
format: int32
type: integer
Expand Down
Loading