Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V1.0.0 scale down is never happening because msgsRecieved is not updated #98

Closed
AmeerAssi opened this issue Jul 11, 2020 · 4 comments · Fixed by #99
Closed

V1.0.0 scale down is never happening because msgsRecieved is not updated #98

AmeerAssi opened this issue Jul 11, 2020 · 4 comments · Fixed by #99
Assignees
Labels
bug Something isn't working

Comments

@AmeerAssi
Copy link

I am testing auto scaler version 1.0.0 as I see it was lately released. experiencing this behavior:
after having scale up, and the work finished, scale down does not happen.
looking on my queue in AWS console I see its empty without messages in flights for more than 20 mins:
image
here is monitor status for the queue, where you can see that all the message handling had finished before 22:00:
image

when looking on the auto-scaler logs I see that the replicas are not scaled down because there are received messages:
two snapshots of times in logs, where time differences more than 20 mins (the cache according to the documentation in the code should be 1 min)
image

image

it looks like the msgsReceived cache is never refreshed.

Pod describe info:
Name: workerpodautoscaler-57fc6bf9d9-225db
Namespace: kube-system
Priority: 1000
Priority Class Name: infra-normal-priority
Node: ip-192-168-127-142.us-east-2.compute.internal/192.168.127.142
Start Time: Sun, 12 Jul 2020 00:41:20 +0300
Labels: app=workerpodautoscaler
pod-template-hash=57fc6bf9d9
Annotations: kubernetes.io/psp: eks.privileged
Status: Running
IP: 192.168.126.38
Controlled By: ReplicaSet/workerpodautoscaler-57fc6bf9d9
Containers:
wpa:
Container ID: docker://4898ad92c38baed27d84a0f206ee60b85f0b149526142a2abfd956dccc676069
Image: practodev/workerpodautoscaler:v1.0.0
Image ID: docker-pullable://practodev/workerpodautoscaler@sha256:2bdcaa251e2a2654e73121721589ac5bb8536fbeebc2b7a356d24199ced84e73
Port:
Host Port:
Command:
/workerpodautoscaler
run
--resync-period=60
--wpa-threads=10
--aws-regions=us-east-2
--sqs-short-poll-interval=20
--sqs-long-poll-interval=20
--wpa-default-max-disruption=0
State: Running
Started: Sun, 12 Jul 2020 00:41:22 +0300
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 100Mi
Requests:
cpu: 10m
memory: 20Mi
Environment Variables from:
workerpodautoscaler-secret-env Secret Optional: false
Environment:
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from workerpodautoscaler-token-j8lvc (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
workerpodautoscaler-token-j8lvc:
Type: Secret (a volume populated by a Secret)
SecretName: workerpodautoscaler-token-j8lvc
Optional: false
QoS Class: Burstable
Node-Selectors: beta.kubernetes.io/os=linux
Tolerations: :NoExecute
:NoSchedule
Events:
Type Reason Age From Message


Normal Scheduled 45m default-scheduler Successfully assigned kube-system/workerpodautoscaler-57fc6bf9d9-225db to ip-192-168-127-142.us-east-2.compute.internal
Normal Pulling 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Pulling image "practodev/workerpodautoscaler:v1.0.0"
Normal Pulled 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Successfully pulled image "practodev/workerpodautoscaler:v1.0.0"
Normal Created 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Created container wpa
Normal Started 45m kubelet, ip-192-168-127-142.us-east-2.compute.internal Started container wpa

WPA deployment:
apiVersion: k8s.practo.dev/v1alpha1
kind: WorkerPodAutoScaler
metadata:
creationTimestamp: "2020-01-28T14:59:16Z"
generation: 5316
name: processor-ip4m
namespace: default
resourceVersion: "52253623"
selfLink: /apis/k8s.practo.dev/v1alpha1/namespaces/default/workerpodautoscalers/processor-ip4m
uid: c111ba43-41de-11ea-b4d5-066ce59a32e8
spec:
deploymentName: processor-ip4m
maxDisruption: null
maxReplicas: 80
minReplicas: 1
queueURI: **************
secondsToProcessOneJob: 10
targetMessagesPerWorker: 720
status:
CurrentMessages: 0
CurrentReplicas: 31
DesiredReplicas: 31

@alok87 alok87 added the bug Something isn't working label Jul 12, 2020
@alok87 alok87 self-assigned this Jul 12, 2020
alok87 added a commit that referenced this issue Jul 12, 2020
…urrent time

This is done to solve #98. Since this was not initialized, its value was zero from cache to behave in a abnormal manner
@alok87
Copy link
Contributor

alok87 commented Jul 12, 2020

Thanks for reporting this. Working on the fix #99

@alok87
Copy link
Contributor

alok87 commented Jul 12, 2020

@AmeerAssi

We have not released it in Github release yet. But have pushed the following docker images for use:

pushed: practodev/workerpodautoscaler:v1.0.0-21-gfdb7dcd
pushed: practodev/workerpodautoscaler:v1.0
pushed: practodev/workerpodautoscaler:v1

Please try it out and let us know if the issue gets fixed for you!
Thanks again for reporting this. 👍

@agconti
Copy link

agconti commented Jul 13, 2020

I ran into this same issue. Updating to v1.0 from v.1.0.0 worked for me.

@alok87
Copy link
Contributor

alok87 commented Jul 14, 2020

Yes this was a major issue, planning to release this soon in Github latest release as v1.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants