You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Symptoms include timeouts on api queries, causing keymanweb.com, help.keyman.com, keyman.com, meaning that keymanweb in particular was failing to load, e.g. with heavy queries such as:
Looking at the cluster, api.keyman.com database pod was showing 100% CPU utilization for last 2 days:
Unclear why the cpu would be spiking at that point. No evidence of changes to database, or spike in api.keyman.com visits. Memory+disk show a spike that starts hours after the high cpu starts. So that's a bit weird too, but may be SQL Server resource management?
Mitigation
Restarted the pod. Resolved the immediate issue.
Will continue to monitor.
Additional actions
Monitor
We should have an alert setup for persistent high cpu (e.g. >10 minutes at >.9 CPU avg?)
Symptoms
Symptoms include timeouts on api queries, causing keymanweb.com, help.keyman.com, keyman.com, meaning that keymanweb in particular was failing to load, e.g. with heavy queries such as:
https://api.keyman.com/cloud/4.0/keyboards?jsonp=keyman.register&languageidtype=bcp47&version=17.0&keyboardid=khmer_angkor,basic_kbdkni
Reported also by @LornaSIL this morning:
Diagnostics
Looking at the cluster, api.keyman.com database pod was showing 100% CPU utilization for last 2 days:
Unclear why the cpu would be spiking at that point. No evidence of changes to database, or spike in api.keyman.com visits. Memory+disk show a spike that starts hours after the high cpu starts. So that's a bit weird too, but may be SQL Server resource management?
Mitigation
Additional actions
cc @darcywong00 @tim-eves
The text was updated successfully, but these errors were encountered: