-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control-service: add timeouts to shedlock's database operations #693
Conversation
We have experienced cases when operations against the database appear to hang indefinitely (usually after a connectivity issue with the database). As a result tasks that attempt to obtain a distributed lock, fail to start. This is particularly problematic for important tasks, such as the data job watching. This commit attempts to resolve this by specifying a query timeout to the Jdbc template used by shedlock to operate with the database. Link: https://dzone.com/articles/threads-stuck-in-javanetsocketinputstreamsocketrea Testing done: started the service locally and veried that the locks are operating normally with thevtimeout. Signed-off-by: Tsvetomir Palashki <[email protected]>
Isn't shedlock affected by generic db connectivity issues? maybe we need to address the JDBC connectivity recovery prior fine-tuning the shedlock that reuses same datasource configuration? That is found at Line 18 in 29aa26b
for example we install it the Pipelines Control Service using helm install ... --set database.jdbcUrl=$DATABASE_JDBC_URL https://github.com/vmware/versatile-data-kit/blob/main/projects/control-service/projects/helm_charts/pipelines-control-service/templates/deployment.yaml#L99 Maybe consider providing some defaults/configdocs for:
maybe
I see 3 database configurations only, |
As far as I know, only the Shedlock is affected (@tpalashki correct me if I am wrong), I would recommend merging this PR and tweaking the datasource in a separate one. |
Judging by the thread dump, those are the threads to have appeared to be hung:
All of them are at the reading of the query result:
All of them are using the HikariCP configuration, which should supposedly be taking care (?) of reviving the connections according to its defaults: https://github.com/brettwooldridge/HikariCP |
We have experienced cases when operations against the database appear to hang
indefinitely (usually after a connectivity issue with the database). As a result,
tasks that attempt to obtain a distributed lock, fail to start. This is
particularly problematic for important tasks, such as the data job watch.
This commit attempts to resolve this by specifying a query timeout to the
Jdbc template used by shedlock to operate with the database.
Link: https://dzone.com/articles/threads-stuck-in-javanetsocketinputstreamsocketrea
Testing done: started the service locally and verified that the locks are
operating normally with the timeout.
Signed-off-by: Tsvetomir Palashki [email protected]