-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdk-meta-jobs: implement a limit on starting jobs at once #1681
Conversation
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
for more information, see https://pre-commit.ci
ec827d4
to
5742cde
Compare
Please link the github issue that you are addressing in this PR |
Good user experience is very important. The current solution moves the burden to the user to make sure they setup their dag. It is much better if we handle it for them. If we automatically submit at max N jobs to start concurrently (regardless of how much has been setup). Option A) Implementing a job type of queue logic : A job queue can help us manage the order in which jobs are executed. We can set a maximum number of jobs that can be added to the queue at any given time, and once this limit is reached, new jobs will be queued until there is the capacity to run them. The current solution, which involves raising an error and not allowing users to set up a DAG with a certain number of concurrent jobs, can be a good solution in cases where it is important to enforce strict limits on the system but we do not have such requirements. This type of discussion is better done in the issue before the PR is open (though it's ok if it's here but it will slow down the PR review). I strongly recommend outlying your ideas in the github issue first. |
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
Btw the correct prefix in the commit mesage is just the name of the plugin We use the titles to auto-generate the release notes: https://github.com/vmware/versatile-data-kit/releases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit hard to understand what the code does. Just by looking at it and without having tried to run it, it should do the following:
- Sort the DAG topologically
- Get each level of the sorted DAG
- If a level exceeds the max limit, raise an error
If this is correct, there should be a good library to use without having to implement it yourself. In case you still decide to implement it yourself, I suggest you break the code down into simple pieces, so it's more understandable, e.g. have one function that sorts, one that gets the levels and one that checks the length. This is a good approach regardless of how you handle the levels that are over the limit (raising an Error, @tozka's suggestion or something else)
Signed-off-by: Yoan Salambashev <[email protected]>
We met face-to-face (zoom to zoom) with @yonitoo and agreed on an approach where the MetaDag class ensures that there are at most N (configurable) concurrent jobs running by delaying the start of a new job if it exceeds the limit. We will reuse the already-built delay job functionality. We also realized that the current validation approach would not work because it doesn't account for the fact that job duration could be different, and job start can be delayed for different reasons. As a result, it's possible that parts of the DAG may be ahead of other parts. |
de1b2d8
to
8960e23
Compare
for more information, see https://pre-commit.ci
Thanks a lot for the feedback! @tozka @ivakoleva @DeltaMichael |
Actually, I just saw that @tozka already wrote the update, sorry for the spam. |
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
Kubernetes, service accounts are used by pods to authenticate with the Kubernetes API server and access resources in the cluster. Each pod is assigned a service account, and by default, the service account token is mounted in the pod's file system and used for authentication. However, not all pods need to access the Kubernetes API server or cluster resources, and in some cases, including the service account token can pose a security risk. For example, if a pod is compromised or runs untrusted code (which data job generally do), an attacker could use the service account token to escalate privileges and access sensitive resources in the cluster. To mitigate this risk, Kubernetes provides the `automountServiceAccountToken` property, which can be used to disable the automatic mounting of the service account token in pods. By setting this property to false, the service account token is not included by default, and the pod does not have access to the token unless it is manually mounted in a volume. It is a best practice to follow the principle of least privilege when designing and configuring Kubernetes workloads. By disabling the automatic mounting of the service account token, we can reduce the attack surface of our pods (and hence data jobs) and minimize the risk of compromise or data breaches. Testing Done: existing integration tests cover that the main functionality will not be broken by setting automountServiceAccountToken to false. Signed-off-by: Antoni Ivanov <[email protected]>
… in /projects/control-service/projects/pipelines_control_service (#1673) Bumps [io.micrometer:micrometer-core](https://github.com/micrometer-metrics/micrometer) from 1.10.3 to 1.10.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/micrometer-metrics/micrometer/releases">io.micrometer:micrometer-core's releases</a>.</em></p> <blockquote> <h2>1.10.4</h2> <h2>:star: New Features / Enhancements</h2> <ul> <li>Use Meter.Id for logging in DynatraceExporterV2 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3616">#3616</a></li> </ul> <h2>:lady_beetle: Bug Fixes</h2> <ul> <li>Fixing Scope setting and resetting via ObservationThreadLocalAccessor <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3613">#3613</a></li> <li>ObservationThreadLocalAccessor fixes <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3603">#3603</a></li> <li>LogbackMetrics does not protect against StackOverflowError in arbitrary Counter implementations <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3623">#3623</a></li> <li>[Dynatrace v2] Ensure synchronization when taking snapshots <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3615">#3615</a></li> <li>Statsd UDS protocol does not work on linux ARM64 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3605">#3605</a></li> <li>Hazelcast Cache Metrics, "put" operation count incorrect <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3556">#3556</a></li> </ul> <h2>:hammer: Dependency Upgrades</h2> <ul> <li>Upgrade netty to 4.1.89 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3642">#3642</a></li> <li>Upgrade context-propagation to 1.0.2 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3638">#3638</a></li> <li>Upgrade Testcontainers to 1.17.6 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3641">#3641</a></li> <li>Upgrade JUnit to 5.9.2, AssertJ to 3.24.2, Mockito to 4.11.0, Archunit to 1.0.1 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3640">#3640</a></li> <li>Upgrade Dropwizard Metrics to 4.2.16 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3639">#3639</a></li> <li>Upgrade to wavefront-sdk-java 3.0.4 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3636">#3636</a></li> <li>Upgrade to reactor-bom 2020.0.27 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3634">#3634</a></li> <li>Upgrade signalfx-java to 1.0.28 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3633">#3633</a></li> <li>Update aws-java-sdk-cloudwatch to 1.12.405 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3632">#3632</a></li> <li>Bump com.gradle.enterprise from 3.12.2 to 3.12.3 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3609">#3609</a></li> <li>Update Elasticsearch Docker image versions to the latest in integration tests <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3595">#3595</a></li> <li>Update samples to use Spring Boot 2.7.8 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3593">#3593</a></li> </ul> <h2>:memo: Tasks</h2> <ul> <li>Remove snapshot repository for 1.10.4 <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3637">#3637</a></li> <li>Fix deprecation warnings from Spotless <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3630">#3630</a></li> <li>Revert Java to 18 on CircleCI for 1.10.x and above <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3612">#3612</a></li> <li>Set dependabot target-branch to 1.9.x <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3610">#3610</a></li> <li>Clean up compile warnings in micrometer-core <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3604">#3604</a></li> <li>Add circleci config file to the circleci cache key <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/issues/3611">#3611</a></li> <li>Add tests for OtlpConfig.resourceAttributes() <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3608">#3608</a></li> <li>[Dynatrace registry v1] Use header instead of query parameter <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3596">#3596</a></li> <li>Remove broken link in StackdriverSample <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3591">#3591</a></li> <li>Update CircleCI images to the latest <a href="https://github-redirect.dependabot.com/micrometer-metrics/micrometer/pull/3550">#3550</a></li> </ul> <h2>:heart: Contributors</h2> <p>Thank you to all the contributors who worked on this release:</p> <p><a href="https://github.com/izeye"><code>@izeye</code></a>, <a href="https://github.com/pirgeo"><code>@pirgeo</code></a>, and <a href="https://github.com/taer"><code>@taer</code></a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/3eee5ea299d550d7735a53c7257a231b427b1a18"><code>3eee5ea</code></a> Upgrade netty to 4.1.89</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/9061a070325d658fd82b1b8509a502d862b8f603"><code>9061a07</code></a> Merge branch '1.9.x' into 1.10.x</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/2c237a1a91456c470999f35f97baee6e112baaaf"><code>2c237a1</code></a> Upgrade Testcontainers to 1.17.6</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/65ba079bd00ff65219dbd2414cd833db3fcee91c"><code>65ba079</code></a> Upgrade JUnit to 5.9.2, AssertJ to 3.24.2, Mockito to 4.11.0, Archunit to 1.0.1</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/7f7b5534208a8c5821d94cebf2499b21b38d759a"><code>7f7b553</code></a> Upgrade Dropwizard Metrics to 4.2.16</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/ce4819cc89579481ed3730dc2be763badb4459eb"><code>ce4819c</code></a> Upgrade context-propagation to 1.0.2</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/03484181150de292f29a2ed1a0a6039e5c96d80f"><code>0348418</code></a> Remove snapshot repository for 1.10.4</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/ea2e72eb85a93f7abf21e8786f0e9a16dbce7d65"><code>ea2e72e</code></a> Merge branch '1.9.x' into 1.10.x</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/c11349500e41196e9b25a04364fda7b76677ba17"><code>c113495</code></a> Upgrade to wavefront-sdk-java 3.0.4</li> <li><a href="https://github.com/micrometer-metrics/micrometer/commit/9a27e86c257a40a0962a98cf69b56d956db0183a"><code>9a27e86</code></a> Upgrade to reactor-bom 2020.0.27</li> <li>Additional commits viewable in <a href="https://github.com/micrometer-metrics/micrometer/compare/v1.10.3...v1.10.4">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Antoni Ivanov <[email protected]>
Automatically merge Dependabot PRs. Dependabot will wait until all your status checks pass before merging.
Add license header
…projects/control-service/projects/pipelines_control_service (#1653) Bumps [net.bytebuddy:byte-buddy](https://github.com/raphw/byte-buddy) from 1.13.0 to 1.14.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/raphw/byte-buddy/releases">net.bytebuddy:byte-buddy's releases</a>.</em></p> <blockquote> <h2>Byte Buddy 1.14.0</h2> <ul> <li>Add <code>Step.Factory.ForDelegation</code> in <code>MemberSubstitution</code> that allows for delegation similar to <code>MethodDelegation</code> but in-code.</li> <li>Add handlers for <code>MethodDelegation</code> and <code>Advice</code> that leverage method handles for field access and self-invocation.</li> <li>Add <code>Step.Factory</code> for type assignment that allows casting the return value from a previous step to another type.</li> <li>Avoid usage of <code>URL</code> class loader as it is deprecated, and use newer method if available.</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/raphw/byte-buddy/blob/master/release-notes.md">net.bytebuddy:byte-buddy's changelog</a>.</em></p> <blockquote> <h3>18. February 2023: version 1.14.0</h3> <ul> <li>Add <code>Step.Factory.ForDelegation</code> in <code>MemberSubstitution</code> that allows for delegation similar to <code>MethodDelegation</code> but in-code.</li> <li>Add handlers for <code>MethodDelegation</code> and <code>Advice</code> that leverage method handles for field access and self-invocation.</li> <li>Add <code>Step.Factory</code> for type assignment that allows casting the return value from a previous step to another type.</li> <li>Avoid usage of <code>URL</code> class loader as it is deprecated, and use newer method if available.</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/raphw/byte-buddy/commit/00da41ec542a0cee1029b500522bdfd8145e34da"><code>00da41e</code></a> [maven-release-plugin] prepare release byte-buddy-1.14.0</li> <li><a href="https://github.com/raphw/byte-buddy/commit/c601a7886aadcc7cc7965e4ab15ecd4702417310"><code>c601a78</code></a> [release] Release new version</li> <li><a href="https://github.com/raphw/byte-buddy/commit/102ee254402c22e7730f74b9808c5550966945c5"><code>102ee25</code></a> Avoid test execution on Java 7.</li> <li><a href="https://github.com/raphw/byte-buddy/commit/3b5f4dcb8a15e969a2b4befa9a4d850ff7f494ad"><code>3b5f4dc</code></a> Remove unused packages.</li> <li><a href="https://github.com/raphw/byte-buddy/commit/b42f78b2a5de74abe32180058f06c75a4469a21b"><code>b42f78b</code></a> Add scheme part to URI.</li> <li><a href="https://github.com/raphw/byte-buddy/commit/d94d66a18c320cdd1b8e951a8a1dae1a2745d78a"><code>d94d66a</code></a> Add scheme part to URI.</li> <li><a href="https://github.com/raphw/byte-buddy/commit/aaa18f29c901949d7109bff2372ce0e282df1171"><code>aaa18f2</code></a> Adjust expectation for dynamic lambda invocation.</li> <li><a href="https://github.com/raphw/byte-buddy/commit/0f3bd5f2e6af453110dd1289fbc90664b81027fb"><code>0f3bd5f</code></a> Add hints for spotbugs.</li> <li><a href="https://github.com/raphw/byte-buddy/commit/c2242c7f3c0b91e502cf3584aafb917c4457ece3"><code>c2242c7</code></a> fix javadoc.</li> <li><a href="https://github.com/raphw/byte-buddy/commit/0a2e4ea5d20340869f65c323f867c19c4ea09ecb"><code>0a2e4ea</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/raphw/byte-buddy/issues/1407">#1407</a> from capthua/master</li> <li>Additional commits viewable in <a href="https://github.com/raphw/byte-buddy/compare/byte-buddy-1.13.0...byte-buddy-1.14.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Antoni Ivanov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…1684) We need a formatting pre-commit hook, to cover the new tech stack added with the Frontend component. Adding and configured a pre-commit hook. Testing done: tested locally; ci/cd --------- Signed-off-by: ivakoleva <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
for more information, see https://pre-commit.ci
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Yoan Salambashev <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
Improving your code reading abilities is essential for becoming a better programmer. So let me offer you some tips which I've found useful for me and I hope you find useful as well:
|
Signed-off-by: Yoan Salambashev <[email protected]>
for more information, see https://pre-commit.ci
Thanks for the tips, @tozka! Appreciate them! |
Signed-off-by: Yoan Salambashev <[email protected]>
for more information, see https://pre-commit.ci
… checks Signed-off-by: Yoan Salambashev <[email protected]>
for more information, see https://pre-commit.ci
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
projects/vdk-plugins/vdk-meta-jobs/src/vdk/plugin/meta_jobs/meta_dag.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Assuming the tests pass let's merge it.
Signed-off-by: Yoan Salambashev <[email protected]>
Signed-off-by: Yoan Salambashev <[email protected]>
for more information, see https://pre-commit.ci
What:
Create a check that checks for every job if there are more than a configurable amount of concurrent starting jobs (15 by default) in a Meta Jobs DAG. If so, the ones that exceed the limit would be delayed.
Why: We want to limit the number of simultaneously starting Meta jobs in order not to overload the SC
Tests: unit test is introduced
Signed-off-by: Yoan Salambashev [email protected]