Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control-service: killed job was shown as successful #2103

Closed

Conversation

mivanov1988
Copy link
Collaborator

Why

We recently got the following feedback from our internal client: A data job was listed as successful even though it hit the 12 hour limit and was killed; the logs do not show that either - the last entry in the log just shows the last object that was sent for ingestion, but there is no summary of the data job.

The problem is caused by the following fix - #1586.

When the job hit the 12-hour limit the K8S Pod is terminated and we construct partial JobExecutionStatus which enters in the following if statement and returns Optional.empty() rather than the constructed object.

As a result, this job execution becomes stuck in the Running status until it is detected by emergency logic, which marks such executions as successful due to the lack of associated Pods to them.

What

Added validation for an already completed job in a more appropriate place.

Testing Done

Added integration test

Signed-off-by: Miroslav Ivanov [email protected]

mivanov1988 and others added 2 commits May 22, 2023 15:47
Why
We recently got the following feedback from our internal client:
A data job was listed as successful even though it hit the 12 hour limit and was killed; the logs do not show that either - the last entry in the log just shows the last object that was sent for ingestion, but there is no summary of the data job.

The problem is caused by the following fix - #1586.

When the job hit the 12-hour limit the K8S Pod is terminated and we construct partial JobExecutionStatus which enters in the following if statement and returns Optional.empty() rather than the constructed object.

https://github.com/vmware/versatile-data-kit/blob/4763ba877f43b270fbd4770bc1533216f7c5d618/projects/control-service/projects/pipelines_control_service/src/main/java/com/vmware/taurus/service/KubernetesService.java#L1656

As a result, this job execution becomes stuck in the Running status until it is detected by emergency logic, which marks such executions as successful due to the lack of associated Pods to them.

What
Added validation for an already completed job in a more appropriate place.

Testing Done
Added integration test

Signed-off-by: Miroslav Ivanov [email protected]
mivanov1988 and others added 4 commits May 22, 2023 15:57
<!--pre-commit.ci start-->
updates:
- https://github.com/asottile/reorder_python_importshttps://github.com/asottile/reorder-python-imports
- [github.com/asottile/pyupgrade: v3.3.1 →
v3.4.0](asottile/pyupgrade@v3.3.1...v3.4.0)
- [github.com/pre-commit/mirrors-prettier: v3.0.0-alpha.6 →
v3.0.0-alpha.9-for-vscode](pre-commit/mirrors-prettier@v3.0.0-alpha.6...v3.0.0-alpha.9-for-vscode)
<- simply republished package; no need
<!--pre-commit.ci end-->

---------

Signed-off-by: ivakoleva <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ivakoleva <[email protected]>
DeltaMichael and others added 10 commits May 22, 2023 16:50
## Why

A recent test run in CI broke with an error related to the staticmethod
decorator
https://gitlab.com/vmware-analytics/versatile-data-kit/-/jobs/4325250349

The latest heartbeat changes were tested on python3.11 They turned out
not to be compatible with python 3.7

## What

Remove the @staticmethod decorator from the test instantiation method
(we're not using this method outside the runner)
Rename config memeber field of test classes to _config

## How was this tested

Locally, in virtual environment using python 3.7

## What kind of change is this

Bugfix

Signed-off-by: Dilyan Marinov <[email protected]>
…2.472 in /projects/control-service/projects/pipelines_control_service (#2111)

Bumps
[com.amazonaws:aws-java-sdk-sts](https://github.com/aws/aws-sdk-java)
from 1.12.468 to 1.12.472.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/aws/aws-sdk-java/blob/master/CHANGELOG.md">com.amazonaws:aws-java-sdk-sts's
changelog</a>.</em></p>
<blockquote>
<h1><strong>1.12.472</strong> <strong>2023-05-19</strong></h1>
<h2><strong>AWS Backup</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>Add ResourceArn, ResourceType, and BackupVaultName to
ListRecoveryPointsByLegalHold API response.</li>
</ul>
</li>
</ul>
<h2><strong>AWS Elemental MediaPackage v2</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>Adds support for the MediaPackage Live v2 API</li>
</ul>
</li>
</ul>
<h2><strong>Amazon Connect Cases</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>This release adds the ability to create fields with type Url through
the CreateField API. For more information see <a
href="https://docs.aws.amazon.com/cases/latest/APIReference/Welcome.html">https://docs.aws.amazon.com/cases/latest/APIReference/Welcome.html</a></li>
</ul>
</li>
</ul>
<h2><strong>Amazon Simple Email Service</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>This release allows customers to update scaling mode property of
dedicated IP pools with PutDedicatedIpPoolScalingAttributes call.</li>
</ul>
</li>
</ul>
<h1><strong>1.12.471</strong> <strong>2023-05-18</strong></h1>
<h2><strong>AWS CloudTrail</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>Add ConflictException to PutEventSelectors, add
(Channel/EDS)ARNInvalidException to Tag APIs. These exceptions provide
customers with more specific error messages instead of internal
errors.</li>
</ul>
</li>
</ul>
<h2><strong>AWS Compute Optimizer</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>In this launch, we add support for showing integration status with
external metric providers such as Instana, Datadog ...etc in
GetEC2InstanceRecommendations and ExportEC2InstanceRecommendations
apis</li>
</ul>
</li>
</ul>
<h2><strong>AWS Elemental MediaConvert</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>This release introduces a new MXF Profile for XDCAM which is
strictly compliant with the SMPTE RDD 9 standard and improved handling
of output name modifiers.</li>
</ul>
</li>
</ul>
<h2><strong>AWS Security Token Service</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>API updates for the AWS Security Token Service</li>
</ul>
</li>
</ul>
<h2><strong>Amazon Athena</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>Removing SparkProperties from EngineConfiguration object for
StartSession API call</li>
</ul>
</li>
</ul>
<h2><strong>Amazon Connect Service</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>You can programmatically create and manage prompts using APIs, for
example, to extract prompts stored within Amazon Connect and add them to
your Amazon S3 bucket. AWS CloudTrail, AWS CloudFormation and tagging
are supported.</li>
</ul>
</li>
</ul>
<h2><strong>Amazon EC2 Container Service</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>Documentation only release to address various tickets.</li>
</ul>
</li>
</ul>
<h2><strong>Amazon Elastic Compute Cloud</strong></h2>
<ul>
<li>
<h3>Features</h3>
<ul>
<li>Add support for i4g.large, i4g.xlarge, i4g.2xlarge, i4g.4xlarge,
i4g.8xlarge and i4g.16xlarge instances powered by AWS Graviton2
processors that deliver up to 15% better compute performance than our
other storage-optimized instances.</li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/2e98b99f26af3e82b36431b46f03b277b678c7ae"><code>2e98b99</code></a>
AWS SDK for Java 1.12.472</li>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/209a868639f959c497f4f64099c1b3fd5b8dc624"><code>209a868</code></a>
Update GitHub version number to 1.12.472-SNAPSHOT</li>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/f9f380828de6200b076b9d728873a508a61d847d"><code>f9f3808</code></a>
AWS SDK for Java 1.12.471</li>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/56c257574afcd973f9a4c7935a82927fe5726c0c"><code>56c2575</code></a>
Update GitHub version number to 1.12.471-SNAPSHOT</li>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/eda03ac5e368d3cfbb63324b7d1305bf8ae858f2"><code>eda03ac</code></a>
AWS SDK for Java 1.12.470</li>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/031a2c783caa27e18d4ea83fca4f91c014096c59"><code>031a2c7</code></a>
Update GitHub version number to 1.12.470-SNAPSHOT</li>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/b4fa3d162884584caed370859c8b44e26396d4c7"><code>b4fa3d1</code></a>
AWS SDK for Java 1.12.469</li>
<li><a
href="https://github.com/aws/aws-sdk-java/commit/96a0c2f8f11842c7e58498dbf294173fe2356221"><code>96a0c2f</code></a>
Update GitHub version number to 1.12.469-SNAPSHOT</li>
<li>See full diff in <a
href="https://github.com/aws/aws-sdk-java/compare/1.12.468...1.12.472">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=com.amazonaws:aws-java-sdk-sts&package-manager=gradle&previous-version=1.12.468&new-version=1.12.472)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
… github.com:vmware/versatile-data-kit into person/miroslavi/killed-job-was-shown-as-successful
@mivanov1988 mivanov1988 deleted the person/miroslavi/killed-job-was-shown-as-successful branch May 23, 2023 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants