Skip to content
This repository has been archived by the owner on Jan 28, 2022. It is now read-only.

Sl/105 run refresh #118

Merged
merged 2 commits into from
Dec 12, 2019
Merged

Conversation

stuartleeks
Copy link
Collaborator

@stuartleeks stuartleeks commented Nov 20, 2019

Fixes #105

To address the issue with not being able to reconcile runs(#105) this PR removes the check that propagated an error that was reported as part of the run state from the API to be a reconciler error. WIth this PR the reconciler now completes successfully and the run status is reported in the status for the run CRD instance (and availble via kubectl describe run myrun

@Azadehkhojandi
Copy link
Contributor

Can you please cherry-pick your changes related to #105 and send PR separately?

@Azadehkhojandi Azadehkhojandi self-requested a review November 25, 2019 09:14
@stuartleeks stuartleeks force-pushed the sl/105-run-refresh branch 3 times, most recently from e8657ef to 43cfd55 Compare November 26, 2019 08:43
@stuartleeks
Copy link
Collaborator Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@Azadehkhojandi Azadehkhojandi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's catch up about this issue and see how we can add tests

@stuartleeks
Copy link
Collaborator Author

I've rebased this on master and re-tested following the steps outlined in the PR

$ kubectl get run
NAME         AGE     RUNID   STATE
run-sample   2m23s   15      PENDING

After a while, the job is in the error state:

export JOB_ID=15 # taken from kubectl get run output above
curl -H "Authorization: Bearer $DATABRICKS_TOKEN" $DATABRICKS_HOST/api/2.0/jobs/runs/get?run_id=$JOB_ID
{
  "job_id": 24,
  "run_id": 15,
  "number_in_job": 1,
  "state": {
    "life_cycle_state": "INTERNAL_ERROR",
    "result_state": "FAILED",
    "state_message": "Library installation failed for library jar: \"dbfs:/my-jar.jar\"\n. Error messages:\njava.lang.Throwable: java.io.FileNotFoundException: dbfs:/my-jar.jar"
  },
// rest omitted for brevity

Once the operator has next refreshed the INTERNAL_ERROR status is reflected in the run status:

$ kubectl get run
NAME         AGE   RUNID   STATE
run-sample   10m   15      INTERNAL_ERROR

Observing the operator logs shows that the status is still being reconciled every 30 seconds.

@Azadehkhojandi Azadehkhojandi merged commit 282291b into Azure:master Dec 12, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run CRD reports incorrect state information when Run fails in DataBricks
2 participants