Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks CLI authenticates with azure-cli, but bundle deployment does not #1722

Closed
Pim-Mostert opened this issue Aug 27, 2024 · 6 comments · Fixed by #1734
Closed

Databricks CLI authenticates with azure-cli, but bundle deployment does not #1722

Pim-Mostert opened this issue Aug 27, 2024 · 6 comments · Fixed by #1734
Assignees
Labels
Bug Something isn't working DABs DABs related issues

Comments

@Pim-Mostert
Copy link

Describe the issue

I want to deploy a Databricks Asset Bundle from an Azure DevOps Pipeline using databricks. While authentication seems to work fine when using cli commands (such as databricks experiments list-experiments), authentication fails for bundle deployment databricks bundle deploy.

In the pipeline I'm making use of the AzureCLI task, which enables databricks CLI to make use of azure-cli type authentication.

As mentioned in databricks/databricks-sdk-go#1025 (comment) the issue appears to be:

The issue that CLI authenticates with azure-cli type but bundles failed to do so is separate one and might be related to some miss on bundles side where we don't pass all necessary env variables. If this is an issue for you, please feel free to open a separate ticket for this in Databricks CLI repo.

Configuration

# azure-pipelines.yml
variables:
  databricksHost: "https://adb-XXX.azuredatabricks.net"

pool:
  vmImage: "ubuntu-latest"

jobs:
  - job: databricks_asset_bundle
    displayName: "Deploy Databricks Asset Bundle"
    steps:
      - bash: |
          # Install Databricks CLI - see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops
          curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

          # Verify installation
          databricks --version

          # Create databricks config file
          file="~/.databrickscfg"

          if [ -f "$file" ] ; then
              rm "$file"
          fi        

          echo "[DEFAULT]" >> ~/.databrickscfg
          echo "host = $databricksHost" >> ~/.databrickscfg
        displayName: Setup Databricks CLI
      - task: AzureCLI@2
        displayName: Deploy Asset Bundle
        inputs:
          azureSubscription: "my-workload-identity-federation-service-connection"
          addSpnToEnvironment: true
          scriptType: "bash"
          scriptLocation: "inlineScript"
          inlineScript: |
            # As described in https://devblogs.microsoft.com/devops/public-preview-of-workload-identity-federation-for-azure-pipelines/
            export ARM_CLIENT_ID=$servicePrincipalId
            export ARM_OIDC_TOKEN=$idToken
            export ARM_TENANT_ID=$tenantId
            export ARM_SUBSCRIPTION_ID=$(az account show --query id -o tsv)
            export ARM_USE_OIDC=true
            
            # Databricks authentication itself works fine
            echo ------------- List experiments -------------
            databricks experiments list-experiments
            
            # But bundle deployment does not
            echo ------------- Deploy bundle -------------
            databricks bundle deploy --log-level=debug --target dev
# databricks.yml
bundle:
  name: my_project

variables:
  service_principle:
    description: Service principle used by the DevOps agent
    default: my-service-principle-id
    
run_as:
  service_principal_name: ${var.service_principle}

# Example resources to deploy
resources:
  experiments:
    my_experiment:
      name: "/Workspace/Users/${var.service_principle}/my_experiment"

targets:
  dev:
    mode: production
    default: true
    workspace:
      host: https://adb-XXX.azuredatabricks.net

Steps to reproduce the behavior

  1. Create a DevOps service connection with Workflow Identity Federation
  2. Create an Azure Pipeline with above yml (replace placeholders), using the service connection from 1)
  3. Create Databricks Asset Bundle with above above yml (replace placeholders)
  4. Trigger pipeline
  5. Observe error

Expected Behavior

The deployment of the asset bundle should succeed.

Actual Behavior

------------- Deploy bundle -------------
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/index.json
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_SHA256SUMS.72D7468F.sig
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_SHA256SUMS
2024/08/27 08:40:59 [DEBUG] GET https://releases.hashicorp.com/terraform/1.5.5/terraform_1.5.5_linux_amd64.zip
Uploading bundle files to /Users/***/.bundle/my_project/dev/files...
Deploying resources...
Updating deployment state...
Deployment complete!
Error: terraform apply: exit status 1

Error: cannot create mlflow experiment: failed during request visitor: default auth: azure-cli: cannot get access token: ERROR: Please run 'az login' to setup account.
. Config: host=https://adb-XXX.azuredatabricks.net,/ azure_client_id=***, azure_tenant_id=XXX. Env: DATABRICKS_HOST, ARM_CLIENT_ID, ARM_TENANT_ID

  with databricks_mlflow_experiment.main,
  on bundle.tf.json line 17, in resource.databricks_mlflow_experiment.main:
  17:       }

Note that the listing of experiments works fine:

------------- List experiments -------------
[
   (expected list of experiments, redacted)
  {
      ...
  },
  ...
]

OS and CLI version

Output by the Azure pipeline:

azure-cli                         2.63.0

core                              2.63.0
telemetry                          1.1.0

Extensions:
azure-devops                       1.0.1

Dependencies:
msal                              1.30.0
azure-mgmt-resource               23.1.1

Databricks CLI: v0.227.0

OS: Ubuntu (Microsoft-hosted agent, latest version)

Is this a regression?

I don't know, I'm new to Databricks.

Debug Logs

Output databricks experiments list-experiments --log-level TRACE:
experiment-list.txt
Output databricks bundle deploy --log-level=debug --target dev:
bundle-deploy.txt

@pietern
Copy link
Contributor

pietern commented Aug 28, 2024

Chiming in as I ran into the same thing a few weeks ago.

The culprit is the Azure CLI configuration file location. We currently don't forward the AZURE_CONFIG_FILE environment variable, which the AzureCLI@2 task sets (perhaps for isolation, but I don't know for sure). To work around this, you can set useGlobalConfig and it will use the default configuration file location, and the Azure CLI will always find it:

- task: AzureCLI@2
  inputs:
    # ...
    useGlobalConfig: true
    # ...

@Pim-Mostert
Copy link
Author

@pietern That indeed works for me too, thanks!

For reference, here is my full working configuration:

variables:
  databricksHost: "https://adb-XXX.azuredatabricks.net"

pool:
  vmImage: "ubuntu-latest"

jobs:
  - job: databricks_asset_bundle
    displayName: "Deploy Databricks Asset Bundle"
    steps:
      - bash: |
          # Install Databricks CLI - see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops
          curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

          # Verify installation
          databricks --version

          # Create databricks config file
          file="~/.databrickscfg"

          if [ -f "$file" ] ; then
              rm "$file"
          fi        

          echo "[DEFAULT]" >> ~/.databrickscfg
          echo "host = $databricksHost" >> ~/.databrickscfg
        displayName: Setup Databricks CLI
      - task: AzureCLI@2
        displayName: Deploy Asset Bundle
        inputs:
          azureSubscription: "my-wif-serviceconnection"
          useGlobalConfig: true
          scriptType: "bash"
          scriptLocation: "inlineScript"
          inlineScript: |
            databricks bundle deploy --target dev

pietern added a commit that referenced this issue Aug 29, 2024
This ensures that the CLI and Terraform can both use an Azure CLI session
configured under a non-standard path. This is the default behavior
on Azure DevOps when using the AzureCLI@2 task.

Fixes #1722.
github-merge-queue bot pushed a commit that referenced this issue Aug 29, 2024
## Changes

This ensures that the CLI and Terraform can both use an Azure CLI
session configured under a non-standard path. This is the default
behavior on Azure DevOps when using the AzureCLI@2 task.

Fixes #1722.

## Tests

Unit test.
@pabtorres
Copy link

@pietern That indeed works for me too, thanks!

For reference, here is my full working configuration:

variables:
  databricksHost: "https://adb-XXX.azuredatabricks.net"

pool:
  vmImage: "ubuntu-latest"

jobs:
  - job: databricks_asset_bundle
    displayName: "Deploy Databricks Asset Bundle"
    steps:
      - bash: |
          # Install Databricks CLI - see https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-azure-devops
          curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

          # Verify installation
          databricks --version

          # Create databricks config file
          file="~/.databrickscfg"

          if [ -f "$file" ] ; then
              rm "$file"
          fi        

          echo "[DEFAULT]" >> ~/.databrickscfg
          echo "host = $databricksHost" >> ~/.databrickscfg
        displayName: Setup Databricks CLI
      - task: AzureCLI@2
        displayName: Deploy Asset Bundle
        inputs:
          azureSubscription: "my-wif-serviceconnection"
          useGlobalConfig: true
          scriptType: "bash"
          scriptLocation: "inlineScript"
          inlineScript: |
            databricks bundle deploy --target dev

Hello, @Pim-Mostert, In your:

echo "host = $databricksHost" >> ~/.databrickscfg

Did you add the host, client_id and client_secret of the Service Principal?

@Pim-Mostert
Copy link
Author

@pabtorres I only added the host. The necessary credentials are injected under the hood by the AzureCLI task.

@rikjansen-hu
Copy link
Contributor

This issue still persists.

PR #1734 passes along the wrong variable. It should be AZURE_CONFIG_DIR (instead of AZURE_CONFIG_FILE)

The suggested solution (useGlobalConfig) can impose a security risk (e.g. on self-hosted agents in DevOps), as authorization credentials are written to a global location, exposing it to future users of the agent.

@pietern
Copy link
Contributor

pietern commented Jan 29, 2025

Thanks for flagging. I don't know how I got this wrong... There is no mention of AZURE_CONFIG_FILE anywhere. I must have mixed it up with useGlobalConfig and mistook the task working for the environment variable passthrough working.

andrewnester pushed a commit that referenced this issue Feb 4, 2025
## Changes

Solves #1722 (current solution passes wrong variable)

## Tests

None, this is a simple find-and-replace on a previous PR.
Proof that this is the correct
[variable](https://learn.microsoft.com/en-us/cli/azure/azure-cli-configuration#cli-configuration-file).
This just passes the variable along to the Terraform environment, which
[should](hashicorp/terraform#25416) be picked
up by Terraform.

Co-authored-by: Rik Jansen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working DABs DABs related issues
Projects
None yet
5 participants