Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update metadata-service to latest version + docs #35419

Merged
merged 1 commit into from
Feb 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "metadata-service"
version = "0.3.3"
version = "0.3.4"
description = ""
authors = ["Ben Church <[email protected]>"]
readme = "README.md"
Expand Down
92 changes: 67 additions & 25 deletions airbyte-ci/connectors/metadata_service/orchestrator/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# Connector Orchestrator
This is the Orchestrator for Airbyte metadata built on Dagster.

This is the Orchestrator for Airbyte metadata built on Dagster.

# Setup

## Prerequisites

#### Poetry

Before you can start working on this project, you will need to have Poetry installed on your system. Please follow the instructions below to install Poetry:
Before you can start working on this project, you will need to have Poetry installed on your system.
Please follow the instructions below to install Poetry:

1. Open your terminal or command prompt.
2. Install Poetry using the recommended installation method:
Expand All @@ -23,125 +24,165 @@ Alternatively, you can use `pip` to install Poetry:
pip install --user poetry
```

3. After the installation is complete, close and reopen your terminal to ensure the newly installed `poetry` command is available in your system's PATH.
3. After the installation is complete, close and reopen your terminal to ensure the newly installed
`poetry` command is available in your system's PATH.

For more detailed instructions and alternative installation methods, please refer to the official Poetry documentation: https://python-poetry.org/docs/#installation
For more detailed instructions and alternative installation methods, please refer to the official
Poetry documentation: https://python-poetry.org/docs/#installation

### Using Poetry in the Project

Once Poetry is installed, you can use it to manage the project's dependencies and virtual environment. To get started, navigate to the project's root directory in your terminal and follow these steps:

Once Poetry is installed, you can use it to manage the project's dependencies and virtual
environment. To get started, navigate to the project's root directory in your terminal and follow
these steps:

## Installation

```bash
poetry install
cp .env.template .env
```

## Create a GCP Service Account and Dev Bucket

Developing against the orchestrator requires a development bucket in GCP.

The orchestrator will use this bucket to:

- store important output files. (e.g. Reports)
- watch for changes to the `registry` directory in the bucket.

However all tmp files will be stored in a local directory.

To create a development bucket:

1. Create a GCP Service Account with the following permissions:
- Storage Admin
- Storage Object Admin
- Storage Object Creator
- Storage Object Viewer
- Storage Admin
- Storage Object Admin
- Storage Object Creator
- Storage Object Viewer
2. Create a PUBLIC GCS bucket
3. Add the service account as a member of the bucket with the following permissions:
- Storage Admin
- Storage Object Admin
- Storage Object Creator
- Storage Object Viewer

- Storage Admin
- Storage Object Admin
- Storage Object Creator
- Storage Object Viewer

4. Add the following environment variables to your `.env` file:
- `METADATA_BUCKET`
- `GCS_CREDENTIALS`
- `METADATA_BUCKET`
- `GCS_CREDENTIALS`

Note that the `GCS_CREDENTIALS` should be the raw json string of the service account credentials.

Here is an example of how to import the service account credentials into your environment:

```bash
export GCS_CREDENTIALS=`cat /path/to/credentials.json`
```

## The Orchestrator

The orchestrator (built using Dagster) is responsible for orchestrating various the metadata processes.
The orchestrator (built using Dagster) is responsible for orchestrating various the metadata
processes.

Dagster has a number of concepts that are important to understand before working on the
orchestrator.

Dagster has a number of concepts that are important to understand before working on the orchestrator.
1. Assets
2. Resources
3. Schedules
4. Sensors
5. Ops

Refer to the [Dagster documentation](https://docs.dagster.io/concepts) for more information on these concepts.
Refer to the [Dagster documentation](https://docs.dagster.io/concepts) for more information on these
concepts.

### Starting the Dagster Daemons

Start the orchestrator with the following command:

```bash
poetry run dagster dev
```

Then you can access the Dagster UI at http://localhost:3000

Note its important to use `dagster dev` instead of `dagit` because `dagster dev` start additional services that are required for the orchestrator to run. Namely the sensor service.
Note its important to use `dagster dev` instead of `dagit` because `dagster dev` start additional
services that are required for the orchestrator to run. Namely the sensor service.

### Materializing Assets with the UI
When you navigate to the orchestrator in the UI, you will see a list of assets that are available to be materialized.

When you navigate to the orchestrator in the UI, you will see a list of assets that are available to
be materialized.

From here you have the following options

1. Materialize all assets
2. Select a subset of assets to materialize
3. Enable a sensor to automatically materialize assets

### Materializing Assets without the UI

In some cases you may want to run the orchestrator without the UI. To learn more about Dagster's CLI commands, see the [Dagster CLI documentation](https://docs.dagster.io/_apidocs/cli).
In some cases you may want to run the orchestrator without the UI. To learn more about Dagster's CLI
commands, see the [Dagster CLI documentation](https://docs.dagster.io/_apidocs/cli).

## Running Tests

```bash
poetry run pytest
```

## Deploying to Dagster Automatically
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would potentially separate these into

"How the orchestrator is deployed" (which happens on any changes to the orchestrator, including actual orchestrator code changes, or updating the lib dependency)

and

"how to release updates to the lib"

which maybe goes in the lib readme honestly.

Just a nit though, we can move stuff around later - let's get the code in!


GitHub Actions is used to automatically deploy the orchestrator to Dagster Cloud
([Github Action](https://github.com/airbytehq/airbyte/blob/master/.github/workflows/metadata_service_deploy_orchestrator_dagger.yml)).

1. Update the version of your code (`../lib`) and update the version of the package in
`pyproject.toml`
1. In this project (`../orchestrator`) Run `poetry lock --no-update` to bump the version of the
requirements you may have changed in
`airbyte-ci/connectors/metadata_service/orchestrator/poetry.lock`
1. Push your changes to the `master` branch and the orchestrator will be automatically deployed to
Dagster Cloud.

Comment on lines +136 to +148
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the path I took

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: There was no changelog to update?

## Deploying to Dagster Cloud manually
Note: This is a temporary solution until we have a CI/CD pipeline setup.

Getting the CICD setup is currently blocked until we hear back from Dagster on a better way to use relative imports in a Dagster Cloud Deployment.
This should only be needed if the above (automatic deployment) fails.

### Installing the dagster-cloud cli

```bash
pip install dagster-cloud
dagster-cloud config
```

### Deploying the orchestrator

```bash
cd orchestrator
DAGSTER_CLOUD_API_TOKEN=<YOU-DAGSTER-CLOUD-TOKEN> airbyte-ci metadata deploy orchestrator
```

# Using the Orchestrator to create a Connector Registry for Development

The orchestrator can be used to create a connector registry for development purposes.

## Setup

First you will need to setup the orchestrator as described above.

Then you will want to do the following

### 1. Mirror the production bucket
Use the Google Cloud Console to mirror the production bucket (prod-airbyte-cloud-connector-metadata-service) to your development bucket.

Use the Google Cloud Console to mirror the production bucket
(prod-airbyte-cloud-connector-metadata-service) to your development bucket.

[Docs](https://cloud.google.com/storage-transfer/docs/cloud-storage-to-cloud-storage)

### 2. Upload any local metadata files you want to test changes with

```bash
# assuming your terminal is in the same location as this readme
cd ../lib
Expand All @@ -150,6 +191,7 @@ poetry run metadata_service upload <PATH TO METADATA FILE> <NAME OF YOUR BUCKET>
```

### 3. Generate the registry

```bash
poetry run dagster dev
open http://localhost:3000
Expand Down
25 changes: 23 additions & 2 deletions airbyte-ci/connectors/metadata_service/orchestrator/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading