From 1be10819a5168cecdc35b6826bbefcaaff814675 Mon Sep 17 00:00:00 2001 From: Nikki Everett Date: Mon, 14 Oct 2024 10:29:26 -0500 Subject: [PATCH] DOC-648 flytesnacks edits needed for Neptune and W&B flytekit plugins documentation (#1748) * update refs Signed-off-by: nikki everett * clean up integrations information architecture Signed-off-by: nikki everett * fix databricks agent title Signed-off-by: nikki everett * move k8s pod plugin to deprecated integrations section and add deprecation notice Signed-off-by: nikki everett --------- Signed-off-by: nikki everett --- .../deprecated_integrations/index.md | 10 - docs/integrations/index.md | 238 +++++++++++------- examples/databricks_agent/README.md | 2 +- examples/k8s_pod_plugin/README.md | 4 + examples/neptune_plugin/README.md | 6 +- examples/wandb_plugin/README.md | 2 +- 6 files changed, 150 insertions(+), 112 deletions(-) delete mode 100644 docs/integrations/deprecated_integrations/index.md diff --git a/docs/integrations/deprecated_integrations/index.md b/docs/integrations/deprecated_integrations/index.md deleted file mode 100644 index ff5c12e2a..000000000 --- a/docs/integrations/deprecated_integrations/index.md +++ /dev/null @@ -1,10 +0,0 @@ -# Deprecated integrations - -```{toctree} -:maxdepth: -1 -:hidden: - -BigQuery plugin -Databricks plugin -Snowflake plugin -``` diff --git a/docs/integrations/index.md b/docs/integrations/index.md index 34da66778..283ac34d7 100644 --- a/docs/integrations/index.md +++ b/docs/integrations/index.md @@ -8,83 +8,79 @@ Flyte is designed to be highly extensible and can be customized in multiple ways Want to contribute an example? Check out the {ref}`Documentation contribution guide `. ``` -## Flytekit Plugins +## Flytekit plugins -Flytekit plugins are simple plugins that can be implemented purely in python, unit tested locally and allow extending -Flytekit functionality. These plugins can be anything and for comparison can be thought of like -[Airflow Operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html). +Flytekit plugins can be implemented purely in Python, unit tested locally, and allow extending +Flytekit functionality. For comparison, these plugins can be thought of like +[Airflow operators](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html). ```{list-table} :header-rows: 0 :widths: 20 30 -* - {doc}`SQL ` - - Execute SQL queries as tasks. -* - {doc}`Great Expectations ` - - Validate data with `great_expectations`. -* - {doc}`Papermill ` - - Execute Jupyter Notebooks with `papermill`. -* - {doc}`Pandera ` - - Validate pandas dataframes with `pandera`. -* - {doc}`Modin ` - - Scale pandas workflows with `modin`. -* - {doc}`Dolt ` - - Version your SQL database with `dolt`. * - {doc}`DBT ` - Run and test your `dbt` pipelines in Flyte. -* - {doc}`WhyLogs ` - - `whylogs`: the open standard for data logging. -* - {doc}`MLFlow ` - - `mlflow`: the open standard for model tracking. -* - {doc}`ONNX ` - - Convert ML models to ONNX models seamlessly. +* - {doc}`Dolt ` + - Version your SQL database with `dolt`. * - {doc}`DuckDB ` - Run analytical queries using DuckDB. -* - {doc}`Weights and Biases ` - - `wandb`: Machine learning platform to build better models faster. +* - {doc}`Great Expectations ` + - Validate data with `great_expectations`. +* - {doc}`MLFlow ` + - `mlflow`: the open standard for model tracking. +* - {doc}`Modin ` + - Scale pandas workflows with `modin`. * - {doc}`Neptune ` - `neptune`: Neptune is the MLOps stack component for experiment tracking. * - {doc}`NIM ` - Serve optimized model containers with NIM. * - {doc}`Ollama ` - Serve fine-tuned LLMs with Ollama in a Flyte workflow. +* - {doc}`ONNX ` + - Convert ML models to ONNX models seamlessly. +* - {doc}`Pandera ` + - Validate pandas dataframes with `pandera`. +* - {doc}`Papermill ` + - Execute Jupyter Notebooks with `papermill`. +* - {doc}`SQL ` + - Execute SQL queries as tasks. +* - {doc}`Weights and Biases ` + - `wandb`: Machine learning platform to build better models faster. +* - {doc}`WhyLogs ` + - `whylogs`: the open standard for data logging. ``` -:::{dropdown} {fa}`info-circle` Using flytekit plugins +:::{dropdown} {fa}`info-circle` Using Flytekit plugins :animate: fade-in-slide-down -Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the -{py:class}`~flytekit.core.base_task.PythonTask` API defined in Flytekit. +Data is automatically marshalled and unmarshalled in and out of the plugin. Users should mostly implement the {py:class}`~flytekit.core.base_task.PythonTask` API defined in Flytekit. -Flytekit Plugins are lazily loaded and can be released independently like libraries. We follow a convention to name the -plugin like `flytekitplugins-*`, where `*` indicates the package to be integrated into Flytekit. For example -`flytekitplugins-papermill` enables users to author Flytekit tasks using [Papermill](https://papermill.readthedocs.io/en/latest/). +Flytekit plugins are lazily loaded and can be released independently like libraries. The naming convention is `flytekitplugins-*`, where `*` indicates the package to be integrated into Flytekit. For example, `flytekitplugins-papermill` enables users to author Flytekit tasks using [Papermill](https://papermill.readthedocs.io/en/latest/). You can find the plugins maintained by the core Flyte team [here](https://github.com/flyteorg/flytekit/tree/master/plugins). ::: -## Native Backend Plugins +## Native backend plugins -Native Backend Plugins are the plugins that can be executed without any external service dependencies because the compute is -orchestrated by Flyte itself, within its provisioned Kubernetes clusters. +Native backend plugins can be executed without any external service dependencies because the compute is orchestrated by Flyte itself, within its provisioned Kubernetes clusters. ```{list-table} :header-rows: 0 :widths: 20 30 -* - {doc}`K8s Pods ` - - Execute K8s pods for arbitrary workloads. -* - {doc}`K8s Cluster Dask Jobs ` - - Run Dask jobs on a K8s Cluster. -* - {doc}`K8s Cluster Spark Jobs ` - - Run Spark jobs on a K8s Cluster. * - {doc}`Kubeflow PyTorch ` - Run distributed PyTorch training jobs using `Kubeflow`. * - {doc}`Kubeflow TensorFlow ` - Run distributed TensorFlow training jobs using `Kubeflow`. +* - {doc}`Kubernetes pods ` + - Execute Kubernetes pods for arbitrary workloads. +* - {doc}`Kubernetes cluster Dask jobs ` + - Run Dask jobs on a Kubernetes Cluster. +* - {doc}`Kubernetes cluster Spark jobs ` + - Run Spark jobs on a Kubernetes Cluster. * - {doc}`MPI Operator ` - Run distributed deep learning training jobs using Horovod and MPI. -* - {doc}`Ray Task ` +* - {doc}`Ray ` - Run Ray jobs on a K8s Cluster. ``` @@ -98,54 +94,53 @@ orchestrated by Flyte itself, within its provisioned Kubernetes clusters. :header-rows: 0 :widths: 20 30 +* - {doc}`AWS SageMaker Inference agent ` + - Deploy models and create, as well as trigger inference endpoints on AWS SageMaker. * - {doc}`Airflow agent ` - Run Airflow jobs in your workflows with the Airflow agent. * - {doc}`BigQuery agent ` - Run BigQuery jobs in your workflows with the BigQuery agent. * - {doc}`ChatGPT agent ` - Run ChatGPT jobs in your workflows with the ChatGPT agent. -* - {doc}`Databricks ` +* - {doc}`Databricks agent ` - Run Databricks jobs in your workflows with the Databricks agent. -* - {doc}`Memory Machine Cloud ` +* - {doc}`Memory Machine Cloud agent ` - Execute tasks using the MemVerge Memory Machine Cloud agent. * - {doc}`OpenAI Batch ` - Submit requests for asynchronous batch processing on OpenAI. -* - {doc}`SageMaker Inference ` - - Deploy models and create, as well as trigger inference endpoints on SageMaker. -* - {doc}`Sensor ` +* - {doc}`Sensor agent ` - Run sensor jobs in your workflows with the sensor agent. -* - {doc}`Snowflake ` +* - {doc}`Snowflake agent ` - Run Snowflake jobs in your workflows with the Snowflake agent. ``` (external_service_backend_plugins)= -## External Service Backend Plugins +## External service backend plugins -As the term suggests, external service backend plugins rely on external services like -[Hive](https://docs.qubole.com/en/latest/user-guide/engines/hive/index.html) for handling the workload defined in the Flyte task that uses the respective plugin. +As the term suggests, these plugins rely on external services to handle the workload defined in the Flyte task that uses the plugin. ```{list-table} :header-rows: 0 :widths: 20 30 -* - {doc}`AWS Athena plugin ` +* - {doc}`AWS Athena ` - Execute queries using AWS Athena -* - {doc}`AWS Batch plugin ` +* - {doc}`AWS Batch ` - Running tasks and workflows on AWS batch service * - {doc}`Flyte Interactive ` - Execute tasks using Flyte Interactive to debug. -* - {doc}`Hive plugin ` +* - {doc}`Hive ` - Run Hive jobs in your workflows. ``` (enable-backend-plugins)= -::::{dropdown} {fa}`info-circle` Enabling Backend Plugins +::::{dropdown} {fa}`info-circle` Enabling backend plugins :animate: fade-in-slide-down -To enable a backend plugin you have to add the `ID` of the plugin to the enabled plugins list. The `enabled-plugins` is available under the `tasks > task-plugins` section of FlytePropeller's configuration. -The plugin configuration structure is defined [here](https://pkg.go.dev/github.com/flyteorg/flytepropeller@v0.6.1/pkg/controller/nodes/task/config#TaskPluginConfig). An example of the config follows, +To enable a backend plugin, you must add the `ID` of the plugin to the enabled plugins list. The `enabled-plugins` is available under the `tasks > task-plugins` section of FlytePropeller's configuration. +The plugin configuration structure is defined [here](https://pkg.go.dev/github.com/flyteorg/flytepropeller@v0.6.1/pkg/controller/nodes/task/config#TaskPluginConfig). An example of the config follows: ```yaml tasks: @@ -160,15 +155,15 @@ tasks: container_array: k8s-array ``` -**Finding the `ID` of the Backend Plugin** +**Finding the `ID` of the backend plugin** -This is a little tricky since you have to look at the source code of the plugin to figure out the `ID`. In the case of Spark, for example, the value of `ID` is used [here](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L424) here, defined as [spark](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L41). +To find the `ID` of the backend plugin, look at the source code of the plugin. For examples, in the case of Spark, the value of `ID` is used [here](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L424), defined as [spark](https://github.com/flyteorg/flyteplugins/blob/v0.5.25/go/tasks/plugins/k8s/spark/spark.go#L41). :::: -## SDKs for Writing Tasks and Workflows +## SDKs for writing tasks and workflows -The {ref}`community ` would love to help you with your own ideas of building a new SDK. Currently the available SDKs are: +The {ref}`community ` would love to help you build new SDKs. Currently, the available SDKs are: ```{list-table} :header-rows: 0 @@ -180,7 +175,7 @@ The {ref}`community ` would love to help you with your own ideas of b - The Java/Scala SDK for Flyte. ``` -## Flyte Operators +## Flyte operators Flyte can be integrated with other orchestrators to help you leverage Flyte's constructs natively within other orchestration tools. @@ -196,42 +191,91 @@ constructs natively within other orchestration tools. ```{toctree} :maxdepth: -1 :hidden: +:caption: Flytekit plugins + +DBT +Dolt +DuckDB +Great Expectations +MLFlow +Modin +Neptune +NIM +Ollama +ONNX +Pandera +Papermill +SQL +Weights & Biases +WhyLogs +``` + +```{toctree} +:maxdepth: -1 +:hidden: +:caption: Native backend plugins + +Kubeflow PyTorch +Kubeflow TensorFlow +Kubernetes cluster Dask jobs +Kubernetes cluster Spark jobs +MPI Operator +Ray +``` + +```{toctree} +:maxdepth: -1 +:hidden: +:caption: Flyte agents + +Airflow agent +AWS Sagemaker inference agent +BigQuery agent +ChatGPT agent +Databricks agent +Memory Machine Cloud agent +OpenAI batch agent +Sensor agent +Snowflake agent +``` + +```{toctree} +:maxdepth: -1 +:hidden: +:caption: External service backend plugins + +AWS Athena +AWS Batch +Flyte Interactive +Hive + +``` + +```{toctree} +:maxdepth: -1 +:hidden: +:caption: SDKs for writing tasks and workflows + +flytekit +flytekit-java + +``` + +```{toctree} +:maxdepth: -1 +:hidden: +:caption: Flyte operators + +Airflow +``` + +```{toctree} +:maxdepth: -1 +:hidden: +:caption: Deprecated integrations -/auto_examples/airflow_agent/index -/auto_examples/airflow_plugin/index -/auto_examples/athena_plugin/index -/auto_examples/aws_batch_plugin/index -/auto_examples/bigquery_agent/index -/auto_examples/chatgpt_agent/index -/auto_examples/k8s_dask_plugin/index -/auto_examples/databricks_agent/index -/auto_examples/dbt_plugin/index -/auto_examples/dolt_plugin/index -/auto_examples/duckdb_plugin/index -/auto_examples/flyteinteractive_plugin/index -/auto_examples/greatexpectations_plugin/index -/auto_examples/hive_plugin/index -/auto_examples/k8s_pod_plugin/index -/auto_examples/mlflow_plugin/index -/auto_examples/mmcloud_agent/index -/auto_examples/modin_plugin/index -/auto_examples/kfmpi_plugin/index -/auto_examples/neptune_plugin/index -/auto_examples/nim_plugin/index -/auto_examples/ollama_plugin/index -/auto_examples/onnx_plugin/index -/auto_examples/openai_batch_agent/index -/auto_examples/papermill_plugin/index -/auto_examples/pandera_plugin/index -/auto_examples/kfpytorch_plugin/index -/auto_examples/ray_plugin/index -/auto_examples/sagemaker_inference_agent/index -/auto_examples/sensor/index -/auto_examples/snowflake_agent/index -/auto_examples/k8s_spark_plugin/index -/auto_examples/sql_plugin/index -/auto_examples/kftensorflow_plugin/index -/auto_examples/wandb_plugin/index -/auto_examples/whylogs_plugin/index -Deprecated integrations +BigQuery plugin +Databricks plugin +Kubernetes pods +Snowflake plugin ``` diff --git a/examples/databricks_agent/README.md b/examples/databricks_agent/README.md index 6d6f4e754..0c1148926 100644 --- a/examples/databricks_agent/README.md +++ b/examples/databricks_agent/README.md @@ -1,6 +1,6 @@ (databricks_agent)= -# Databricks agent example +# Databricks agent ```{eval-rst} .. tags:: Spark, Integration, DistributedComputing, Data, Advanced diff --git a/examples/k8s_pod_plugin/README.md b/examples/k8s_pod_plugin/README.md index 4a1646be5..2f18928dd 100644 --- a/examples/k8s_pod_plugin/README.md +++ b/examples/k8s_pod_plugin/README.md @@ -4,6 +4,10 @@ .. tags:: Integration, Kubernetes, Advanced ``` +```{important} +This plugin is no longer needed and is here only for backwards compatibility. No new versions will be published after v1.13.x Please use the `pod_template` and `pod_template_name` arguments to `@task` as described in the {ref}`Kubernetes task pod configuration guide ` instead. +``` + Flyte tasks, represented by the {py:func}`@task ` decorator, are essentially single functions that run in one container. However, there may be situations where you need to run a job with more than one container or require additional capabilities, such as: diff --git a/examples/neptune_plugin/README.md b/examples/neptune_plugin/README.md index bbf76fbf4..67c4d845d 100644 --- a/examples/neptune_plugin/README.md +++ b/examples/neptune_plugin/README.md @@ -1,6 +1,6 @@ -(neptune)= +(neptune_plugin)= -# Neptune +# Neptune plugin ```{eval-rst} .. tags:: Integration, Data, Metrics, Intermediate @@ -10,7 +10,7 @@ ## Installation -To install the Flyte Neptune plugin, , run the following command: +To install the Flyte Neptune plugin, run the following command: ```bash pip install flytekitplugins-neptune diff --git a/examples/wandb_plugin/README.md b/examples/wandb_plugin/README.md index 53c3ddfd4..b5f1da584 100644 --- a/examples/wandb_plugin/README.md +++ b/examples/wandb_plugin/README.md @@ -1,4 +1,4 @@ -(wandb)= +(wandb_plugin)= # Weights and Biases