Skip to content

Commit

Permalink
Cluster configuration updates (#2029)
Browse files Browse the repository at this point in the history
* Static and dynamic configuration + Clusters concept updates

* added link

* more updates

* Format

* what have i done

* built out cluster configuration details

* Update docs-src/references/configuration.md

* Update docs-src/references/configuration.md

* Format

* added commonly used dynamic config keys

* added commonly used dynamic config keys for reference

* fix things

* Format

* minor corrections and fixes

* Format

* addressed some comments; more updates to come

* Format

* Updates per feedback from yycpt, tihomir. and dnr

* Updated per David's feedback

* Apply suggestions from code review

Co-authored-by: David Reiss <[email protected]>

* more updates from feedback

* typo fixes

* fixes typos and some details from David's feedback

* cloud limit update for EDU-726

* Addressing comments from David and Yichao

* Apply suggestions from Dail's code review

Co-authored-by: Dail Magee Jr <[email protected]>

* Format

* Updates per Dail's review and general cleanup.

* minor update to persistenceNamespaceMaxQPS descriptions per Yichao's clarification.

* Apply suggestions from code review

Co-authored-by: David Reiss <[email protected]>

* Format

* Apply suggestions from code review

Co-authored-by: Dail Magee Jr <[email protected]>

* final updates

---------

Co-authored-by: aarohib <[email protected]>
Co-authored-by: David Reiss <[email protected]>
Co-authored-by: Dail Magee Jr <[email protected]>
  • Loading branch information
4 people authored Jun 15, 2023
1 parent 6429238 commit 924378c
Show file tree
Hide file tree
Showing 42 changed files with 1,253 additions and 301 deletions.
14 changes: 9 additions & 5 deletions ASSEMBLY_REPORT.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

20 changes: 20 additions & 0 deletions assembly/guide-configs/concepts/clusters.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,30 @@
"type": "h3",
"id": "concepts/what-is-a-retention-period"
},
{
"type": "h2",
"id": "concepts/what-is-persistence"
},
{
"type": "h2",
"id": "concepts/what-is-visibility"
},
{
"type": "h2",
"id": "concepts/what-is-archival"
},
{
"type": "h2",
"id": "concepts/what-is-cluster-configuration"
},
{
"type": "h3",
"id": "concepts/what-is-cluster-security-configuration"
},
{
"type": "h3",
"id": "concepts/what-is-cluster-obervability"
},
{
"type": "h2",
"id": "concepts/what-is-multi-cluster-replication"
Expand Down
2 changes: 2 additions & 0 deletions docs-src/clusters/how-to-set-up-archival.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,8 @@ The following table showcases acceptable values for each configuration and what
| `namespaceDefaults.archival.history.state` | `enabled`, `disabled` | Default state of the Archival feature whenever a new Namespace is created without specifying the Archival state. |
| `namespaceDefaults.archival.history.URI` | Valid URI | Must be a URI of the file store location and match a schema that correlates to a provider. |

Additional resources: [Cluster configuration reference](/references/configuration).

#### Namespace creation

Although Archival is configured at the cluster level, it operates independently within each Namespace.
Expand Down
113 changes: 6 additions & 107 deletions docs-src/concepts/what-is-a-temporal-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,128 +2,27 @@
id: what-is-a-temporal-cluster
title: What is a Temporal Cluster?
sidebar_label: Temporal Cluster
description: A Temporal Cluster is the Temporal Server paired with persistence.
description: A Temporal Cluster is a Temporal Server paired with Persistence and Visibility stores.
tags:
- term
- explanation
---

A Temporal Cluster is the group of services, known as the [Temporal Server](/concepts/what-is-the-temporal-server), combined with persistence stores, that together act as a component of the Temporal Platform.
A Temporal Cluster is the group of services, known as the [Temporal Server](/concepts/what-is-the-temporal-server), combined with [Persistence](/concepts/what-is-persistence) and [Visibility](/concepts/what-is-visibility) stores, that together act as a component of the Temporal Platform.

- [How to quickly install a Temporal Cluster for testing and development](/kb/all-the-ways-to-run-a-cluster)
- [Cluster deployment guide](/cluster-deployment-guide)

![A Temporal Cluster (Server + persistence)](/diagrams/temporal-cluster.svg)

### Persistence

A Temporal Cluster's only required dependency for basic operation is a database.
Multiple types of databases are supported.

![Persistence](/diagrams/temporal-database.svg)

The database stores the following types of data:

- Tasks: Tasks to be dispatched.
- State of Workflow Executions:
- Execution table: A capture of the mutable state of Workflow Executions.
- History table: An append only log of Workflow Execution History Events.
- Namespace metadata: Metadata of each Namespace in the Cluster.
- Visibility data: Enables operations like "show all running Workflow Executions".
For production environments, we recommend using Elasticsearch.

An Elasticsearch database must be added to enable [Advanced Visibility](/concepts/what-is-advanced-visibility) on Temporal Server versions 1.19.1 and earlier.

With Temporal Server version 1.20 and later, Advanced Visibility features are available on SQL databases like MySQL (version 8.0.17 and later), PostgreSQL (version 12 and later), SQLite (v3.31.0 and later) and Elasticsearch.

#### Dependency versions

Temporal tests compatibility by spanning the **minimum** and **maximum** stable non-EOL major versions for each supported database.
As of time of writing, these specific versions are used in our test pipelines and actively tested before we release any version of Temporal:

- **Cassandra v3.11 and v4.0**
- **PostgreSQL v10.18 and v13.4**
- **MySQL v5.7 and v8.0** (specifically 8.0.19+ due to a bug)

We update these support ranges once a year.
The release notes of each Temporal Server declare when we plan to drop support for database versions reaching End of Life.

- Because Temporal Server primarily relies on core database functionality, we do not expect compatibility to break often.
Temporal has no opinions on database upgrade paths; as long as you can upgrade your database according to each project's specifications, Temporal should work with any version within supported ranges.
- We do not run tests with vendors like Vitess and CockroachDB, so you rely on their compatibility claims if you use them.
Feel free to discuss them with fellow users [in our forum](https://community.temporal.io/).
- Temporal also supports SQLite v3.x persistence, but this is meant only for development and testing, not production usage.

### Monitoring and observation

Temporal emits metrics by default in a format that is supported by Prometheus.
Monitoring and observing those metrics is optional.
Any metrics software that supports the same format can be used. Currently, we test with the following Prometheus and Grafana versions:

- **Prometheus >= v2.0**
- **Grafana >= v2.5**

### Visibility
<!-- ### Visibility
Commenting this out because it is out of place. Using the what is visibility concept topic in the guide instead.
Also these details are covered in the Visibility store setup under cluster deployment.
Temporal has built-in [Visibility](/concepts/what-is-visibility) features.
To enhance this feature, Temporal supports an [integration with Elasticsearch](/clusters/how-to-integrate-elasticsearch-into-a-temporal-cluster).
- Elasticsearch v8 is supported from Temporal version 1.18.0 onwards
- Elasticsearch v7.10 is supported from Temporal version 1.7.0 onwards
- Elasticsearch v6.8 is supported up to Temporal version 1.17.x
- Elasticsearch v6.8 and v7.10 versions are explicitly supported with AWS Elasticsearch

### mTLS encryption

Temporal supports Mutual Transport Layer Security (mTLS) as a method of encrypting network traffic between services within a Temporal Cluster, or between application processes and a Cluster.

Mutual TLS can be enabled in Temporal’s [TLS configuration](/references/configuration#tls).
This configuration can be passed through `WithConfig` or `WithConfigLoader`.

This configuration includes two sections that serve to separate intra-cluster and external traffic. That way, different certificates and settings can be used to encrypt each section of traffic:

- `internode`: configuration for encrypting communication between nodes within the Cluster.
- `frontend`: configuration for encrypting the Frontend's public endpoints

### Temporal Client connections

A client's network access can be limited by using certificates issued by a specific Certificate Authority (CA).

To restrict access to Temporal Cluster endpoints, use the `clientCAFiles` or `clientCAData` property and the `requireClientAuth` property.
These properties can be specified in both the `internode` and `frontend` sections of the [mTLS configuration](/references/configuration#tls).

#### Server name specification

Specify the `serverName` in the `client` section of your mTLS configuration to prevent spoofing and [MITM attacks](https://en.wikipedia.org/wiki/Man-in-the-middle_attack).

Entering a value for `serverName` enables established connections to authenticate the endpoint.
This ensures that the server certificate presented to any connected client has the specified server name in its CN property.

This measure can be used for `internode` and `frontend` endpoints.

For more information on mTLS configuration, refer to our [TLS configuration guide](/references/configuration#tls).

### Auth

**Authentication** is the process of verifying users who want to access your application are actually the users you want accessing it.
**Authorization** is the verification of applications and data that a user on your Cluster or application has access to.

Temporal has several authentication protocols that can be set to restrict access to your data.
These protocols address three areas: servers, client connections, and users.

Server attacks can be prevented by specifying `serverName` in the `client` section of your mTLS configuration.
This can be done for both `frontend` and `internode` endpoints.

Client connections can be restricted to certain endpoints by requiring certificates from a specific CA.
Modify the `clientCaFiles`, `clientCaData`, and `requireClientAuth` properties in the `internode` and `frontend` sections of the mTLS configuration.

User access can be restricted through extensibility points and plugins.
When implemented, the `frontend` invokes the plugin before executing the requested operation.

Temporal offers two plugin interfaces for API call authentication and authorization.

- [`ClaimMapper`](/concepts/what-is-a-claimmapper-plugin)
- [`Authorizer`](/concepts/what-is-an-authorizer-plugin)

The logic of both plugins can be customized to fit a variety of use cases.
When provided, the frontend invokes the implementation of the plugins before running the requested operation.
- Elasticsearch v6.8 and v7.10 versions are explicitly supported with AWS Elasticsearch -->
16 changes: 9 additions & 7 deletions docs-src/concepts/what-is-a-workflow-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,19 +148,21 @@ For example, it may be reasonable to use Continue-As-New once per day for a long
Each pending Activity generates a metadata entry in the Workflow's mutable state.
Too many entries create a large mutable state, which causes unstable persistence.

To protect the system, Temporal enforces a maximum of 50,000 pending Activities, Child Workflows, external Workflows, and Signals.
To protect the system, Temporal enforces a maximum of 50,000 pending Activities, Child Workflows, Signals, and Workflow cancellation requests.
Currently, there is no limit on the total number of Signals that a Workflow Execution can receive. <!--From Temporal server v1.21, the default maximum number of Signals that a Workflow Execution can receive is 10000. -->
These limits are set with the following [dynamic configuration keys](https://github.com/temporalio/temporal/blob/master/service/history/configs/config.go):

- `NumPendingChildExecutionsLimit`
- `NumPendingActivitiesLimit`
- `NumPendingSignals`
- `NumPendingCancelRequestsLimit`
- `limit.numPendingChildExecutions.error`
- `limit.numPendingActivities.error`
- `limit.numPendingSignals.error`
- `limit.numPendingCancelRequests.error`
- `history.maximumSignalsPerExecution`

By default, Temporal fails Workflow Task Executions that would cause the Workflow to surpass 50,000 pending Activities, Child Workflows, external Workflows, or Signals.
By default, Temporal fails Workflow Task Executions that would cause the Workflow to surpass 50,000 <!--2000 in from v1.21--> pending Activities, Child Workflows, Workflow cancellation requests, or Signals. <!-- The Workflow Execution fails if the number of pending Signals exceeds 2000, or if the total number of Signals received exceeds 10000. -->
Similar constraints are enforced for `SignalExternalWorkflowExecution`, `RequestCancelExternalWorkflowExecution`, and `StartChildWorkflowExecution` Commands.

:::note

Cloud users are limited to 2,000 each of pending Activities, Child Workflows, external Workflows, and Signals.
Cloud users are limited to 2,000 each of pending Activities, Child Workflows, Workflow cancellation requests, and Signals.

:::
55 changes: 55 additions & 0 deletions docs-src/concepts/what-is-cluster-configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
id: what-is-cluster-configuration
title: What is Cluster configuration?
sidebar_label: Cluster configuration
description: Cluster Configuration is the setup and configuration details of your Temporal Cluster, defined using YAML.
tags:
- term
- explanation
---

Cluster configuration is the setup and configuration details of your self-hosted Temporal Cluster, defined using YAML.
You must define your Cluster configuration when setting up your self-hosted Temporal Cluster.

For details on using Temporal Cloud, see [Temporal Cloud documentation](/cloud).

Cluster configuration is composed of two types of configuration: [Static configuration](#static-configuration) and [Dynamic configuration](#dynamic-configuration).

### Static configuration

Static configuration contains details of how the Cluster should be set up.
The static configuration is read just once and used to configure service nodes at startup.
Depending on how you want to deploy your self-hosted Temporal Cluster, your static configuration must contain details for setting up:

- Temporal Services—Frontend, History, Matching, Worker
- Membership ports for the Temporal Services
- Persistence (including History Shard count), Visibility and Advanced Visibility, Archival store setups.
- TLS, authentication, authorization
- Server log level
- Metrics
- Cluster metadata
- Dynamic config Client

Static configuration values cannot be changed at runtime.
Some values, such as the Metrics configuration or Server log level can be changed in the static configuration but require restarting the Cluster for the changes to take effect.

For details on static configuration keys, see [Cluster configuration reference](/references/configuration).

For static configuration examples, see <https://github.com/temporalio/temporal/tree/master/config>.

### Dynamic configuration

Dynamic configuration contains configuration keys that you can update in your Cluster setup without having to restart the server processes.

All dynamic configuration keys provided by Temporal have default values that are used by the Cluster.
You can override the default values by setting different values for the keys in a YAML file and setting the [dynamic configuration client](/references/configuration#dynamicconfigclient) to poll this file for updates.
Setting dynamic configuration for your Cluster is optional.

Setting overrides for some configuration keys upates the Cluster configuration immediately.
However, for configuration fields that are checked at startup (such as thread pool size), you must restart the server for the changes to take effect.

Use dynamic configuration keys to fine-tune your self-deployed Cluster setup.

For details on dynamic configuration keys, see [Dynamic configuration reference](/references/dynamic-configuration).

For dynamic configuration examples, see <https://github.com/temporalio/temporal/tree/master/config/dynamicconfig>.
32 changes: 32 additions & 0 deletions docs-src/concepts/what-is-cluster-observability.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
id: what-is-cluster-obervability
title: What is Cluster observability?
sidebar_label: Monitoring and observation
description: Monitor and observe Cluster performance with metrics emitted by your self-hosted Temporal Cluster or by Temporal Cloud.
tags:
- explanation
---

You can monitor and observe performance with metrics emitted by your self-hosted Temporal Cluster or by Temporal Cloud.

Temporal emits metrics by default in a format that is supported by Prometheus.
Any metrics software that supports the same format can be used.
Currently, we test with the following Prometheus and Grafana versions:

- **Prometheus >= v2.0**
- **Grafana >= v2.5**

Temporal Cloud emits metrics through a Prometheus HTTP API endpoint, which can be directly used as a Prometheus data source in Grafana or to query and export Cloud metrics to any observability platform.

For details on Cloud metrics and setup, see the following:

- [Temporal Cloud metrics reference](/cloud/how-to-monitor-temporal-cloud-metrics)
- [Set up Grafana with Temporal Cloud observability to view metrics](/kb/prometheus-grafana-setup-cloud#data-sources-configuration-for-temporal-cloud-and-sdk-metrics-in-grafana)

On self-hosted Temporal Clusters, expose Prometheus endpoints in your Cluster configuration and configure Prometheus to scrape metrics from the endpoints.
You can then set up your observability platform (such as Grafana) to use Prometheus as a data source.

For details on self-hosted Cluster metrics and setup, see the following:

- [Temporal Cluster OSS metrics reference](/references/cluster-metrics)
- [Set up Prometheus and Grafana to view SDK and self-hosted Cluster metrics](/kb/prometheus-grafana-setup)
Loading

0 comments on commit 924378c

Please sign in to comment.