Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 archive errors with global namespaces in multiple regions #440

Open
dhiaayachi opened this issue Sep 5, 2024 · 0 comments
Open

S3 archive errors with global namespaces in multiple regions #440

dhiaayachi opened this issue Sep 5, 2024 · 0 comments

Comments

@dhiaayachi
Copy link
Owner

Expected Behavior

S3 Archival works with global namespaces + multi cluster deployment in different regions, based on this excerpt from the docs:

Archival is supported in Global NamespacesLink preview icon (Namespaces that span multiple clusters). When Archival is running in a Global Namespace, it first runs on the active cluster; later it runs on the standby cluster. Before archiving, a history check is done to see what has been previously archived.

Actual Behavior

Global namespace created with S3 archival enabled in region A (both active cluster and archive bucket), and after failing namespace over to cluster in region B (with archive bucket in region B), archive functionality in region B fails with errors similar to the following. Note that the log is from cluster B, but contains the archival URI that points to the bucket in region A (the default for cluster A)..

{
	"id": "<redacted>",
	"content": {
		"timestamp": "2023-02-26T01:06:25.798Z",
		"tags": [
			"region:<region-b>"
		],
		"service": "temporal",
		"attributes": {
			"msg": "failed to archive target",
			"shard-id": 24,
			"level": "error",
			"logger": "temporal",
			"source": "stdout",
			"error": "BadRequest: Bad Request\n\tstatus code: 400, request id: <redacted>, host id: <redacted>",
			"archival-caller-service-name": "history",
			"archival-URI": "s3://<region-a-bucket>",
			"target": "history",
			"caller": "log/with_logger.go:72",
			"stacktrace": "go.temporal.io/server/common/log.(*withLogger).Error\n\t/go/pkg/mod/go.temporal.io/[email protected]/common/log/with_logger.go:72\ngo.temporal.io/server/common/log.(*withLogger).Error\n\t/go/pkg/mod/go.temporal.io/[email protected]/common/log/with_logger.go:72\ngo.temporal.io/server/service/history/archival.(*archiver).recordArchiveTargetResult\n\t/go/pkg/mod/go.temporal.io/[email protected]/service/history/archival/archiver.go:244\ngo.temporal.io/server/service/history/archival.(*archiver).archiveHistory\n\t/go/pkg/mod/go.temporal.io/[email protected]/service/history/archival/archiver.go:191\ngo.temporal.io/server/service/history/archival.(*archiver).Archive.func2\n\t/go/pkg/mod/go.temporal.io/[email protected]/service/history/archival/archiver.go:162",
			"archival-request-namespace-id": "<redacted>",
			"service": "temporal",
			"archival-request-workflow-id": "<redacted>",
			"archival-request-run-id": "<redacted>",
			"archival-request-namespace": "<redacted>",
			"archival-request-close-failover-version": 2,
			"timestamp": 1677373585798,
			"ts": 1677373585.797902
		}
	}
}

Steps to Reproduce the Problem

  1. Deploy multi cluster with separate S3 buckets in respective regions
  2. Create global namespace with archive enabled pointing to bucket in region A
  3. Failover global namespace to cluster in region B

Specifications

  • Version: 1.20.0
  • Platform: linux amd64

Comments

I believe this is due to the various S3 calls using the same underlying AWS session that's configured at launch with the cluster's default archival region (i.e. cluster A uses an s3 client that points to region A, cluster B uses an S3 client that points to region B, but after failover, cluster B attempts to interact with S3 objects in the other region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant