Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archival + task error after migration from 1.19.1 to 1.20.0 #4270

Closed
nordiste opened this issue May 3, 2023 · 7 comments · Fixed by #4304
Closed

archival + task error after migration from 1.19.1 to 1.20.0 #4270

nordiste opened this issue May 3, 2023 · 7 comments · Fixed by #4304
Assignees

Comments

@nordiste
Copy link

nordiste commented May 3, 2023

Expected Behavior

no error

Actual Behavior

after a migration process from 1.19.1 to 1.20.0, we got a lot of error :
"failed to archive target" -> "invalid search attribute type : Unspecified"
"failed to archive workflow" -> "invalid search attribute type : Unspecified"
"failed to process task" -> "invalid search attribute type : Unspecified"

failed to archive target return this stack :

[go.temporal.io/server/common/log.(*zapLogger).Error](http://go.temporal.io/server/common/log.(*zapLogger).Error)
    /home/builder/temporal/common/log/zap_logger.go:150
[go.temporal.io/server/service/history/archival.(*archiver).recordArchiveTargetResult](http://go.temporal.io/server/service/history/archival.(*archiver).recordArchiveTargetResult)
    /home/builder/temporal/service/history/archival/archiver.go:244
[go.temporal.io/server/service/history/archival.(*archiver).archiveVisibility](http://go.temporal.io/server/service/history/archival.(*archiver).archiveVisibility)
    /home/builder/temporal/service/history/archival/archiver.go:218
[go.temporal.io/server/service/history/archival.(*archiver).Archive.func3](http://go.temporal.io/server/service/history/archival.(*archiver).Archive.func3)
    /home/builder/temporal/service/history/archival/archiver.go:167

failed to process task return this stack :

go.temporal.io/server/common/log.(*zapLogger).Error](http://go.temporal.io/server/common/log.(*zapLogger).Error)
    /home/builder/temporal/common/log/zap_logger.go:150
[go.temporal.io/server/common/log.(*lazyLogger).Error](http://go.temporal.io/server/common/log.(*lazyLogger).Error)
    /home/builder/temporal/common/log/lazy_logger.go:68
[go.temporal.io/server/service/history/queues.(*executableImpl).HandleErr](http://go.temporal.io/server/service/history/queues.(*executableImpl).HandleErr)
    /home/builder/temporal/service/history/queues/executable.go:321
[go.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask.func1](http://go.temporal.io/server/common/tasks.(*FIFOScheduler%5B...%5D).executeTask.func1)
    /home/builder/temporal/common/tasks/fifo_scheduler.go:232
[go.temporal.io/server/common/backoff.ThrottleRetry.func1](http://go.temporal.io/server/common/backoff.ThrottleRetry.func1)
    /home/builder/temporal/common/backoff/retry.go:175
[go.temporal.io/server/common/backoff.ThrottleRetryContext](http://go.temporal.io/server/common/backoff.ThrottleRetryContext)
    /home/builder/temporal/common/backoff/retry.go:199
[go.temporal.io/server/common/backoff.ThrottleRetry](http://go.temporal.io/server/common/backoff.ThrottleRetry)
    /home/builder/temporal/common/backoff/retry.go:176
[go.temporal.io/server/common/tasks.(*FIFOScheduler[...]).executeTask](http://go.temporal.io/server/common/tasks.(*FIFOScheduler%5B...%5D).executeTask)
    /home/builder/temporal/common/tasks/fifo_scheduler.go:241
[go.temporal.io/server/common/tasks.(*FIFOScheduler[...]).processTask](http://go.temporal.io/server/common/tasks.(*FIFOScheduler%5B...%5D).processTask)
    /home/builder/temporal/common/tasks/fifo_scheduler.go:217"

i'll try to migrate to 1.20.2 but same problem.

workers seems to run normally but i get errors on temporal logs.

Steps to Reproduce the Problem

migrate from v1.19.1 to 1.20.0

Specifications

  • Version: 1.20.2
  • Platform: kubernetes
  • archivage : S3
  • database : PG (migrated to V12 but problem before)
  • process using schedules
temporal-tools-84c7c5cb75-5m55t:~$ tctl adm cl gsa
Custom search attributes:
+------+------+
| NAME | TYPE |
+------+------+
+------+------+
System search attributes:
+----------------------------+-------------+
|            NAME            |    TYPE     |
+----------------------------+-------------+
| BatcherNamespace           | Keyword     |
| BatcherUser                | Keyword     |
| BinaryChecksums            | KeywordList |
| CloseTime                  | Datetime    |
| ExecutionDuration          | Int         |
| ExecutionStatus            | Keyword     |
| ExecutionTime              | Datetime    |
| HistoryLength              | Int         |
| HistorySizeBytes           | Int         |
| RunId                      | Keyword     |
| StartTime                  | Datetime    |
| StateTransitionCount       | Int         |
| TaskQueue                  | Keyword     |
| TemporalChangeVersion      | KeywordList |
| TemporalNamespaceDivision  | Keyword     |
| TemporalSchedulePaused     | Bool        |
| TemporalScheduledById      | Keyword     |
| TemporalScheduledStartTime | Datetime    |
| WorkflowId                 | Keyword     |
| WorkflowType               | Keyword     |
+----------------------------+-------------+
Storage mappings:
+-------------+-------------+
| COLUMN NAME | COLUMN TYPE |
+-------------+-------------+
+-------------+-------------+
Workflow info:
{

}
@MichaelSnowden
Copy link
Contributor

What kind of workflows are you archiving? Specifically, how old are they?

@yux0
Copy link
Contributor

yux0 commented May 6, 2023

I think there is some issue around

enumspb.IndexedValueType_value[string(value.Metadata[MetadataType])],
. Can the value type is 'Unspecified' and the value from payload metadata is also 'Unspecified'?

@nordiste
Copy link
Author

nordiste commented May 9, 2023

What kind of workflows are you archiving? Specifically, how old are they?

this is schedules generated tasks.
i'll try to delete schedules and recreate, but problem continue.
theses tasks are created with V1.20.0 to V1.20.2. (1 min old to more than 2 days)

here is a task describe :

emporal-tools-c67bd679f-bnfnh:~$ tctl workflow describe --w xxxxxxxxxxx-2023-05-04T15:16:00Z --raw
{
  "executionConfig": {
    "taskQueue": {
      "name": "xxxxxxxxxx",
      "kind": "Normal"
    },
    "defaultWorkflowTaskTimeout": "10s"
  },
  "workflowExecutionInfo": {
    "execution": {
      "workflowId": "xxxxxxxxx-2023-05-04T15:16:00Z",
      "runId": "5efdb3f9-3e41-4280-a120-0d2cf3da6a7d"
    },
    "type": {
      "name": "xxxxxxxxxxx"
    },
    "startTime": "2023-05-04T15:16:00.115727404Z",
    "closeTime": "2023-05-04T15:16:00.595368783Z",
    "status": "Completed",
    "historyLength": "11",
    "executionTime": "2023-05-04T15:16:00.115727404Z",
    "memo": {

    },
    "searchAttributes": {
      "indexedFields": {
        "BinaryChecksums": {
          "metadata": {
            "encoding": "anNvbi9wbGFpbg==",
            "type": "S2V5d29yZExpc3Q="
          },
          "data": "WyIzMWJkNmVhZmZlMDI4ZjJhYTRhZTZkOTdlZDlkZTg4NyJd"
        },
        "TemporalScheduledById": {
          "metadata": {
            "encoding": "anNvbi9wbGFpbg==",
            "type": "S2V5d29yZA=="
          },
          "data": "ImJhbk9uQ2RuIg=="
        },
        "TemporalScheduledStartTime": {
          "metadata": {
            "encoding": "anNvbi9wbGFpbg==",
            "type": "RGF0ZXRpbWU="
          },
          "data": "IjIwMjMtMDUtMDRUMTU6MTY6MDBaIg=="
        }
      }
    },
    "autoResetPoints": {
      "points": [
        {
          "binaryChecksum": "31bd6eaffe028f2aa4ae6d97ed9de887",
          "runId": "5efdb3f9-3e41-4280-a120-0d2cf3da6a7d",
          "firstWorkflowTaskCompletedId": "4",
          "createTime": "2023-05-04T15:16:00.232380603Z",
          "resettable": true
        }
      ]
    },
    "taskQueue": "xxxxxxxxxxxxxxxxxx",
    "stateTransitionCount": "7",
    "historySizeBytes": "1513"
  }
}

Thanks for help

@MichaelSnowden
Copy link
Contributor

I'm not able to repro this exact issue, but here's what I have so far.

  1. I made a branch off of v1.20.2 that minimizes the archival delay: https://github.com/temporalio/temporal/compare/v1.20.2...snowden/4270-repo?expand=1
  2. I started the server and ran it against my s3 localstack
  3. I ran some workflows with Maru, and I then tried to describe them with tctl. I do see a warning about the KeywordList search attribute being invalid.
 tctl --namespace benchtest workflow describe --workflow_id 4
Warning: unable to stringify search attribute: invalid search attribute type: KeywordList
{
    ...
    "searchAttributes": {
      "indexedFields": {
        "BinaryChecksums": "[\"035d2a38385e9836a609b434c77c2389\"]"
      }
    },
}

@nordiste could you go into more detail about how you were using schedules to repro this issue?

@MichaelSnowden
Copy link
Contributor

I was able to reproduce this locally with schedules--not with other workflow types. I'll update here when I find out what's wrong

@MichaelSnowden
Copy link
Contributor

@nordiste it looks like this is a bug that exists when archiving workflows created with schedules. For now, you should be able to workaround this by setting history.durableArchivalEnabled to false. I'm currently working on a long-term fix

@MichaelSnowden
Copy link
Contributor

@nordiste You can try patching #4304 locally in the meantime to see if this fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants