GC: Error status and object is not found but job is running ? #14966

guyguy333 · 2021-05-25T09:34:26Z

Expected behavior and actual behavior:

As long as GC job is running, I shouldn't have error status. However, I'm pretty sure GC job is running as registry is spammed with DELETE. Moreover, I'm unable to fetch the job log so I believe job is not done.

Here is the output clicking on Log:

{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"log entity: 3ae399a817334a6a372bccd0\"}"}]}

Steps to reproduce the problem:

Run a long GC job (We've 1.5Tb and ~175k objects on S3)

Versions:
Please specify the versions of following systems.

harbor version: 2.22
docker engine version: Using containerd
docker-compose version: Using K8S

Additional context:

Harbor config files: You can get them by packaging harbor.yml and files in the same directory, including subdirectory.
Log files: You can get them by package the /var/log/harbor/ .

The text was updated successfully, but these errors were encountered:

wy65701436 · 2021-05-31T08:28:19Z

can you share the job service log as well? It's helpful for narrow down the problem.

And, can you confirm whether the other jobs(Replication/Scan or Retention) are running at the time of GC execution? We found that the job log may fail to create at the some particular case, like job service high load scenario.

guyguy333 · 2021-06-01T13:31:19Z

Sure. However, after a fail, if I don't restart Harbor (job service pod), I'm having this response requesting a log of any garbage collection (even on which failed multiple days ago and a restart already happened):

{"errors":[{"code":"UNKNOWN","message":"internal server error"}]}

Storage is on S3. I have many job service logs with context deadline exceeded I can share if you want.

Here is an example of a job service.

jobserver.log

Currently, I was never able to run a complete GC. I start GC, it fails few hours later, I restart job service and restart GC, it fails ...

guyguy333 · 2021-06-02T16:51:28Z

I got an other error retrieving log after job failed: {"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"log entity: 2c456ecc4b695fa3505e6fd0\"}"}]}

Maybe finally related to #12948 ?

steven-zou · 2021-06-17T08:01:43Z

2021-05-31T07:25:18Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:262]: failed to delete manifest with v2 API, xxxxx/xxxxx/xxxxx, sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05, Delete "http://harbor-harbor-registry:5000/v2/xxxxx/xxxxx/xxxxx/manifests/sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-05-31T07:25:18Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:165]: failed to execute GC job at sweep phase, error: failed to delete manifest with v2 API: xxxxx/xxxxx/xxxxx, sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05: Delete "http://harbor-harbor-registry:5000/v2/xxxxx/xxxxx/xxxxx/manifests/sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

wy65701436 · 2021-09-02T18:15:26Z

You can upgrade to latest Harbor, it introduces retry on GC failure within one minute.

guyguy333 · 2021-09-02T18:19:15Z

Thanks @wy65701436 I will try that :)

wy65701436 · 2021-10-13T02:07:34Z

@guyguy333 please reopen it if you still encounter the problem.

Grounz · 2021-10-21T15:48:07Z

Hi,

I have upgrade Harbor to 2.3.3 and i have the same error:

json { "errors": [ { "code": "NOT_FOUND", "message": "{\"code\":10010,\"message\":\"object is not found\",\"details\":\"f7d54004a275590aeccd94ee\"}" } ] }

Jobserver.log in time range:

Oct 17 02:00:01 172.18.0.1 jobservice[2857]: 2021-10-17T00:00:01Z [ERROR] [/jobservice/runner/redis.go:152]: Run job GARBAGE_COLLECTION:f7d54004a275590aeccd94ee error: runtime error: interface conversion: interface {} is nil, not string; stack: goroutine 61 [running]: Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run.func2(0xc0013217d0, 0xc00078ba40) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:150 +0xa5 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: panic(0x28b80c0, 0xc000cfbdd0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/usr/local/go/src/runtime/panic.go:975 +0x47a Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).parseParams(0xc000d0ef30, 0xc000c592c0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:116 +0x365 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).init(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0x7, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:109 +0x313 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).Run(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0xc000e29e50, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:151 +0x4d Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run(0xc000413a70, 0xc00078ba40, 0x0, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:212 +0xc55 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github. Oct 17 02:00:01 172.18.0.1 jobservice[2857]: 2021-10-17T00:00:01Z [ERROR] [/jobservice/runner/redis.go:113]: Job 'GARBAGE_COLLECTION:f7d54004a275590aeccd94ee' exit with error: runtime error: interface conversion: interface {} is nil, not string; stack: goroutine 61 [running]: Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run.func2(0xc0013217d0, 0xc00078ba40) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:150 +0xa5 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: panic(0x28b80c0, 0xc000cfbdd0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/usr/local/go/src/runtime/panic.go:975 +0x47a Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).parseParams(0xc000d0ef30, 0xc000c592c0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:116 +0x365 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).init(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0x7, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:109 +0x313 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).Run(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0xc000e29e50, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:151 +0x4d Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run(0xc000413a70, 0xc00078ba40, 0x0, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:212 +0xc55 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.

miminar · 2023-01-04T12:20:22Z

We are having the same issue in 2.6.2.
Using S3 storage, we are also hitting #12948. The jobs keeps running for a very long time, it is reported as Running, but opening the Log results in:

{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"228f7a1eb1175ebabcaafd76\"}"}]}

@wy65701436 or @steven-zou, can you please re-open? Or shall I open a new one? What other information do you want me to provide?

Antiarchitect · 2023-01-20T17:58:30Z

Same symptoms. I use Ceph Radosgw as an S3 storage. I have 2.66 TB overall registry space used (and growing). Not a single job (weekly because of #14774) didn't succeed since September 2022.

P.S. jobservice pod CPU consumption is close to zero and pod stdout does not contain anything worthy of mentioning. Could it be it's doing nothing?

Just observed jobService dashboard: all jobs are in Pending state. There is one worker with concurrency 100 and no one got any jobs.

Antiarchitect · 2023-01-20T18:23:43Z

Removing all Redis instances (KeyDB in my case) with all the PVCs seems did the trick and now one worker is doing GC after a manual restart.

salmon5 · 2023-09-27T01:23:42Z

My harbor version is 2.2.0,have same issue.
I update harbor to 2.4.0,the issue is fixed.
https://goharbor.io/docs/2.4.0/administration/upgrade/

mdavid01 · 2024-05-16T16:47:35Z

Hi Team - What is the log entity? How can it help me locate the root cause?
{"errors":[{"code":"NOT_FOUND","message":"{"code":10010,"message":"object is not found","details":"log entity: ef8aa08703def705401ad8fb"}"}]}

kdambekalns · 2024-10-07T10:24:43Z

Seeing this with v2.11.0 – should this be reopened or shall I create a new issue?

mdavid01 · 2024-10-07T11:27:15Z

Thx for following up. We are just now installing 2.11. I would say leave closed and we'll open new issues as we encounter them with 2.11.

imcom · 2024-11-10T02:52:32Z

I am running with the version Version v2.11.1-6b7ecba1 and currently encountered a failed GC. Then I run a new GC again then immediate the log gives me this

{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"fa1fa3b7095092f69a588140\"}"}]}

Job status is Running and from logs of harbor-jobservice I could see that the job is indeed running and deleting things

2024-11-10T02:51:36Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:424]: [2837ff8f-735c-4beb-8bb1-b148b7445a75][89868/779771] delete blob from storage: sha256:8fa693c326245c4397a5e8409ea08f354c0c2177d29d6970278675200f02455a
2024-11-10T02:51:36Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:453]: [2837ff8f-735c-4beb-8bb1-b148b7445a75][89868/779771] delete blob record from database: 335257, sha256:8fa693c326245c4397a5e8409ea08f354c0c2177d29d6970278675200f02455a

steven-zou assigned wy65701436 and steven-zou May 31, 2021

steven-zou added area/gc area/replication kind/bug labels May 31, 2021

wy65701436 added area/job-services and removed area/replication labels May 31, 2021

guyguy333 mentioned this issue Jun 1, 2021

Errors when running garbage collection #15039

Closed

wy65701436 closed this as completed Oct 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC: Error status and object is not found but job is running ? #14966

GC: Error status and object is not found but job is running ? #14966

guyguy333 commented May 25, 2021

wy65701436 commented May 31, 2021 •

edited

Loading

guyguy333 commented Jun 1, 2021 •

edited

Loading

guyguy333 commented Jun 2, 2021

steven-zou commented Jun 17, 2021

wy65701436 commented Sep 2, 2021

guyguy333 commented Sep 2, 2021

wy65701436 commented Oct 13, 2021

Grounz commented Oct 21, 2021 •

edited

Loading

miminar commented Jan 4, 2023

Antiarchitect commented Jan 20, 2023 •

edited

Loading

Antiarchitect commented Jan 20, 2023

salmon5 commented Sep 27, 2023

mdavid01 commented May 16, 2024 •

edited

Loading

kdambekalns commented Oct 7, 2024

mdavid01 commented Oct 7, 2024

imcom commented Nov 10, 2024

GC: Error status and object is not found but job is running ? #14966

GC: Error status and object is not found but job is running ? #14966

Comments

guyguy333 commented May 25, 2021

wy65701436 commented May 31, 2021 • edited Loading

guyguy333 commented Jun 1, 2021 • edited Loading

guyguy333 commented Jun 2, 2021

steven-zou commented Jun 17, 2021

wy65701436 commented Sep 2, 2021

guyguy333 commented Sep 2, 2021

wy65701436 commented Oct 13, 2021

Grounz commented Oct 21, 2021 • edited Loading

miminar commented Jan 4, 2023

Antiarchitect commented Jan 20, 2023 • edited Loading

Antiarchitect commented Jan 20, 2023

salmon5 commented Sep 27, 2023

mdavid01 commented May 16, 2024 • edited Loading

kdambekalns commented Oct 7, 2024

mdavid01 commented Oct 7, 2024

imcom commented Nov 10, 2024

wy65701436 commented May 31, 2021 •

edited

Loading

guyguy333 commented Jun 1, 2021 •

edited

Loading

Grounz commented Oct 21, 2021 •

edited

Loading

Antiarchitect commented Jan 20, 2023 •

edited

Loading

mdavid01 commented May 16, 2024 •

edited

Loading