Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC: Error status and object is not found but job is running ? #14966

Closed
guyguy333 opened this issue May 25, 2021 · 16 comments
Closed

GC: Error status and object is not found but job is running ? #14966

guyguy333 opened this issue May 25, 2021 · 16 comments

Comments

@guyguy333
Copy link

Expected behavior and actual behavior:

As long as GC job is running, I shouldn't have error status. However, I'm pretty sure GC job is running as registry is spammed with DELETE. Moreover, I'm unable to fetch the job log so I believe job is not done.

Screenshot 2021-05-25 at 11 31 52

Here is the output clicking on Log:

{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"log entity: 3ae399a817334a6a372bccd0\"}"}]}

Steps to reproduce the problem:

Run a long GC job (We've 1.5Tb and ~175k objects on S3)

Versions:
Please specify the versions of following systems.

  • harbor version: 2.22
  • docker engine version: Using containerd
  • docker-compose version: Using K8S

Additional context:

  • Harbor config files: You can get them by packaging harbor.yml and files in the same directory, including subdirectory.
  • Log files: You can get them by package the /var/log/harbor/ .
@wy65701436
Copy link
Contributor

wy65701436 commented May 31, 2021

can you share the job service log as well? It's helpful for narrow down the problem.

And, can you confirm whether the other jobs(Replication/Scan or Retention) are running at the time of GC execution? We found that the job log may fail to create at the some particular case, like job service high load scenario.

@guyguy333
Copy link
Author

guyguy333 commented Jun 1, 2021

Sure. However, after a fail, if I don't restart Harbor (job service pod), I'm having this response requesting a log of any garbage collection (even on which failed multiple days ago and a restart already happened):

{"errors":[{"code":"UNKNOWN","message":"internal server error"}]}

Storage is on S3. I have many job service logs with context deadline exceeded I can share if you want.

Here is an example of a job service.

jobserver.log

Currently, I was never able to run a complete GC. I start GC, it fails few hours later, I restart job service and restart GC, it fails ...

@guyguy333
Copy link
Author

I got an other error retrieving log after job failed: {"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"log entity: 2c456ecc4b695fa3505e6fd0\"}"}]}

Maybe finally related to #12948 ?

@steven-zou
Copy link
Contributor

2021-05-31T07:25:18Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:262]: failed to delete manifest with v2 API, xxxxx/xxxxx/xxxxx, sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05, Delete "http://harbor-harbor-registry:5000/v2/xxxxx/xxxxx/xxxxx/manifests/sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2021-05-31T07:25:18Z [ERROR] [/jobservice/job/impl/gc/garbage_collection.go:165]: failed to execute GC job at sweep phase, error: failed to delete manifest with v2 API: xxxxx/xxxxx/xxxxx, sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05: Delete "http://harbor-harbor-registry:5000/v2/xxxxx/xxxxx/xxxxx/manifests/sha256:1a793c56994d3fd017dbf95ec73f8c3615477da8e3d44dc8c0957580eb6e3e05": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

@wy65701436
Copy link
Contributor

You can upgrade to latest Harbor, it introduces retry on GC failure within one minute.

@guyguy333
Copy link
Author

Thanks @wy65701436 I will try that :)

@wy65701436
Copy link
Contributor

@guyguy333 please reopen it if you still encounter the problem.

@Grounz
Copy link

Grounz commented Oct 21, 2021

Hi,

I have upgrade Harbor to 2.3.3 and i have the same error:

json { "errors": [ { "code": "NOT_FOUND", "message": "{\"code\":10010,\"message\":\"object is not found\",\"details\":\"f7d54004a275590aeccd94ee\"}" } ] }

Jobserver.log in time range:

Oct 17 02:00:01 172.18.0.1 jobservice[2857]: 2021-10-17T00:00:01Z [ERROR] [/jobservice/runner/redis.go:152]: Run job GARBAGE_COLLECTION:f7d54004a275590aeccd94ee error: runtime error: interface conversion: interface {} is nil, not string; stack: goroutine 61 [running]: Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run.func2(0xc0013217d0, 0xc00078ba40) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:150 +0xa5 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: panic(0x28b80c0, 0xc000cfbdd0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/usr/local/go/src/runtime/panic.go:975 +0x47a Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).parseParams(0xc000d0ef30, 0xc000c592c0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:116 +0x365 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).init(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0x7, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:109 +0x313 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).Run(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0xc000e29e50, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:151 +0x4d Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run(0xc000413a70, 0xc00078ba40, 0x0, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:212 +0xc55 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github. Oct 17 02:00:01 172.18.0.1 jobservice[2857]: 2021-10-17T00:00:01Z [ERROR] [/jobservice/runner/redis.go:113]: Job 'GARBAGE_COLLECTION:f7d54004a275590aeccd94ee' exit with error: runtime error: interface conversion: interface {} is nil, not string; stack: goroutine 61 [running]: Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run.func2(0xc0013217d0, 0xc00078ba40) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:150 +0xa5 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: panic(0x28b80c0, 0xc000cfbdd0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/usr/local/go/src/runtime/panic.go:975 +0x47a Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).parseParams(0xc000d0ef30, 0xc000c592c0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:116 +0x365 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).init(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0x7, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:109 +0x313 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/job/impl/gc.(*GarbageCollector).Run(0xc000d0ef30, 0x2eb5860, 0xc000e29e50, 0xc000c592c0, 0xc000e29e50, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/job/impl/gc/garbage_collection.go:151 +0x4d Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.com/goharbor/harbor/src/jobservice/runner.(*RedisJob).Run(0xc000413a70, 0xc00078ba40, 0x0, 0x0) Oct 17 02:00:01 172.18.0.1 jobservice[2857]: #011/harbor/src/jobservice/runner/redis.go:212 +0xc55 Oct 17 02:00:01 172.18.0.1 jobservice[2857]: github.

@miminar
Copy link

miminar commented Jan 4, 2023

We are having the same issue in 2.6.2.
Using S3 storage, we are also hitting #12948. The jobs keeps running for a very long time, it is reported as Running, but opening the Log results in:

{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"228f7a1eb1175ebabcaafd76\"}"}]}

image

@wy65701436 or @steven-zou, can you please re-open? Or shall I open a new one? What other information do you want me to provide?

@Antiarchitect
Copy link

Antiarchitect commented Jan 20, 2023

Same symptoms. I use Ceph Radosgw as an S3 storage. I have 2.66 TB overall registry space used (and growing). Not a single job (weekly because of #14774) didn't succeed since September 2022.

P.S. jobservice pod CPU consumption is close to zero and pod stdout does not contain anything worthy of mentioning. Could it be it's doing nothing?

Just observed jobService dashboard: all jobs are in Pending state. There is one worker with concurrency 100 and no one got any jobs.

@Antiarchitect
Copy link

Removing all Redis instances (KeyDB in my case) with all the PVCs seems did the trick and now one worker is doing GC after a manual restart.

@salmon5
Copy link

salmon5 commented Sep 27, 2023

My harbor version is 2.2.0,have same issue.
I update harbor to 2.4.0,the issue is fixed.
https://goharbor.io/docs/2.4.0/administration/upgrade/

@mdavid01
Copy link

mdavid01 commented May 16, 2024

Hi Team - What is the log entity? How can it help me locate the root cause?
{"errors":[{"code":"NOT_FOUND","message":"{"code":10010,"message":"object is not found","details":"log entity: ef8aa08703def705401ad8fb"}"}]}

@kdambekalns
Copy link

Seeing this with v2.11.0 – should this be reopened or shall I create a new issue?

@mdavid01
Copy link

mdavid01 commented Oct 7, 2024

Thx for following up. We are just now installing 2.11. I would say leave closed and we'll open new issues as we encounter them with 2.11.

@imcom
Copy link

imcom commented Nov 10, 2024

I am running with the version Version v2.11.1-6b7ecba1 and currently encountered a failed GC. Then I run a new GC again then immediate the log gives me this

{"errors":[{"code":"NOT_FOUND","message":"{\"code\":10010,\"message\":\"object is not found\",\"details\":\"fa1fa3b7095092f69a588140\"}"}]}

Job status is Running and from logs of harbor-jobservice I could see that the job is indeed running and deleting things

2024-11-10T02:51:36Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:424]: [2837ff8f-735c-4beb-8bb1-b148b7445a75][89868/779771] delete blob from storage: sha256:8fa693c326245c4397a5e8409ea08f354c0c2177d29d6970278675200f02455a
2024-11-10T02:51:36Z [INFO] [/jobservice/job/impl/gc/garbage_collection.go:453]: [2837ff8f-735c-4beb-8bb1-b148b7445a75][89868/779771] delete blob record from database: 335257, sha256:8fa693c326245c4397a5e8409ea08f354c0c2177d29d6970278675200f02455a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants