Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication taking longer each time #16656

Closed
piroh1990 opened this issue Apr 6, 2022 · 11 comments
Closed

Replication taking longer each time #16656

piroh1990 opened this issue Apr 6, 2022 · 11 comments

Comments

@piroh1990
Copy link

Hello

Since we Updated from Harbor 2.2.1 to 2.4.1 and now to 2.4.2 there is a Problem with the replication.
Every time the replication is running it takes slightly longer.
Over time this will lead to a extremely long replication time.

Syncing is done with another Harbor instance that runs on Version v2.1.4-e2426603

The Settings of the Replication is as following:
Replication_Rule

As a workaround we are running a cronjob that is deleting all the Charts with the following two lines.
rm -f /data/chart_storage/cm2/*
rm -f /data/redis/dump.rdb

Does anyone know why this is happening and how we can fix it?
If you need additional log files i will of course provide everything you need.

@chlins chlins self-assigned this Apr 6, 2022
@chlins
Copy link
Member

chlins commented Apr 11, 2022

@piroh1990 How many artifacts did you have, and also could you share the execution tab and job logs?

@dioguerra
Copy link

dioguerra commented Apr 11, 2022

I'm running version v2.4.1-c4b06d79 and the same happened to me while replicating helm charts from https://artifacthub.io/ locally.
It also happens that the chart replication fails, but then it starts again looping again and again (maybe this is related on me deleting the core pods)
Had to delete the core pods because this problem caused the service to go unresponsive with many in-flight requests pending
image

We saw an issue similar to the inflight requests accummulating here: #15474 but we since updated and this didn't happen for a long time. now, this issue. Our configuration below:
image
Our filter is set to: {ceph-csi,cert-manager,cluster-autoscaler,coredns,deliveryhero,ingress-nginx,metrics-server,node-feature-discovery,kubernetes,prometheus-community,traefik}/**

2022-04-11T01:10:14Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.ReplicationHandler of topic REPLICATION: project prometheus-community not found
2022-04-11T01:10:14Z [ERROR] [/controller/event/handler/webhook/artifact/replication.go:195]: failed to get project prometheus-community, error: project prometheus-community not found
2022-04-11T01:10:14Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.ReplicationHandler of topic REPLICATION: project prometheus-community not found
2022-04-11T01:10:15Z [ERROR] [/controller/event/handler/webhook/artifact/replication.go:195]: failed to get project prometheus-community, error: project prometheus-community not found
2022-04-11T01:10:15Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.ReplicationHandler of topic REPLICATION: project prometheus-community not found
2022-04-11T01:10:15Z [ERROR] [/controller/event/handler/webhook/artifact/replication.go:195]: failed to get project cert-manager, error: project cert-manager not found
2022-04-11T01:10:15Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.ReplicationHandler of topic REPLICATION: project cert-manager not found
2022-04-11T01:10:16Z [ERROR] [/controller/event/handler/webhook/artifact/replication.go:195]: failed to get project cluster-autoscaler, error: project cluster-autoscaler not found
2022-04-11T01:10:16Z [ERROR] [/pkg/notifier/notifier.go:203]: Error occurred when triggering handler *artifact.ReplicationHandler of topic REPLICATION: project cluster-autoscaler not found

Now it appears that the job is stuck on this replications that keep failling

@chlins
Copy link
Member

chlins commented Apr 12, 2022

@dioguerra For replication job, it will retry when failed and max retry times is 3.

@piroh1990
Copy link
Author

Hello @chlins

Thank you for your response.
Attached you find the jobservice.log and a Screenshot of the execution tab.
All in all there are 391 charts and 16 artifacts that are Replicatet.

Exectution Job
jobservice.log

@chlins
Copy link
Member

chlins commented Apr 14, 2022

@piroh1990 If you want to short the replication time, I suggest try to config the name and tag filter more specific like {project1,project2} because if leave empty it needs to list all artifacts from harbor.

@chlins
Copy link
Member

chlins commented Apr 18, 2022

And also could you try to exclude charts to replicate but only artifacts to compare the performance?

@piroh1990
Copy link
Author

piroh1990 commented Apr 20, 2022

@chlins
There is only on project we can access on the other Harbor.
The problem is not that it's taking too long, the problem is that it's taking each time we replicate.
If we replicate once every hour it will take maybe 5 minutes for the first replication and at the end of the day this will have gone up to 15 minutes. And the next day it will be up to 1 hour.

@chlins
Copy link
Member

chlins commented May 5, 2022

There maybe some performance issues in the chartmuseum component, and we suggest migrate charts from chartmuseum to OCI chart, chartmuseum will be deprecated in the future version. You can refer to https://goharbor.io/docs/2.3.0/working-with-projects/working-with-images/managing-helm-charts/#manage-helm-charts-with-the-oci-compatible-registry-of-harbor.

@piroh1990
Copy link
Author

Unfortunately it's not possible to use OCI Charts, since Rancher does not support it.
There is even a bug regarding this issue: rancher/rancher#29105

Is there any way how we can debug this?
Since the feature is not yet deprecated i would like to find a solution.

@github-actions
Copy link

github-actions bot commented Jul 5, 2022

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

@github-actions
Copy link

github-actions bot commented Aug 5, 2022

This issue was closed because it has been stalled for 30 days with no activity. If this issue is still relevant, please re-open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants