-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add valid retry solution to mvn-verify [skip ci] #9609
Conversation
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Signed-off-by: YanxuanLiu <[email protected]>
Why are we retrying verify? That includes running tests. If the connection to maven central is not ideal and needs to be retired I am okay with doing that, but if a test is failing randomly and ends up being retired until it passes I don't want that to happen. |
We don't run tests in github actions. Only things like compile,RAT,scalastyle,docgen TImeouts during spark-rapids-jni downloads are the most common reason for the failed checks there. I think the proper solution is to implement a |
Signed-off-by: YanxuanLiu <[email protected]>
cache action is used in MONAILabel project, but we need more attention to maintain the disc space of cache, and it may bring more unexpected issues. https://github.com/Project-MONAI/MONAILabel/actions/runs/6563649076. For this feature, I think we can try if dependencies download issue can be solved by retry first. |
Signed-off-by: YanxuanLiu <[email protected]>
@peixin and I tried to run It failed with log:
These dependencies cannot be found on sonatype, but will be built in following mvn commands. Actually, we still need to download dependencies in following steps and we may still need retry then. We also ran Do you have suggestion on using |
@gerashegalov yes I forgot about that. We don't run any tests so it is all about compiling. @YanxuanLiu I think I found an alternative maven plugin that is designed to fix the problems that we are seeing.
The github page for it is at https://github.com/qaware/go-offline-maven-plugin |
I am not sure we can rely on local paths to share dependency cache between matrix job runners. We can add some logging to actions to see if they share anything even when they land on the same VM. However, I suggest using a front door github feature https://github.com/actions/setup-java#caching-packages-dependencies instead of cooking up a backdoor workaround. Even with the cache action, our main issue is a DDOS fetching of spark-rapids-jni from parallel matrix jobs. So we should add an upstream job doing this step with github dependency cache enabled |
We do not deploy plugin snapshot intermediate artifacts of plugin repo to public due to security requirements |
From my monitoring, its not only JNI, when the maven(sonatype) service is not responsive it would randomly fail any dep downloading
this could be an option to help decrease the frequency of similar issue. This would also not entirely resolve this issue as pre-baked cache from that command would not cover all following module builds. mvn verify/package could still fail downloading non-pre-baked deps from remote. But we should try everything to help mitigate issue @YanxuanLiu (the root cause is still the unstable maven repo service which we cannot help too much) |
I tried the plugin locally, the result seems not that well.
Also, we need to add all dynamic dependencies to plugin config, I'm not sure if it's a good practice of management. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed the PR as partially fix the #9559 to help mitigate the issue first (still failing different CI everyday)
please help retrigger the run multiple times and check the log to see if it actually helped some scenarios resolve the issue (and the cost analysis), thanks
Triggered the action several times, and caught one example of retry. I set sleep time for 3 retries to |
build |
part of #9559