-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Navi21][HIP] test_pooling2d and other unit tests are failing #1141
Comments
@junliume I tested both rocm-4.1/4.3, both HIP/OCL backend, but seems can't reproduce this navi crash of |
@carlushuang it is still pretty persistent: (the following pipeline is just now) |
@junliume Hi, I tested on ixt-sjc2-63, with latest MIOpen, and rebuild docker, miopen and do test (all the command is the same as that Jenkins stage), but still everything can pass. Below is the command, here is the log
|
Indeed interesting and tricky one, you can check pipelines here: |
Hi @junliume yes, I checked this pipeline, and it seems even the clinfo of this and ixt-sjc2-63 is the same. very strange |
@atamazov and @carlushuang It tried to see if CTEST_PARALLEL_LEVEL=2 would stabilize the CI too, because running them in serial is really slow. |
test_pooling2d and other unit tests are stable with serial run |
@junliume Do we know the root reason of the issue? |
No, unfortunately. It's not likely MIOpen's own issue since it's very platform specific, meanwhile if we try to submit a ticket for runtime/compiler it's hard to persuade them to accept since these are MIOpen's unit tests ... |
Let's agree that the root reason of this issue is #1148. In this case we can keep this issue closed. |
The following tests are failing consistently on gfx1030, please take a look:
http://micimaster.amd.com/blue/rest/organizations/jenkins/pipelines/MLLibs/pipelines/MIOpen/branches/ci_gfx1030_ocl_to_hip/runs/7/nodes/10/steps/40/log/?start=0
Hence if running HIP check on gfx1030, several more unit tests failed, for example:
Let's talk about priorities of these failure checks separately, but this issue is to track the problems we would be facing soon.
The text was updated successfully, but these errors were encountered: