-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MueLu: MueLu hangs when try to "export data" such as matrices after repartitioning has occurred #3991
Comments
@pwxy That would be nice, but that's not the way github works. |
@pwxy I started looking into fixing this. Using a try-catch block, A gets dumped correctly. I'm trying to track down the missing coordinates now.. |
CC: @srajama1 (Trilinos Linear Solvers Product Area Lead) @pwxy, can you please provide reprodocability info? Is this using that ATDM Trilinos build configuration described at: If not, then this is not something we can support as an ATDM issue. Please see the standard ATDM Trilinos GitHub issue template at: |
@bartlettroscoe This is not ATDM issue from the builds to dashboard, still it is an ATDM issue. We can support this. If we cannot support runs of @pwxy then we are in real trouble :) |
@srajama1, I simply mean that if we can't reproduce this problem with the ATDM Trilinos build configuration, then we can't support this using the triagging and resolution process described in Triagging and Addressing ATDM Trilinos Failures becuase we can't provide reproducability instructions. It is out of our hands. But if you can support it, then that is fine. NOTE: According to the policy: please make sure that someone adds a test that exposes this defect (that can be run in the ATDM Trilino builds) first and then fixes the code. Please don't just "fix the code" and move on. |
@bartlettroscoe, yes I used the ATDM Trilinos build configuration described at: https://github.com/trilinos/Trilinos/blob/develop/cmake/std/atdm/README.md However, I used the SEMS rhel6 environment to build Trilinos on the CEE LAN after my question yesterday morning regarding the current CEE rhel6 environment being hardwired to load SPARC modules. |
@pwxy said:
Excellent! @jhux2, why are we not seeing an automated test failing? Do we need to add a test that runs:
? |
Because this is a newly discovered bug. |
Addresses Issue trilinos#3991
Addresses Issue trilinos#3991
Addresses Issue trilinos#3991
Addresses Issue trilinos#3991
@bartlettroscoe I'm waiting on some feedback from an application. It did appear to fix things for me, but I want to wait until I declare success. The defect was not exposed by any tests. |
@cgcgcg, would it be possible to add a test to MueLu that showed this bug before your change and then verified that you fixed the bug after your change? |
@bartlettroscoe I can do that. The only thing I'm afraid of is that I will be creating another test that breaks the ATDM builds. It seems that tests that do IO are randomly failing quite frequently.. |
@cgcgcg, tests that do I/O don't need to be flaky. You just need to not have a race between writing and reading the same file in the same executable. If you read that file in a separate process it seems to help. I have found that to fix problems like this. |
@pwxy Can you please check if this is fixed ? |
Better question, was an automated test ever written to demonstrate this defect and show that it was fixed? |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
This issue was closed due to inactivity for 395 days. |
MueLu hangs when try to "export data" such as matrices after repartitioning has occurred.
The MPI processes that have dropped out after repartitioning will throw and the run hangs:
The text was updated successfully, but these errors were encountered: