-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Framework: PR failures in unrelated packages 2022-07 #10782
Comments
Reported in TRILINOSHD-123. |
Here's the error, which is completely unrelated to any changes in the PRs.
|
@brian-kelley Thanks. I don't understand how that wasn't flagged by PR testing. New machine/configuration, maybe? |
Yeah, must be something like that. |
Maybe it has to do with switching from c++14 to c++17 in the test scripts? |
@rppawlo Wait. They did that? |
It appears this might be possible. I will speak with @srbdev about this. |
We moved one build to C++17 about a year ago, we moved a couple more this week. We are trying to slowly move them over to support the new version of Kokkos anticipated next month. |
After reviewing the failures in #10777 more carefully, we are pursuing reverting the C++17 changes from the last couple days temporarily until the resulting failures can be resolved. |
This is impacting my PR #10784 as well. |
@jhux2, the issue is that changes to the PR build configurations don't actually pass through PR testing. Making changes to the PR build configurations pass through PR testing is something that would be recommended. Knowing how the GenConfig repos are handled I think I know how that could be done if people are interested. |
The switch to C++17 appeared to have passed PR testing but we suspect that the changes didn't actually trigger a full build and passed without actually going through the full test suite. Since they "passed" and were set with the |
You can add this to PR build failures that fail the compiler check due to running out of disk space as shown in the PR test iteration #10784 (comment) and on CDash here showing errors like:
and:
It looks like these failures are all coming from one machine:
Is there no automated process checking disk space on these machines and sending notification emails when they get low? |
Ran into the out-of-disk-space issue with the intel/17 build in PR #10783, that was on the same machine Ross mentioned above: |
I cleared up @bartlettroscoe I'm looking into the tools to help with resource monitoring and alerting, and eventually add some automation to keep the testing nodes healthy. |
@srbdev Thank you! |
@srbdev, here is a dirt-simple tool that does the job: monitor-disk-usage.sh. Just set up a Jenkins job or a cron job on each node to run it once a day and send out email to the Trilinos framework team email list. I have this running on testing.sandia.gov and testing-dev.sandia.gov so that we don't get caught flat-footed when the disk fills up too much. Don't let the perfect be the enemy of the better. |
And now it appears we have another set of failings in 'develop' not related to a particular PR that is impacting PRs #10775 (@jhux2), #10777 (@brian-kelley), #10783 (@masterleinad), #10784 (@bartlettroscoe), and #10785 (@srbdev) as shown in this query showing: which shows the failures:
This can't be related to the PRs because there is zero chance my PR #10784, for example, could trigger a failure like this. Trilinos really needs a set of post-merge CI builds to catch errors like this. And we need to figure out how errors like this are getting into the 'develop' branch and adjust the processes so this happens less often. |
I force merged #10776 as explained in detail in that issue. |
This issue has had no activity for 365 days and is marked for closure. It will be closed after an additional 30 days of inactivity. |
Closing, this was fixed long ago. |
Bug Report
@trilinos/framework @jwillenbring
Description
A number of PRs are failing due to failures in packages that are apparently unrelated to changes in the PR.
See #10776 and #10777 and #10751 for examples.
The text was updated successfully, but these errors were encountered: