Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: Skip several failing tests on Windows and one on a specific CI build #6044

Merged
merged 4 commits into from
Dec 26, 2017

Conversation

CAM-Gerlach
Copy link
Member

Per @ccordoba12 's request, I went through the tests and disabled the 4 that were currently consistently failing, at least for me on Windows. Also, I skipped a test that randomly failed on a specific Travis build for that build (Py3.5 with PyQT5, Py2 and PyQT4 were already skipped; it didn't fail on any of the other CIs or builds, had nothing to do with the code I added and succeeded when I merely changed some whitespace to my next otherwise identical commit so I was almost sure it was spurious; it already had flaky(3) but failed all three times). I ran them about 20 times in total over the course of my testing.

A few key points to look at for in reviewing:

  • Did I choose the right left of specificity for my skipifs (skipping the test for Windows, but not drilling down into my specific Python or PyQT version, nor generalizing to all builds)? This was more or less consistent with the other skipifs, but wasn't sure exactly what @ccordoba12 wanted.
  • Should we skip tests for the CIs that only fail all three flaky runs once out of many builds, as I did?
  • Is there a better/cleaner way to get the Python version that I should have used?

Thanks!

@CAM-Gerlach CAM-Gerlach added this to the v3.2.6 milestone Dec 24, 2017
@jnsebgosselin
Copy link
Member

This is just a comment, I'm not arguing against this PR, but this seems like we have been disabling a lot of tests on Windows lately... I'm not sure that is a good thing... Currently, even before this PR, the Windows build is about 2% behind in test coverage compared to the Linux build.

@CAM-Gerlach Have you seen some sort of correlation between the tests you are disabling and the issues we are receiving from our users on Windows? I'm under the impression we are receiving a lot of bug reports about the introspection one.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Dec 24, 2017

@jnsebgosselin Thanks, really appreciate the feedback. I went through the tests in more detail to determine why they were failing and if they might have any relation to the issues folks were reporting. The results:

  • The calltip test doesn't seem to be a major issue, and if I understand it correctly I haven't seen any issues directly regarding it. While I tried a number of tweaks to try to get to to work, I was unsuccessful, and so I left it skipped.
  • The notebook test also didn't appear to be a core feature nor the subject of any recent bug reports, and there didn't appear to be a quick/obvious way to get it working, so I left it skipped as well.
  • The run in dedicated console test seemed like it could be tangentially related to a few bug reports, and in any case I tracked down why it wasn't working on my machine—like a few others, it has the default keyboard shortcuts (in this case F5 for run file) hardcoded, and I changed a number of them as I find the defaults rather inefficient and don't always make the most sense. When testing using the defaults, the test passed just fine unmodified, so I unskipped it.
  • The introspection test, while apparently able to find issues with introspection, seemed primarily aimed at a rather specific use case, ensuring it still worked after the PATH was changed. As the overwhelming majority of recent bug reports either had to do with incompatible dependencies, specific packages (both of which were seen on multiple platforms) and (Win-specific) autocomplete and help from the editor failing after the system was locked, slept, left idle, etc., the test would not really be able to detect any of this. However, I was able to get it working by greatly lengthening a few of the timeouts and adding flaky; interestingly, it needed at least twice as much time to not fail when run with all the tests using runtests.py vs running the individual file with pytest. This would probably also allow it to succeed on the CI servers; however, as it adds over 30 s to the build time when it succeeds, or around 90 s when it fails all three runs (could just re-disable flaky to avoid that worse case), not sure if you all think its worth it. However, I did unskip it for windows, since it now does run.
  • Finally, the static analysis test also seemed somewhat relevant to the issues reported with that component lately. However, most of them were due to rather specific user actions (cancelling the test in progress, running on specific files, dependency issues, etc) that again the test wouldn't catch. In any case, there were three issues preventing it from running as-is: the shortcut issue from before (I had run static analysis mapped to something different than the default), the function not finding the expected number of "Convention" problems since I'd disabled some of the unnecessary/false positive ones (in general, pylint is very aggressive calling things out) in my .pylintrc config file, which happened to be several of the ones it was looking for, and finally the pause between calling the static analysis and getting the results being too short, the latter of which I was able to directly fix, and it passing in the default configuration, I re-enabled it. Also, to note, if a successful static analysis has been run at any point before on the test script, the test will erroneously pass even when it should fail due to the analysis not completing, as it just loads those previous results from the pylint.results file rather than ensuring they are actually new.

However, there remains the issue that the user configuration, specifically keyboard shortcuts and .pylintrc, affects the tests due to hardcoding and assumptions; this of course has no effect on the always regenerated CI tests but makes it impossible to run the tests successfully locally if any of these settings are changed without manually backing up, resetting, and then restoring them every time the tests are run, or setting those tests to skip every time; due to the -x flag in the call to pytest, not only do those tests erroneously fail but the first one failing stops the entire test sequence, preventing one from seeing any of the remaining results. Obviously, this is far from ideal, but I'm not sure about the best way to replace hardcoding the shortcuts, nor what the best way to handle checking the pylint results without knowing what checks are enabled, nor how to clear a specific file's pylint results (or ensuring they have been refreshed). Thoughts?

@jnsebgosselin
Copy link
Member

jnsebgosselin commented Dec 24, 2017

Thanks you very much @CAM-Gerlach for further investigating this issue! Just to be sure I understood correctly, you are skipping some of the tests because they are failing on your machine, even if they are passing on our CI servers? When I look at our currently opened PR, everything seems to pass just fine on our CI.

We can always investigate and re-enable the tests later, so it's not a big deal. I'll try to investigate this on my end also, so I just want to understand well what is going on.

IMO, the solution for failing tests should not be to skip it ideally :D. But well... we are unfunded and short-handed, so I understand this is probably the best temporary fix for now...

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Dec 24, 2017

Just to be sure I understood correctly, you are skipping some of the tests because they are failing on your machine, even if they are passing on our CI servers? When I look at our currently opened PR, everything seems to pass just fine on our CI.

Not all of them :)

  • calltip: Currently skipped on CIs due to timeout; Also times out on my machine despite several tweaks and low hanging fruit fixes so this PR now skips it on Windows.
  • open_notebook_in_project_explorer: Currently skipped on CIs due to timeout; Also times out/fails on my machine with no obvious fixes so this PR now skips it on Windows as well.
  • dedicated_consoles: Currently passing on CIs, skipped on Windows and Python 2. First version of PR skipped it on all Windows due to it failing on my machine, but I isolated the issue to it relying on hardcoded keyboard shortcuts and thus now it passes when I temporarily reset them to default, so it is unchanged for now in this PR.
  • editor_introspection Currently skipped on CIs due to failing frequently, my original PR skipped it on Windows as well for the same reason but I fixed it with strategically placed waits/timeouts, so it is now passing locally and probably would on the CIs. However, it would add around 30 extra seconds to the each remote test run so I left the CIs still unchanged/skipped for now.
  • run_static_code_analysis Currently skipped on non PYQT5 builds on both CIs and locally due to timeout. Skipped also on Windows in the original PR, but fixed one issue (slightly longer timeout) and isolated two others (relies on default keybindings and on user-set .pylintrc settings so that it now passes. However, still have a false negative locally only (passes when it should fail) if pylint has run successfully on the file previously.
  • move_to_first_breakpoint Currently skipped for all non Py3 + PyQT5 builds and usually passing for Py3 + PyQT5; however I've noticed at least one random failure on Py3.5 + PyQT5 on the CIs which affected a PR of mine, so this PR currently has it skipped for that build also. However, I've rethought this, since it may possibly be connected to an actual intermittent behavior I've observed so I actually think I should roll this back.

IMO, the solution for failing tests should not be to skip it ideally :D. But well... we are unfunded and short-handed, so I understand this is probably the best temporary fix for now...

Indeed! So at the moment, I'm only now skipping two additional ones on windows, that were already skipping on CI or certain windows builds, and fixed the rest that were failing so they'll at least pass on pure stock Spyder.

To summarize, the remaining proximate issues that should probably be fixed, not necessarily in this PR, in rough (IMHO) priority order:

  1. Keybindings are used to activate certain routines (pylint, run file, etc) which are hardcoded to the defaults (affects static_analysis, dedicated_console, and several skipped checks). This prevents me or anyone with custom shortcuts from running the tests at all at least past the first few (due to the -x option in runtests.py), at least without one of a few tedious and time consuming workarounds every time. Possible solutions would be to either call the routines directly, read the actual keybindings from the config file, or have the tests use/temporarily write and restore a clean config file). Longer term, maybe project-based config files are the answer?
    2. Static analysis test relies on no changes being made to the user's .pylintrc that disable the Conventions messages the test checks for, which are probably among the most commonly disabled. This can probably be fixed by manually ensuring these checks are enabled for the file with # pylint: enable=C0XXX decorator comments at the beginning of the test file, which I can try to test/implement shortly. EDIT: Done!
  2. Go to first breakpoint check is rarely failing on the CI, but may be indicative of a problem I'm experiencing locally, that Spyder sometimes goes to the first breakpoint, and sometimes it just stays at the first line. If that can be reproduced and fixed, then that would be ideal. In the meantime I'm thinking just re-enable the test if it is only failing rarely?
  3. Static analysis test compares against the previous results for the file (if they exist) instead of the current one if the latter doesn't run/in time, leading to the possibility of a false negative (passes when it should fail; the inverse should be ruled out by my timeout increase) locally. Not a huge issue, but could maybe be fixed by either ensuring the test run being compared against was actually fresh, or clearing any previous run somehow first?

I will hopefully push a new version with a fix for 2. assuming my suggestion above works, and rolling back the targeted skip of the breakpoint check shortly.

Thanks!

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Dec 25, 2017

Well, that's what I get for not running the full test suite; silly me. Would it be best to just create a new script file (e.g. script_pylint.py), for just that test with the relevant enable statements, as inelegant as that is, rather than modifying every other test that depends on the current script.py (a lot)? The file appears to be saved automatically when pylint is run, so I can't just have it inject the necessary statements into just the file when it is open, or they'll get saved and baked in there for any following tests.

@pep8speaks
Copy link

pep8speaks commented Dec 25, 2017

Hello @CAM-Gerlach! Thanks for updating the PR.

Line 1:1: E265 block comment should start with '# '
Line 10:1: E265 block comment should start with '# '
Line 13:1: E265 block comment should start with '# '
Line 14:1: E402 module level import not at top of file

CAM-Gerlach EDIT: Should be able to ignore these ^ as they are intentional.

@jnsebgosselin
Copy link
Member

Thank you very much @CAM-Gerlach. I understand better what you want to do now.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Dec 26, 2017

@ccordoba12 Any thoughts on how 1. above (several tests have hardcoded default keybindings and fail if they have been changed) should be addressed? That's the final thing remaining I'd like to see fixed or at least worked around in this PR (the others are minor/secondary or not directly related), as otherwise I or anyone else using custom keyboard shortcuts still can't run the tests locally, as the first non-skipped test (dedicated_consoles) fails without that and prevents any of the other tests from even running. I can naively think of a few possible ways to do it as mentioned above, but I'm not sure which you would prefer/think most feasible, or if there's a better solution I should implement instead.

@ccordoba12
Copy link
Member

Any thoughts on how 1. above (several tests have hardcoded default keybindings and fail if they have been changed) should be addressed?

Yes, we should run our tests with a clean configuration. I tried to do that some time ago, but I couldn't make it work reliably. Perhaps you could give it a try.

@ccordoba12
Copy link
Member

That's the final thing remaining I'd like to see fixed or at least worked around in this PR

But this is not for this PR, please leave it for another one.

@CAM-Gerlach
Copy link
Member Author

CAM-Gerlach commented Dec 26, 2017

@ccordoba12 Okay; thanks much for your insight. Anything else you need me to do for this PR? Should I create a new issue for the clean config problem and we can discuss it there?

@ccordoba12
Copy link
Member

ccordoba12 commented Dec 26, 2017

Anything else you need me to do for this PR?

Nop, I'll merge it right away :-)

Should I create a new issue for the clean config problem and we can discuss it there?

Yes, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants