-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Implement end-to-end testing with Playwright #420
Conversation
This is an in-progress cleaned version of the final PR (numberscope#361) of the Delft student user interface project, which it supersedes. Comments from the original PR that remain relevant: * The end-to-end tests run using Firefox and Chromium. * New tests are in the e2e folder. * The tests often depend on specific classes and IDs, so they may need to be updated upon changes to Numberscope. * The tests can be executed as the following npm script: `npm run test:e2e` * An interactive testing UI and debugger can be executed as the following npm script: `npm run test:e2e:ui` Caveats concerning trying this cleaned PR and its status: * Make certain to run `npm install` after pulling this PR. * Many of the tests do not yet pass, perhaps because of the "specific classes and IDs" point mentioned above and the fact that ui2 has diverged significantly from the Delft PR series. * Tests are not yet run automatically prior to commit. * I do not think there are any image tests yet, we need to try to add them. * Tests are not yet performed in the continuous integration checks to be run on GitHub; they should be.
Wow, the testing is actually doing what it is supposed to! It uncovered #419 as part of its testing. That is to say, the first test I looked at was simply stale with respect to our diverged code. But the second test was operating exactly as designed, and it detected that the caption on the default speciment (modfill with random) is not correct if it is saved and then the gallery is viewed. So I will have to fix #419 in this PR to get it to a point in which it can be merged. This experience makes me very positive on this PR, and makes me feel that after this PR, every PR should be required to update testing: if it is a bugfix PR, it should include a test that would fail with the bug, and passes with the PR. And if it is a feature PR, it should include either unit or end-to-end tests of the feature, preferably both. Anyhow, I will continue by investigating #419 as soon as I have a chance. |
OK, I have resolved #419 in this branch, and now all of the end-to-end tests that the Delft team implemented are passing! As you can see from the above lists, there is still a lot to do before this PR is ready to merge, but @katestange @Vectornaut I would be happy for some initial review: code review, does pulling and |
Currently,
and
and the scope appears to work. |
Wow, looks like you actually ran the tests! Great. For all of those warnings at the beginning, note that version 16 of |
The strong desire for image tests led to a cascade of changes in this commit, mostly driven by the need to have reproducible images: - Removes all use of `sketch.noLoop()` and `sketch.loop()` in favor of the previously existing `stop()` and `continue()` visualizer methods, to allow: - Adds a `frames=NNN` query parameter to URLs to set the maximum number of frames a visualization may draw - Switches from the "static" instance of mathjs to a "dynamic" one, to allow its random number configuration to be controlled. In conjunction with this, moves all math functions into a single math module, as extensions of mathjs. - Removes all use of `Math.random()` in favor of the mathjs random generator - Adds a `randomSeed=AAAA` query parameter to URLs to make the mathjs random generator reproducible. - Documents all of the above changes.
OK, I have pushed the code that facilitates image tests and institutes them for all featured gallery specimens. Presuming it is reproducible on other machines, this approach gives us the benefit that any time another featured specimen is added, another test is automatically added. (When in the future we remove featured specimens, we may want to move them to a dedicated test specimen list, if they are still of interest for testing purposes.) @katestange @Vectornaut it would be awesome to see if the newly-added tests pass on either of your machines. Note they will definitely fail on MacOS or Windows, as Playwright image tests are enforced to be OS-specific. When we need to enable development on other platforms, I'll have to create a Windows PR (I have a dual-boot machine kicking around somewhere), and someone with access to a Mac will have to do likewise. Fingers crossed and looking forward to hearing back your results on these new tests. |
Also adds new tests for `src/shared/defineFeatured.ts` and corrects the documentation extraction facility for the package manager scripts. Resolves numberscope#25. Resolves numberscope#73. Resolves numberscope#246.
In other words, it should never call calculate twice for the same index. This is tested by 10K random accesses to indices less than 1M, followed by accessing the first 10K entries, followed by accessing the last 10K entries. Hopefully that should suffice. Resolves numberscope#54.
This is an initial pass at addressing numberscope#113. Note, however, that ModFill is not reporting to the person doing visualization that it is running with different parameter values than shown. So that still must be done, but for that part we will need a resolution to numberscope#112, which will be a sufficiently involve change that we should leave it to a spearate PR from this numberscope#420.
Resolves numberscope#174.
Just to make sure I understand correctly, this PR is going to mean that every time I try to commit some new code, even just along the way in something I'm working on but isn't polished yet, we're running alllll these tests, right, and I can't commit if anything fails? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more thoughts.
Indeed, this PR adopts a model that a commit is a checkpoint at which the code works. That's not an uncommon code hygiene practice. (But it is by no means universal.) Your working version can be as broken as you like, but it should work at each commit. We could switch to only enforcing tests on PRs but in my experience that leads to people making PRs that don't actually pass the tests, since they end up not always bothering to run all the tests before pushing. Since the modal PR is just one or two commits, tests on every commit seems reasonable to me. I suppose we could look into tests as pre-push rather than pre-commit actions, if you feel testing on every commit is too onerous. |
Nope, I'm happy, I just wanted to make sure I understand. |
OK, I think I've addressed your latest batch of comments (thanks for them!). Looking forward to your further review when you get a chance. |
In histogram, because you can pan and zoom, the mouseover isn't working right anymore (it doesn't adjust properly to the panned and zoomed setup). |
Question: how do the tests decide how many workers to use? |
Ok, at the moment I can't think of too much more to test or ask. I've tried out all the things I can think of trying. Tests are passing. So just the mouseover in histogram. |
I literally have no idea. Is this something you'd like me to investigate? |
Yeah, I knew that. The original code has no contemplation that the display could have changed at all. I was sort of hoping that could be left for the future, but that was fatigue on my part. (Frankly I am not quite sure how the mouseover code can tell which box the mouse is over if there's been a combination of panning and zooming, but I am sure it is possible somehow...) I will try to fix tonight. If not, it will likely be Friday at the soonest because I am heading to Seattle tomorrow for a SUMM meeting. |
No, was just curious. |
Well it could be added to the Histogram overhaul issue if you'd prefer, that would be fine with me. You could also turn off mouse controls for now, and add turning them with mouseover working again to Histogram overhaul. I don't have anything else for this PR to ask for at the moment, so maybe we should just go for merging? It is an unusually large PR. :) |
No, it's OK. Should be done now. There were other related aspects to fix, such as not having the text get unreadably tiny when zoomed out, and keeping dragging proportional when zoomed in or out. Take a look and let me know what you think; it seems from your comments that this is the last thing on your list for merging. Sorry I was kvetching, it was just me being lazy and not wanting to figure out the coordinate transform from mouse coordinates to plot coordinates depending on the pan and zoom. But fortunately the situation is simple enough (no rotations, for example) and I've done this sort of thing enough that I could just do it by trial and error: get it working for arbitrary pan, no zoom; then get it working for no pan, arbitrary zoom; and then finally combining those to get the general case... |
I just put it in a fixed position on screen, regardless of zoom or pan. How's that? |
As far as I can tell, that only occurs on the last frame or maybe the last two frames before zooming all the way through the graph so you can't see anything (that is literally what happens). In other words, the formulae are working just right except in case of an extremely close-up view. Mathematically, there should be no difference for those last two frames, as far as I can see. I am supposing that due to some kind of quantization in possible text sizes, or some kind of integer rounding phenomenon, at this extreme close-up range, the text height is actually being reported inaccurately by p5. So I couldn't figure out how to get that last frame or two to look perfect. So I let well enough be, as an edge case that's not of too much practical import. It's certainly readable all the way up to the last frame but one, and looking at the graph at that very last zoom level is unlikely to be useful, and anyway you would already know what that caption says by the time you get there. You are welcome to: Sorry if that was a cranky response ;-) |
Haha, you're entitled to a cranky response! I think we should merge this and celebrate! |
* test: Implement end-to-end testing with Playwright This is an in-progress cleaned version of the final PR (#361) of the Delft student user interface project, which it supersedes. Comments from the original PR that remain relevant: * The end-to-end tests run using Firefox and Chromium. * New tests are in the e2e folder. * The tests often depend on specific classes and IDs, so they may need to be updated upon changes to Numberscope. * The tests can be executed as the following npm script: `npm run test:e2e` * An interactive testing UI and debugger can be executed as the following npm script: `npm run test:e2e:ui` Caveats concerning trying this cleaned PR and its status: * Make certain to run `npm install` after pulling this PR. * Many of the tests do not yet pass, perhaps because of the "specific classes and IDs" point mentioned above and the fact that ui2 has diverged significantly from the Delft PR series. * Tests are not yet run automatically prior to commit. * I do not think there are any image tests yet, we need to try to add them. * Tests are not yet performed in the continuous integration checks to be run on GitHub; they should be. * fix: Repair gallery/click on featured item test * fix: Repair gallery/saving a specimen test * fix: Repair scope tests. * test: Run end-to-end tests prior to any commit * test: Add image tests to e2e The strong desire for image tests led to a cascade of changes in this commit, mostly driven by the need to have reproducible images: - Removes all use of `sketch.noLoop()` and `sketch.loop()` in favor of the previously existing `stop()` and `continue()` visualizer methods, to allow: - Adds a `frames=NNN` query parameter to URLs to set the maximum number of frames a visualization may draw - Switches from the "static" instance of mathjs to a "dynamic" one, to allow its random number configuration to be controlled. In conjunction with this, moves all math functions into a single math module, as extensions of mathjs. - Removes all use of `Math.random()` in favor of the mathjs random generator - Adds a `randomSeed=AAAA` query parameter to URLs to make the mathjs random generator reproducible. - Documents all of the above changes. * doc: describe running and creating code tests Also adds new tests for `src/shared/defineFeatured.ts` and corrects the documentation extraction facility for the package manager scripts. Resolves #25. Resolves #73. Resolves #246. * test: Test that the caching mechanism won't double-calculate In other words, it should never call calculate twice for the same index. This is tested by 10K random accesses to indices less than 1M, followed by accessing the first 10K entries, followed by accessing the last 10K entries. Hopefully that should suffice. Resolves #54. * fix: Prevent ModFill from hanging on extremely large input This is an initial pass at addressing #113. Note, however, that ModFill is not reporting to the person doing visualization that it is running with different parameter values than shown. So that still must be done, but for that part we will need a resolution to #112, which will be a sufficiently involve change that we should leave it to a spearate PR from this #420. * doc: Update PR checklist. Resolves #174. * maintenance: Remove and disallow trailing whitespace in code Resolves #219. * maintenance: Make TypeScript target ES2022 in all cases. Resolves #226. * maintenance: Run typecheck in CI Resolves #292. * maintenance: Run GitHub CI on pushes to main as well Resolves #217 * test: Add OEIS Sequence-Visualizer test grid Use a list of "stressful sequences" to perform at least two image tests on each visualizer. As this change uncovered several existing errors, there are numerous other changes in this commit to return to a state of all tests passing. Here is an enumeration of other changes, in no particular order: * Change husky pre-commit actions so that the end-to-end tests are not run if there is an existing successful test since any files in the project last changed. * On most image tests, snapshot only the visualizer canvas. * Make 'axios' a runtime dependency, rather than just development, as it is used when obtaining OEIS sequence values in the running frontend. * So many other changes are made to Histogram.ts that this PR also adds hatching to show the elements with unknown factorization. Adds the "p5.brush" package to draw the hatching. Since this necessitates WebGL, adds a new 'P5GLVisualizer' base class for visualizers that wish to use a WebGL canvas. * Updates several dependencies to latest to make sure their out-of-dateness was not contributing to test issues. * Switches to the (ES) "module" import system from "commonjs" to ensure that was not contributing to test issues. Resolves #294. * Specifies font locations in a way that vite understands how to relocate in both the 'dev' and 'build' versions of the app. * Properly marks both factor cache and value cache as empty at the initialization of an OEIS sequence. * Allows ±Infinity as valid values of INTEGER param fields for convenience. * Allows specimenQuery to take the output of parseSpecimenQuery to recreate the same specimen query as was parsed, for the sake of testing. * Turns off the browser default context menu on visualizers, which wasn't particularly useful and was interfering with the UI for some visualizers. * Fixes typos in Differences visualizer (this.first -> sequence.first) * Makes FactorHistogram visualizer a p5 WebGL-based visualizer, and adds hatching to the "0 factors" bar to indicate the terms with unknown factorizations. * Uses the WebGL default controls to implement panning, zooming, and (rather uselessly but somewhat spectacularly) rotating the plot in three dimensions for the FactorHistogram visualizer. * On FactorHistogram's first pass over the sequence data, collects the counts for each possible number of factors, as opposed to the number of factors for each entry, to avoid a possibly disastrously long loop on the second pass that accumulates bin counts. This also streamlines the computation of the largest number of factors in the data. * Factors out repeated code in FactorHistogram, for labeling bars and displaying text. * Fits the hover box in FactorHistogram to the text to be displayed. * FactorHistogram displays a temporary message if factoring is taking a long time. * Moves the bar labels of FactorHistogram just under their respective bars, to make sure they are visible. * Selects y-axis ticks at round numbers for FactorHistogram, and moves the tick labels closer to their ticks. * Prevents FactorHistogram from looping except when there is mouse activity. * Prevents ModFill from freezing up if too large a "mod dimension" is chosen. Note TODO: display a warning when a smaller mod dimension than requested is actually used. * Prevents Vue's reactivity system from attempting to modify the behavior of p5 sketches, using the Vue `markRaw` method. This change prevents some instances of infinite loops caused by cascading change notifications. * Tightens up the typing of P5Visualizer so that it is possible to derive another visualizer base class P5GLVisualizer from it. Also splits up the inhabit method so that P5GLVisualizer can modify it as needed. * Allow negative start indices for Show Factors visualizer. * Show at most 100 terms in Show Factors (no room on screen for more than that). * Updates TypeScript target versions of JavaScript Resolves #226. * refactor: better typing for param descriptions * feat: Add ExtendedBigint ParamType * refactor: Remove generic parameters from top levels of Paramable hierarchy * feat: Uniform sequence bounds controls All params for controlling which terms of a sequence will be used in the visualization are removed from individual visualizers. Instead, there are uniform params in the Sequence classes themselves. Resolves #411. In this implementation, several other changes are necessary and/or expedient to make. In particular, the type of sequence indices is changed to bigint and bounds to ExtendedBigint (the union of bigint and ±Infinity), in preparation for addressing #455. Additional changes include: * Removal of SequenceDefault class, as everything now derives from Cached. * More care in caching of OEIS sequences, to provide some help with #459. * Individual-parameter validation functions now take a validation status to update based on the finding. This refactor avoids a frequent need to merge status objects. * Individual-parameter validation functions are called in a context of `this` set to the Paramable object, so that other data from the paramable can be accessed. * New `math` functions for dealing with ExtendedBigints. * NumberGlyph visualizer now tries to display as many terms as are available and will fit on the screen, except in case there are infinitely many terms available, in which case it still defaults to 64 terms. This change is needed so that the general length parameter for Sequences would affect what NumberGlyph displayed. * Puts reactivation of "known" OEIS sequences into a function rather than at top level of sequences.ts so that the timing of when it occurs can be controlled. * Renames property `_size` of P5Visualizer to `size` as multiple derived classes seem to need this value (i.e., not just used internally in the base class). * Makes sure the p5 loop is restarted if need be every time `.show()` is called on a visualizer. * Temporarily disables A000521 transversal tests until we can get incremental OEIS loading to work. * Adds a test for starting a visualization deep into a sequence. * refactor: Apply OEIS modulus on the fly, so cache can point to communal one * test: First working e2e tests with text and transversal Unfortunately, due to the intricacies of end-to-end testing with image comparison, this is a very large commit. It introduces running tests in a Docker container for the sake of reproducibility. (So note that henceforth you must have docker installed and running on your machine to perform testing, and hence to make a commit.) However, it turns out not all tests can be run in a Docker container -- Firefox is not able to supply a WebGL context inside of Docker. Hence, some tests still need to be run directly on the host machine. To deal with all this, I felt it necessary to introduce the "make" tool. In particular, creating the necessary Docker image is slow, and I didn't want to have to repeat that except when truly necessary. That, in turn, means there are now many more configuration files, as well as auxiliary files created by the Makefile. The directory tree was becoming hopelessly cluttered. To keep things straight in my head, I felt it was really necessary to reorganize the directory structure. The biggest change is to put as many of the configuration files as possible in a new etc/ directory (the old-school unix name for where to put such things). I managed to get almost everything put in there. The one major holdout is the tsconfig files; I just couldn't find a way to get TypeScript to run without its config files at the top level of the project. Slightly annoying, but then, there are so many more annoying things about TypeScript. Anyhow, accomplishing the configuration file rearrangement resulted in updating eslint. Unfortunately, it went through a _major_ rewrite from version 8 to 9, and prettier-eslint has not been updated to work with the new version. I tried to work on that update myself, but prettier-eslint is incredibly intricate in the way that it handles configurations because it is trying to provide a great deal of flexibility and power, and it is trying to infer prettier configuration from eslint configuration (and maybe even vice-versa as well, I'm not sure). Since we were only using a tiny fraction of that power, it turned out to be significantly easier to just replace prettier-eslint with a custom script tools/prettiest.js that runs first prettier, then eslint. It also meant we could get rid of the lint-staged tool, so there is that. (While I was at it, I ran the "depcheck" tool and got rid of some other unneeded cruft that had accumulated.) But now it all works, and as a side benefit since just about every npm script now runs through make, you should never need to worry about running install before dev or build before preview, etc. Make keeps track of what depends on what and whether/how to redo the prerequisites before taking the requested action. Even better, if you just ran successful end-to-end tests, which take a while now, then `git commit` will not have to re-run them. Those are the major points. In addition, there are a number of minor changes resulting from errors uncovered in getting all the tests to work, etc: - Reordering of element attributes and other reformatting in vue components, because with the lint updates and reconfiguration, it now seems significantly stricter about vue templates. - First pass at an updated User Guide for ui2; it definitely needs more work. - Some other more minor documentation updates. - Updates tools/editor/autoformat.el so that emacs can use the new "prettiest" tool for formatting (even though I know I am likely the only one in the world who will ever use that). * test: Add skipped tests reflecting known shortcomings As per Numberscope discussion today, this is the last missing element of the end-to-end testing PR. With these skipped tests, we are flagging that there are known concerns we want to resolve at least by beta, if not alpha. Also fixes a small gap in the docs for running from source that Aaron noticed in the meeting. With this, the PR will be marked ready for final review. * fix: Generate docker image even with no existing test results * chore: Attempt to run e2e tests in CI workflow * test: No WebGL tests in CI :-( * fix: format of Playwright reporter parameter * chore: Try to grab GitHub CI actuals to make separate CI test file * fix: oops test must succeed to generate artifact * chore: Another try to grab ci snapshots * test: Add in the extracted CI snapshots * doc: Note that PR reviewers must run e2e tests themselves, too. * chore: Stop grabbing snapshots when we don't need them * fix: Changes as per review comments * doc: more info on Docker (and fix remaining typo) * test: Fuzz the pixel comparison in Firefox WebGL tests * test: Really fuzz the pixel comparison in Firefox WebGL tests * test: Check if docker tests passed before updating result directory * doc: Improvements per code review * doc: show the expected output of successful end-to-end test * chore: remove stray comment that no longer applies * fix: useful default camera controls for WebGL: left drag pans, wheel zooms * fix: correct Husky action inclusion and test it * fix: don't alter the URL just loaded, and reset frame limit on changes * fix: adjust dragging, detected mouse position, and text size for zoom/pan * fix: Keep the 'too many bins' message in a fixed absolute canvas position
This is an in-progress cleaned version of the final PR (#361) of
the Delft student user interface project, which it supersedes.
Comments from the original PR that remain relevant:
npm run test:e2e
npm run test:e2e:ui
[but note that these tests run on your local machine natively, rather than in Docker, and so some image comparison tests will fail that will succeed when run non-interactively].Caveats concerning trying this cleaned PR and its status:
npm install
after pulling this PR.Many of the tests do not yet pass, perhaps because of the "specific classes and IDs" point mentioned above and the fact that ui2 has diverged significantly from the Delft PR series.Tests are not yet run automatically prior to commit.I do not think there are any image tests yet, we need to try to add them.Tests are not yet performed in the continuous integration checks to be run on GitHub; they should be.In addition, this PR corrects a number of accumulated issues and issues that were surfaced as a result of implementing the testing. Namely:
Resolves #25.
Resolves #54.
Resolves #73.
Resolves #174.
Resolves #217.
Resolves #219.
Resolves #225.
Resolves #226.
Resolves #246.
Resolves #292.
Resolves #294.
Resolves #311.
Resolves #419.
Resolves #458.
noLoop()
andloop()
methods with visualizerstop()
andcontinue()
methods, and documented the need to do this (necessary to ensure that the play/pause control is updated properly, as the testing watches this control to know when the visualization is done and so it is safe to take a snapshot).stop()
method to control the number of frames that a visualizer will run (again, needed to make the images produced by a visualizer deterministic).make
for building the app and testing it, to avoid expensive steps like generating a Docker image and so on from being re-run when they don't need to be.By submitting this PR, I am indicating to the Numberscope maintainers that I have read and understood the contributing guidelines and that this PR follows those guidelines to the best of my knowledge. I have also read the pull request checklist and followed the instructions therein.