Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acc: add a helper to diff with replacements #2352

Merged
merged 6 commits into from
Feb 14, 2025
Merged

acc: add a helper to diff with replacements #2352

merged 6 commits into from
Feb 14, 2025

Conversation

denik
Copy link
Contributor

@denik denik commented Feb 13, 2025

Changes

diff.py is like "diff -r -U2" but it applies replacements first to the argument.

This allows comparing different output files and directories but ignore differences that are going to be replaced by placeholders.

This is useful for tests that record large amount of files, specifically "bundle init" with standard templates. In those tests, changing one parameter results in a small diff so recording the full directory is not helpful, because it's hard to see what changed there. I'm using it in implementation of serverless mode for templates that need it: #2348 The serverless templates are slightly different from classic, capturing the diff helps to see exactly where.

Related small changes:

  • Add [TESTROOT] replacement for absolute path to acceptance directory in git repo.
  • Add $TESTDIR env var for absolute path to a given test in git repo.

Tests

diff.py is like "diff -r -U2" but it applies replacements first to the argument.

This allows comparing different output files and directories but ignore differences
that are going to be replaced by placeholders.

This is useful for tests that record large amount of files, specifically "bundle init" with
standard templates. In those tests, changing one parameter results in a small diff so recording
the full directory is not helpful, because it's hard to see what changed there.

I'm using it in implementation of serverless mode for templates that need it: #2348

Related small changes: add [TESTROOT] replacement for absolute path to acceptance directory in git repo.
Add $TESTDIR env var for absolute path to a given test in git repo.
@denik denik temporarily deployed to test-trigger-is February 13, 2025 09:27 — with GitHub Actions Inactive
@@ -56,6 +56,7 @@ const (
EntryPointScript = "script"
CleanupScript = "script.cleanup"
PrepareScript = "script.prepare"
ReplsFile = "repls.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment what's this file used for? From first glance it's not clear if it's an input for replacement or some sort of output

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

@@ -65,6 +66,10 @@ var Scripts = map[string]bool{
PrepareScript: true,
}

var Ignored = map[string]bool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make this configurable, maybe in the future?

Copy link
Contributor Author

@denik denik Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can discuss it once we have use case for it.

@@ -320,6 +333,10 @@ func runTest(t *testing.T, dir, coverDir string, repls testdiff.ReplacementsCont
cmd.Env = append(cmd.Env, "GOCOVERDIR="+coverDir)
}

absDir, err := filepath.Abs(dir)
require.NoError(t, err)
cmd.Env = append(cmd.Env, "TESTDIR="+absDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't absDir value be in quotes? It might contain spaces at least on Windows

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This passes on Windows CI and it's also not the only directory we have, so judging from practice, no. Unless you have a Windows machine where it does not work without quotes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as it passes on Windows CI it's fine I guess

p1 = d1 / f
p2 = d2 / f
if f not in set2:
print(f"Only in {d1}: {f}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make it a bit more explicit phrasing? Something like "File X is found only in Y directory". When I was reading test output it was not immediately clear what's going on

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This follows "diff -r" phrasing. If you're familiar with that, it makes sense.

@@ -27,6 +27,9 @@ type TestConfig struct {
// If true, do not run this test against cloud environment
LocalOnly bool

// if true, save file repls.json with all the replacemnts
SaveRepls bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead of saving and then loading replacements just do the output replacement always (maybe to a separate temp folder) and then compare it? Or just always save it and ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather avoid adding this overhead to all tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, File i/o is cheap so this should not add much overhead? Doing replacement on all output files by default makes sense since we do that anyways when performing comparisons. WDYT?

Copy link
Contributor

@shreyas-goenka shreyas-goenka Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also get to avoid the added complexity of having SaveRepls and repls.json.

And we could use diff directly instead of relying on the python script to perform replacements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@denik by overhead you mean performance overhead? If so, how do we balance between performance and code complexity then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I benchmarked it, there is no detectable difference on my laptop, so I removed the option: c8b4196

We have open PRs with 100s of tests #2260
so we might reconsider it then if we measure that it makes a difference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we could use diff directly instead of relying on the python script to perform replacements.

I don't see how that is possible, the replacements are applied by test runner but on file system the files are without replacements so diff will see the actual values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the files are without replacements so diff will see the actual values

I was proposing we change this. If the files on the file system are with the replacement then we could use diff directly.

But on thinking about this more it would make debugging harder since since you don't know the pre-replacement values. The python approach makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shreyas-goenka I don't think this is possible as the replacements could be done only after script is finished and diff call is a part of the script

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively we could introduce some sort of post script run validation script which would call the diff and compare expected output but it might be more complex

@denik denik temporarily deployed to test-trigger-is February 14, 2025 09:56 — with GitHub Actions Inactive
@denik denik requested a review from andrewnester February 14, 2025 09:57
@denik denik temporarily deployed to test-trigger-is February 14, 2025 10:29 — with GitHub Actions Inactive
if config.SaveRepls {
replsJson, err := json.MarshalIndent(repls.Repls, "", " ")
require.NoError(t, err)
testutil.WriteFile(t, filepath.Join(tmpDir, ReplsFile), string(replsJson))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be unlikely to happen but if my test script also produces repls.json this will override it, correct? Shall we error out in this case just to be safe?

Copy link
Contributor Author

@denik denik Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is written before your script is run, so your script will override it, which will break diff.py if you use it.

You should be able to modify your script to use different file or rename the files like we do with .gitignore.

@denik denik requested a review from andrewnester February 14, 2025 10:33
@@ -0,0 +1,56 @@
#!/usr/bin/env python3
"""This script implements "diff -r -U2 dir1 dir2" but applies replacements first"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have a chance to measure performance difference between running this test with diff and with repls.json writing + diff.py?

Copy link
Contributor Author

@denik denik Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand -- diff is doing the wrong thing there, e.g. it will look at actual username rather than [USERNAME] and detect that as a difference. So performance is not important because they are not equivalent, functionality-wise.

We can bench diff.py against diff but even if it's 2x slower there is no action to take, because we use diff.py for handling replacements.

That's said I don't think diffing will be a bottleneck in your tests, so it makes sense to prefer diff.py because it has less cross-platform quirks than diff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to discussion in the other thread about potentially replacing the output without storing repl.json file and using diff instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since it's not really possible to use diff anyway, performance here indeed doesn't really matter

@denik denik requested a review from andrewnester February 14, 2025 10:35
@denik denik added this pull request to the merge queue Feb 14, 2025
Merged via the queue into main with commit c0a56a9 Feb 14, 2025
9 checks passed
@denik denik deleted the denik/acc-diff branch February 14, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants