-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The main problem was that the file exists checker uses go's built-in filepath globber, which doesn't support double-asterisk patterns (golang/go#11862). Additionally, with patterns like `*.js` we need to search for all JS files not only on the first level but also in nested directories. Based on made investigation this problem was fixed and additional test coverage was added to document that contract. * Refactor code to remove lint issues * Update Go min version to 1.14 on CI
- Loading branch information
Showing
18 changed files
with
434 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
57 changes: 57 additions & 0 deletions
57
docs/investigation/file_exists_checker/file_matcher_libs_bench_test.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
// Always record the result of func execution to prevent | ||
// the compiler eliminating the function call. | ||
// Always store the result to a package level variable | ||
// so the compiler cannot eliminate the Benchmark itself. | ||
package file_exists_checker | ||
|
||
import ( | ||
"fmt" | ||
"log" | ||
"os" | ||
"path" | ||
"testing" | ||
|
||
"github.com/bmatcuk/doublestar/v2" | ||
"github.com/mattn/go-zglob" | ||
"github.com/yargevad/filepathx" | ||
) | ||
|
||
var pattern string | ||
func init() { | ||
curDir, err := os.Getwd() | ||
if err != nil { | ||
log.Fatal(err) | ||
} | ||
pattern = path.Join(curDir, "..", "..", "**", "*.md") | ||
fmt.Println(pattern) | ||
} | ||
|
||
var pathx []string | ||
|
||
func BenchmarkPathx(b *testing.B) { | ||
var r []string | ||
for n := 0; n < b.N; n++ { | ||
r, _ = filepathx.Glob(pattern) | ||
} | ||
pathx = r | ||
} | ||
|
||
var zGlob []string | ||
|
||
func BenchmarkZGlob(b *testing.B) { | ||
var r []string | ||
for n := 0; n < b.N; n++ { | ||
r, _ = zglob.Glob(pattern) | ||
} | ||
zGlob = r | ||
} | ||
|
||
var double []string | ||
|
||
func BenchmarkDoublestar(b *testing.B) { | ||
var r []string | ||
for n := 0; n < b.N; n++ { | ||
r, _ = doublestar.Glob(pattern) | ||
} | ||
double = r | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
## File exits checker | ||
|
||
This document describes investigation about [`file exists`](../../../internal/check/file_exists.go) checker which needs to deal with the gitignore pattern syntax | ||
|
||
### Problem | ||
|
||
A [CODEOWNERS](https://docs.github.com/en/free-pro-team@latest/github/creating-cloning-and-archiving-repositories/about-code-owners#codeowners-syntax) file uses a pattern that follows the same rules used in [gitignore](https://git-scm.com/docs/gitignore#_pattern_format) files. | ||
The gitignore files support two consecutive asterisks ("**") in patterns that match against the full path name. Unfortunately the core Go library `filepath.Glob` does not support [`**`](https://github.com/golang/go/issues/11862) at all. | ||
|
||
This caused that for some patterns the [`file exists`](../../../internal/check/file_exists.go) checker didn't work properly, see [issue#22](https://github.com/mszostok/codeowners-validator/issues/22). | ||
|
||
Additionally, we need to support a single asterisk at the beginning of the pattern. For example, `*.js` should check for all JS files in the whole git repository. To achieve that we need to detect that and change from `*.js` to `**/*.js`. | ||
|
||
```go | ||
pattern := "*.js" | ||
if len(pattern) >= 2 && pattern[:1] == "*" && pattern[1:2] != "*" { | ||
pattern = "**/" + pattern | ||
} | ||
``` | ||
|
||
### Investigation | ||
|
||
Instead of creating a dedicated solution, I decided to search for a custom library that's supporting two consecutive asterisks. | ||
There are a few libraries in open-source that can be used for that purpose. I selected three: | ||
- https://github.com/bmatcuk/doublestar/v2 | ||
- https://github.com/mattn/go-zglob | ||
- https://github.com/yargevad/filepathx | ||
|
||
I've tested all libraries and all of them were supporting `**` pattern properly. As a final criterion, I created benchmark tests. | ||
|
||
#### Benchmarks | ||
|
||
Run benchmarks with 1 CPU for 5 seconds: | ||
|
||
```bash | ||
go test -bench=. -benchmem -cpu 1 -benchtime 5s ./file_matcher_libs_bench_test.go | ||
|
||
goos: darwin | ||
goarch: amd64 | ||
BenchmarkPathx 79 72276938 ns/op 7297258 B/op 40808 allocs/op | ||
BenchmarkZGlob 126 47206545 ns/op 840973 B/op 10550 allocs/op | ||
BenchmarkDoublestar 157 38041578 ns/op 3521379 B/op 22150 allocs/op | ||
``` | ||
|
||
Run benchmarks with 12 CPU for 5 seconds: | ||
```bash | ||
go test -bench=. -benchmem -cpu 12 -benchtime 5s ./file_matcher_libs_bench_test.go | ||
|
||
goos: darwin | ||
goarch: amd64 | ||
BenchmarkPathx-12 78 73096386 ns/op 7297114 B/op 40807 allocs/op | ||
BenchmarkZGlob-12 637 9234632 ns/op 914239 B/op 10564 allocs/op | ||
BenchmarkDoublestar-12 151 38372922 ns/op 3522899 B/op 22151 allocs/op | ||
``` | ||
|
||
#### Summary | ||
|
||
With the 1 CPU , the `doublestar` library has the shortest time, but the allocated memory is higher than the `z-glob` library. | ||
With the 12 CPU, the `z-glob` is a winner bot in time and memory allocation. The worst one in each test was the `filepathx` library. | ||
|
||
> **NOTE:** The `z-glob` library has an issue with error handling. I've provided PR for fixing that problem: https://github.com/mattn/go-zglob/pull/37. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
module github.com/mszostok/codeowners-validator/docs/investigation/file_exists_checker | ||
|
||
go 1.15 | ||
|
||
require ( | ||
github.com/bmatcuk/doublestar/v2 v2.0.1 | ||
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6 | ||
github.com/yargevad/filepathx v0.0.0-20161019152617-907099cb5a62 | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
github.com/bmatcuk/doublestar/v2 v2.0.1 h1:EFT91DmIMRcrUEcYUW7AqSAwKvNzP5+CoDmNVBbcQOU= | ||
github.com/bmatcuk/doublestar/v2 v2.0.1/go.mod h1:QMmcs3H2AUQICWhfzLXz+IYln8lRQmTZRptLie8RgRw= | ||
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6 h1:nw6OKTHiQIVOSaT4xJ5STrLfUFs3xlU5dc6H4pT5bVQ= | ||
github.com/mattn/go-zglob v0.0.4-0.20201017022353-70beb5203ba6/go.mod h1:MxxjyoXXnMxfIpxTK2GAkw1w8glPsQILx3N5wrKakiY= | ||
github.com/yargevad/filepathx v0.0.0-20161019152617-907099cb5a62 h1:pZlTNPEY1N9n4Frw+wiRy9goxBru/H5KaBxJ4bFt89w= | ||
github.com/yargevad/filepathx v0.0.0-20161019152617-907099cb5a62/go.mod h1:VtdjfTSVslSOB39qCxkH9K3m2qUauaJk/6y+pNkvCQY= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.