Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: parse generate property in sdf #143

Merged
merged 21 commits into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,11 +269,16 @@ have additional information for identifying the kind of content to expect:
which are only available for certain architectures. Example:
`/usr/bin/hello: {arch: amd64}` will instruct Chisel to extract and install
the "/usr/bin/hello" file only when chiselling an amd64 filesystem.
- **generate**: accepts a `manifest` value to instruct Chisel to generate the
manifest files in the directory. Example: `/var/lib/chisel/**:{generate:
manifest}`. NOTE: the provided path has to be of the form
`/slashed/path/to/dir/**` and no wildcards can appear apart from the trailing
`**`.

## TODO

- [ ] Preserve ownerships when possible
- [ ] GPG signature checking for archives
- [x] GPG signature checking for archives
- [ ] Use a fake server for the archive tests
- [ ] Functional tests

Expand Down
151 changes: 108 additions & 43 deletions internal/setup/setup.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,11 +60,12 @@ type SliceScripts struct {
type PathKind string

const (
DirPath PathKind = "dir"
CopyPath PathKind = "copy"
GlobPath PathKind = "glob"
TextPath PathKind = "text"
SymlinkPath PathKind = "symlink"
DirPath PathKind = "dir"
CopyPath PathKind = "copy"
GlobPath PathKind = "glob"
TextPath PathKind = "text"
SymlinkPath PathKind = "symlink"
GeneratePath PathKind = "generate"
letFunny marked this conversation as resolved.
Show resolved Hide resolved

// TODO Maybe in the future, for binary support.
//Base64Path PathKind = "base64"
Expand All @@ -77,14 +78,22 @@ const (
UntilMutate PathUntil = "mutate"
)

type GenerateKind string

const (
GenerateNone GenerateKind = ""
GenerateManifest GenerateKind = "manifest"
)

type PathInfo struct {
Kind PathKind
Info string
Mode uint

Mutable bool
Until PathUntil
Arch []string
Mutable bool
Until PathUntil
Arch []string
Generate GenerateKind
}

// SameContent returns whether the path has the same content properties as some
Expand All @@ -95,7 +104,8 @@ func (pi *PathInfo) SameContent(other *PathInfo) bool {
return (pi.Kind == other.Kind &&
pi.Info == other.Info &&
pi.Mode == other.Mode &&
pi.Mutable == other.Mutable)
pi.Mutable == other.Mutable &&
pi.Generate == other.Generate)
}

type SliceKey struct {
Expand Down Expand Up @@ -141,10 +151,19 @@ func ReadRelease(dir string) (*Release, error) {

func (r *Release) validate() error {
keys := []SliceKey(nil)
paths := make(map[string]*Slice)
globs := make(map[string]*Slice)

// Check for info conflicts and prepare for following checks.
// Check for info conflicts and prepare for following checks. A conflict
// means that two slices attempt to extract different files or directories
// to the same location.
// Conflict validation is done without downloading packages which means that
// if we are extracting content from different packages to the same location
// we cannot be sure that it will be the same. On the contrary, content
// extracted from the same package will never conflict because it is
// guaranteed to be the same.
// The above also means that generated content (e.g. text files, directories
// with make:true) will always conflict with extracted content, because we
// cannot validate that they are the same without downloading the package.
paths := make(map[string]*Slice)
for _, pkg := range r.Packages {
for _, new := range pkg.Slices {
keys = append(keys, SliceKey{pkg.Name, new.Name})
Expand All @@ -157,12 +176,35 @@ func (r *Release) validate() error {
}
return fmt.Errorf("slices %s and %s conflict on %s", old, new, newPath)
}
} else {
if newInfo.Kind == GlobPath {
globs[newPath] = new
// Note: We do not have to record newPath because conflict
// is a transitive relation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. Consider the check above: the package is being taken into account to determine if there is a conflict or not. If these items are in different packages and we ignore new, keeping olding old, we might ignore conflicts that should not be ignored.

The original code here seems straightforward, and it would be nice to not change that with too many implied assumptions. Note how above we're simply checking if two things conflict, with very straighfrorwad rules: given that newPath and oldPath are the exact same string, we consider whether their content is exactly the same (SameContent) to spot a conflict. But, this is only true if we are either extracting that from the package or we're explicitly creating the content.

Copy link
Collaborator Author

@letFunny letFunny Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, the code and its assumptions were okay, the same ones we had on master basically. The problem was the comment which attempted to explain a nuanced relation in a very sort sentence. I have changed the comment to something that captures the intent with more precision because it is true that the conflict is NOT a transitive relation, it is more akin to "equivalence classes" of no-conflict where we partition by paths. However, that is again too complex so I have written the comment in the most straightforward way I could think of.

continue
}

// Check for glob and generate conflicts.
for oldPath, old := range paths {
oldInfo := old.Contents[oldPath]
if !(newInfo.Kind == GlobPath || newInfo.Kind == GeneratePath ||
oldInfo.Kind == GlobPath || oldInfo.Kind == GeneratePath) {
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall loop here was de-optimized, probably in an unnecessary way.

The for loops above are already executing: for every file (content), inside every slice, of every package. So already quite a relevant expansion. Now the new code is also adding almost every one of those items to a list, and for every one inner iteration of the three earlier loops, it's looping over that whole list again. As a quick exercise, assume 10k items in the earlier loops, how many times are we executing the logic here? How many times did we go through the exact same items before rejecting them? (hint: just for the first element of the list, 10k-1 times).

That's why the original code had a globs helper here. The cost was similar, but we were paying only for items that we knew had to be handled as globs. I think we still want something similar, but need different conditions for its use as you've spotted.

Copy link
Collaborator Author

@letFunny letFunny Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, even if GlobPath is the most expensive operation here the approach in the PR does not make sense. The code that was in the PR was going to do roughly O(n^2) loop iterations (combinatorial explosion) while the previous code was using globs to reduce that by the % of globs in the slice definitions. For example, given 24.04 from chisel-releases that has ~25% globs (as of today), the previous algorithm does, in theory, 1/4 of iterations of the new one. This is especially relevant for use-cases where the % is even lower, which is what we envision for the future of the releases. The only cost is one extra map which is pretty reasonable.

I have changed the code to "tweak" the previous algorithm while solving the bugs and adding the support for generate. I am only not sure about the naming of the new map, but that is a very minor thing.

}
if (newInfo.Kind == GlobPath || newInfo.Kind == CopyPath) &&
(oldInfo.Kind == GlobPath || oldInfo.Kind == CopyPath) {
if new.Package == old.Package {
continue
}
}
if strdist.GlobPath(newPath, oldPath) {
if (old.Package > new.Package) || (old.Package == new.Package && old.Name > new.Name) ||
(old.Package == new.Package && old.Name == new.Name && oldPath > newPath) {
old, new = new, old
oldPath, newPath = newPath, oldPath
}
return fmt.Errorf("slices %s and %s conflict on %s and %s", old, new, oldPath, newPath)
}
paths[newPath] = new
}

paths[newPath] = new
}
}
}
Expand All @@ -173,22 +215,6 @@ func (r *Release) validate() error {
return err
}

// Check for glob conflicts.
for newPath, new := range globs {
for oldPath, old := range paths {
if new.Package == old.Package {
continue
}
if strdist.GlobPath(newPath, oldPath) {
if old.Package > new.Package || old.Package == new.Package && old.Name > new.Name {
old, oldPath, new, newPath = new, newPath, old, oldPath
}
return fmt.Errorf("slices %s and %s conflict on %s and %s", old, new, oldPath, newPath)
}
}
paths[newPath] = new
}

return nil
}

Expand Down Expand Up @@ -357,8 +383,9 @@ type yamlPath struct {
Symlink string `yaml:"symlink"`
Mutable bool `yaml:"mutable"`

Until PathUntil `yaml:"until"`
Arch yamlArch `yaml:"arch"`
Until PathUntil `yaml:"until"`
Arch yamlArch `yaml:"arch"`
Generate GenerateKind `yaml:"generate"`
}

// SameContent returns whether the path has the same content properties as some
Expand Down Expand Up @@ -583,7 +610,19 @@ func parsePackage(baseDir, pkgName, pkgPath string, data []byte) (*Package, erro
var mutable bool
var until PathUntil
var arch []string
if strings.ContainsAny(contPath, "*?") {
var generate GenerateKind
if yamlPath != nil && yamlPath.Generate != "" {
zeroPathGenerate := zeroPath
zeroPathGenerate.Generate = yamlPath.Generate
if !yamlPath.SameContent(&zeroPathGenerate) || yamlPath.Until != UntilNone {
return nil, fmt.Errorf("slice %s_%s path %s has invalid generate options",
pkgName, sliceName, contPath)
}
if _, err := validateGeneratePath(contPath); err != nil {
return nil, fmt.Errorf("slice %s_%s has invalid generate path: %s", pkgName, sliceName, err)
}
kinds = append(kinds, GeneratePath)
} else if strings.ContainsAny(contPath, "*?") {
if yamlPath != nil {
if !yamlPath.SameContent(&zeroPath) {
return nil, fmt.Errorf("slice %s_%s path %s has invalid wildcard options",
Expand All @@ -595,6 +634,7 @@ func parsePackage(baseDir, pkgName, pkgPath string, data []byte) (*Package, erro
if yamlPath != nil {
mode = yamlPath.Mode
mutable = yamlPath.Mutable
generate = yamlPath.Generate
if yamlPath.Dir {
if !strings.HasSuffix(contPath, "/") {
return nil, fmt.Errorf("slice %s_%s path %s must end in / for 'make' to be valid",
Expand Down Expand Up @@ -644,12 +684,13 @@ func parsePackage(baseDir, pkgName, pkgPath string, data []byte) (*Package, erro
return nil, fmt.Errorf("slice %s_%s mutable is not a regular file: %s", pkgName, sliceName, contPath)
}
slice.Contents[contPath] = PathInfo{
Kind: kinds[0],
Info: info,
Mode: mode,
Mutable: mutable,
Until: until,
Arch: arch,
Kind: kinds[0],
Info: info,
Mode: mode,
Mutable: mutable,
Until: until,
Arch: arch,
Generate: generate,
}
}

Expand All @@ -659,6 +700,22 @@ func parsePackage(baseDir, pkgName, pkgPath string, data []byte) (*Package, erro
return &pkg, err
}

// validateGeneratePath validates that the path follows the following format:
// - /slashed/path/to/dir/**
//
// Wildcard characters can only appear at the end as **, and the path before
// those wildcards must be a directory.
func validateGeneratePath(path string) (string, error) {
if !strings.HasSuffix(path, "/**") {
return "", fmt.Errorf("%s does not end with /**", path)
}
dirPath := strings.TrimSuffix(path, "**")
if strings.ContainsAny(dirPath, "*?") {
return "", fmt.Errorf("%s contains wildcard characters in addition to trailing **", path)
}
return dirPath, nil
}

func stripBase(baseDir, path string) string {
// Paths must be clean for this to work correctly.
return strings.TrimPrefix(path, baseDir+string(filepath.Separator))
Expand Down Expand Up @@ -691,9 +748,17 @@ func Select(release *Release, slices []SliceKey) (*Selection, error) {
}
return nil, fmt.Errorf("slices %s and %s conflict on %s", old, new, newPath)
}
continue
} else {
paths[newPath] = new
}
// An invalid "generate" value should only throw an error if that
// particular slice is selected. Hence, the check is here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems okay for now, but it's a bit unclear what the final place should be, due to the potential automatic manifest inclusion which could make this be better placed elsewhere.

switch newInfo.Generate {
case GenerateNone, GenerateManifest:
default:
return nil, fmt.Errorf("slice %s has invalid 'generate' for path %s: %q, consider an update if available",
new, newPath, newInfo.Generate)
}
paths[newPath] = new
}
}

Expand Down
Loading
Loading