Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add argument to make_codebase_resource to make the created CodebaseResource path relative to the project work path rather than the project codebase path #378

Closed
wants to merge 30 commits into from

Conversation

JonoYang
Copy link
Member

@JonoYang JonoYang commented Dec 8, 2021

I have a pipeline that performs something similar to a Merkle tree calculation on a codebase. In order for this to work, I need a root directory to roll my results up to. scanpipe currently does not have the concept of a root, so my current workaround is to use the project's codebase directory as the root directory.

I create a new pipeline that creates a CodebaseResource for the project codebase directory, but when I call scanpipe.pipes.scancode.make_codebase_resource, the created CodebaseResources do not have codebase as the prefix.

I've modified scanpipe.pipes.scancode.make_codebase_resource to have a new argument relative_to_work_path such that the path created for the CodebaseResource is related to the project work path, rather than the project codebase path. This means that the created CodebaseResource paths will be prefixed with codebase e.g. codebase/notice.NOTICE.

@JonoYang JonoYang requested a review from tdruez December 8, 2021 02:48
@JonoYang JonoYang force-pushed the 180-projectcodebase-root-workaround branch 2 times, most recently from a4d4a8b to 4faafad Compare December 8, 2021 03:49
@tdruez
Copy link
Contributor

tdruez commented Dec 9, 2021

@JonoYang I'm not confortable with the idea of storing the codebase/ prefix as a choice.
This lack of consistency in the way CodebaseResource path would be treated will likely be a pain in the future as features will now have to deal with both possibility (prefix or no-prefix).

In order for this to work, I need a root directory to roll my results up to

What about a special "root" CodebaseResource instance instead?
Maybe a simple . or ./ coule be good enough?

Note that we used to always store the codebase/ prefix in the database but this had a real estate cost in the UI and storage cost in the database without much benefits (at least at the time).
We could bring back the prefix is we do not find a better approach, but this would not be a choice and should be enforced for all CodebaseResource instances.

@JonoYang
Copy link
Member Author

JonoYang commented Dec 9, 2021

@tdruez Thanks for the review!

In order for this to work, I need a root directory to roll my results up to

What about a special "root" CodebaseResource instance instead?
Maybe a simple . or ./ coule be good enough?

This would be acceptable too. The thing I am interested in is the ability to use the ProjectCodebase class to iterate through the CodebaseResources using code that was originally written for use with scancode-toolkit's Codebase and VirtualCodebase classes. In particular, the code I have uses Codebase.walk() and Resource.walk(). If we are to create a special root CodebaseResource named . or ./, I would assume that all of the other CodebaseResources in our codebase would need to have . as a prefix since CodebaseResource.walk() in scancode.io uses the CodebaseResource path to determine its children.

For example:

./
./foo/bar.c
./README.txt

I'll modify my pipeline to create a CodebaseResource with the path ./ and then modify make_codebase_resource to allow me to create CodebaseResources with . as a path prefix.

@tdruez
Copy link
Contributor

tdruez commented Dec 10, 2021

@JonoYang my point is that we do not want to store any prefix in the database.
I would suggest to refine the walk() functions logic instead to support a special case for a root.

Copy link
Member Author

@JonoYang JonoYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdruez I've made some changes where we now create the "." CodebaseResource when running the scan_codebase pipeline. I've done this by modifying Project.walk_codebase_path() to return the project codebase/ directory and by modifying make_codebase_resource to handle the cases where it is making a CodebaseResource for the project codebase/ directory.

"""
return self.codebase_path.rglob("*")
return itertools.chain(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdruez I am trying to get walk_codebase_path() to return the codebase/ directory in addition to the subdirectories and files of codebase/. Is there a better way to do this?

scanpipe/models.py Outdated Show resolved Hide resolved
@JonoYang JonoYang force-pushed the 180-projectcodebase-root-workaround branch 2 times, most recently from 8c690ce to cf8a8bf Compare December 15, 2021 20:52
Copy link
Contributor

@tdruez tdruez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not implemented a way to create "." when creating CodebaseResources for the docker pipeline. An immediate solution would be to check to see if a "." exists before creating the CodebaseResources for a layer (https://github.com/nexB/scancode.io/blob/main/scanpipe/pipes/docker.py#L134)

Maybe a better idea would be to create "." when the project is created?

No, let's not create resources at project creation time, resources should be created during pipelines execution.

@@ -668,9 +669,10 @@ def get_latest_output(self, filename):

def walk_codebase_path(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a unit test for this method.

@@ -1580,7 +1588,12 @@ def descendants(self):
database query on the current CodebaseResource `path`.
The current CodebaseResource is not included.
"""
return self.project.codebaseresources.filter(path__startswith=f"{self.path}/")
if self.path == ".":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a is_root model property for this code and replace here with "self.is_root"
This will make the code more readable and we'll be able to change the Root logic in the future by simply updating the property.
Also, the "." should be stored in a global variable, something like ROOT_SYMBOL, for the same reasons as above.

@@ -1580,7 +1588,12 @@ def descendants(self):
database query on the current CodebaseResource `path`.
The current CodebaseResource is not included.
"""
return self.project.codebaseresources.filter(path__startswith=f"{self.path}/")
if self.path == ".":
return self.project.codebaseresources.exclude(path=".")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition does not seem to be tested. Also, could you document in the dostring why the root need to be treated differently?

"""
exactly_one_sub_directory = "[^/]+$"
children_regex = rf"^{self.path}/{exactly_one_sub_directory}"
if self.path == ".":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, it's not clear nor documented why we have to treat the root differently.

tdruez and others added 6 commits May 2, 2022 11:31
    * Remove create_discovered_packages2 and create_codebase_resources2

Signed-off-by: Jono Yang <[email protected]>
    * Normalize package_uids before comparing results in tests
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
@tdruez tdruez mentioned this pull request May 10, 2022
JonoYang added 13 commits May 10, 2022 12:11
    * Mark ProjectCodebase tests with expectedFailure
    * We will revisit ProjectCodebase and update it to fit our current models

Signed-off-by: Jono Yang <[email protected]>
    * We are using a scancode scan results for tests since asgiref-3.3.0_scan.json is not exactly the same format as scancode's json output

Signed-off-by: Jono Yang <[email protected]>
    * Update regen_test_data.py to generate asgiref-3.3.0_walk_test_fixtures.json

Signed-off-by: Jono Yang <[email protected]>
    * No need to explicity get license_clarity_score in make_results_summary()
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
    * Add .vscode to .gitignore

Signed-off-by: Jono Yang <[email protected]>
    * Add argument to set the codebase path of a CodebaseResource created by make_codebase_resource relative to the project work path

Signed-off-by: Jono Yang <[email protected]>
    * `codebase` has been added as an unused argument to CodebaseResource.save() to provide compatibility with commoncode.resource.Resource

Signed-off-by: Jono Yang <[email protected]>
    * Have test to ensure that the codebase prefix is present when relative_to_work_path is True

Signed-off-by: Jono Yang <[email protected]>
    * Update expected test results
    * Revert previous changes to make_codebase_resource

Signed-off-by: Jono Yang <[email protected]>
    * Use different regex to determine the children of the "." CodebaseResource
    * Update test expectations

Signed-off-by: Jono Yang <[email protected]>
JonoYang added 5 commits May 10, 2022 16:04
    * replace_root_path_and_name is a helper function that replaces the root of a Codebase with "." and strips the previous root prefix from the remaining Resources
    * Update tests

Signed-off-by: Jono Yang <[email protected]>
    * Update docstrings
    * Update expected test results

Signed-off-by: Jono Yang <[email protected]>
@JonoYang JonoYang force-pushed the 180-projectcodebase-root-workaround branch from 48e489c to a9410f8 Compare May 11, 2022 02:24
    * Format code

Signed-off-by: Jono Yang <[email protected]>
@JonoYang JonoYang force-pushed the 180-projectcodebase-root-workaround branch from 95ad5af to 24902bf Compare May 11, 2022 02:48
@tdruez
Copy link
Contributor

tdruez commented Jun 14, 2022

@JonoYang now that we merged the toolkit upgrade, would it be a good time to rebase et complete this PR?

@JonoYang
Copy link
Member Author

@tdruez

I'm merging in the changes from main into this branch. I'm having trouble with replace_root_path_and_name (https://github.com/nexB/scancode.io/blob/180-projectcodebase-root-workaround/scanpipe/pipes/scancode.py#L504) that I made. My function renames the root resource path and name to ".", and removes the root path from the remaining resource paths. It's conflicting with the changes made to how Resource in a Codebase in scancode are related to one another. The walk method on commoncode.Resource uses the path of a Resource to find its children. If we walk from the root Resource of a Codebase, the walk fails because it cannot find the rest of the Resources in the codebase because no other Resource paths start with ".".

I think what I am trying to do in this function is best implemented somewhere in commoncode.Codebase, where I can pass some arguments to the Codebase to do this path transformations and modify the logic of walk to find the child of a root resource whose path is ".".

@tdruez tdruez mentioned this pull request Mar 2, 2023
@JonoYang
Copy link
Member Author

JonoYang commented Mar 2, 2023

Closing this in favor of #624. Instead of creating a single root CodebaseResource, ProjectCodebase has been updated to walk, not from a single root, but from every CodebaseResource in the root of a Project. This effectively will get the same tree traversal done without large changes to the Project/CodebaseResource models.

@JonoYang JonoYang closed this Mar 2, 2023
@JonoYang JonoYang deleted the 180-projectcodebase-root-workaround branch March 3, 2023 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants