Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LAYERS sheet to XLSX output for Docker scans uploaded to SCIO #926

Closed
mjherzog opened this issue Sep 14, 2023 · 11 comments
Closed

Add LAYERS sheet to XLSX output for Docker scans uploaded to SCIO #926

mjherzog opened this issue Sep 14, 2023 · 11 comments
Labels
enhancement New feature or request outputs This issue is related to one of the SCIO output files/

Comments

@mjherzog
Copy link
Member

It would be nice to get the LAYERS sheet in the XLSX output for a Docker scan added to the SCIO database with a load_inventory pipeline.

@mjherzog mjherzog added enhancement New feature or request outputs This issue is related to one of the SCIO output files/ labels Sep 14, 2023
@tdruez
Copy link
Contributor

tdruez commented Sep 14, 2023

@mjherzog Could you clarify what would you like to be imported from the LAYERS sheet? I'm not sure to get what's missing from the load_inventory pipeline import.

@mjherzog
Copy link
Member Author

I am looking for the same LAYERS data that we get in the XLSX output when you run a Docker image scan directly in SCIO with a docker pipeline instead of loading an SCIO Docker scan (json) run on a different SCIO instance.
The fields are:

  • layer_tag
  • created_by (Docker command)
  • layer_id
  • image_id
  • created (date)
  • size
  • author
  • comment
  • xlsx_errors

The most important data are: created_by and the xref of layer_id to image_id.

@tdruez
Copy link
Contributor

tdruez commented Sep 14, 2023

The LAYERS sheet is generated from the Project.extra_data field.

In the case of a JSON export, the integrality of that field is stored in the JSON and can be restored on load_inventory, but in the case of XLSX, most of the value is lost as we only keep the layers entries from the Project.extra_data.

We could convert back the layers data into the Project.extra_data field but the original data and structure will be lost and this may be problematic for the next export to XLSX.


Example of Project.extra_data

images:
  - os: linux
    tags:
      - alpine-mini:latest
    author:
    distro:
      os: linux
      logo:
      name: Alpine Linux
      id_like: []
      variant:
      version:
      build_id:
      cpe_name:
      home_url: https://alpinelinux.org/
      extra_data: {}
      identifier: alpine
      variant_id:
      version_id: 3.16.1
      pretty_name: Alpine Linux v3.16
      support_url:
      architecture: amd64
      bug_report_url: https://gitlab.alpinelinux.org/alpine/aports/-/issues
      version_codename:
      documentation_url:
      privacy_policy_url:
    labels: {}
    layers:
      - os:
        size: 305152
        author:
        labels: []
        sha256: 8826df928cad11bdd3830d029971f952660954d6afa8d3413093a77602d420e8
        comment: buildkit.dockerfile.v0
        created: '2022-08-02T00:49:43.155600296+02:00'
        variant:
        layer_id: 8826df928cad11bdd3830d029971f952660954d6afa8d3413093a77602d420e8
        created_by: 'ADD alpine.tar / # buildkit'
        os_version:
        architecture:
        docker_version:
        is_empty_layer: no
        archive_location: bbab8f037289d59a7464ec66c1b710097fe9bffe8e70467f4f6d8f54b8422aca/layer.tar
        extracted_location: docker-mini-with-license-alpine.tar.xz-extract/8826df928cad11bdd3830d029971f952660954d6afa8d3413093a77602d420e8
    sha256:
    comment:
    created: '2022-08-02T00:49:43.155600296+02:00'
    history:
      - comment: buildkit.dockerfile.v0
        created: '2022-08-02T00:49:43.155600296+02:00'
        created_by: 'ADD alpine.tar / # buildkit'
    variant:
    image_id: 33ebbbd3ccb4c86576d32eb638fbcaec9f600e715fbf918d1a6e7e7f00f95742
    os_version:
    architecture: amd64
    image_format: docker
    config_digest: sha256:33ebbbd3ccb4c86576d32eb638fbcaec9f600e715fbf918d1a6e7e7f00f95742
    docker_version:

Exported XLSX:
Screenshot 2023-09-14 at 22 31 00

@mjherzog
Copy link
Member Author

I am hoping to get this from a use case where the imported/loaded Scan is in json format. I am not expecting this to work from loading an XLSX scan file.

@tdruez
Copy link
Contributor

tdruez commented Sep 19, 2023

The images data is stored at the Project level in the extra_data field.
When generating outputs such as the XLSX, this extra_data is used to generate the LAYERS sheet.

Now, when using the load_inventory, the pipeline only loads the inventory (package, dependencies, files) into a new Project, as it supports multiple "inventory" inputs side by side, Project-specific data, such as the extra_data is not loaded as we have potentially multiple Projects to load from.

imported/loaded Scan is in JSON

To clarify, this pipeline is not a "Project importer" that would restore a full SCIO project state. This could be a new feature but would be limited to a single Project input for the "import" part.

@mjherzog
Copy link
Member Author

@tdruez Thank you for the detailed explanation. Adding this feature seems ultimately more complicated than beneficial at this time given all the variables including the fact that it would apply only to Docker data.

@pombredanne
Copy link
Member

Now, when using the load_inventory, the pipeline only loads the inventory (package, dependencies, files) into a new Project, as it supports multiple "inventory" inputs side by side, Project-specific data, such as the extra_data is not loaded as we have potentially multiple Projects to load from.

@mjherzog @tdruez IMHO we should have a way to reload all the data from a scan in a project. May this this would not work if there are more than one scan loaded, but in the general case this should work.

@pombredanne pombredanne reopened this Sep 25, 2023
@pombredanne
Copy link
Member

To clarify, this pipeline is not a "Project importer" that would restore a full SCIO project state. This could be a new feature but would be limited to a single Project input for the "import" part.

This makes sense... @mjherzog I do not think this would be a complex feature... this is basically loading the inventory as we do now, plus loading the project's extra data.

@mjherzog
Copy link
Member Author

I was worried that adding it only for Docker scans with a single image would add significant complexity in general.

@tdruez
Copy link
Contributor

tdruez commented Sep 26, 2023

The complexity only exists when trying to load two, or more, projects into a single one.

@tdruez
Copy link
Contributor

tdruez commented Jan 6, 2025

Now, when using the load_inventory, the pipeline only loads the inventory (package, dependencies, files) into a new Project, as it supports multiple "inventory" inputs side by side, Project-specific data, such as the extra_data is not loaded as we have potentially multiple Projects to load from.

I've revisited this issue and added support for loading the extra_data value(s) from JSON input files in #1507

In the case of a single JSON file as input for the load_inventory, the extra_data is restored as-is from the original Project.

In the case of multiple JSON files as inputs for the load_inventory, each of the input's extra_data is loaded prefixed with the input file name. For example:

  • output1.json
  • output2.json

Content of the extra_data field:

output1.json:
   <extra_data>

output2.json:
   <extra_data>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request outputs This issue is related to one of the SCIO output files/
Projects
None yet
Development

No branches or pull requests

3 participants