Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isisdataeval - ISISDATA Evaluation and Verification #5111

Merged
merged 11 commits into from
Mar 29, 2023

Conversation

KrisBecker
Copy link
Contributor

Description

Motivation for this tool is to validate the ISISDATA installation on a local processing system. Recent changes to how the USGS provides ISISDATA to ISIS users separates the kernel configuration (USGS/AWS) from the kernel download resources (typically NAIF).

This can be a useful tool to quickly determine if spiceinit failures may be due to missing files or improper local ISIS installation. It can also be used to validate a local ISISDATA install where kernel file download filters have been applied to ISISDATA installation processing. For example, AWS S3 storage does not support symbolic links to files, so a great deal of effort has been made to identify and report these types of files (they lead to copies/redundancy of the linked file). When given the $ISISDATA directory (the default), this tool will inspect all files and directories contained in the installation for a valid ISISDATA configuration. As it pertains to isisdataeval, a valid ISISDATA installation is confirmed when all the contents of the kernel database (DB) and configuration files are translated into files that exist in the system. isisdataeval does not, however, confirm all necessary or required files have been downloaded to the local system, but it can be verified by comparing results of the application on each install of ISISDATA.

I apologize I do not have tests developed for this. Honestly, I am open for suggestions on how to test this application.

Related Issue

Related Issues:
#5110
#5109
#5105

How Has This Been Validated?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Documentation change (update to the documentation; no code change)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Infrastructure change (changes to things like CI or the build system that do not impact users)

Checklist:

  • I have read and agree to abide by the Code of Conduct
  • I have read the CONTRIBUTING document.
  • My change requires a change to the documentation and I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • I have added myself to the .zenodo.json document.
  • I have added any user impacting changes to the CHANGELOG.md document.

Licensing

This project is mostly composed of free and unencumbered software released into the public domain, and we are unlikely to accept contributions that are not also released into the public domain. Somewhere near the top of each file should have these words:

This work is free and unencumbered software released into the public domain. In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain.

  • I dedicate any and all copyright interest in this software to the public domain. I make this dedication for the benefit of the public at large and to the detriment of my heirs and successors. I intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

This new application will evaluate all the files in an ISISDATA local installation for a valid kernel configuration and verify content
@Kelvinrr
Copy link
Collaborator

Testing it locally, this might be useful to run during the nightly syncs to diagnose data issues.

@KrisBecker
Copy link
Contributor Author

My suggestion is to run this application after every ISISDATA update to your sources and make them available to users. Run the app on each $ISISDATA/{mission} directory independently and make them available as well.

This provides hashes for all data and each mission after every source sync and users can now check this resource and determine any timeline of ISISDATA integrity.

Some additional (perhaps Python) tools to run comparisons of the data would be super helpful. (NOTE the app contains the ability to generate a JSON file for output, too.)

@acpaquette acpaquette self-requested a review January 31, 2023 21:13
Copy link
Collaborator

@Kelvinrr Kelvinrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small comments for now. Still messing with it.

For testing: If it is possible to test on a subset, we can test it by creating temporary kerne.db files from string literals to a temp dir similar to how we test ISDs. Then you can create empty kernel files to test things like missing, external etc. If it helps I put up boilerplate of what that would look like. But you can see some examples in tests/Fixtures.cpp

for kernel DB files using the form <b>kernels.????.db</b>. The
highest version of these files are selected and its contents are read.
For every <b>File</b> specification in a <b>Selection</b> group,
the highested version is determined and then checked for existance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

highested -> highest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this, thanks.

Comment on lines 690 to 693
satisfy spiceinit requirements. A status of <em>external</em>
indicate the file <em>as specified</em> in the db/conf file
does not exist in the ISISDATA inventory. For the occurances
in the above results indicate an extra '/' in the file name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, external means that it is a file NOT mentioned in any .db and therefore essential a dead kernel to ISIS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. external file status as determined in isisdateaveal has several meanings that may include:

  • A file that is referred to in a kernel DB/conf file, but is not found in the full ISISDATA file inventory (it can still exist but the full path to the file must contain the expanded/specified $ISISDATA path or its deemed external). These situations are common when ISISDATA == DATADIR.
  • A misspelled kernel file specification that contains an extra / in its file name (as seen in Voyager SPK kernel configs in ISISDATA Evaluation and Verification #5110). The file is present and can be read in ISIS with the file specification, but it's still identified as an issue nonetheless.
  • When you choose DATADIR as a mission directory in $ISISDATA, e.g., DATADIR=$ISISDATA/osirisrex, any file specified in a kernel DB or conf file that is not contained within the $ISISDATA/osirisrex directory hierarchy (i.e., inventory), is considered external. This identifies ISISDATA dependencies outside the mission directory even if the file is in $ISISDATA but not in DATADIR - a quite useful feature. And these cases may not indicate an error, but it is good to know from a management perspective (if your ISIS mission kernel spec has outside dependencies).

Note its difficult to make assumptions in isisdataeval regarding a dead file because all files in ISISDATA are considered inventory, including calibration files, which are typically not part of the kernel infrastructure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems kinda broad, does it make sense to make these different classifications? It seems useful to flag files that match case 2 and 3 specifically rather than being pooled together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What classification do you suggest?

Case 2 is difficult to identify specifically as a misspelled file in the config file and may have different OS behaviors. It is determined by string comparisons of the environment variable expansion and the result of absolute path by OS resources via Qt::QFileInfo which seems to clean up the path.

I can add more documentation based upon this narrative if that will be helpful.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for now the extra text in the XML docs that breaks down these cases is good for now.

This commit updates documentation base upon code review.

Some additional modifications were made to code to clean/clarify some processing and ensure special cases are identified.

Added the VERIFY parameter as its not very straight forward if you don’t add the right parameter (need to hash files to find bad filenames). This option will not require any output files to run both validation (always ran) and verification (optional).
The isisdata_mockup.py tool is provide to create an ISISDATA mockup to test the application. This submission documents the process to set up testing in the ISIS Google Test environment.
This directory contains a minimal configuration to provide testing for isisdataeval. Note that the total size of all these files is about 120K. See the README.md in ISIS3/isis/src/system/apps/isisdataeval/tools for details on how this data is created.
There are some cube files and other files that end up being excluded in .gitignore. An entry was added that included all the files in this commit.
@KrisBecker KrisBecker requested review from Kelvinrr and removed request for acpaquette March 24, 2023 23:03
Copy link
Collaborator

@Kelvinrr Kelvinrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions inline.

Comment on lines +1 to +9
{
"source": "$ISISDATA/hayabusa/kernels/spk/hay_osbj_050911_051118_v1n.bsp",
"filesize": 81458176,
"createtime": "2022-08-12T22:29:59.000000",
"modifiedtime": "2022-11-23T23:37:37.651359",
"md5hash": "bbc5587a59fd5d5d9683ac5d957649ce",
"sha256hash": "8577f45d89d376a9bb1351b89e53c8d4e26327e465cc70b3bf361c975db53515",
"sha1hash": "072e199a7b2697b97e79c1fa6aa1a186b5e03dc3"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests depend on having real binary kernels? Could the test setup not simply create an empty kernel? Part of the application is comparing hashes, is this hash comparison critical to the apps tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the tests depend on having real binary kernels?

Only to compute (meaningful) hash values, which is why there is a reference to the real kernel/file.

Could the test setup not simply create an empty kernel?

It could. But that does not test the validation aspect of this application. You could (and highly recommended to) pick kernels which you know its file name, size and contents should not change or file not exist (all part of verification) after any download/install. Modify the test to only compute hashes for these kernels/files and it automatically validates your environment. I don't know how to make a better/useful test.

Part of the application is comparing hashes, is this hash comparison critical to the apps tests?

Yes. This validates $ISISDATA. By definition, this framework should also be completely consistent with conditions in any valid (i.e., fully downloaded/populated) ISIS test environment since it requires $ISISDATA to run the tests.

There are two things I cannot/do not test: file times and symbolic links since support/results have proven to be inconsistent across OSes for these file/directory/device characteristics. However, these data have proven to be useful on local systems, particularly when creating/configuring new mission kernel support systems in ISIS.

Comment on lines +88 to +89
With this tool and the files listed in `isisdataeval_isisdata_mockup_files.lis`, the test for isisdataeval can be recreated from any ISISDATA installation. Note that from time to time, the files used in this ISISDATA test could change which would cause failures. Here are the commands to create the `isisdataeval` test ISISDATA mockup directory - from the Git install directory:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appreciate the thorough instructions. Is it possible to simple make this mocked ISISDATA area static and not rely on the real data for comparison? From the tests it seems it's comparing hashes, my first thought is that we don't need to compare hashes from the original file? probably something I am missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure its possible to make data up, but as I mentioned above, this does not validate $ISISDATA. That would tell us nothing about ISISDATA or isisdataeval unless some additional effort or procedures are pursued on real data. These tests use selected files from four very stable $ISISDATA directories that should not change. This approach integrates verification and validation of required data for proper and more thorough testing of the ISIS system.

Secondarily, but just as important, this provides tools to that can help address ongoing problems with ISISDATA encountered by ISIS users by first ensuring the problem is not related to their local ISISDATA installation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on the second part, after this is merged I plan to add it to the nightly ISISDATA script and find a good way to report results in AWS such that we can get a better idea of the changes other than did download/did not download.

I think there might be a way to test the hash comparisons without real data, as the goal here is to test the app not the data itself, the data itself would be tested elsewhere. But there is already a ton of good work here and I'd rather get it in than split hairs on testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Can you hold off for a bit on the merge.I have an update coming.

It might address some of the things you mentioned.

Copy link
Collaborator

@Kelvinrr Kelvinrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests look good on Jenkins.

@Kelvinrr Kelvinrr merged commit 4e39c98 into DOI-USGS:dev Mar 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants