Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run configuration implementation security issue #479

Closed
marxide opened this issue Mar 23, 2021 · 3 comments · Fixed by #483
Closed

Run configuration implementation security issue #479

marxide opened this issue Mar 23, 2021 · 3 comments · Fixed by #483
Assignees
Labels
discussion topic to discuss openly enhancement New feature or request high priority An issue that needs to be prioritised. python Pull requests that update Python code

Comments

@marxide
Copy link
Contributor

marxide commented Mar 23, 2021

The run configuration file is just a Python file that is loaded by importlib. When it's loaded, it's executed outside of any user context. This means that any arbitrary code in a run config file will be executed without any permission checks. Since we allow users to write and upload their own config files, this presents a serious security concern. A user could, among other things, delete an entire run that wasn't theirs, make themselves an administrator, basically anything you can do in a Django shell session.

@ajstewart and I discussed this briefly and think we should change the run configuration to use configparser. It's in the Python stdlib, is made for configuration files, and maintains a reasonable level of human readability.

The downside is that we lose the flexibility to do things like glob and sort paths for input files. This was pretty handy (I use it all the time) but thinking about it more it's probably better to provide the input files explicitly so we can be sure exactly what files were used in a run and the order in which they were provided.

@marxide marxide self-assigned this Mar 23, 2021
@marxide marxide added enhancement New feature or request high priority An issue that needs to be prioritised. python Pull requests that update Python code labels Mar 23, 2021
@marxide
Copy link
Contributor Author

marxide commented Mar 23, 2021

We may want to consider using a strict YAML subset (provided by strictyaml) instead of configparser. Below are some limited examples of both (I haven't replicated the full config).

I find both options viable. Personally, I find both to be human readable and suitable for users to edit by hand through the UI. From a development standpoint, I like the YAML option because of the neat schema validation features. I welcome your thoughts!

configparser

Pros:

  • in the stdlib

Cons:

  • values are scalar only - no lists

Examples

Since we can't use lists, the choices for defining the input files are either: a large, comma-separated string of paths; or a series of config options with no value in their own sections. I've opted for the latter in the example below since I think the strings would become difficult to work with when large.

[Run settings]
run path = /foo/bar/pipeline-runs/run1
default survey = none
suppress astropy warnings = true

[Image files]
/foo/bar/image1.epoch1.fits
/foo/bar/image2.epoch1.fits
/foo/bar/image1.epoch2.fits
/foo/bar/image2.epoch2.fits

[Selavy files]
/foo/bar/selavy-image1.epoch1.txt
/foo/bar/selavy-image2.epoch1.txt
/foo/bar/selavy-image1.epoch2.txt
/foo/bar/selavy-image2.epoch2.txt

[Noise files]
/foo/bar/noisemap.image1.epoch1.fits
/foo/bar/noisemap.image2.epoch1.fits
/foo/bar/noisemap.image1.epoch2.fits
/foo/bar/noisemap.image2.epoch2.fits

[Background files]
/foo/bar/meanmap.image1.epoch1.fits
/foo/bar/meanmap.image2.epoch1.fits
/foo/bar/meanmap.image1.epoch2.fits
/foo/bar/meanmap.image2.epoch2.fits

[Source monitoring]
monitor = true
min sigma = 3.0
edge buffer scale = 1.2
cluster threshold = 3.0
allow nan = false

For epoch-based association, we can set the value for the input file "options" with the epoch.

[Image files]
/foo/bar/image1.epoch1.fits = 1
/foo/bar/image2.epoch1.fits = 1
/foo/bar/image1.epoch2.fits = 2
/foo/bar/image2.epoch2.fits = 2

Strict YAML

Pros:

  • schema validation with feedback

Cons:

  • stricter syntax compared with configparser, e.g. susceptible to bad indentation (but then so is our existing method)

Example

Run settings:
  run path: /foo/bar/pipeline-runs/run1
  default survey: none
  suppress astropy warnings: true
Input files:
  Images:
  - /foo/bar/image1.epoch1.fits
  - /foo/bar/image2.epoch1.fits
  - /foo/bar/image1.epoch2.fits
  - /foo/bar/image2.epoch2.fits
  Selavy:
  - /foo/bar/selavy-image1.epoch1.txt
  - /foo/bar/selavy-image2.epoch1.txt
  - /foo/bar/selavy-image1.epoch2.txt
  - /foo/bar/selavy-image2.epoch2.txt
  Noise:
  - /foo/bar/noiseMap.image1.epoch1.fits
  - /foo/bar/noiseMap.image2.epoch1.fits
  - /foo/bar/noiseMap.image1.epoch2.fits
  - /foo/bar/noiseMap.image2.epoch2.fits
  Background:
  - /foo/bar/meanMap.image1.epoch1.fits
  - /foo/bar/meanMap.image2.epoch1.fits
  - /foo/bar/meanMap.image1.epoch2.fits
  - /foo/bar/meanMap.image2.epoch2.fits
Source monitoring:
  monitor: true
  min sigma: 3.0
  edge buffer scale: 1.2
  cluster threshold: 3.0
  allow nan: false

Since YAML supports lists and mappings, we can activate the epoch-based association similarly to how we do it now.

Input files:
  Images:
    1:
    - /foo/bar/image1.epoch1.fits
    - /foo/bar/image2.epoch1.fits
    2:
    - /foo/bar/image1.epoch2.fits
    - /foo/bar/image2.epoch2.fits

The best part of the YAML option from my point of view is the validation. The strictyaml package allows schemas to be defined and the config files to be validated against it. It even allows multiple schema options so we can validate against regular and epoch-based association at the same time.

import strictyaml

# define the input file schemas
_standard_mode_mapping = strictyaml.UniqueSeq(strictyaml.Str())
_epoch_mode_mapping = strictyaml.MapPattern(strictyaml.Str(), strictyaml.UniqueSeq(strictyaml.Str()))
# allow either schema for the input files
_input_file_mapping = _standard_mode_mapping | _epoch_mode_mapping

schema = strictyaml.Map({
    "Run settings": strictyaml.Map({
        "run path": strictyaml.Str(),
        "default survey": strictyaml.Str(),
        "suppress astropy warnings": strictyaml.Bool(),
    }),
    "Input files": strictyaml.Map({
        "Images": _input_file_mapping,
        "Selavy": _input_file_mapping,
        "Noise": _input_file_mapping,
        "Background": _input_file_mapping,
    }),
    "Source monitoring": strictyaml.Map({
        "monitor": strictyaml.Bool(),
        "min sigma": strictyaml.Float(),
        "edge buffer scale": strictyaml.Float(),
        "cluster threshold": strictyaml.Float(),
        "allow nan": strictyaml.Bool(),
    }),
})

# config_no_run_path is the first YAML example above but with the run path removed
try:
    yaml = strictyaml.load(config_no_run_path, schema, label="run config")
except strictyaml.YAMLValidationError as e:
    print(e)
while parsing a mapping
  in "run config", line 1, column 1:
    Run settings:
     ^ (line: 1)
required key(s) 'run path' not found
  in "run config", line 3, column 1:
      suppress astropy warnings: 'true'
    ^ (line: 3)

@marxide marxide added the discussion topic to discuss openly label Mar 23, 2021
@marxide
Copy link
Contributor Author

marxide commented Mar 23, 2021

I've looked over the current run config validation. The following validation steps can be removed in favor of YAML schema validation:

  • No unknown config options
  • Required config options present
  • No duplicates in the input file lists
  • Number of input files match for each type (i.e. images, selavy, noise)
  • Selected source finder is a valid choice (i.e. selavy)
  • Selected association method is a valid choice (i.e. basic, advanced, deruiter)
  • Background files are provided if source monitoring is turned on, and the number of background files matches the other inputs

As far as I can tell, the following validation steps are not possible with schema validation and would still be required:

  • Number of input files does not exceed the configured maximum
  • Input files exist and are readable
  • At least 2 input images are provided

@ajstewart
Copy link
Contributor

I would be in support of switching to YAML from what you've said here!

marxide added a commit that referenced this issue Apr 1, 2021
Changes the pipeline run configuration file format to YAML.
Implements a strict YAML validator.
Closes #479.
marxide added a commit that referenced this issue Apr 1, 2021
Changes the pipeline run configuration file format to YAML.
Implements a strict YAML validator.
Closes #479.
marxide added a commit that referenced this issue Apr 23, 2021
* replace run config .py files with .yaml
Changes the pipeline run configuration file format to YAML.
Implements a strict YAML validator.
Closes #479.

* update docs

* updated CHANGELOG

* Add pipeline/config.py to code reference nav.

* Fix UI arrow file set to False.

* Apply suggestions from code review

Co-authored-by: Adam Stewart <[email protected]>

* docs and docstring fixes

* improve pipeline config error messages

* background section always given in run config
Run config template will now always contain the background section. If source monitoring is off, it will be commented out.

* more specific run config validation messages

* lock deps

Co-authored-by: Adam Stewart <[email protected]>
Co-authored-by: Adam Stewart <[email protected]>
marxide added a commit that referenced this issue Apr 28, 2021
Changes the pipeline run configuration file format to YAML.
Implements a strict YAML validator.
Closes #479.
marxide added a commit that referenced this issue May 21, 2021
* Bump y18n from 3.2.1 to 3.2.2

Bumps [y18n](https://github.com/yargs/y18n) from 3.2.1 to 3.2.2.
- [Release notes](https://github.com/yargs/y18n/releases)
- [Changelog](https://github.com/yargs/y18n/blob/master/CHANGELOG.md)
- [Commits](https://github.com/yargs/y18n/commits)

Signed-off-by: dependabot[bot] <[email protected]>

* Add epoch based association clarification to documentation

* .env file text and dark mode

- Added dark mode to docs.
- Added text about the .env file to documentation.
- Clarified and made clearer the Django Q requirement for running jobs.

* Changed VAST logo to icon format to not appear stretched

* Apply suggestions from code review

Co-authored-by: Andrew O'Brien <[email protected]>

* Bump minimum mkdocs-material to 7.1.1 to ensure dark mode functionality

* Updated CHANGELOG.md

* Fixed CHANGELOG.md

* Updated the docs layout, home page and FAQs

- Enabled a new home page with wave design and announcement bar.
- Changed layout to tabs and sections mode.
- Changed location of some pages to other tab headings.
- Added to FAQ section.

* Updated CHANGELOG.md

* Moved CSS to own stylesheet file

- Fixed Safari image width

* align bokehJS and bokeh python versions (#489)

* align bokehJS and bokeh python versions
Fixes #488

* added comment for bokeh to pyproject.toml

* replace run config .py files with .yaml (#483)

* replace run config .py files with .yaml
Changes the pipeline run configuration file format to YAML.
Implements a strict YAML validator.
Closes #479.

* update docs

* updated CHANGELOG

* Add pipeline/config.py to code reference nav.

* Fix UI arrow file set to False.

* Apply suggestions from code review

Co-authored-by: Adam Stewart <[email protected]>

* docs and docstring fixes

* improve pipeline config error messages

* background section always given in run config
Run config template will now always contain the background section. If source monitoring is off, it will be commented out.

* more specific run config validation messages

* lock deps

Co-authored-by: Adam Stewart <[email protected]>
Co-authored-by: Adam Stewart <[email protected]>

* Added announcement bar update step to making a release doc

* Fix bokeh hover tool for lightcurve plots (#493)

* fix missing lightcurve hovertool for non-variables

* add default x-range of 1 day to lightcurve

* updated CHANGELOG
Fixes #490

* Added lightgallery support

* Complete lightgallery support and more

* Fixed docstring in config.py.
* Updated all image captions.
* Split off `DataTables` and `Tags and Favourites` to their own pages.
* Expanded documentation of the documentation.
* Enclosed all file examples in example admonition.
* Renamed docs/quickstart -> docs/gettingstarted.
* Added acknowledgements page.
* Moved FAQs to before Reference.
* Fixed MathJax and instant navigation.

* fix broken cone search when coord = 0 (#501)

* fix broken cone search when coord = 0
Fixes #495

* updated CHANGELOG

* Change arrow file generation from vaex to pyarrow

* Removed vaex dependency.
* Removed UI warnings of arrow files not being generated when the run is processed via the website.
* Updated docs.

Fixes #497.

* Minor fixes

* added cos(dec) scaling to aladin lite footprint (#502)

* Renamed FAQs -> FAQ



* Added missing lightgallery css and js files



* Added "Relations False Variability" section to association page



* fix "back to top" TOC link

The code to add a "back to top" link to the table of contents would fail with an error message on pages that don't have a TOC.
The link was also never added if the user started on a page without a TOC as the mkdocs-material instant loading feature prevents JS from being executed when navigating between local docs pages. This patch ensures the link is always added.

* Update docs/design/association.md

Co-authored-by: Andrew O'Brien <[email protected]>

* set default auto field model (#507)

* set default auto field model
Silences warnings introduced in Django 3.2. Closes #505.
 since this doesn't change the model, it only tells Django what it already is.

* updated CHANGELOG

* Add glob expression support to yaml run config (#504)

* added support for globs in yaml run config
Input files can be specified with one or more glob expressions. See docs for details.
Resolved input file paths are matched and printed to the run log.
Closes #485
updated changelog

* fixed missing background images from run log

* wrap config snippets in example admonishment

Apply suggestions from code review.

Co-authored-by: Adam Stewart <[email protected]>

* add docstring and fix docs

* fix outdated ref in tests docs

Co-authored-by: Adam Stewart <[email protected]>

* add tmate action debugging (#512)

Enable by triggering a manual test run. See https://github.com/marketplace/actions/debugging-with-tmate


* Revert "add tmate action debugging (#512)" (#513)

This reverts commit 836e5a6.

* Draw selected source components on top in JS9 (#508)

* Draw selected source components on top in JS9
Fixes #200
 no changes covered by tests

* update changelog


* changed to single JS9 shape layer
JS9 only allows the top shape layer to respond to mouse events. Reverted
back to a single layer with the shapes internally ordered.


* Fix rogue relations (#510)

- Fixed rogue relations
- Fixed docstring typos.
- Replace deprecated pandas kwarg
- Update expected regression test data
- Updated changelog to reflect test changes
- Refactored test_most_relations() to sort by relations and RA.

Co-authored-by: Andrew O'Brien <[email protected]>

* Added created and updated dates to doc pages. (#514)

* Added mkdocs-git-revision-date-localized-plugin to docs

* Added date created to pages with override

- Also changed docs workflow to fetch with fetch-depth: 0.

* Updated CHANGELOG.md

* Updated documentation docs page

* Corrected tree on docs page

* remove linebreaks from coordinates (#515)

* remove linebreaks from coordinates
Removes the excess whitespace between the lat and lon coordinates which was being included in the clipboard copy text.


* pin djangorestframework-datatables to 0.5.1 (#517)

* pin djangorestframework-datatables to 0.5.1
Later versions give the wrong total number of results during pagination. It only seems to affect cases when a different model is paginated, e.g. the list of measurements of a run returned by views.RunViewSet.measurements.


* update doc related to default dev branch (#521)

* update doc related to default dev branch

Co-authored-by: Andrew O'Brien <[email protected]>

* Fixed external search results table nan values (#523)

Co-authored-by: Andrew O'Brien <[email protected]>

* Updated mkdocs-material for native creation date support (#518)

Co-authored-by: Andrew O'Brien <[email protected]>

* removed actions ssh key
forced-phot is now public and doesn't require authentication to clone 

* Updated source detail page layout (#526)

* Updated source detail page layout

- Added first detection JS9 at the top of the page.
- Extended the external results row count.
- Swap position of JS9 and aladin on measurement detail page to match.

* Minor tidy of new lines

- 

* fixed broken image detail link (#528)

* fixed broken image detail link
Fixes #527 

* updated changelog

* remove action git config to use ssh
With no private repo dependencies, we can use the default anonymous https instead.

* Bump lodash from 4.17.20 to 4.17.21 (#525)

Bumps [lodash](https://github.com/lodash/lodash) from 4.17.20 to 4.17.21.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](lodash/lodash@4.17.20...4.17.21)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump hosted-git-info from 2.8.8 to 2.8.9 (#524)

Bumps [hosted-git-info](https://github.com/npm/hosted-git-info) from 2.8.8 to 2.8.9.
- [Release notes](https://github.com/npm/hosted-git-info/releases)
- [Changelog](https://github.com/npm/hosted-git-info/blob/v2.8.9/CHANGELOG.md)
- [Commits](npm/hosted-git-info@v2.8.8...v2.8.9)

Signed-off-by: dependabot[bot] <[email protected]>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Versioning (#532)

* add versioning and updated release docs
Closes #492

* add version to run pipeline log
Closes #529

* updated migration docs
We should no longer instruct devs to squash migrations since the pipeline models are mostly stable and large runs exist

* exclude version file from docs

* added discussions and defines issues and PRs

* updated changelog


* simplify git flow description


* add version to page footer


* updated npm deps; removed bootstrap (#533)

* updated npm deps; removed bootstrap
Updated npm deps according to semver rules.
Removed bootstrap in favour of the version bundled with startbootstrap-sb-admin-2.

* updated changelog

* fixed breadcrumb truncation


* Update homepage text (#534)

* Update homepage text

- Reference to documentation.
- Removed reference to users not being able to run pipeline runs.
- Added version to homepage.

* Minor change to homepage text


* added help and acknowledgement doc page (#535)

* added help and acknowledgement doc page
Updated acknowledgements
Removed email link from docs footer
Closes #522

* docs: fix email links, removed slack from footer

* updated changelog


* changed source naming convention (#536)

* changed source naming convention
Replaced ASKAP_<coord> naming convention with J<coord>. Coords have an RA precision of 1 and a Dec precision of 0, matching the convention used in the VAST-P1 paper.
Sources can be searched by name with an optional "VAST" prefix which is silently removed before searching.
Includes a data migration to update existing source names.

* updated changelog
Closes #530

* remove redundant types from docstrings
Co-authored-by: Adam Stewart <[email protected]>

* fixed formatting of examples in docstrings

* define reverse data migration

* Removed survey source models, commands and references (#538)

* Removed survey source models, commands and references

- Removed `Survey`, `SurveySource` and `SurveySourceQuerySet` from models.
- Removed related fields from Source model and crossmatch relation.
- Removed `importsurvey` command and docs page.
- Updated schema diagrams.

Non related minor fixes:
- Updated command line docs page for `clearpiperun` command.
- Added type annotation to `remove_forced_meas()` in `forced_extraction.py`.

Fixes #462.

* Added merged migration file

* delete leftover refs to removed models

* added required fields to SourceSerializer

* Removed default_survey option from run config
- Also removed `SURVEYS_WORKING_DIR` from settings and env.
- Removed references from documentation.

Co-authored-by: Andrew O'Brien <[email protected]>

* prep for release v1.0.0

* updated release docs


Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam Stewart <[email protected]>
Co-authored-by: Shibli Saleheen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion topic to discuss openly enhancement New feature or request high priority An issue that needs to be prioritised. python Pull requests that update Python code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants