Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation update #471

Merged
merged 28 commits into from
Apr 1, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
7cb82f2
Docs update - Using the Pipeline
ajstewart Jan 31, 2021
d4cc538
Added ingest page.
ajstewart Mar 11, 2021
30c67f8
Added near complete association page
ajstewart Mar 11, 2021
f40aab8
Added new source and monitoring pages
ajstewart Mar 12, 2021
d31392f
Added source stats page.
ajstewart Mar 15, 2021
5a8734e
Added pipeline outputs page.
ajstewart Mar 15, 2021
dc406a3
Completed output section
ajstewart Mar 18, 2021
54fc7ee
Added new navbar options: docs and start a discussion
ajstewart Mar 18, 2021
4ba056f
Finished website section
ajstewart Mar 19, 2021
65b1874
Merge branch 'master' into docs-update-adam
ajstewart Mar 19, 2021
c7713a5
Update CHANGELOG.md
ajstewart Mar 19, 2021
62b616b
Fix internal page links
ajstewart Mar 20, 2021
3dc71ac
Added a code reference section to the documentation
ajstewart Mar 26, 2021
3c73982
Apply suggestions from code review
ajstewart Mar 26, 2021
d674012
Moved Mathjax to `docs/theme/js/extra.js`
ajstewart Mar 26, 2021
45e9a5a
Merge branch 'master' into docs-update-adam
ajstewart Mar 26, 2021
6237536
Merge branch 'docs-update-adam' into docs-update-code-reference
ajstewart Mar 26, 2021
9925605
Fix quickstart usage links
ajstewart Mar 26, 2021
840662b
Merge branch 'master' into docs-update-adam
ajstewart Mar 26, 2021
52ab2da
Merge branch 'docs-update-adam' into docs-update-code-reference
ajstewart Mar 26, 2021
9a66aa4
Updated urls and admin docstrings.
ajstewart Mar 26, 2021
c1a5353
Apply suggestions from code review
ajstewart Mar 30, 2021
94f5c6a
Apply suggestions from code review
ajstewart Mar 30, 2021
f13c0d3
Manually commit monitor.md suggestion
ajstewart Mar 30, 2021
34104a3
Updated adminusage/cli.md
ajstewart Mar 30, 2021
7a21d31
Merge branch 'docs-update-adam' into docs-update-code-reference
ajstewart Mar 30, 2021
99da822
Added docstrings to translators and converters
ajstewart Mar 30, 2021
ede5a73
Merge pull request #480 from askap-vast/docs-update-code-reference
ajstewart Apr 1, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

#### Added

- Added script to auto-generate code reference documentation pages [#480](https://github.com/askap-vast/vast-pipeline/pull/480).
- Added code reference section to documentation [#480](https://github.com/askap-vast/vast-pipeline/pull/480).
- Added new pages and sections to documentation [#471](https://github.com/askap-vast/vast-pipeline/pull/471)
- Added `requirements/environment.yml` so make it easier for Miniconda users to get the non-Python dependencies [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Added `pyproject.toml` and `poetry.lock` [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Added `init-tools/init-db.py` [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
Expand All @@ -21,6 +24,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

#### Changed

- Changed non-google format docstrings to google format [#480](https://github.com/askap-vast/vast-pipeline/pull/480).
- Changed some documentation layout and updated content [#471](https://github.com/askap-vast/vast-pipeline/pull/471).
- Changed the `vaex` dependency to `vaex-arrow` [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Set `CREATE_MEASUREMENTS_ARROW_FILES = True` in the basic association test config [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Bumped minimum Python version to 3.7.1 [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
Expand Down Expand Up @@ -56,6 +61,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

#### List of PRs

- [#480](https://github.com/askap-vast/vast-pipeline/pull/480) feat: Code reference documentation update.
- [#471](https://github.com/askap-vast/vast-pipeline/pull/471) feat: Documentation update.
- [#472](https://github.com/askap-vast/vast-pipeline/pull/472) feat: Simplify install.
- [#473](https://github.com/askap-vast/vast-pipeline/pull/473) fix: discard the selavy unit row before reading.
- [#466](https://github.com/askap-vast/vast-pipeline/pull/466) fix: Fixed initial job processing from the UI.
Expand Down
File renamed without changes.
332 changes: 332 additions & 0 deletions docs/adminusage/cli.md

Large diffs are not rendered by default.

117 changes: 117 additions & 0 deletions docs/design/association.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Source Association

This page details the association stage of a pipeline run.

There are three association methods available which are summarised in the table below, and detailed in the following sections.

!!! tip
For complex fields and large surveys the `De Ruiter` method is recommended.

| Method | Fixed Assoc. Radius | Astropy function | Possible Relation Types |
| ------------------------------ | ------------------------ | ---------------------- | ----------------------- |
| [Basic](#basic) | Yes | `match_coordinates_sky` | one-to-many |
| [Advanced](#advanced) | Yes | `search_around_sky` | many-to-many, many-to-one, one-to-many |
| [de Ruiter (TraP)](#de-ruiter) | No | `search_around_sky` | many-to-many, many-to-one, one-to-many |


## General Association Notes

### Terminology

During association, `measurements` are associated into unique `sources`.

### Association Process

By default, association is performed on an image-by-image basis, ordered by the observational date. The only time this isn't the case is when [Epoch Based Association](#epoch-based-association) is used.

!!! note
Epoch Based Association is not an association method, rather it changes how the measurements are handled when passed to one of the three methods for association.

### Weighted Average Coordinates

After every iteration of each association method, the average RA and Dec, weighted by the positional uncertainty, are calculated for each source. These weighted averages are then used as the base catalogue for the next association iteration. In other words, as the measurements are associated, new measurements are associated against the weighted average of the sources identified to that point in the process.

Sources positions are reported using the weighted averages.

## Association Methods

!!! tip
For a better understanding on the underlying process, see [this page](https://docs.astropy.org/en/stable/coordinates/matchsep.html#matching-catalogs){:target="_blank"} in the astropy documentation for examples on matching catalogues.

### Basic
The most basic association method uses the astropy [`match_coordinates_sky`](https://docs.astropy.org/en/stable/api/astropy.coordinates.match_coordinates_sky.html){:target="_blank"} function which:

* Associates measurements using only the nearest neighbour for each source when comparing catalogues.
* Uses a fixed association radius as a threshold for a 'match'.
* Only one-to-many [relations](#relations) are possible.

### Advanced
This method uses the same process as `Basic`, however the astropy function [`search_around_sky`](https://docs.astropy.org/en/stable/api/astropy.coordinates.search_around_sky.html){:target="_blank"} is used instead. This means:

* All possible matches between the two catalogues are found, rather than only the nearest neighbour.
* A fixed association radius is still applied as the threshold.
* All types of [relations](#relations) are possible.

### de Ruiter
The de Ruiter method is a translation of the association method used by the [LOFAR Transients Pipeline (TraP)](https://tkp.readthedocs.io/en/latest/){:target="_blank"}, which uses the `de Ruiter radius` in order to define associations.

The `search_around_sky` astropy method is still used, but the threshold for a potential match is first limited by a `beamwidth limit` value which is defined in the pipeline run configuration file (`ASSOCIATION_BEAMWIDTH_LIMIT`), such that the initial threshold separation distance is set to

$$
\text{beamwidth limit} \times \frac{\theta_{\text{bmaj,img}}}{2},
$$

where $\theta_{\text{bmaj,img}}$ is the major axis of the restoring beam of the image being associated. Then, the de Ruiter radius is calculated for all potential matches which is defined as

$$
r_{i,j} = \sqrt{
\frac{ (\alpha_{i} - \alpha_{j})^{2}((\delta_{i} + \delta_{j})/2)}{\sigma^{2}_{\alpha_{i}} + \sigma^{2}_{\alpha_{j}}}
+ \frac{(\delta_{i} + \delta_{j})^{2}}{\sigma^{2}_{\delta_{i}} + \sigma^{2}_{\delta_{j}}}
}
$$

where $\alpha_{n}$ is the right ascension of source n, $\delta_{n}$ is its declination, and $\sigma_{y}$ represents the error on the quantity y. Matches are then identified by applying a threshold maximum value to the de Ruiter radius which is defined by the user in the pipeline run configuration file (`ASSOCIATION_DE_RUITER_RADIUS`).

All relation types are possible using this method.

## Relations
Situations can arise where a source is associated with more than one source in the catalogue being cross-matched (or vice versa). Internally these types of associations are called:

* `many-to-many`
* `one-to-many`
* `many-to-one`

a good explanation of these situations is presented in the TraP documentation [here](https://tkp.readthedocs.io/en/latest/devref/database/assoc.html#database-assoc){:target="_blank"}. The VAST Pipeline follows the TraP methods in handling these types of associations, which is also detailed in the linked documentation. In short:

* `many-to-many` associations are reduced to `one-to-one` or `one-to-many` associations.
* `one-to-many` and `many-to-one` associations create "forked" unique sources. I.e. an individual datapoint can belong to two different sources.

The VAST Pipeline reports the `one-to-many` and `many-to-one` associations by `relating` sources. A source may have one or more `relations` which signifies the the source could be associated with more than one other source. This often happens for complex sources with many closely packed components.

A read-through of the [TraP documentation](https://tkp.readthedocs.io/en/latest/devref/database/assoc.html#database-assoc){:target="_blank"} is highly encouraged on this point as it contains an excellent description.

## Epoch Based Association
The pipeline is able to associate inputs on an epoch basis. What this means is that, for example, all VAST Pilot Epoch 1 measurements are grouped together and are associated with grouped together Epoch 2 measurements, and so on. In doing this, duplicate measurements from within the same epoch are cut with the measurement kept being that which is closest to the centre of its respective image. The separation distance that defines a duplicate is defined in the pipeline run configuration file (`ASSOCIATION_EPOCH_DUPLICATE_RADIUS`).

The mode is activated by entering the images to be processed as `dictionary` objects, using an orderable string as the key and lists of images as the values, as demonstrated below.

```python
IMAGE_FILES = {
"epoch01": ["/full/path/to/image1.fits", "/full/path/to/image2.fits"],
"epoch02": ["/full/path/to/image3.fits"],
}
```

The lightcurves below show the difference between 'regular' association (top) and 'epoch based' association (lower) for a source.

[![Regular Association](../img/regular_association.png){: loading=lazy }](../img/regular_association.png)
[![Epoch Based Association](../img/epoch_based_association.png){: loading=lazy }](../img/epoch_based_association.png)

For large surveys where transient and variablity searches on the epoch timescale is required, using this mode can greatly speed up the association stage.

!!! warning
Epoch based association does eliminate the full time resolution of your data! The base time resolution will be between the defined epochs.

## Parallel Association
When parallel association is used, the images to process are analysed and grouped into distinct patches of the sky that do not overlap. These distinct regions are then processed through the source association in parallel. It is recommended to use parallel association when your dataset covers three or more distinct patches of sky.

Loading