Skip to content

Commit

Permalink
Merge branch 'master' into dependabot/npm_and_yarn/prismjs-1.23.0
Browse files Browse the repository at this point in the history
  • Loading branch information
ajstewart committed Apr 1, 2021
2 parents c07b1ee + 6a3d5ad commit e2d1b79
Show file tree
Hide file tree
Showing 129 changed files with 4,585 additions and 1,294 deletions.
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

#### Added

- Added script to auto-generate code reference documentation pages [#480](https://github.com/askap-vast/vast-pipeline/pull/480).
- Added code reference section to documentation [#480](https://github.com/askap-vast/vast-pipeline/pull/480).
- Added new pages and sections to documentation [#471](https://github.com/askap-vast/vast-pipeline/pull/471)
- Added `requirements/environment.yml` so make it easier for Miniconda users to get the non-Python dependencies [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Added `pyproject.toml` and `poetry.lock` [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Added `init-tools/init-db.py` [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
Expand All @@ -22,6 +25,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
#### Changed

- Dependabot: Bump prismjs from 1.22.0 to 1.23.0 [#469](https://github.com/askap-vast/vast-pipeline/pull/469).
- Changed non-google format docstrings to google format [#480](https://github.com/askap-vast/vast-pipeline/pull/480).
- Changed some documentation layout and updated content [#471](https://github.com/askap-vast/vast-pipeline/pull/471).
- Changed the `vaex` dependency to `vaex-arrow` [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Set `CREATE_MEASUREMENTS_ARROW_FILES = True` in the basic association test config [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
- Bumped minimum Python version to 3.7.1 [#472](https://github.com/askap-vast/vast-pipeline/pull/472).
Expand Down Expand Up @@ -58,6 +63,8 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
#### List of PRs

- [#469](https://github.com/askap-vast/vast-pipeline/pull/469) dep: Bump prismjs from 1.22.0 to 1.23.0.
- [#480](https://github.com/askap-vast/vast-pipeline/pull/480) feat: Code reference documentation update.
- [#471](https://github.com/askap-vast/vast-pipeline/pull/471) feat: Documentation update.
- [#472](https://github.com/askap-vast/vast-pipeline/pull/472) feat: Simplify install.
- [#473](https://github.com/askap-vast/vast-pipeline/pull/473) fix: discard the selavy unit row before reading.
- [#466](https://github.com/askap-vast/vast-pipeline/pull/466) fix: Fixed initial job processing from the UI.
Expand Down
File renamed without changes.
332 changes: 332 additions & 0 deletions docs/adminusage/cli.md

Large diffs are not rendered by default.

117 changes: 117 additions & 0 deletions docs/design/association.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Source Association

This page details the association stage of a pipeline run.

There are three association methods available which are summarised in the table below, and detailed in the following sections.

!!! tip
For complex fields and large surveys the `De Ruiter` method is recommended.

| Method | Fixed Assoc. Radius | Astropy function | Possible Relation Types |
| ------------------------------ | ------------------------ | ---------------------- | ----------------------- |
| [Basic](#basic) | Yes | `match_coordinates_sky` | one-to-many |
| [Advanced](#advanced) | Yes | `search_around_sky` | many-to-many, many-to-one, one-to-many |
| [de Ruiter (TraP)](#de-ruiter) | No | `search_around_sky` | many-to-many, many-to-one, one-to-many |


## General Association Notes

### Terminology

During association, `measurements` are associated into unique `sources`.

### Association Process

By default, association is performed on an image-by-image basis, ordered by the observational date. The only time this isn't the case is when [Epoch Based Association](#epoch-based-association) is used.

!!! note
Epoch Based Association is not an association method, rather it changes how the measurements are handled when passed to one of the three methods for association.

### Weighted Average Coordinates

After every iteration of each association method, the average RA and Dec, weighted by the positional uncertainty, are calculated for each source. These weighted averages are then used as the base catalogue for the next association iteration. In other words, as the measurements are associated, new measurements are associated against the weighted average of the sources identified to that point in the process.

Sources positions are reported using the weighted averages.

## Association Methods

!!! tip
For a better understanding on the underlying process, see [this page](https://docs.astropy.org/en/stable/coordinates/matchsep.html#matching-catalogs){:target="_blank"} in the astropy documentation for examples on matching catalogues.

### Basic
The most basic association method uses the astropy [`match_coordinates_sky`](https://docs.astropy.org/en/stable/api/astropy.coordinates.match_coordinates_sky.html){:target="_blank"} function which:

* Associates measurements using only the nearest neighbour for each source when comparing catalogues.
* Uses a fixed association radius as a threshold for a 'match'.
* Only one-to-many [relations](#relations) are possible.

### Advanced
This method uses the same process as `Basic`, however the astropy function [`search_around_sky`](https://docs.astropy.org/en/stable/api/astropy.coordinates.search_around_sky.html){:target="_blank"} is used instead. This means:

* All possible matches between the two catalogues are found, rather than only the nearest neighbour.
* A fixed association radius is still applied as the threshold.
* All types of [relations](#relations) are possible.

### de Ruiter
The de Ruiter method is a translation of the association method used by the [LOFAR Transients Pipeline (TraP)](https://tkp.readthedocs.io/en/latest/){:target="_blank"}, which uses the `de Ruiter radius` in order to define associations.

The `search_around_sky` astropy method is still used, but the threshold for a potential match is first limited by a `beamwidth limit` value which is defined in the pipeline run configuration file (`ASSOCIATION_BEAMWIDTH_LIMIT`), such that the initial threshold separation distance is set to

$$
\text{beamwidth limit} \times \frac{\theta_{\text{bmaj,img}}}{2},
$$

where $\theta_{\text{bmaj,img}}$ is the major axis of the restoring beam of the image being associated. Then, the de Ruiter radius is calculated for all potential matches which is defined as

$$
r_{i,j} = \sqrt{
\frac{ (\alpha_{i} - \alpha_{j})^{2}((\delta_{i} + \delta_{j})/2)}{\sigma^{2}_{\alpha_{i}} + \sigma^{2}_{\alpha_{j}}}
+ \frac{(\delta_{i} + \delta_{j})^{2}}{\sigma^{2}_{\delta_{i}} + \sigma^{2}_{\delta_{j}}}
}
$$

where $\alpha_{n}$ is the right ascension of source n, $\delta_{n}$ is its declination, and $\sigma_{y}$ represents the error on the quantity y. Matches are then identified by applying a threshold maximum value to the de Ruiter radius which is defined by the user in the pipeline run configuration file (`ASSOCIATION_DE_RUITER_RADIUS`).

All relation types are possible using this method.

## Relations
Situations can arise where a source is associated with more than one source in the catalogue being cross-matched (or vice versa). Internally these types of associations are called:

* `many-to-many`
* `one-to-many`
* `many-to-one`

a good explanation of these situations is presented in the TraP documentation [here](https://tkp.readthedocs.io/en/latest/devref/database/assoc.html#database-assoc){:target="_blank"}. The VAST Pipeline follows the TraP methods in handling these types of associations, which is also detailed in the linked documentation. In short:

* `many-to-many` associations are reduced to `one-to-one` or `one-to-many` associations.
* `one-to-many` and `many-to-one` associations create "forked" unique sources. I.e. an individual datapoint can belong to two different sources.

The VAST Pipeline reports the `one-to-many` and `many-to-one` associations by `relating` sources. A source may have one or more `relations` which signifies the the source could be associated with more than one other source. This often happens for complex sources with many closely packed components.

A read-through of the [TraP documentation](https://tkp.readthedocs.io/en/latest/devref/database/assoc.html#database-assoc){:target="_blank"} is highly encouraged on this point as it contains an excellent description.

## Epoch Based Association
The pipeline is able to associate inputs on an epoch basis. What this means is that, for example, all VAST Pilot Epoch 1 measurements are grouped together and are associated with grouped together Epoch 2 measurements, and so on. In doing this, duplicate measurements from within the same epoch are cut with the measurement kept being that which is closest to the centre of its respective image. The separation distance that defines a duplicate is defined in the pipeline run configuration file (`ASSOCIATION_EPOCH_DUPLICATE_RADIUS`).

The mode is activated by entering the images to be processed as `dictionary` objects, using an orderable string as the key and lists of images as the values, as demonstrated below.

```python
IMAGE_FILES = {
"epoch01": ["/full/path/to/image1.fits", "/full/path/to/image2.fits"],
"epoch02": ["/full/path/to/image3.fits"],
}
```

The lightcurves below show the difference between 'regular' association (top) and 'epoch based' association (lower) for a source.

[![Regular Association](../img/regular_association.png){: loading=lazy }](../img/regular_association.png)
[![Epoch Based Association](../img/epoch_based_association.png){: loading=lazy }](../img/epoch_based_association.png)

For large surveys where transient and variablity searches on the epoch timescale is required, using this mode can greatly speed up the association stage.

!!! warning
Epoch based association does eliminate the full time resolution of your data! The base time resolution will be between the defined epochs.

## Parallel Association
When parallel association is used, the images to process are analysed and grouped into distinct patches of the sky that do not overlap. These distinct regions are then processed through the source association in parallel. It is recommended to use parallel association when your dataset covers three or more distinct patches of sky.

Loading

0 comments on commit e2d1b79

Please sign in to comment.