Code for automatically detecting Large Scale Traveling Ionospheric Disturbances (LSTIDs) from ham radio spot data.
Developed by the HamSCI NASA Space Weather Operations to Research (SWO2R) Team with major contributions by:
- Nathaniel Frissell W2NAF
- Nicholas Callahan
- Diego Sanchez KD2RLM
- Bill Engelke AB4EJ
- Mary Lou West KC2NMC
This code was tested on an x86 Ubuntu 22.04 LTS Linux machine with python v3.11.9 and the following libraries:
dask==2024.5.0
matplotlib==3.8.4
numpy==2.1.0
pandas==2.2.2
scipy==1.14.1
statsmodels==0.14.2
xarray==2023.6.0
- Clone GitHub Repository
pip install -e .
- Place raw spot data files into
raw_data
directory.- Raw spot data should be bzip2 compressed daily files.
- Names should be in the form of:
2018-11-01_PSK.csv.bz2
,2018-11-01_RBN.csv.bz2
, and2018-11-01_WSPR.csv.bz2
, etc. - Data files for 1 November 2018 - 30 April 2019 are available from https://doi.org/10.5281/zenodo.10673981. Due to Zenodo file number limitations, data files in this repository are combined into one *.tar file for each month.
- Edit parameters in the top of
run_LSTID_detection.py
. - Run
./run_LSTID_detection.py
Using multiprocessing on a 64-thread machine with 512 GB RAM, this code takes about 12 minutes to process the 1 November 2018 - 30 April 2019 data from https://doi.org/10.5281/zenodo.10673981.
Data Loading and Gridding is handled by hamsci_LSTID_detect.data_loading.RawSpotProcessor()
.
- For each day, RBN, PSK, and WSPRNet spot data is combined into a single data frame.
- Data is filtered based on frequency, TX-RX midpoint location, and TX-RX ground range. For Frissell et al. (2024, GRL), the following filters are used, which corresponds to 14 MHz signals over North America:
$$20^{\circ} < lat < 60^{\circ}$$ $$-160^{\circ} < lon < -60^{\circ}$$ - 14 MHz < f < 15 MHz
- 0 km < R_gc < 3000 km
- Filtered data is gridded into 10 km range by 1 minute bins.
Gridded array re-scaling is handled by hamsci_LSTID_detect.data_loading.create_xarr()
.
- Data array is trimmed so that only daylight hours in North America are used (1200-2359 UTC).
- A scaled version
$M_{ad}$ of the gridded array$A$ is computed byhamsci_LSTID_detect.data_loading.mad()
as follows:$$M_{ad} = \frac{|A-\mbox{Med}(A)|}{\mbox{max}(\mbox{Med}(A),0.05)}$$
Skip distance edge-detection is handled by hamsci_LSTID_detect.edge_detection.run_edge_detect()
.
- The x- and y- dimensions of the gridded array are trimmed by 8%.
- A
scipy.ndimage.gaussian_filter()
with$\sigma=(4.2, 4.2)$ is applied to the gridded array. - The gridded array has the minimum subtracted from the array so the lower bound is 0.
- A maximum of the gridded array, excluding upper outliers, is taken by selecting the value of the
occurence_max=60
th pixel's largest value. - The gridded array is re-scaled by the maximum and
i_max=30
values so that the outlier adjusted maximum is re-scaled to 30, and all float values are rounded to integers, so the array consists of approximately 30 discrete thresholds. - For each unique value in the integer array, a lower threshold line is calculated by taking the index of the lowest point in each column such that the value at that point is greater than the threshold.
- All thresholds for the integer array are stacked vertically, and threshold values below a specified y location are set to
np.nan
. - Column-wise, for each value
q
inqs=[.4,.5,.6]
, a scalar value is selected to represent the column as the detected edge by selecting theq
th quantile value with nans ignored. - Using the edges for each value in
[.4,.5,.6]
, each edge has outlier points removed, where any point that deviates more thanmax_abs_dev=20
vertical pixels from the smoothed edge is interpolated withscipy.interpolate.CubicSpline
. - After outlier removal, the edge with the lowest standard deviation against the smoothed and interpolated version of the edge is returned as
min_line
andminz_line
, in the original and smoothed form respectively. -
min_line
is returned as the raw detected edge for the image, along with the unselected edges corresponding to theqs
, and theminz_line
for comparison.
A theoretical sinusoid is fit to the detected edge by hamsci_LSTID_detect.edge_detection.run_edge_detect()
.
- A 15 minute rolling coefficient of variation
$CV = \sigma/\mu$ is computed on the raw detected edge for use as a quality parameter. - The largest contiguous time period between 1330 and 2230 UTC where
$CV < 0.5$ is selected as "good". - A 2nd-degree polynomial is fit to the good period.
- The raw detected edge within this time period is detrended using a least-squares best-fit second degree polynomial.
- A
$1 < T < 4.5$ hr band-pass filter is applied to the detrended edge. - This filtered, detrended result is curve-fit to
$$A\sin(2\pi ft+\phi) + mt +b$$ to determine the LSTID amplitude$A$ and period$T=1/f$ .-
scipy.optimize.curve_fit()
is used as the curve fitter. - The curve fit routine is run for each of the following initial period guesses:
$T_{hr}$ = [1,1.5,2,2.5,3,3.5,4]. - Other parameter initial guesses are as follows:
guess['amplitude_km'] = np.ptp(data_detrend)/2.
guess['phase_hr'] = 0.
guess['offset_km'] = np.mean(data_detrend)
guess['slope_kmph'] = 0.
- The fit with the highest
$r^2$ value is selected as the best fit.
-
Figure 1 shows LSTID automatic detection plot for 15 December 2018.
- Panel (a): Heatmap of re-scaled, smoothed, and thresholded RBN, PSKReporter, and WSPRNet data as a function of communications ground range versus time. Only 14 MHz band data with communications midpoints over the US (
$$20^{\circ} < lat < 60^{\circ}$$ and$$-160^{\circ} < lon < -60^{\circ}$$ ) are used. Data are gridded in 10 km x 1 min bins. - Panel (b): Heatmap data from (a) with algorithm fit overlays. Blue line is raw detected edge; white dashed line is the final sinusoid fit with the trend added back in. Gray line is the 15-minute rolling coefficient of variation quality parameter. Green dashed vertical lines mark the start and end times used for curve fitting.
- Panel (c): Blue line is the detrended, band-pass filtered detected edge. Red dashed line is the sinusoid fit to the detrended edge. Green dashed vertical lines mark the start and end times used for curve fitting.
- Panel (d): Curve-fit parameters for the 2nd degree polynomial detrend fit and the sinusoid curve fit.
This work was supported by NASA Grants 80NSSC21K1772, 80NSSC23K0848 and United States National Science Foundation (NSF) Grant AGS-2045755.