diff --git a/_files/template.md b/_files/template.md new file mode 100644 index 0000000..3626c77 --- /dev/null +++ b/_files/template.md @@ -0,0 +1,87 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.0 + kernelspec: + display_name: Python 3 (ipykernel) + language: python + name: python3 +--- + +# Descriptive Title of the Content of the Notebook +--- +Header Section: include the following information. + +- **Description:** A template on how to write a notebook tutorial on sciserver. +- **Level:** Beginner | Intermediate | Advanced. +- **Data:** Descirbe what data, if any will be use. If None, write: NA +- **Requirements:** Describe what is needed to run the notebooks. For example: "Run in the (heasof) conda environment on Sciserver". Or "python packages: [`heasoftpy`, `astropy`, `numpy`]". +- **Credit:** Who wrote the notebebook and when. +- **Support:** How to get help. +- **Last verified to run:** (00/00/000) When was this last tested. + +--- + + +## 1. Introduction +Describe the content. It can contain 0plain text, bullets, and/or images as needed. +Use `Markdown` when writing. + +The following are suggested subsections. Not all are needed: +- Motivation / Science background. +- Learning goals. +- Details about the requirements, and on running the notebook outside Sciserver. +- Type of outcome or end product. + +You may want to include the following section on how to run the notebook outise sciserver. +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +

+Running Outside Sciserver:
+This notebook runs in the heasoftpy conda environment on Sciserver. +If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are correctly installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
+ +The following gives an example sections. + + +## 2. Imports + +```python +# add imports here +import heasoftpy as hsp +``` + +## 3.0 Define Input if needed +This section will include things like: +- obsIDs +- Plot settings +- Work directory +- Detector settings +- etc + +```python +obsid = '000000000' +``` + +## 4. Data Access +How is the data used here can be found, and accessed (e.g. copied, downloaded etc.) + + +## 5. Data Processing or Extraction +Include this section if needed (choose a relevant title) + + +## 6. Data Products +If the notebook produces some data products, describe them here, and how the user will be using them. + +```python + +``` diff --git a/ixpe_intro.md b/analysis-ixpe-example.md similarity index 78% rename from ixpe_intro.md rename to analysis-ixpe-example.md index 4340d7a..4af1f73 100644 --- a/ixpe_intro.md +++ b/analysis-ixpe-example.md @@ -5,33 +5,70 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.15.2 + jupytext_version: 1.16.0 kernelspec: display_name: (heasoft) language: python name: heasoft --- -# Getting Started with IXPE data +# Getting Started with IXPE Data +
-This notebook is a tutorial on accessing IXPE data on Sciserver and getting started with analysing the data. You will learn to download the data, extract the source and background regions and perform spectro-polarimetric fits. +- **Description:** An example on analysing IXPE data on Sciserver using heasoftpy. +- **Level:** Intermediate. +- **Data:** IXPE observation of blazar **Mrk 501** (ObsID 01004701). +- **Requirements:** `heasoftpy`, `xspec`, `matplotlib` +- **Credit:** Kavitha Arur (Jun 2023). +- **Support:** Contact the [IXPE Guest Observer Facility (GOF)](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback?selected=ixpe) or the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024. + +
+ + +## 1. Introduction + +This notebook is a tutorial on accessing IXPE data on Sciserver and getting started with analysing them. You will learn to download the data, extract the source and background regions and perform spectro-polarimetric fits. It also highly recommended that new users read the IXPE Quick Start Guide ([linked here](https://heasarc.gsfc.nasa.gov/docs/ixpe/analysis/IXPE_quickstart.pdf)) and the recommended practices for statistical treatment of IXPE results [here](https://heasarcdev.gsfc.nasa.gov/docs/ixpe/analysis/IXPE_Stats-Advice.pdf). +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +
+Also, this notebook requires heasoftpy, which is available in the (heasoft) conda environment. You should see (heasoft) at the top right of the notebook. If not, click there and select it. + +Running Outside Sciserver:
+If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
+ + +## 2. Module Imports +We need the following python modules: + +
+In this example, reprocessing the data is not required. Instead the level 2 data products are sufficient. If you need to reprocess the data, the IXPE tools are available with from heasoftpy import ixpe. +
+ +```python +import glob +import matplotlib.pyplot as plt +import heasoftpy as hsp +import xspec +``` -## Finding and exploring data +## 3. Finding the Data -All the heasarc data is mounted into the compute under /FTP/, so once we have the path to the data, we can directly access it without the need to download it. +On Sciserver, all the HEASARC data ire mounted locally under `/FTP/`, so once we have the path to the data, we can directly access it without the need to download it. -For our exploratory data analysis, we will use an observation of the blazar Mrk 501 (ObsID 01004701). +For our exploratory data analysis, we will use an observation of the blazar **Mrk 501** (ObsID 01004701). -(For more information on how to locate datasets of interest, see the [data access notebook](data_access.ipynb).) +You can also see the [Getting Started](getting-started.md), [Data Access](data-access.md) and [Finding and Downloading Data](data-find-download.md) tutorials for examples on how to find data. ```python -paths_txt = """ -/FTP/ixpe/data/obs/01/01004701 -""" -paths = paths_txt.split('\n')[1:-1] +data_path = "/FTP/ixpe/data/obs/01/01004701" ``` Check the contents of this folder @@ -44,26 +81,22 @@ It should contain the standard IXPE data files, which include: For a complete description of data formats of the level 1, level 2 and calibration data products, see the support documentation on the [IXPE Website](https://heasarc.gsfc.nasa.gov/docs/ixpe/analysis/#supportdoc) ```python -import glob -glob.glob(f'{paths[0]}/*') +glob.glob(f'{data_path}/*') ``` ---- -## Analyzing The Data +## 4. Exploring The Data To Analyze the data within the notebook, we use `heasoftpy`. -In the folder for each observation, check for a README file. This file is included with a description of known issues (if any) with the processing for that observation. +In the folder for each observation, check for a `README` file. This file is included with a description of known issues (if any) with the processing for that observation. In this *IXPE* example, it is not necessary to reprocess the data. Instead the level 2 data products can be analysed directly. ```python -import heasoftpy as hsp - # set some input -indir = paths[0] +indir = data_path obsid = indir.split('/')[-1] -filelist = glob.glob(f'{paths[0]}/event_l2/*') +filelist = glob.glob(f'{indir}/event_l2/*') filelist ``` @@ -74,16 +107,15 @@ det1_fits = filelist[0] det2_fits = filelist[1] det3_fits = filelist[2] -#print the file structure for event 1 dectector file +#print the file structure for event 1 detector file out = hsp.fstruct(infile=det1_fits).stdout print(out) ``` ---- -## Extracting the spectro polarimetric data +## 5. Extracting the spectro polarimetric data -### Defining the source and background regions +### 5.1 Defining the Source and Background Regions To obtain the source and background spectra from the Level 2 files, we need to define a source region and background region for the extraction. This can also be done using `ds9`. @@ -101,7 +133,7 @@ f.write('annulus(16:53:51.766,+39:45:44.41,132.000",252.000")') f.close() ``` -### Running the extractor tools +### 5.2 Running the extractor tools The `extractor` tool from FTOOLS, can now be used to extract I,Q and U spectra from IXPE Level 2 event lists as shown below. @@ -165,21 +197,20 @@ if out.returncode != 0: raise Exception('extractor for det3 failed!') ``` ---- -### Obtaining the response files +### 5.3 Obtaining the Response Files For the I spectra, you will need to include the RMF (Response Matrix File), and the ARF (Ancillary Response File). -For the Q and U spectra, you will need to include the RMF and MRF (Modulation Response File). The MRF is defined as defined by the product of the energy-dependent modulation factor, $\mu$(E) and the ARF. +For the Q and U spectra, you will need to include the RMF and MRF (Modulation Response File). The MRF is defined by the product of the energy-dependent modulation factor, $\mu$(E) and the ARF. The location of the calibration files can be obtained through the `hsp.quzcif` tool. Type in `hsp.quzcif?` to get more information on this function. Note that the output of the `hsp.quzcif` gives the path to more than one file. This is because there are 3 sets of response files, corresponding to the different weighting schemes. -For the 'NEFF' weighting, use 'alpha07_02'. -For the 'SIMPLE' weighting, use 'alpha075simple_02'. -For the 'UNWEIGHTED' version, use '20170101_02'. +- For the 'NEFF' weighting, use 'alpha07_02'. +- For the 'SIMPLE' weighting, use 'alpha075simple_02'. +- For the 'UNWEIGHTED' version, use '20170101_02'. ```python # hsp.quzcif? @@ -233,11 +264,10 @@ res = hsp.quzcif(mission='ixpe', instrument='gpd',detector='DU3', mrf3 = [x.split()[0] for x in res.output if 'alpha075_02' in x][0] ``` ---- -### Load data into PyXSPEC and start fitting + +### 5.4 Load data into PyXSPEC and start fitting ```python -import xspec rmf_list = [rmf1,rmf2,rmf3] mrf_list = [mrf1,mrf2,mrf3] @@ -251,6 +281,7 @@ for (du, rmf_file, mrf_file, arf_file) in zip(du_list, rmf_list, mrf_list, arf_l #Load the I data xspec.AllData("%i:%i ixpe_det%i_src_I.pha"%(du, du+x, du)) + xspec.AllData(f"{du}:{du+x} ixpe_det{du}_src_I.pha") s = xspec.AllData(du+x) # #Load response and background files @@ -318,16 +349,11 @@ xspec.AllModels.show() xspec.Fit.perform() ``` ---- -### Plotting the results +### 5.5 Plotting the results -This is done through matplotlib. +This is done through `matplotlib`. ```python -import matplotlib.pyplot as plt - - -%matplotlib inline xspec.Plot.area=True xspec.Plot.xAxis='keV' xspec.Plot('lda') @@ -366,8 +392,7 @@ ax.set_xlabel('Energy (keV)') ax.set_ylabel(r'Polangle') ``` ---- -## Interpreting the results from XSPEC +## 6. Interpreting the results from XSPEC There are two parameters of interest in our example. These given by the polarization fraction, A, and polarization angle, $\psi$. The XSPEC error (or uncertainty) command can be used @@ -414,7 +439,7 @@ plt.xlabel('A') plt.errorbar(m1.polconst.A.values[0],m1.polconst.psi.values[0],fmt='+') ``` -### Determining the flux and calculating MDP +### 6.1 Determining the flux and calculating MDP Note that the detection is deemed "highly probable" (confidence C > 99.9%) as @@ -439,7 +464,7 @@ mean of exposure time of 97243 s gives an MDP99 of 5.70% meaning that, for an un This is consistent with the highly probable detection deduced here of a polarization fraction of 7.45$\pm$1.8%. -## Additional Resources +## 7. Additional Resources Visit the IXPE [GOF Website](https://heasarcdev.gsfc.nasa.gov/docs/ixpe/analysis/) and the IXPE [Project Website at MSFC](https://ixpe.msfc.nasa.gov/for_scientists/index.html) for more resources. diff --git a/nicer-example.md b/analysis-nicer-example.md similarity index 62% rename from nicer-example.md rename to analysis-nicer-example.md index 5576780..3a900f7 100644 --- a/nicer-example.md +++ b/analysis-nicer-example.md @@ -5,38 +5,82 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.15.2 + jupytext_version: 1.16.0 kernelspec: display_name: (heasoft) language: python name: heasoft --- -# An Example Analysing NICER Data One Sciserver +# Analysing NICER Data On Sciserver +
+ +- **Description:** An example on analysing NICER data on Sciserver using heasoftpy. +- **Level:** Beginner. +- **Data:** NICER observation of **PSR_B0833-45** (obsid = 4142010107) +- **Requirements:** `heasoftpy`, `xspec`, `astropy`, `matplotlib` +- **Credit:** Mike Corcoran / Abdu Zoghbi (May 2022). +- **Support:** Contact the [HEASARC helpdesk or NICER Guest Observer Facility (GOF)](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024. + +
+ + +## 1. Introduction In this tutorial, we will go through the steps of analyzing a NICER observation of `PSR_B0833-45` (`obsid = 4142010107`) using `heasoftpy`. -The following assumes this notebook is run from the (heasoft) environment on Sciserver. You should see `(Heasoft)` at the top right of the notebook. If not, click there and select `(Heasoft)`. Heasoft higher than v6.31 is required in order to be able to run `nicerl3` tools. -If running outside sciserver, please ensure that heasoft v6.31 or above is installed. +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +
+Also, this notebook requires heasoftpy, which is available in the (heasoft) conda environment. You should see (heasoft) at the top right of the notebook. If not, click there and select it. + +
+Heasoft higher than v6.31 is required in order to be able to run nicerl3 tools. +

+ +Running Outside Sciserver:
+If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft (higher than v6.31) are installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
+ + +## 2. Module Imports +We need the following python modules: + +
+Note that for heasoftpy < 1.4, nicerl2 is accessed via heasoftpy.nicerl2. For heasoftpy >= 1.4, it is available under a separate module called nicer that needs to be imported explicitly with from heasoftpy import nicer +
```python ## Import libraries that we will need. -import heasoftpy as hsp -import xspec as xs -from astropy.io import fits -from astropy.table import Table import os import sys -import matplotlib.pylab as plt +from astropy.io import fits +from astropy.table import Table import numpy as np +import matplotlib.pylab as plt + +import heasoftpy as hsp +import xspec as xs + +from packaging import version +if version.parse(hsp.__version__) < version.parse('1.4'): + nicer = hsp +else: + from heasoftpy import nicer ``` -# Set up the NICER obsid directory +## 3. Set up the NICER obsid directory -We are using OBSID `4142010107`. The data archive is mounted under `/FTP/..`. To find the exact location of the observation, we can use `pyvo` to query the archive using the VO services, or use Xamin, as illustrated in the `Getting-Started` and `data_access` notebooks +We are using OBSID `4142010107`. The data archive is mounted under `/FTP/..`. To find the exact location of the observation, we can use `pyvo` to query the archive using the VO services, or use Xamin, as illustrated in the [Getting Started](getting-started.md) and [Data Access](data-access.md) tutorials. -Because nicerl2 may modify of the observation directory, we copy it from the data location. +Because `nicerl2` may modify of the observation directory, we copy it from the data location. If running the notebook outside Sciserver, the data will need to be downloaded from the archive. + +We will use the current directory as a working directory. ```python nicerobsID = '4020180460' @@ -47,10 +91,10 @@ if not os.path.exists(nicerobsID): os.system(f'cp -r {dataLocation} {work_dir}') ``` -# Process and Clean the Data. +## 4. Process and Clean the Data. Next, we run the `nicerl2` pipeline to process and clean the data using `heasoftpy` -There are different ways of calling a `heasoftpy` task. Here, we first create a dictionary that contains the input parameters for the `nicerl2` task, which is then passed to `hsp.nicerl2` +There are different ways of calling a `heasoftpy` task. Here, we first create a dictionary that contains the input parameters for the `nicerl2` task, which is then passed to `nicer.nicerl2` ```python # input @@ -64,7 +108,7 @@ inPars = { } # run the task -out = hsp.nicerl2(inPars) +out = nicer.nicerl2(inPars) # check that everything run correctly if out.returncode == 0: @@ -76,21 +120,21 @@ else: fp.write('\n'.join(out.output)) ``` - -# Extract the Spectra using `nicerl3-spect` +## 5. Extract the Spectra using `nicerl3-spect` We use `nicerl3-spect3` (which is available in heasoft v6.31 and up). -#### Note -> Note that the `-` symbol in the name is replace by `_` when calling the equivalent python name, so that `nicerl3-spect3` becomes `nicerl3_spect3` +
+Note that the - symbol in the name is replace by _ (underscore) when calling the equivalent python name, so that nicerl3-spect3 becomes nicerl3_spect3 +
+
For this example, we use the `scorpeon` background model to create a background pha file. You can choose other models too, if needed. The spectra are written to the `spec` directory. Note that we set the parameter `updatepha` to `yes`, so that the header of the spectral file is modifered to point to the relevant response and background files. - ```python # Setup the output directory @@ -116,7 +160,7 @@ inPars = { } # run the spectral extraction task -out = hsp.nicerl3_spect(inPars) +out = nicer.nicerl3_spect(inPars) # check that the task run correctly if out.returncode == 0: @@ -127,16 +171,16 @@ else: print(f'ERROR in nicerl3-spect {nicerobsID}; Writing log to {logfile}') with open(logfile, 'w') as fp: fp.write('\n'.join(out.output)) - ``` -# Extract the Light Curve using `nicerl3-lc` +## 6. Extract the Light Curve using `nicerl3-lc` We use `nicerl3-lc` (which is available in heasoft v6.31 and up). -#### Note -> Note that, similar to `nicerl3_spect`, the `-` symbol in the name is replace by `_` when calling the equivalent python name, so that `nicerl3-lc` becomes `nicerl3_lc` +
+Note that, similar to nicerl3_spect, the - symbol in the name is replace by _ when calling the equivalent python name, so that nicerl3-lc becomes nicerl3_lc. +
Note that no background light curve is estimated @@ -158,7 +202,7 @@ inPars = { } # run the light curve task -out = hsp.nicerl3_lc(inPars) +out = nicer.nicerl3_lc(inPars) # check the task runs correctly if out.returncode == 0: @@ -170,9 +214,9 @@ else: fp.write('\n'.join(out.output)) ``` -# Analysis +## 7. Analysis -## 1. Spectral Analysis +### 7.1 Spectral Analysis Here, we will show an example of how the spectra we just extract can be analyzed using `pyxspec`. The spectra is loaded and fitted with a broken power-law model. @@ -220,7 +264,7 @@ plt.errorbar(en, cr, xerr=enwid, yerr=crerr, fmt='k.', alpha=0.2) plt.plot(en, mop,'r-') ``` -## 2. Plot the Light Curve +### 7.2 Plot the Light Curve Next, we going to read the light curve we just generated. Different Good Time Intervals (GTI) are plotted separately. diff --git a/nustar_lightcurve_example.md b/analysis-nustar-lightcurve.md similarity index 68% rename from nustar_lightcurve_example.md rename to analysis-nustar-lightcurve.md index f94a975..9fbaeb9 100644 --- a/nustar_lightcurve_example.md +++ b/analysis-nustar-lightcurve.md @@ -5,41 +5,84 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.15.2 + jupytext_version: 1.16.0 kernelspec: display_name: (heasoft) language: python name: heasoft --- +# NuSTAR Lightcurve Extraction On Sciserver +
+ +- **Description:** Run NuSTAR pipeline and extract light curve products. +- **Level:** Beginner. +- **Data:** NuSTAR observation of **SWIFT J2127.4+5654** (obsid = 60001110002) +- **Requirements:** `heasoftpy`, `numpy`, `matplotlib` +- **Credit:** Abdu Zoghbi (Jan 2023). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024. + +
+ + ## An Example Analysing NuSTAR Data One Sciserver In this tutorial, we will go through the steps of analyzing NuSTAR observation of the AGN in center of `SWIFT J2127.4+5654` with `obsid = 60001110002` using `heasoftpy`. -The following assumes this notebook is run from the (heasoft) environment on Sciserver. You should see `(Heasoft)` at the top right of the notebook. If not, click there and select `(Heasoft)` +We will run the NuSTAR pipeline to reprocess the data and then extract a light curve. + +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +
+Also, this notebook requires heasoftpy, which is available in the (heasoft) conda environment. You should see (heasoft) at the top right of the notebook. If not, click there and select it. + +

+ +Running Outside Sciserver:
+If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
+ + +## 2. Module Imports +We need the following python modules: + +
+Note that for heasoftpy < 1.4, nupipeline is accessed via heasoftpy.nupipeline. For heasoftpy >= 1.4, it is available under a separate module called nustar that needs to be imported explicitly with from heasoftpy import nustar +
```python import os import sys +import numpy as np +import matplotlib.pyplot as plt + import heasoftpy as hsp -``` -```python -print(hsp.__version__) +from packaging import version +if version.parse(hsp.__version__) < version.parse('1.4'): + nustar = hsp +else: + from heasoftpy import nustar ``` +## 3. Run the Reprocessing Pipeline + We are interested in *NuSTAR* observation `60001110002`. To obtain the full path to the data directory, we can use [Xamin](https://heasarc.gsfc.nasa.gov/xamin/) and select `FTP Paths` in `Data Products Cart` to find the path: `/FTP/nustar/data/obs/00/6//60001110002/`. -You can also see the `Getting-Started.ipynb` and `data_access.ipynb` notebooks for examples using `pyVO` to find the data. +You can also see the [Getting Started](getting-started.md), [Data Access](data-access.md) and [Finding and Downloading Data](data-find-download.md) tutorials for examples using `pyVO` to find the data. On Sciserver, all the data is available locally in the path `/FTP/...`. -In the case of *NuSTAR*, we don't even have to copy the data. We can call the pipleline tool using the that data path. +In the case of *NuSTAR*, we don't even have to copy the data. We can call the pipleline tool using that data path. ```python obsid = '60001110002' -path = '/FTP/nustar/data/obs/00/6//60001110002/' +path = f'/FTP/nustar/data/obs/00/6/{obsid}/' ``` @@ -64,15 +107,15 @@ outdir = obsid + '_p/event_cl' stem = 'nu' + obsid # call the tasks -out = hsp.nupipeline(indir=indir, outdir=outdir, steminputs=stem, instrument='FPMA', - clobber='yes', noprompt=True, verbose=True) +out = nustar.nupipeline(indir=indir, outdir=outdir, steminputs=stem, instrument='FPMA', + clobber='yes', noprompt=True, verbose=True) ``` After running for some time, and if things run smoothly, the last a few lines of the output may contain a message like: ``` ============================================================================================= -nupipeline_0.4.9: Exit with no errors - Fri Nov 26 13:53:29 EST 2021 +nupipeline_0.4.9: Exit with no errors - ... ============================================================================================= ``` @@ -103,7 +146,7 @@ nupipeline(noprompt=True, verbose=True) --- -### Extracting a light curve +## 4. Extract a Light Curve Now that we have data processed, we can proceed and extract a light curve for the source. For this, we use `nuproducts` (see [nuproducts](https://heasarc.gsfc.nasa.gov/lheasoft/ftools/caldb/help/nuproducts.html) for details) First, we need to create a source and background region files. @@ -152,6 +195,7 @@ out = nuproducts(params, noprompt=True, verbose=True) print('return code:', out.returncode) ``` +## 5. Read and Plot the Light Curve listing the content of the output directory `60001110002_p/lc`, we see that the task has created a source and background light cruves (`nu60001110002A01_sr.lc` and `nu60001110002A01_bk.lc`) along with the corresponding spectra. The task also generates `.flc` file, which contains the background-subtracted light curves. @@ -174,13 +218,6 @@ out = hsp.ftlist(infile='60001110002_p/lc/nu60001110002A01.flc', option='T', - After reading the data, we plot the data points with full exposure (`Fraction_exposure == 1`) -```python -import numpy as np -import matplotlib.pyplot as plt - -%matplotlib inline -``` - ```python lc_data = np.genfromtxt('60001110002_p/lc/nu60001110002A01.txt', missing_values='NULL', filling_values=np.nan) good_data = lc_data[:,4] == 1 @@ -203,3 +240,7 @@ plt.errorbar(lc_data[:,0], lc_data[:,2], lc_data[:,3], fmt='o', lw=0.5) plt.xlabel('Time (sec)') plt.ylabel('Count Rate (per sec)') ``` + +```python + +``` diff --git a/analysis-rxte-lightcurve.md b/analysis-rxte-lightcurve.md new file mode 100644 index 0000000..d78e1fc --- /dev/null +++ b/analysis-rxte-lightcurve.md @@ -0,0 +1,354 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.0 + kernelspec: + display_name: (heasoft) + language: python + name: heasoft +--- + +# RXTE Lightcurve Exctraction On Sciserver with Parallelization +
+ +- **Description:** Finding data and extracting light curves from RXTE data, with an example of running tasks in parallel. +- **Level:** Advanced. +- **Data:** RXTE observations of **eta car** taken over 16 years. +- **Requirements:** `heasoftpy`, `pyvo`, `matplotlib`, `tqdm` +- **Credit:** Tess Jaffe (Sep 2021). Parallelization by Abdu Zoghbi (Jan 2024) +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024. + +
+ + +## 1. Introduction + +This notebook demonstrates an analysis of 16 years of RXTE data, which would be difficult outside of SciServer. + +The RXTE archive contain standard data product that can be used without re-processing the data. These are described in details in the [RXTE ABC guide](https://heasarc.gsfc.nasa.gov/docs/xte/abc/front_page.html). + +We first find all of the standard product light curves. Then, realizing that the channel boundaries in the standard data products do not address our science question, we re-extract light curves following the RXTE documentation and using `heasoftpy`. + +As we will see, a single run on one observations takes about 20 seconds, which means that extracting all observations takes about a week. We will show an example of how this can be overcome by parallizing the analysis, reducing the run time from weeks to a few hours. + +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +
+Also, this notebook requires heasoftpy, which is available in the (heasoft) conda environment. You should see (heasoft) at the top right of the notebook. If not, click there and select it. + +Running Outside Sciserver:
+If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
+ + +## 2. Module Imports +We need the following python modules: + + +```python +import sys, os, shutil +import glob +import pyvo as vo +import numpy as np +from astropy.io import fits +from astropy.coordinates import SkyCoord +import matplotlib.pyplot as plt +import datetime + +# tqdm is needed to show progress +from tqdm import tqdm + +import heasoftpy as hsp + +# for prallelization +import multiprocessing as mp +``` + +## 3. Find the Data + +To find the relevent data, we can use [Xamin](https://heasarc.gsfc.nasa.gov/xamin/), the HEASARC web portal, or the Virtual Observatory (VO) python client `pyvo`. Here, we use the latter so it is all in one notebook. + +You can also see the [Getting Started](getting-started.md), [Data Access](data-access.md) and [Finding and Downloading Data](data-find-download.md) tutorials for examples using `pyVO` to find the data. + +Specifically, we want to look at the observation tables. So first we get a list of all the tables HEASARC serves and then look for the ones related to RXTE: + +```python +tap_services = vo.regsearch(servicetype='tap', keywords=['heasarc']) +heasarc_tables = tap_services[0].service.tables +``` + +```python +for tablename in heasarc_tables.keys(): + if "xte" in tablename: + print(" {:20s} {}".format(tablename,heasarc_tables[tablename].description)) + +``` + +The `xtemaster` catalog is the one that we are interested in. + +Let's see what this table has in it. Alternatively, we can google it and find the same information here: + +https://heasarc.gsfc.nasa.gov/W3Browse/all/xtemaster.html + + +```python +for c in heasarc_tables['xtemaster'].columns: + print("{:20s} {}".format(c.name,c.description)) +``` + +We're interested in Eta Carinae, and we want to get the RXTE `cycle`, `proposal`, and `obsid`. for every observation it took of this source based on its position. We use the source positoin instead of the name in case the name has been entered differently in the table, which can happen. + +We construct a query in the ADQL language to select the columns (`target_name`, `cycle`, `prnb`, `obsid`, `time`, `exposure`, `ra` and `dec`) where the point defined by the observation's RA and DEC lies inside a circle defined by our chosen source position. + +For more example on how to use these powerful VO services, see the [Data Access](data-access.md) and [Finding and Downloading Data](data-find-download.md) tutorials, or the [NAVO](https://heasarc.gsfc.nasa.gov/vo/summary/python.html) [workshop tutorials](https://nasa-navo.github.io/navo-workshop/). + +```python +# Get the coordinate for Eta Car +pos = SkyCoord.from_name("eta car") +query="""SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec + FROM public.xtemaster as cat + where + contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 + and + cat.exposure > 0 order by cat.time + """.format(pos.ra.deg, pos.dec.deg) +``` + +```python +results=tap_services[0].search(query).to_table() +results +``` + +Let's just see how long these observations are: + +```python +plt.plot(results['time'],results['exposure']) +plt.xlabel('Time (s)') +plt.ylabel('Exposure (s)') +``` + +## 4. Read the Standard Products and Plot the Light Curve + +Let's collect all the standard product light curves for RXTE. (These are described on the [RXTE analysis pages](https://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook_book.html).) + +```python +## Need cycle number as well, since after AO9, +## no longer 1st digit of proposal number +ids = np.unique( results['cycle','prnb','obsid','time']) +ids.sort(order='time') +ids +``` + +```python +## Construct a file list. +rxtedata = "/FTP/rxte/data/archive" +filenames = [] +for (k,val) in enumerate(tqdm(ids['obsid'], total=len(ids))): + fname = "{}/AO{}/P{}/{}/stdprod/xp{}_n2a.lc.gz".format( + rxtedata, + ids['cycle'][k], + ids['prnb'][k], + ids['obsid'][k], + ids['obsid'][k].replace('-','')) + + # keep only files that exist + if os.path.exists(fname): + filenames.append(fname) +print("Found {} out of {} files".format(len(filenames), len(ids))) +``` + +Let's collect them all into one light curve: + +```python +lcurves = [] +for file in tqdm(filenames): + with fits.open(file) as hdul: + data = hdul[1].data + lcurves.append(data) + plt.plot(data['TIME'], data['RATE']) +``` + +```python +# combine the ligh curves into one +lcurve = np.hstack(lcurves) + +# The above LCs are merged per proposal. You can see that some proposals +# had data added later, after other proposals, so you need to sort: +lcurve.sort(order='TIME') + +# plot the light curve +plt.plot(lcurve['TIME'], lcurve['RATE']) + +``` + +## 4. Re-extract the Light Curve + +Let's say we find that we need different channel boundaries than were used in the standard products. We can write a function that does the RXTE data analysis steps for every observation to extract a lightcurve and read it into memory to recreate the above dataset. This function may look complicated, but it only calls three RXTE executables: + +* `pcaprepobsid` +* `maketime` +* `pcaextlc2` + +which extracts the Standard mode 2 data (not to be confused with the "standard products") for the channels we are interested in. It has a bit of error checking that'll help when launching a long job. + + +```python + +class XlcError(Exception): + pass + + +# Define a function that, given an ObsID, does the rxte light curve extraction +def rxte_lc(obsid=None, ao=None , chmin=None, chmax=None, cleanup=True): + rxtedata = "/FTP/rxte/data/archive" + obsdir = "{}/AO{}/P{}/{}/".format( + rxtedata, + ao, + obsid[0:5], + obsid + ) + outdir = f"tmp.{obsid}" + if (not os.path.isdir(outdir)): + os.mkdir(outdir) + + if cleanup and os.path.isdir(outdir): + shutil.rmtree(outdir, ignore_errors=True) + + result = hsp.pcaprepobsid(indir=obsdir, outdir=outdir) + if result.returncode != 0: + raise XlcError(f"pcaprepobsid returned status {result.returncode}.\n{result.stdout}") + + # Recommended filter from RTE Cookbook pages: + filt_expr = "(ELV > 4) && (OFFSET < 0.1) && (NUM_PCU_ON > 0) && .NOT. ISNULL(ELV) && (NUM_PCU_ON < 6)" + try: + filt_file = glob.glob(outdir+"/FP_*.xfl")[0] + except: + raise XlcError("pcaprepobsid doesn't seem to have made a filter file!") + + result = hsp.maketime( + infile=filt_file, + outfile=os.path.join(outdir,'rxte_example.gti'), + expr=filt_expr, name='NAME', + value='VALUE', + time='TIME', + compact='NO' + ) + if result.returncode != 0: + raise XlcError(f"maketime returned status {result.returncode}.\n{result.stdout}") + + # Running pcaextlc2 + result = hsp.pcaextlc2( + src_infile="@{}/FP_dtstd2.lis".format(outdir), + bkg_infile="@{}/FP_dtbkg2.lis".format(outdir), + outfile=os.path.join(outdir,'rxte_example.lc'), + gtiandfile=os.path.join(outdir,'rxte_example.gti'), + chmin=chmin, + chmax=chmax, + pculist='ALL', layerlist='ALL', binsz=16 + ) + + if result.returncode != 0: + raise XlcError(f"pcaextlc2 returned status {result.returncode}.\n{result.stdout}") + + with fits.open(os.path.join(outdir,'rxte_example.lc')) as hdul: + lc = hdul[1].data + + if cleanup: + shutil.rmtree(outdir,ignore_errors=True) + + return lc + +``` + +Note that each call to this function will take 10-20 seconds to complete. Extracting all the observations will take a while, so we limit this run for 10 observations. We will look into running this in parallel in the next step. + +Our new light curves will be for channels 5-10 + +```python +# For this tutorial, we limit the number of observations to 10 +nlimit = 10 +lcurves = [] +for (k,val) in tqdm(enumerate(ids[:nlimit])): + lc = rxte_lc(obsid=val['obsid'], ao=val['cycle'], chmin="5", chmax="10", cleanup=True) + lcurves.append(lc) +lcurve = np.hstack(lcurves) +``` + +## 5. Running the Extraction in Parallel. +As noted, extracting the light curves for all observations will take a while if run in serial. We will look next into parallizing the `rxte_lc` calls. We will use the `multiprocessing` python module. + +We do this by first creating a wrapper around `rxte_lc` that does a few things: + +- Use `local_pfiles_context` in `heasoftpy` to properly handle parameter files used by the heasoft tasks. This step is required to prevent parallel calls to `rxte_lc` from reading or writing to the same parameter files that ca lead to calls with the wrong parameters. +- Convert `rxte_lc` from multi-parameter method to a single one, so `multiprocessing` can handle it. +- Catch all errors in the `rxte_lc` call. + +We will use all CPUs available in the machine. This can be changing the value of `ncpu`. + +```python +def rxte_lc_wrapper(pars): + """A wrapper around rxte_lc so it can be called in parallel + + pars: (obsid, ao, chmin, chmax, cleanup) + + """ + obsid, ao, chmin, chmax, cleanup = pars + + # the following is needed so the parameter files + # in parallel calls do not read or write the same file + with hsp.utils.local_pfiles_context(): + try: + lc = rxte_lc(obsid, ao, chmin, chmax, cleanup) + except XlcError: + lc = None + return lc +``` + +Before running the function in parallel, we construct a list `pars` that holds the parameters that will be passed to `rxte_lc_wrapper` (and hence rxte_lc). + +
+nlimit is now increased to 64. When you run this in full, change the limit to the number of observations +
+ + +```python +nlimit = 64 +ncpu = mp.cpu_count() +pars = [] +for (k,val) in enumerate(ids[:nlimit]): + pars.append([val['obsid'], val['cycle'], "5", "10", True]) + +with mp.Pool(processes=ncpu) as pool: + lcs = pool.map(rxte_lc_wrapper, pars) +``` + +```python +# combine the ligh curves into one +lcurve = np.hstack(lcs) + +# The above LCs are merged per proposal. You can see that some proposals +# had data added later, after other proposals, so you need to sort: +lcurve.sort(order='TIME') + +# plot the light curve +plt.figure(figsize=(8,4)) +plt.plot(lcurve['TIME'], lcurve['RATE']) +plt.xlabel('Time (s)') +plt.ylabel('Rate ($s^{-1}$)') +``` + +With the parallelization, we can do more observations at a fraction of the time. + +If you want run this notebook on all observations, you can comment out the two cell that runs in serial (the cell below where `rxte_lc` is defined), and submit this notebook in the [batch queue](https://apps.sciserver.org/compute/jobs) on Sciserver. + +```python + +``` diff --git a/analysis-rxte-ml.md b/analysis-rxte-ml.md new file mode 100644 index 0000000..dfa6409 --- /dev/null +++ b/analysis-rxte-ml.md @@ -0,0 +1,453 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.0 + kernelspec: + display_name: (heasoft) + language: python + name: heasoft +--- + +# RXTE Spectral Analysis with Minimal Machine Learning +
+ +- **Description:** A basic example of using Machine Learning to explore X-ray spectra. +- **Level:** Intermediate. +- **Data:** RXTE observations of **eta car** taken over 16 years. +- **Requirements:** `pyvo`, `matplotlib`, `tqdm`, `xspec`, `sklearn`, `umap` +- **Credit:** Tess Jaffe, Brian Powell (Sep 2021). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 02/02/2024. + +
+ + +## 1. Introduction +In this tutorial, we will do some data exploration of the RXTE spectra of **Eta Car**. We assume that we know nothing about the stellar system, but want to gain a broad understanding of its behavior and spectral changes in a model-independent way. We can achieve this with very litte code and some basic machine learning techniques. + +We will first use the HEASARC Virtual Observatory services to find the data (You can also see the [Getting Started](getting-started.md), [Data Access](data-access.md) and [Finding and Downloading Data](data-find-download.md) tutorials for more example on finding HEASARC data using `pyvo`). + +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +
+Also, the required python modules are available in the (heasoft) conda environment. You should see (heasoft) at the top right of the notebook. If not, click there and select it. + +Running Outside Sciserver:
+If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
+ + + +## 2. Module Imports +We need the following python modules: + + +```python +import os +# pyvo is used for querying the data from the archive +import pyvo as vo +import numpy as np +from astropy.io import fits +from astropy.coordinates import SkyCoord +from tqdm import tqdm +import matplotlib.pyplot as plt + +# sklearn is used for the machine learning modeling +from sklearn.preprocessing import StandardScaler +from sklearn.decomposition import PCA +from sklearn.manifold import TSNE +from sklearn.cluster import DBSCAN +from umap import UMAP + +# xspec is used to read the spectra +import xspec +``` + +## 2. Finding the Data +First, we will need to find the spectra data for **Eta Car**. We will use `pyvo` to query the HEASARC archive for XTE observations of the sources. + +We first initalize a table access protocol (TAP) service, and use it to submit a query to the archive to retrieve the information we need. + +`xtemaster` is the name of the table we will query. +For more information and examples on finding and access data, se the [Data Access](data-access.md) and [Finding and Downloading Data](data-find-download.md) tutorials. + +There, you will find information on how to find the table and column names. For the `xtemaster` table, the columns are also available on the [Browse](https://heasarc.gsfc.nasa.gov/W3Browse/all/xtemaster.html) page. In this example, we query the following columns: `target_name`, `cycle`, `prnb`, `obsid`, `time`, `exposure`, `ra` and `dec` + +### 2.1 Query the Archive for Relevant Data + +```python +# Set up the catalog +tap_services = vo.regsearch(servicetype='tap',keywords=['heasarc']) +heasarc_tables = tap_services[0].service.tables +``` + +```python +# create a query +pos = SkyCoord.from_name("Eta Car") +query="""SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec + FROM public.xtemaster as cat + where + contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 + and + cat.exposure > 0 order by cat.time + """.format(pos.ra.deg, pos.dec.deg) +``` + +```python +# submit the query and get a table back +results = tap_services[0].search(query).to_table() +``` + +```python +# Let's keep only the columns we will be using +results = np.unique( results['cycle', 'prnb', 'obsid']) +``` + +### 2.2 Identify the Spectral Data +Next, we create a list of spectral files based on the list of observations + +`filenames` is a list of existing spectral data products that we will use in subsequent analyses. + +```python +## Construct a file list. +rxtedata = "/FTP/rxte/data/archive" +filenames = [] +times = [] +for id in tqdm(results): + # Skip some for a quicker test case + fname = "{}/AO{}/P{}/{}/stdprod/xp{}_s2.pha.gz".format( + rxtedata, + id['cycle'], + id['prnb'], + id['obsid'], + id['obsid'].replace('-','')) + # keep only files that exist in the archive + if os.path.exists(fname): + filenames.append(fname) + # get the start time + with fits.open(fname) as fp: + times.append(fp[0].header['TSTART']) +print(f"Found {len(filenames)} spectra") +``` + + +## 3. Read and Plot the Spectra +Since the spectra are stored in channel space, forward modeling is generally needed to fit them with some physical model. Since we are **not** fitting physical models, we use `xspec` to load the spectra, and the export them as arrays of normalized counts per second versus keV. + +Additionally, to allow these spectra to be manipulated with standard Machine Learning tools such as `sklearn`, we put all the spectra into a common energy grid that runs between 2 and 12 keV, in steps of 0.1 keV. This is generally not a recommended way of analyzing X-ray spectra, but this is reasonable given that we interested in the general spectral shape. + + +```python + +xspec.Xset.chatter = 0 + +# other xspec settings +xspec.Plot.area = True +xspec.Plot.xAxis = "keV" +xspec.Plot.background = True + + +# Setup some energy grid that the spectra will interpolate over. +# start at 2 keV due to low-resolution noise below that energy - specific to RXTE +# stop at 12 keV due to no visible activity from Eta Carinae above that energy +xref = np.arange(2., 12, 0.1) + +# number of spectra to read. We limit it to 500. Change as desired. +nspec = len(filenames) + +# current working directory +cwd = os.getcwd() + +# The spectra will be saved in a list +specs = [] +for file in tqdm(filenames[:nspec]): + # clear out any previously loaded dataset + xspec.AllData.clear() + + # change location to the spectrum folder before reading it + os.chdir(os.path.dirname(file)) + spec = xspec.Spectrum(file) + os.chdir(cwd) + + + xspec.Plot("data") + xVals = xspec.Plot.x() + yVals = xspec.Plot.y() + yref = np.interp(xref, xVals, yVals) + specs.append(yref) + +specs = np.array(specs) +stimes = np.array(times)[:nspec] +``` + +Plot the collected spectra in log-log scale. + +```python +xvals = np.tile(xref, (specs.shape[0],1)) +plt.figure(figsize=(10,6)); +plt.loglog(xvals.T, specs.T, linewidth=0.4); +plt.xlabel('Energy (keV)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('Eta Carinae RXTE Spectra (log-log)'); +``` + +## 4. Exploring the Data with Unsupervized Machine Learning +In the following, we will use different models from the `sklearn` module. + +As a general rule, ML models work best when they are normalized, so the shape is important, not just the magnitude. We use [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html). + +Note that after applying the scaler, we switch to plots in a linear scale. + +### 4.1 Normalize the Spectra + +```python + +scaled_specs = [] +for i in tqdm(range(specs.shape[0])): + s = StandardScaler() + scaled_specs.append(s.fit_transform(specs[i].reshape(-1,1)).T[0]) +scaled_specs = np.array(scaled_specs) + +``` + +Visualize the scaled and unscaled spectra for comparison + +```python +plt.figure(figsize=(10,6)); +plt.plot(xvals.T, scaled_specs.T, linewidth=0.3); +plt.xlabel('Energy (keV)'); +plt.ylabel('Scaled Normalized Count Rate (C/s)'); +plt.title('Scaled Eta Carinae RXTE Spectra (lin-lin)'); + +plt.figure(figsize=(10,6)); +plt.plot(xvals.T, specs.T, linewidth=0.3); +plt.xlabel('Energy (keV)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('Unscaled Eta Carinae RXTE Spectra (lin-lin)'); +``` + +Note that the scaled spectra all have a similiar shape AND magnitude, whereas the unscaled spectra have a similar shape but not mangitude. + +Scaling has the effect of making big features smaller, but small features bigger. So, let's cut off the spectra at 9 keV in order to avoid noise driving the analysis, then rescale. + +```python +specs = specs[:,:xref[xref<=9.0001].shape[0]] +xref = xref[:xref[xref<=9.0001].shape[0]] + +scaled_specs = [] +for i in tqdm(range(specs.shape[0])): + s = StandardScaler() + scaled_specs.append(s.fit_transform(specs[i].reshape(-1,1)).T[0]) +scaled_specs = np.array(scaled_specs) +``` + +Plot the scaled and unscaled spectra for comparison again. + +```python +xvals=np.tile(xref,(specs.shape[0],1)) +plt.figure(figsize=(10,6)); +plt.plot(xvals.T, scaled_specs.T, linewidth=0.4); +plt.xlabel('Energy (keV)'); +plt.ylabel('Scaled Normalized Count Rate (C/s)'); +plt.title('Scaled Eta Carinae RXTE Spectra (lin-lin)'); + +plt.figure(figsize=(10,6)); +plt.plot(xvals.T, specs.T, linewidth=0.4); +plt.xlabel('Energy (keV)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('Unscaled Eta Carinae RXTE Spectra (lin-lin)'); +``` + +### 4.2 Dimension Reduction: Principle Component Analysis +The scaled spectra are now ready for analysis. Let's see what we can learn by using [Principal Component Analysis (PCA)](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) in two dimensions... + +We first decompose the spectra and then plot the components. + +```python +# For comparison, compute PCA +pca = PCA(n_components=2) +scaled_specs_pca = pca.fit_transform(scaled_specs) +plt.figure(figsize=(8,8)) +plt.scatter(scaled_specs_pca[:,0],scaled_specs_pca[:,1]); +plt.title('PCA-reduced Eta Carinae RXTE Spectra'); +plt.axis('off'); +``` + +### 4.3 Dimension Reduction: T-distributed Stochastic Neighbor Embedding (TSNE) + +`PCA` preserves distance, but has no concept of high-dimensional groupings. + +For comparison, compute [`TSNE`](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html), which can extract local high-dimensional relationships. + +```python +tsne = TSNE(n_components=2) +scaled_specs_tsne = tsne.fit_transform(scaled_specs) +plt.figure(figsize=(8,8)) +plt.scatter(scaled_specs_tsne[:,0],scaled_specs_tsne[:,1]); +plt.title('TSNE-reduced Eta Carinae RXTE Spectra'); +plt.axis('off'); +``` + +### 4.4 UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction +`TSNE` indeed finds some local groupings, so let's check `UMAP`, which will allow us to understand local and global relationships. + + +```python +um = UMAP(random_state=1) +scaled_specs_umap = um.fit_transform(scaled_specs) +plt.figure(figsize=(8,8)) +plt.scatter(scaled_specs_umap[:,0], scaled_specs_umap[:,1]); +plt.title('UMAP-reduced Eta Carinae RXTE Spectra'); +plt.axis('off'); +``` + +### 4.5 Clustering the Data +`PCA` only represents distance between the high dimensional samples whereas `TSNE` can find local groupings. + +`UMAP` combines the two into a more intelligent representation that understands both local and global distance. +Let's cluster the `UMAP` representation using `DBSCAN` ... + +```python +dbs = DBSCAN(eps=.6, min_samples=2) +clusters = dbs.fit(scaled_specs_umap) +labels = np.unique(clusters.labels_) +plt.figure(figsize=(8,8)) +for i in range(len(np.unique(labels[labels>=0]))): + plt.scatter(scaled_specs_umap[clusters.labels_==i,0],scaled_specs_umap[clusters.labels_==i,1],label='Cluster '+str(i)); +plt.legend() +plt.title('Clustered UMAP-reduced Eta Carinae RXTE Spectra'); +plt.axis('off'); +``` + +Notice how the `DBSCAN` clustering produced some interesting groupings - we should examine the spectra of each group. +For a less crowded plot of the spectra clusters, plot the mean spectrum of each cluster. + + +```python +# Plot the scaled spectra mean +plt.figure(figsize=(10,6)) +for i in range(len(np.unique(labels[labels>=0]))): + plt.plot(xref,scaled_specs[clusters.labels_==i].mean(axis=0),label='Cluster '+str(i)) +plt.legend(); +plt.xlabel('Energy (keV)'); +plt.ylabel('Scaled Normalized Count Rate (C/s)'); +plt.title('Scaled Eta Carinae RXTE Spectra Cluster Mean (lin-lin)'); + +# Plot the unscaled spectra mean +plt.figure(figsize=(10,6)) +for i in range(len(np.unique(labels[labels>=0]))): + plt.plot(xref,specs[clusters.labels_==i].mean(axis=0),label='Cluster '+str(i)) +plt.legend(); +plt.xlabel('Energy (keV)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('Unscaled Eta Carinae RXTE Spectra Cluster Mean (lin-lin)'); + +``` + +### 4.6 Exploring the Spectral Clusters +It appears that the strangest spectra belong to cluster 1 (orange). +How many spectra are in this group? + +```python +scaled_specs[clusters.labels_==1].shape[0] +``` + +So, we can say that this group is not likely an isolated incident caused by an instrument effect +since similar spectra occur in seven different observations. Let's look at the overall light curve to see where these odd spectra are occuring. + + +```python +# Sum the count rate across the energy range +specsum = specs.sum(axis=1) + +# plot the overall light curve +plt.figure(figsize=(10,6)) +plt.scatter(stimes, specsum) +plt.xlabel('Time (s)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('2-9 keV Eta Carinae RXTE Light Curve'); + + +# plot the clustered light curve +plt.figure(figsize=(10,6)) +for i in range(len(np.unique(labels[labels>=0]))): + plt.scatter(stimes[clusters.labels_==i],specsum[clusters.labels_==i],label='Cluster '+str(i),alpha=1-.1*i) +plt.xlabel('Time (s)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('2-9 keV Eta Carinae RXTE Light Curve with Clustered Spectra'); +plt.legend(); +``` + +We can see that the orange group occurred near the beginning of the RXTE mission. +Let's take a closer look ... + +```python +# plot the clustered light curve +plt.figure(figsize=(10,6)) +for i in range(len(np.unique(labels[labels>=0]))): + plt.scatter(stimes[clusters.labels_==i],specsum[clusters.labels_==i],label='Cluster '+str(i)) +plt.xlabel('Time (s)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('2-9 keV Eta Carinae RXTE Light Curve with Clustered Spectra'); +plt.legend(); +plt.xlim(.6e8,1e8); +``` + +Indeed, the orange group were the first seven observations of Eta Car from RXTE. +Given that this type of spectra does not repeat again, the earlier hypothesis that these +spectra are not due to an instrument issue will need to be revisted. + +Also, given that the blue group also lacks the 2-3 keV noise peak and is only located toward +the beginning of the mission, it may be the case that the background estimation from +that period of time differs substantially. + +So, what else is interesting? +Cluster 5 (the brown group) occurs exclusively at the overall light curve minima. +Looking again at the unscaled spectra means: + +```python +# Plot the unscaled spectra mean +plt.figure(figsize=(10,6)) +for i in range(len(np.unique(labels[labels>=0]))): + plt.plot(xref,specs[clusters.labels_==i].mean(axis=0),label='Cluster '+str(i)) +plt.legend(); +plt.xlabel('Energy (keV)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('Unscaled Eta Carinae RXTE Spectra Cluster Mean (lin-lin)'); +``` + + + +We can see that the broad peak associated with the 3-5 keV energy range is completely absent from the brown group. +Since this phenomena is documented at both X-ray minimums from the latter part of the mission (the earlier minimum may be skewed by background estimation as well) we can say that this spectral difference is likely due to a substantial change in the nature of the Eta Carina stellar system at this time. + + +Also interesting is the green and purple group relationship. Let's exlude the earlier measurements, where we suspect the background estimation may be wrong, and show the overall light curve again: + + + +```python +plt.figure(figsize=(10,6)) +for i in range(len(np.unique(labels[labels>=0]))): + plt.scatter(stimes[clusters.labels_==i],specsum[clusters.labels_==i],label='Cluster '+str(i),alpha=1-.1*i) +plt.xlabel('Time (s)'); +plt.ylabel('Normalized Count Rate (C/s)'); +plt.title('2-9 keV Eta Carinae RXTE Light Curve with Clustered Spectra'); +plt.legend(); +plt.xlim(2.1e8,5.8e8); +``` + + +The green group, which has a lower 3-5 keV peak and a slightly higher energy peak in the 6-7 keV range than the purple group, appears to occur in conjunction with the purple group. This may indicate the presence of two competing behaviors, or spectral states. + + +```python + +``` diff --git a/analysis-rxte-spectra.md b/analysis-rxte-spectra.md new file mode 100644 index 0000000..3847902 --- /dev/null +++ b/analysis-rxte-spectra.md @@ -0,0 +1,197 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.0 + kernelspec: + display_name: (heasoft) + language: python + name: heasoft +--- + +# RXTE Spectral Extraction Example +
+ +- **Description:** Finding standard spectral products from RXTE. +- **Level:** Intermediate. +- **Data:** RXTE observations of **eta car** taken over 16 years. +- **Requirements:** `heasoftpy`, `pyvo`, `matplotlib`, `pyxspec` +- **Credit:** Tess Jaffe (Sep 2021). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024. + +
+ + +## 1. Introduction +This notebook demonstrates an analysis of 16 years of RXTE spectra of Eta Car. + +The RXTE archive contain standard data product that can be used without re-processing the data. These are described in details in the [RXTE ABC guide](https://heasarc.gsfc.nasa.gov/docs/xte/abc/front_page.html). + +We first find all of the standard spectra, then use `pyxspec` to do some basic analysis. + +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +
+Also, this notebook requires heasoftpy, which is available in the (heasoft) conda environment. You should see (heasoft) at the top right of the notebook. If not, click there and select it. + +Running Outside Sciserver:
+If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
+ + +## 2. Module Imports +We need the following python modules: + + +```python +import os +import pyvo as vo +import numpy as np +from tqdm import tqdm +import matplotlib.pyplot as plt +import astropy.io.fits as fits +from astropy.coordinates import SkyCoord +import xspec +``` + +## 3. Find the Data + +To find the relevent data, we can use [Xamin](https://heasarc.gsfc.nasa.gov/xamin/), the HEASARC web portal, or the Virtual Observatory (VO) python client `pyvo`. Here, we use the latter so it is all in one notebook. + +You can also see the [Getting Started](getting-started.md), [Data Access](data-access.md) and [Finding and Downloading Data](data-find-download.md) tutorials for examples using `pyVO` to find the data. + +Specifically, we want to look at the observation tables. So first we get a list of all the tables HEASARC serves and then look for the ones related to RXTE: + +```python +# First query the Registry to get the HEASARC TAP service. +tap_services=vo.regsearch(servicetype='tap',keywords=['heasarc']) +# Then query that service for the names of the tables it serves. +heasarc_tables=tap_services[0].service.tables + +for tablename in heasarc_tables.keys(): + if "xte" in tablename: + print(" {:20s} {}".format(tablename,heasarc_tables[tablename].description)) + +``` + +Query the `xtemaster` catalog for observations of **Eta Car** + +```python +# Get the coordinate for Eta Car +pos = SkyCoord.from_name("eta car") + +query = """SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec + FROM public.xtemaster as cat + where + contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 + and + cat.exposure > 0 order by cat.time + """.format(pos.ra.deg, pos.dec.deg) +results = tap_services[0].search(query).to_table() +results +``` + +```python +# Keep the unique combination of these columns +ids = np.unique( results['cycle','prnb','obsid']) +ids +``` + +At this point, you need to construct a file list. There are a number of ways to do this, but this one is just using our knowledge of how the RXTE archive is structured. + +```python +## Construct a file list. +rxtedata = "/FTP/rxte/data/archive" +filenames = [] +for id in tqdm(ids): + fname = "{}/AO{}/P{}/{}/stdprod/xp{}_s2.pha.gz".format( + rxtedata, + id['cycle'], + id['prnb'], + id['obsid'], + id['obsid'].replace('-','')) + # keep only files that exist in the archive + if os.path.exists(fname): + filenames.append(fname) +print(f"Found {len(filenames)} out of {len(ids)} files") +``` + +Now we have to use [PyXspec](https://heasarc.gsfc.nasa.gov/xanadu/xspec/python/html/quick.html) to convert the spectra into physical units. The spectra are read into a list `spectra` that contain enery values, their error (from the bin size), the counts (counts cm$^{-2}$ s$^{-1}$ keV$^{-1}$) and their uncertainities. Then we use Matplotlib to plot them, since the Xspec plotter is not available here. + +
+The background and response files are set in the header of each spectral file. So before reading a spectrum, we change directory to the location of the file so those files can be read correctly, then move back to the working directory. + +We also set the chatter paramter to 0 to reduce the printed text given the large number of files we are reading. +
+ +```python + +xspec.Xset.chatter = 0 + +# other xspec settings +xspec.Plot.area = True +xspec.Plot.xAxis = "keV" +xspec.Plot.background = True + +# save current working location +cwd = os.getcwd() + +# number of spectra to read. We limit it to 500. Change as desired. +nspec = 500 + +# The spectra will be saved in a list +spectra = [] +for file in filenames[:nspec]: + # clear out any previously loaded dataset + xspec.AllData.clear() + # move to the folder containing the spectrum before loading it + os.chdir(os.dirname(file)) + spec = xspec.Spectrum(file) + os.chdir(cwd) + + xspec.Plot("data") + spectra.append([xspec.Plot.x(), xspec.Plot.xErr(), + xspec.Plot.y(), xspec.Plot.yErr()]) + +``` + +```python +# Now we plot the spectra + +fig = plt.figure(figsize=(10,6)) +for x,xerr,y,yerr in spectra: + plt.loglog(x, y, linewidth=0.2) +plt.xlabel('Energy (keV)') +plt.ylabel(r'counts cm$^{-2}$ s$^{-1}$ keV$^{-1}$') +``` + +You can at this stage start adding spectral models using `pyxspec`, or model the spectra in others ways that may include Machine Learning modeling similar to the [Machine Learning Demo](model-rxte-ml.md) + +If you prefer to use the Xspec built-in functionality, you can do so by plotting to a file (e.g. GIF as we show below). + +```python +xspec.Plot.splashPage=None +xspec.Plot.device='spectrum.gif/GIF' +xspec.Plot.xLog = True +xspec.Plot.yLog = True +xspec.Plot.background = False +xspec.Plot() +xspec.Plot.device='/null' +``` + +```python +from IPython.display import Image +with open('spectrum.gif','rb') as f: + display(Image(data=f.read(), format='gif',width=500)) +``` + +```python + +``` diff --git a/data-access.md b/data-access.md new file mode 100644 index 0000000..e2d6a88 --- /dev/null +++ b/data-access.md @@ -0,0 +1,203 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.0 + kernelspec: + display_name: (heasoft) + language: python + name: heasoft +--- + +# HEASARC data access on SciServer +
+ +- **Description:** A general overview on accessing data on Sciserver. +- **Level:** Intermediate. +- **Data:** Access XTE data on Eta Car as an example. +- **Requirements:** `pyvo`. +- **Credit:** Tess Jaffe (Sep 2021). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024. + +
+ + + +## 1. Introduction +This notebook presents a tutorial of how to access HEASARC data using the virtual observatory (VO) python client `pyvo`. + +We handle the general case of using the Tabel Access Protocol (TAP) to query any information about the HEASARC tables. A more specific data access tutorial when the table is known is give in the [Finding and Downloading Data notebook](data-find-download.md). + +The case will be illustrated by querying for XTE observations of **Eta Car** . + + +
+Running On Sciserver:
+The notebook requires `pyvo`, and on Sciserver, it is available on the `heasoft` conda kernel. Make sure you run the notbeook using that kernel by selecting it in the top right. +
+ + + +## 2. Module Imports +We need the following python modules: + + +```python +import sys +import os +import pyvo +from astropy.coordinates import SkyCoord +import requests +import glob +import numpy as np +``` + +## 3. Get the HEASARC TAP service + +We can use the Virtual Observatory interfaces to the HEASARC to find the data we're interested in. Specifically, we want to look at the observation tables. So first we get a list of all the tables HEASARC serves and then look for the ones related to RXTE. + +### 3.1 Find the Tables + +We start with the Registry of all VO services. The HEASARC table service is using the same backend as our [Xamin web interface](https://heasarc.gsfc.nasa.gov/xamin/), the same database that [Browse](https://heasarc.gsfc.nasa.gov/cgi-bin/W3Browse/w3browse.pl) also uses. + + +```python +tap_services = pyvo.regsearch(servicetype='tap',keywords=['heasarc']) +``` + +We then ask the service for all of the tables that are available at the HEASARC: + +```python +heasarc_tables = tap_services[0].service.tables +``` + +And then we look for the ones related to XTE: + +```python +for tablename in heasarc_tables.keys(): + if "xte" in tablename: + print(" {:20s} {}".format(tablename, heasarc_tables[tablename].description)) + +``` + +The `xtemaster` catalog is the one that we're interested in. + +Let's see what this table has in it. Alternatively, we can google it and find the same information here: + +https://heasarc.gsfc.nasa.gov/W3Browse/all/xtemaster.html + + +```python +for column in heasarc_tables['xtemaster'].columns: + print("{:20s} {}".format(column.name, column.description)) +``` + +### 3.2 Build a Search Query + + +We're interested in Eta Carinae, and we want to get the RXTE cycle, proposal, and observation ID etc. for every observation it took of this source based on its position (Just in case the name has been entered differently, which can happen.) + +The following constructs a query in the ADQL language to select the columns (`target_name`, `cycle`, `prnb`, `obsid`, `time`, `exposure`, `ra`, `dec`) where the point defined by the observation's RA and DEC lies inside a circle defined by our chosen source position. + +The results will be sorted by time. See the [NAVO website](https://heasarc.gsfc.nasa.gov/vo/summary/python.html) for more information on how to use these services with python and how to construct ADQL queries for catalog searches. + +You can also find more detailed on using these services in the [NASA Virtual Observatory workshop tutorials (NAVO)](https://nasa-navo.github.io/navo-workshop/) + +```python +# Get the coordinate for Eta Car +pos = SkyCoord.from_name("eta car") +query = """SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec + FROM public.xtemaster as cat + where + contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 + and + cat.exposure > 0 order by cat.time + """.format(pos.ra.deg, pos.dec.deg) +``` + +```python +results = tap_services[0].search(query).to_table() +results +``` + +## 4. Using Xamin's API + + +An alternative method to access the data is to use the Xamin API specifically. [Xamin](https://heasarc.gsfc.nasa.gov/xamin/) is the main web portal for accessing HEASARC data, and it offers an API that can be used to query the same tables. + +The base URL for the Xamin query servelet is, which will be queries using the `requests` module. + +`https://heasarc.gsfc.nasa.gov/xamin/QueryServlet?` + +And it takes the options: + * table: e.g., "table=xtemaster" + * constraint: eg., "obsid=10004-01-40-00" + * object: "object=andromeda" or "object=10.68,41.27" + +So we can do: + +```python +url = "https://heasarc.gsfc.nasa.gov/xamin/QueryServlet?products&" +result = requests.get(url,params = {"table":"xtemaster", + "object":"eta car", + "resultmax":"10" + }) +result.text.split('\n')[0:2] +``` + +And then you can construct a file list from the second to last field in each row, the *obs_root. + + +## 5. Obtain the Data + +If you know structure of the mission data, you can take the list of observations from XTE above and find the specific files of the type you want for each of those observations. + +For example, let's collect all the standard product light curves for RXTE. (These are described on the [RXTE analysis pages](https://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook_book.html).) + +A second approach is to use the [Xamin](https://heasarc.gsfc.nasa.gov/xamin/) protal, find the data prodcuts and obtain the links there. + +Yet another approach is to use VO datalinks service (via `pyvo`) to find the links to the data. An example of how to do it is shown in the [Finding and Downloading Data notebook](data-find-download.md). + +We are working on making more ways to find the data products from the notebook. + +The following will use the first approach. + +```python +# obtain information about the observations +ids = np.unique( results['cycle','prnb','obsid','time']) +ids.sort(order='time') +ids +``` + +```python +# Construct a file list. +rootdir = "/FTP" +rxtedata = "rxte/data/archive" +filenames = [] + +for (k,val) in enumerate(ids['obsid']): + fname="{}/{}/AO{}/P{}/{}/stdprod/xp{}_n2a.lc.gz".format( + rootdir, + rxtedata, + ids['cycle'][k], + ids['prnb'][k], + ids['obsid'][k], + ids['obsid'][k].replace('-','')) + + f = glob.glob(fname) + if (len(f) > 0): + filenames.append(f[0]) + +print("Found {} out of {} files".format(len(filenames),len(ids))) + +``` + +On Sciserver, the data can be copied directly from the mount archive located under `/FTP/` + +```python + +``` diff --git a/data-catalog-cross-match.md b/data-catalog-cross-match.md new file mode 100644 index 0000000..ccf2467 --- /dev/null +++ b/data-catalog-cross-match.md @@ -0,0 +1,183 @@ +--- +jupyter: + jupytext: + text_representation: + extension: .md + format_name: markdown + format_version: '1.3' + jupytext_version: 1.16.0 + kernelspec: + display_name: (heasoft) + language: python + name: heasoft +--- + +# Searching a Large Catalog With a List of Sources +
+ +- **Description:** An example of cross-matching a list of sources to data in the archive. +- **Level:** Advanced. +- **Data:** A source list of 10,000 source is generate (simulating the user input) and used to find the match in the HEASARC archive. +- **Requirements:** [`pyvo`, `astropy`]". +- **Credit:** Tess Jaffe (May 2023). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024. + +
+ + + +## 1. Introduction + +In this example, a user has a catalog of several thousand sources they are interested in. They'd like to find out if they've been observed by HEASARC missions and what the total exposure each sources has for that mission. + +This can be done in a variety of inefficient ways such as writing a script to call one of the HEASARC APIs for each of the sources. But we encourage users to discover the power of querying databases with Astronomical Data Query Language (ADQL). + +This tutorial is a HEASARC-specific example of a more general workflow querying astronomy databases with Virtual Observatory protocols as described in our [NASA Astronomical Virtual Observatories](https://heasarc.gsfc.nasa.gov/navo/summary/about_navo.html) (NAVO) [workshop notebook](https://nasa-navo.github.io/navo-workshop/content/reference_notebooks/catalog_queries.html). + +
+Running On Sciserver:
+The notebook requires `pyvo`, and on Sciserver, it is available on the `heasoft` conda kernel. Make sure you run the notbeook using that kernel by selecting it in the top right. +
+ + + +## 2. Module Imports +We need the following python modules: + + +```python +## Generic VO access routines +import pyvo as vo +from astropy.table import Table +from astropy.io import ascii, votable +``` + + +## 3. Find the HEASARC VO Service +The step in this tutorial are: +1. Prepare the input source list as VO table in XML format. +2. Find the list of HEASARC missions to be queried. +3. Submit and ADQL query. + + +As described in the NAVO workshop notebooks linked above, the first step is to create an object that represents a tool to query the HEASARC catalogs. + + +```python +# Get HEASARC's TAP service: +tap_services = vo.regsearch(servicetype='tap',keywords=['heasarc']) +for service in tap_services: + if 'heasarc' in service.ivoid: + heasarc = service + break +heasarc.describe() +``` + + +## 4. Prepare the Input Source List: + +To include our list of source with the query, VO protocols use the `VOTable`, which is both powerful and complicated. But `astropy` has easy tools to handle it. + +As we will show, when submitting that query that includes table data (the list of source coordinates in our case), these can be passed to `pyvo` as either an `astropy` table, or as a file name of the VO table in XML format. + + +Typically, you may start from a list of sources you want to query. In this tutorial, we first create this list in comma-separated value (CSV) format to be used as our input. The file `source_list.csv` contains a list of 10000 RA and DEC values. + +We then create a VOTable that can be used in our query below. + + +```python +# Write a list of ra,dec values to a CSV file to simulate the input of interest +# Comment out and replace with your own source list file +result = heasarc.service.run_sync("select ra, dec from xray limit 10000") +ascii.write(result.to_table(), "source_list.csv", overwrite=True, format='csv') + +# Read the list of sources +input_table = Table.read("source_list.csv",format="csv") + +``` + +## 5. Find the list of HEASARC missions to be queried. + + +Note that you may also wish to generate a list of all of our master catalogs. In the case of the HEASARC, we have of order a thousand different catalogs, most of which are scientific results rather than mission observation tables. So you don't want to print all of our catalogs but a selection of them. For instance, you can do it this way: + +```python +master_catalogs=[] +for table in heasarc.service.tables: + if "master" in table.name or "mastr" in table.name: + master_catalogs.append(table.name) +print(master_catalogs) +``` + +## 6. Submit and ADQL query. + +The next step is to construct a query in the SQL language, specifically a dialect created for astronomical queries, the ADQL. This is also described briefly in the workshop notebook among other places. + +Note also that each service can show you examples that its curators have put together to demonstrate, e.g.: + +```python +for example in heasarc.service.examples: + print(example['QUERY']) +``` + +
+ +For our use case, we need to do something a bit more complicated involving a *cross-match* between our source list and the HEASARC master catalog for a given mission. This is done by uploading our list of sources as part of the query. + +While it may be possible to construct an even more complicated query that does all of the HEASARC master catalogs in one go, that may overload the servers, as does repeating the same query 10 thousand times for individual sources. The recommended approch is to do a 10 thousand sources cross match in a few dozen queries to the master catalogs. + +So let's start with the Chandra master catalog `chanmaster`. You can then repeat the exercise for all of the others. + +For a cross-match, you can simply upload your catalog with your query as astropy table object or as an XML file, and tell the service what to name it. In this case, we call it `mytable`. Then you refer to it in the query as `tap_upload.mytable`. + +Our list of sources had two columns named RA and DEC, so they are likewise refered to that way in the SQL query. + +To compare your source list coordinates with the coordinates in the given master observation table, you can use the special `ADQL` functions `POINT`, `CIRCLE`, and `CONTAINS`, which do basically what they sound like. The query below matches the input source list against `chanmaster` based on a radius of 0.01 degrees. For each source, it gives back the number of observations (`count(*) as num_obs`) and the total exposures (`sum(cat.exposure) as total_exposure`): + + +```python +# Construct a query to chanmaster to total the exposures +# for all of the uploaded sources in the list: +query=""" + SELECT cat.name, cat.ra, cat.dec, sum(cat.exposure) as total_exposure, count(*) as num_obs + FROM chanmaster cat, tap_upload.mytable mt + WHERE + CONTAINS(POINT('ICRS', cat.ra, cat.dec),CIRCLE('ICRS', mt.ra, mt.dec, 0.01))=1 + GROUP BY cat.name, cat.ra, cat.dec """ +``` + +```python +# Send the query to the HEASARC server: +result = heasarc.service.run_sync(query, uploads={'mytable': input_table}) +# Convert the result to an Astropy Table +mytable = result.to_table() +mytable +``` + +The above shows that of our 10k sources, roughly a dozen (since the catalogs are updated daily and the row order may change, these numbers will change between runs of this notebook) were observed anywhere from once to over a thousand times. + +Lastly, you can convert the results back into CSV if you wish: + +```python +ascii.write(mytable, "results_chanmaster.csv", overwrite=True, format='csv') +``` + + +Note that sources with slightly different coordinates in the catalogs are summed separately here. If you want to group by the **average** `RA` and `DEC`, the query can be modified to the following, which will average the RA and DEC values that are slightly different for the same source. + +```python +query=""" + SELECT cat.name, AVG(cat.ra) as avg_ra, AVG(cat.dec) as avg_dec, sum(cat.exposure) as total_exposure, count(*) as num_obs + FROM chanmaster cat, tap_upload.mytable mt + WHERE + CONTAINS(POINT('ICRS', cat.ra, cat.dec),CIRCLE('ICRS', mt.ra, mt.dec,0.01))=1 + GROUP BY cat.name """ + +``` + + +```python + +``` diff --git a/data_find_download.md b/data-find-download.md similarity index 60% rename from data_find_download.md rename to data-find-download.md index d6deff4..ef6a5c3 100644 --- a/data_find_download.md +++ b/data-find-download.md @@ -5,49 +5,84 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.15.2 + jupytext_version: 1.16.0 kernelspec: display_name: (heasoft) language: python name: heasoft --- -# Content +# Finding and Downloading Data From a Source Using Python +
+ +- **Description:** Tutorial on how to access HEASARC data using the Virtual Observatory client `pyvo`. +- **Level:** Intermediate +- **Data:** Find and download NuSTAR observations of the AGN NGC 4151 +- **Requirements:** `pyvo`. +- **Credit:** Abdu Zoghbi (May 2022). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/26/2024 + +
+ + + +## 1. Introduction This notebook presents a tutorial of how to access HEASARC data using the virtual observatory (VO) python client `pyvo`. -The use case is a user searching for data on a specific Astronomical object from a specific high energy table. The other [data access tutorial](data_access.ipynb) gives examples other ways to search and access the archive with notebooks. +We handle the case of a user searching for data on a specific astronomical object from a *specific* high energy table. For a more general data access tutorial, see the [data access notebook](data-access.md). + +We will be find all NuSTAR observations of **NGC 4151** that have an exposure of more than 50 ks. + + +This notebook searches the NuSTAR master catalog `numaster` using pyvo. We specifically use the `conesearch` service, which the VO service that allows for searching around a position in the sky (NGC 4151 in this case). -The first steps in this example are: -- Assume we are searching the NuSTAR master catalog `numaster`. -- Use `pyvo` to obtain all the heasarc services that allow access to the table. -- Select the `conesearch` service, which the VO service that allows for a search on a position in the sky. +
+Running On Sciserver:
+The notebook requires `pyvo`, and on Sciserver, it is available on the `heasoft` conda kernel. Make sure you run the notbeook using that kernel by selecting it in the top right. +
+ + + +## 2. Module Imports +We need the following python modules: ```python -# import the relevant libraries -import pyvo import os + +# pyvo for accessing VO services +import pyvo + +# Use SkyCoord to obtain the coordinates of the source +from astropy.coordinates import SkyCoord + ``` -```python +## 3. Finding and Downloading the data +This part assumes we know the ID of the VO service. Generally these are of the form: `ivo://nasa.heasarc/{table_name}`. + +### 3.1 The Search Serivce +First, we create a cone search service: -# select the services -nu_services = pyvo.regsearch(ivoid='ivo://nasa.heasarc/numaster')[0] -# select the cone search service +```python +# Create a cone-search service +nu_services = pyvo.regsearch(ivoid='ivo://nasa.heasarc/numaster')[0] cs_service = nu_services.get_service('conesearch') ``` -Next, we will use the search function in `cs_service` to search for observations around some source, say the AGN `NGC 4151`. +### 3.2 Find the Data -The `search` function takes as input, the sky position as a variable in the form of an astropy sky coordinate object `SkyCoord`. +Next, we will use the search function in `cs_service` to search for observations around our source, NGC 4151. + +The `search` function takes as input, the sky position either as a list of `[RA, DEC]`, or as a an astropy sky coordinate object `SkyCoord`. The search result is then printed as an astropy Table for a clean display. ```python -from astropy.coordinates import SkyCoord - +# Find the coordinates of the source pos = SkyCoord.from_name('ngc 4151') search_result = cs_service.search(pos) @@ -56,6 +91,8 @@ search_result = cs_service.search(pos) search_result.to_table() ``` +### 3.3 Filter the Results + The search returned several entries. Let's say we are interested only in observations with exposures larger than 50 ks. We do that with a loop over the search results. @@ -67,6 +104,8 @@ obs_to_explore = [res for res in search_result if res['exposure_a'] >= 50000] obs_to_explore ``` +### 3.4 Find Links for the Data + The exposure selection resulted in 3 observations (this may change as more observations are collected). Let's try to download them for analysis. To see what data products are available for these 3 observations, we use the VO's datalinks. A datalink is a way to query data products related to some search result. @@ -81,6 +120,8 @@ dlink = obs.getdatalink() dlink.to_table()[['ID', 'access_url', 'content_type']] ``` +### 3.4 Filter the Links + Three products are available for our selected observation. From the `content_type` column, we see that one is a `directory` containing the observation files. The `access_url` column gives the direct url to the data (The other two include another datalink service for house keeping data, and a document to list publications related to the selected observation). We can now loop through our selected observations in `obs_to_explore`, and extract the url addresses with `content_type` equal to `directory`. @@ -104,6 +145,8 @@ for obs in obs_to_explore: ``` +### 3.5 Download the Data + On Sciserver, all the data is available locally under `/FTP/`, so all we need is to use the link text after `FTP` and copy them to the current directory. @@ -113,12 +156,12 @@ Set the `on_sciserver` to `False` if using this notebook outside Sciserver ```python -on_sciserver = True +on_sciserver = os.environ['HOME'].split('/')[-1] == 'idies' if on_sciserver: # copy data locally on sciserver for link in links: - os.system(f"cp /FTP/{link.split('FTP')[1]} .") + os.system(f"cp -r /FTP/{link.split('FTP')[1]} .") else: # use wget to download the data @@ -129,9 +172,6 @@ else: os.system(wget_cmd.format(link)) ``` ---- -- Last Updated: 06/05/2023 - ```python ``` diff --git a/data_access.md b/data_access.md deleted file mode 100644 index e7b2c28..0000000 --- a/data_access.md +++ /dev/null @@ -1,159 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.15.2 - kernelspec: - display_name: (heasoft) - language: python - name: heasoft ---- - -# HEASARC data access on SciServer - -Here we show several methods for getting the lists of the files you're interested in. - -```python -import sys,os -import pyvo as vo -import astropy.coordinates as coord -import requests -import glob -import numpy as np -# Ignore unimportant warnings -import warnings -warnings.filterwarnings('ignore', '.*Unknown element mirrorURL.*', - vo.utils.xml.elements.UnknownElementWarning) -``` - -### Get the HEASARC TAP service - -We can use the Virtual Observatory interfaces to the HEASARC to find the data we're interested in. Specifically, we want to look at the observation tables. So first we get a list of all the tables HEASARC serves and then look for the ones related to RXTE. If you are interested in finding and downloading data from a specific telescope, you can use the tutorial in the other [data access notebook](data_find_download.ipynb). - -We start with the Registry of all VO services. The HEASARC table service is using the same backend as our [Xamin web interface](https://heasarc.gsfc.nasa.gov/xamin/), the same database that [Browse](https://heasarc.gsfc.nasa.gov/cgi-bin/W3Browse/w3browse.pl) also uses. - -```python -tap_services=vo.regsearch(servicetype='tap',keywords=['heasarc']) -``` - -We then ask the service for all of the tables that are available at the HEASARC: - -```python -heasarc_tables=tap_services[0].service.tables -``` - -And then we look for the ones related to XTE: - -```python -for tablename in heasarc_tables.keys(): - if "xte" in tablename: - print(" {:20s} {}".format(tablename,heasarc_tables[tablename].description)) - -``` - -The "xtemaster" catalog is the one that we're interested in. - -Let's see what this table has in it. Alternatively, we can google it and find the same information here: - -https://heasarc.gsfc.nasa.gov/W3Browse/all/xtemaster.html - - -```python -for c in heasarc_tables['xtemaster'].columns: - print("{:20s} {}".format(c.name,c.description)) -``` - -We're interested in Eta Carinae, and we want to get the RXTE cycle, proposal, and observation ID etc. for every observation it took of this source based on its position. (Just in case the name has been entered differently, which can happen.) This constructs a query in the ADQL language to select the columns (target_name, cycle, prnb, obsid, time, exposure, ra, dec) where the point defined by the observation's RA and DEC lies inside a circle defined by our chosen source position. The results will be sorted by time. See the [NAVO website](https://heasarc.gsfc.nasa.gov/vo/summary/python.html) for more information on how to use these services with python and how to construct ADQL queries for catalog searches. - -```python -# Get the coordinate for Eta Car -pos=coord.SkyCoord.from_name("eta car") -query="""SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec - FROM public.xtemaster as cat - where - contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 - and - cat.exposure > 0 order by cat.time - """.format(pos.ra.deg, pos.dec.deg) -``` - -```python -results=tap_services[0].search(query).to_table() -results -``` - -### Xamin's servlet API - - -An alternative, if for some reason you don't want to use PyVO, is to use the Xamin API specifically: - -The base URL for the Xamin query servelet is - - https://heasarc.gsfc.nasa.gov/xamin/QueryServlet? - - And it then takes options - * table: e.g., "table=xtemaster" - * constraint: eg., "obsid=10004-01-40-00" - * object: "object=andromeda" or "object=10.68,41.27" - - So we can do: - -```python -url="https://heasarc.gsfc.nasa.gov/xamin/QueryServlet?products&" -result=requests.get(url,params={"table":"xtemaster", - "object":"eta car", - "resultmax":"10" - }) -result.text.split('\n')[0:2] -``` - -```python - -``` - -And then you can construct a file list from the second to last field in each row, the *obs_root. - - -### Know the archive structure - -With either method, you're still going to have to know how to find the specific files you're interested in for the given mission. (We are working on making this easier.) Then you can take the list of observations from XTE above and find the specific files of the type you want for each of those observations. - -Let's collect all the standard product light curves for RXTE. (These are described on the [RXTE analysis pages](https://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook_book.html).) - -```python -## Need cycle number as well, since after AO9, -## no longer 1st digit of proposal number -ids=np.unique( results['cycle','prnb','obsid','time']) -ids.sort(order='time') -ids -``` - -```python -## Construct a file list. -## Though Jupyter Lab container, either works: -#rootdir="/home/idies/workspace/headata/FTP" -## This one is a link -rootdir="/FTP" -rxtedata="rxte/data/archive" -filenames=[] -for (k,val) in enumerate(ids['obsid']): - fname="{}/{}/AO{}/P{}/{}/stdprod/xp{}_n2a.lc.gz".format( - rootdir, - rxtedata, - ids['cycle'][k], - ids['prnb'][k], - ids['obsid'][k], - ids['obsid'][k].replace('-','')) - #print(fname) - f=glob.glob(fname) - if (len(f) > 0): - filenames.append(f[0]) -print("Found {} out of {} files".format(len(filenames),len(ids))) -``` - -```python - -``` diff --git a/demo_rxte_ml.md b/demo_rxte_ml.md deleted file mode 100644 index 9203c7b..0000000 --- a/demo_rxte_ml.md +++ /dev/null @@ -1,408 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.15.2 - kernelspec: - display_name: (heasoft) - language: python - name: heasoft ---- - -# SciServer Example: Analysis of HEASARC data with minimal Machine Learning -Assume that we know nothing about the stellar system Eta Carinae, but want to gain a broad understanding of its behavior. We can achieve this with very litte code and some basic machine learning techniques. - -```python -#necessary imports -import sys,os -import pyvo as vo -import numpy as np -from astropy.io import fits -import re -from tqdm import tqdm -import matplotlib.pyplot as plt -%matplotlib inline -import xspec -xspec.Xset.allowPrompting = False -# Ignore unimportant warnings -import warnings -warnings.filterwarnings('ignore', '.*Unknown element mirrorURL.*', - vo.utils.xml.elements.UnknownElementWarning) -%matplotlib inline -``` - -```python -# Set up the catalog -tap_services=vo.regsearch(servicetype='tap',keywords=['heasarc']) -heasarc_tables=tap_services[0].service.tables -``` - -Get the coordinates for Eta Carinae and find all observations of it from the xtemaster catalog. - -```python -import astropy.coordinates as coord -pos=coord.SkyCoord.from_name("Eta Car") -## For now, have to google xtemaster to get the columns, find them here: -## https://heasarc.gsfc.nasa.gov/W3Browse/all/xtemaster.html -## (In future, should be a pyvo function.) -query="""SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec - FROM public.xtemaster as cat - where - contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 - and - cat.exposure > 0 order by cat.time - """.format(pos.ra.deg, pos.dec.deg) -``` - -Retrieve all RXTE observations of Eta Carinae - -```python -results=tap_services[0].search(query).to_table() -results -``` - -Collect useful information from the table - -```python -pids=np.unique( results['cycle','prnb']) -pids -``` - -Construct a file list of all the spectra in the standard products, keep the times for later reference - -```python -rootdir="/home/idies/workspace/headata/FTP/" -rxtedata="rxte/data/archive/" -sfilenames=[] -bfilenames=[] -rfilenames=[] -times=[] -for i in tqdm(range(pids.shape[0])): - if not os.path.isdir(rootdir+rxtedata+'AO'+str(pids[i][0])+'/P'+str(pids[i][1])): - print(f"Skipping {pids[i]}") - continue - obsdirs=os.listdir(rootdir+rxtedata+'AO'+str(pids[i][0])+'/P'+str(pids[i][1])) - if 'FMI' in obsdirs: - obsdirs.remove('FMI') - for obsdir in obsdirs: - if os.path.isdir(rootdir+rxtedata+'AO'+str(pids[i][0])+'/P'+str(pids[i][1])+'/'+obsdir+'/stdprod'): - obsfiles=os.listdir(rootdir+rxtedata+'AO'+str(pids[i][0])+'/P'+str(pids[i][1])+'/'+obsdir+'/stdprod') - sphafiles=[f for f in obsfiles if f.endswith('s2.pha.gz')] - bphafiles=[f for f in obsfiles if f.endswith('b2.pha.gz')] - rspfiles=[f for f in obsfiles if f.endswith('.rsp.gz')] - if (len(sphafiles)==1) & (len(bphafiles)==1) & (len(rspfiles)==1): - sfilenames.append(rootdir+rxtedata+'AO'+str(pids[i][0])+'/P'+str(pids[i][1])+'/'+obsdir+'/stdprod/'+sphafiles[0]) - bfilenames.append(rootdir+rxtedata+'AO'+str(pids[i][0])+'/P'+str(pids[i][1])+'/'+obsdir+'/stdprod/'+bphafiles[0]) - rfilenames.append(rootdir+rxtedata+'AO'+str(pids[i][0])+'/P'+str(pids[i][1])+'/'+obsdir+'/stdprod/'+rspfiles[0]) - fitsfile=fits.open(sfilenames[-1],memmap=False) - times.append(fitsfile[0].header['TSTART']) -``` - -Change the spectra from channel space to energy space. This is especially necessary for RXTE, where the energy values of channels changed over the course of the mission. - - -```python -pids -``` - -```python -specs=[] - -#select energy range to interpolate -#start at 2 keV due to low-resolution noise below that energy - specific to RXTE -#stop at 12 keV due to no visible activity from Eta Carinae above that energy -#step size of .1 keV - -xref=np.arange(2.,12.,.1) - -for src,bkg,rsp in tqdm(zip(sfilenames,bfilenames,rfilenames)): - xspec.AllData.clear() # clear out any previously loaded dataset - s = xspec.Spectrum(src) - s.background=bkg - s.response=rsp - xspec.Plot.area=True; - xspec.Plot.xAxis = "keV"; - xspec.Plot.add = True; - xspec.Plot("data"); - xspec.Plot.background = True; - xVals = xspec.Plot.x(); - yVals = xspec.Plot.y(); - yref= np.interp(xref, xVals, yVals); - specs.append(yref); - -specs=np.array(specs) -``` - -Plot the collected spectra - - -```python -xvals=np.tile(xref,(specs.shape[0],1)) -plt.figure(figsize=(10,6)); -plt.plot(xvals.T,specs.T); -plt.semilogx(); -plt.semilogy(); -plt.xlabel('Energy (keV)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('Eta Carinae RXTE Spectra (log-log)'); -``` - -Here we start the ML work. Scale the spectra in order to compare the behavior, not the magnitude. Note that after applying the scaler, log-log plots will be nonsensical, so we will only plot on a linear scale - - -```python -from sklearn.preprocessing import StandardScaler - -scaled_specs=[] -for i in tqdm(range(specs.shape[0])): - s=StandardScaler() - scaled_specs.append(s.fit_transform(specs[i].reshape(-1,1)).T[0]) -scaled_specs=np.array(scaled_specs) -``` - -Plot the scaled and unscaled spectra for comparison - -```python -plt.figure(figsize=(10,6)); -plt.plot(xvals.T,scaled_specs.T); -plt.xlabel('Energy (keV)'); -plt.ylabel('Scaled Normalized Count Rate (C/s)'); -plt.title('Scaled Eta Carinae RXTE Spectra (lin-lin)'); - -plt.figure(figsize=(10,6)); -plt.plot(xvals.T,specs.T); -plt.xlabel('Energy (keV)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('Unscaled Eta Carinae RXTE Spectra (lin-lin)'); -``` - -Note that the scaled spectra all have a similiar shape AND magnitude, whereas the unscaled spectra have a similar shape but not mangitude. -Scaling has the effect of making big features smaller, but small features bigger. So, let's cut off the spectra at 9 keV in order to avoid noise driving the analysis, then rescale. - -```python -specs=specs[:,:xref[xref<=9.0001].shape[0]] -xref=xref[:xref[xref<=9.0001].shape[0]] - -scaled_specs=[] -for i in tqdm(range(specs.shape[0])): - s=StandardScaler() - scaled_specs.append(s.fit_transform(specs[i].reshape(-1,1)).T[0]) -scaled_specs=np.array(scaled_specs) -``` - -Plot the scaled and unscaled spectra for comparison - -```python -xvals=np.tile(xref,(specs.shape[0],1)) -plt.figure(figsize=(10,6)); -plt.plot(xvals.T,scaled_specs.T); -plt.xlabel('Energy (keV)'); -plt.ylabel('Scaled Normalized Count Rate (C/s)'); -plt.title('Scaled Eta Carinae RXTE Spectra (lin-lin)'); - -plt.figure(figsize=(10,6)); -plt.plot(xvals.T,specs.T); -plt.xlabel('Energy (keV)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('Unscaled Eta Carinae RXTE Spectra (lin-lin)'); -``` - -Great! The scaled spectra are now ready for analysis. Let's see how Principal Component Analysis interprets the spectra in two dimensions... - -```python -from sklearn.decomposition import PCA - -# For comparison, compute PCA -pca=PCA(n_components=2) -scaled_specs_pca=pca.fit_transform(scaled_specs) -plt.figure(figsize=(8,8)) -plt.scatter(scaled_specs_pca[:,0],scaled_specs_pca[:,1]); -plt.title('PCA-reduced Eta Carinae RXTE Spectra'); -plt.axis('off'); -``` - -PCA preserves distance, but has no concept of high-dimensional groupings. For comparison, compute TSNE, which can extract local high-dimensional relationships. - - -```python -from sklearn.manifold import TSNE - -tsne=TSNE(n_components=2) -scaled_specs_tsne=tsne.fit_transform(scaled_specs) -plt.figure(figsize=(8,8)) -plt.scatter(scaled_specs_tsne[:,0],scaled_specs_tsne[:,1]); -plt.title('TSNE-reduced Eta Carinae RXTE Spectra'); -plt.axis('off'); -``` - -TSNE indeed finds some local groupings, so let's check UMAP, which will allow us to understand local and global relationships. - - -```python -from umap import UMAP - -um=UMAP(random_state=1) -scaled_specs_umap=um.fit_transform(scaled_specs) -plt.figure(figsize=(8,8)) -plt.scatter(scaled_specs_umap[:,0],scaled_specs_umap[:,1]); -plt.title('UMAP-reduced Eta Carinae RXTE Spectra'); -plt.axis('off'); -``` - -PCA only represents distance between the high dimensional samples whereas TSNE can find local groupings. -UMAP combines the two into a more intelligent representation that understands both local and global distance. -Let's cluster the UMAP representation using DBSCAN.... - - -```python -from sklearn.cluster import DBSCAN - -dbs=DBSCAN(eps=.6,min_samples=2) -clusters=dbs.fit(scaled_specs_umap) -labels=np.unique(clusters.labels_) -plt.figure(figsize=(8,8)) -for i in range(len(np.unique(labels[labels>=0]))): - plt.scatter(scaled_specs_umap[clusters.labels_==i,0],scaled_specs_umap[clusters.labels_==i,1],label='Cluster '+str(i)); -plt.legend() -plt.title('Clustered UMAP-reduced Eta Carinae RXTE Spectra'); -plt.axis('off'); -``` - -The DBSCAN clustering produced some interesting groupings - we should examine the spectra of each group. -For a less crowded plot of the spectra clusters, plot the mean spectrum of each cluster. - - -```python -# Plot the scaled spectra mean -plt.figure(figsize=(10,6)) -for i in range(len(np.unique(labels[labels>=0]))): - plt.plot(xref,scaled_specs[clusters.labels_==i].mean(axis=0),label='Cluster '+str(i)) -plt.legend(); -plt.xlabel('Energy (keV)'); -plt.ylabel('Scaled Normalized Count Rate (C/s)'); -plt.title('Scaled Eta Carinae RXTE Spectra Cluster Mean (lin-lin)'); - -# Plot the unscaled spectra mean -plt.figure(figsize=(10,6)) -for i in range(len(np.unique(labels[labels>=0]))): - plt.plot(xref,specs[clusters.labels_==i].mean(axis=0),label='Cluster '+str(i)) -plt.legend(); -plt.xlabel('Energy (keV)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('Unscaled Eta Carinae RXTE Spectra Cluster Mean (lin-lin)'); - -``` - - -Clearly, the strangest spectra belong to cluster 1 (orange). -How many spectra are in this group? - - -```python -scaled_specs[clusters.labels_==1].shape[0] -``` - - -So, we can say that this group is not likely an isolated incident caused by mechanical malfunction -since similar spectra occur in seven different observations. Let's look at the overall light curve to see where these odd spectra are occuring. - - - -```python -# Sum the count rate across the energy range -specsum=specs.sum(axis=1) - -# plot the overall light curve -plt.figure(figsize=(10,6)) -plt.scatter(times,specsum) -plt.xlabel('Time (s)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('2-9 keV Eta Carinae RXTE Light Curve'); - - -# plot the clustered light curve -plt.figure(figsize=(10,6)) -for i in range(len(np.unique(labels[labels>=0]))): - plt.scatter(np.array(times)[clusters.labels_==i],specsum[clusters.labels_==i],label='Cluster '+str(i),alpha=1-.1*i) -plt.xlabel('Time (s)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('2-9 keV Eta Carinae RXTE Light Curve with Clustered Spectra'); -plt.legend(); -``` - - -We can see that the orange group occurred near the beginning of the RXTE mission. -Let's take a closer look... - - -```python -# plot the clustered light curve -plt.figure(figsize=(10,6)) -for i in range(len(np.unique(labels[labels>=0]))): - plt.scatter(np.array(times)[clusters.labels_==i],specsum[clusters.labels_==i],label='Cluster '+str(i)) -plt.xlabel('Time (s)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('2-9 keV Eta Carinae RXTE Light Curve with Clustered Spectra'); -plt.legend(); -plt.xlim(.6e8,1e8); -``` - - -Indeed, the orange group were the first seven observations of Eta Car from RXTE. -Given that this type of spectra does not repeat again, the earlier hypothesis that these -spectra are not due to mechanical issues must be revisted. -Also, given that the blue group also lacks the 2-3 keV noise peak and is only located toward -the beginning of the mission, it may be the case that the background estimation from -that period of time differs substantially. - -So, what else is interesting? -Cluster 5 (the brown group) occurs exclusively at the overall light curve minima. -Looking again at the unscaled spectra means: - - -```python -# Plot the unscaled spectra mean -plt.figure(figsize=(10,6)) -for i in range(len(np.unique(labels[labels>=0]))): - plt.plot(xref,specs[clusters.labels_==i].mean(axis=0),label='Cluster '+str(i)) -plt.legend(); -plt.xlabel('Energy (keV)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('Unscaled Eta Carinae RXTE Spectra Cluster Mean (lin-lin)'); -``` - - - -We can see that the broad peak associated with the 3-5 keV energy range is completely absent from the brown group. -Since this phenomena is documented at both X-ray minimums from the latter part of the mission (the earlier minimum may -be skewed by background estimation as well) we can say that this spectral difference is likely due to a substantial change -in the nature of the Eta Carina stellar system at this time. - - -Also interesting is the Green and Purple group relationship. Let's exlude the earlier measurements, where we suspect the background -estimation may be wrong, and show the overall light curve again: - - - - -```python -plt.figure(figsize=(10,6)) -for i in range(len(np.unique(labels[labels>=0]))): - plt.scatter(np.array(times)[clusters.labels_==i],specsum[clusters.labels_==i],label='Cluster '+str(i),alpha=1-.1*i) -plt.xlabel('Time (s)'); -plt.ylabel('Normalized Count Rate (C/s)'); -plt.title('2-9 keV Eta Carinae RXTE Light Curve with Clustered Spectra'); -plt.legend(); -plt.xlim(2.1e8,5.8e8); -``` - - -The green group, which has a lower 3-5 keV peak and a slightly higher energy peak in the 6-7 keV range than the purple group, -appears to occur in conjunction with the purple group. This may indicate the presence of two competing behaviors. - - - -With very little code, we have now gained a basic understanding of Eta Carinae from a minimal analysis of HEASARC data. diff --git a/Getting-Started.md b/getting-started.md similarity index 59% rename from Getting-Started.md rename to getting-started.md index 5e70688..8b770d1 100644 --- a/Getting-Started.md +++ b/getting-started.md @@ -5,37 +5,75 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.15.2 + jupytext_version: 1.16.0 kernelspec: - display_name: (heasoft) + display_name: heasoft language: python name: heasoft --- -## Getting Started +# Getting Started With HEASARC Data On Sciserver +--- +- **Description:** A quick introduction on using HEASARC data on Sciserver. +- **Level:** Beginner +- **Data:** We will use 4 observations of `Cyg X-1` from NuSTAR as an example. +- **Requirements:** Run in the (heasoft) conda environment on Sciserver. +- **Credit:** Abdu Zoghbi (May 2022). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 01/22/2024 + +--- + + +## 1. Introduction +In this notebooke, we present a brief overview of a typical analysis flow. It can be used as a quick reference. +We will go through example of **findning**, **accessing** and then **analyzing** some example data. + +We will be using 4 NuSTAR observation of **Cyg-X1**. + +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +

+Running Outside Sciserver:
+This notebook runs in the heasoftpy conda environment on Sciserver. +If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are correctly installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
-This notebooks contains some tips on getting started with accessing and using HEASARC data on Sciserver. +## 2. Module Imports +We need the following python modules: -### Finding and Exploring the data + +```python +import glob + +# for finding data +import pyvo + +# for data analysis +import heasoftpy as hsp +``` + +## 3. Finding and Exploring the data The Heasarc data holdings can be searched and explored in different ways: -- Using the powerful [Xamin Web Interface](https://heasarc.gsfc.nasa.gov/xamin/xamin.jsp). +- Using [Xamin Web Interface](https://heasarc.gsfc.nasa.gov/xamin/xamin.jsp). - Using a virtual observatory (VO) client such as [pyVO](https://github.com/astropy/pyvo) (see below) or [Topcat](http://www.star.bris.ac.uk/~mbt/topcat/). - Using the classical [Browse Mission Interface](https://heasarc.gsfc.nasa.gov/cgi-bin/W3Browse/w3browse.pl). -In Section 1. below, we give a quick example on how to use pyVO to search for NuSTAR data on a specific object. Alternatively, in section 2 assumes you can use Xamin to obtain a list of observations you are interested in. +In [Section 3.1](#3.1-pyvo-example) below, we give an example on how to use `pyVO` to search for NuSTAR data on a specific object. Alternatively, [Section 3.2](#3.2-using-xamin) assumes you can use Xamin to obtain a list of observations you are interested in. For more details on finding and accessing data, see the [data-find-download](data_find_download.md) notebook. -The outcome of sections 1 and 2 is the same, so you can follow either of them. +The outcome of sections 3.1 and 3.2 is the same, so you can follow either of them. ---- -#### 1. pyVO Example +### 3.1 Using Virtual Observatory Client pyVO: - We first search the Virtual Observatory (VO) *registry* for data provided by `heasarc`. The registry provides an index of all data providers that allow access using VO standards. In the following example (`heasarc_service`), we search for all entries in the registry that have the keyword `heasarc`. This can a large and general set. The search can be filtered for more specfic datasets. @@ -98,14 +136,15 @@ for i in range(4): print(path) ``` ---- -#### 2. Using Xamin: +### 3.2 Using The Web Portal Xamin: -Here, we use Xamin to find the data. We again use *numaster*, the master catalog for *NuSTAR*, and search for data the X-ray binary **Cyg X-1**. +Here, we use [Xamin](https://heasarc.gsfc.nasa.gov/xamin) to find the data. We again use *numaster*, the master catalog for *NuSTAR*, and search for data the X-ray binary **Cyg X-1**. When using Xamin to find the data, there is an option in the `Data Products Cart` to select `FTP Paths`, which, when selecting the first 4 datasets, provides a text similar to the following: +> Note that for the purpose of this tutorual, you can choose any observations + ```python paths_txt = """ /FTP/nustar/data/obs/00/3//30001011002/ @@ -116,22 +155,23 @@ paths_txt = """ paths = paths_txt.split('\n')[1:-1] ``` + --- ---- -### Accessing The Data +## 4. Accessing The Data All the heasarc data is mounted into the compute under `/FTP/`, so once we have the path to the data (though `pyVO` or Xamin), we can directly access it without the need to download it. So to check the content of the observational folder for the first observations of `cyg x-1`, we can do: ```python -import glob glob.glob(f'{paths[0]}/*') ``` --- --- -### Analyzing The Data -To Analyze the data within the notebook, we use `heasoftpy`. In the *NuSTAR* example, we can call the `nupipeline` tool to re-prodduce the cleaned event files. +### 5. Analyzing The Data +To Analyze the data within the notebook, we use `heasoftpy`. In the NuSTAR example, we can call the `nupipeline` tool to produce the cleaned event files. + +We focus on observation: `30001011002` ```python import heasoftpy as hsp @@ -153,5 +193,9 @@ Once the task finishes running, we see the new cleaned event files in the local --- --- -### Subsequent Analysis +## 6. Subsequent Analysis For subsequent analysis, you can use `heasoftpy` which provides a python access to all tools in `heasoft`, as well as `pyxspec` to spectral modeling. + +```python + +``` diff --git a/Introduction.md b/introduction.md similarity index 96% rename from Introduction.md rename to introduction.md index 1ad4f9d..ed8e630 100644 --- a/Introduction.md +++ b/introduction.md @@ -1,11 +1,11 @@ -### Welcome To The HEASARC Sciserver Tutorials +# HEASARC Sciserver Tutorials Sciserver is a science platform where you get access to both data and analysis software in the same place. No need to download the data nor install the software. The following notebook tutorials provide some examples of how to use the software already installed on sciserver and accecss the data from the browser. -
+
Please make sure the HEASARC data drive is mounted when initializing the sciserver compute container. See details here.

Also, make sure to run the notebooks using the (heasoft) kernel

@@ -13,7 +13,6 @@ Please make sure the HEASARC data drive is mounted when initializing the sciserv --- - ◈ [Getting Started](Getting-Started.ipynb): A quick guide on accessing the data and using [heasoftpy](https://github.com/HEASARC/heasoftpy) for analysis. ◈ [Data Access Tutorial](data_access.ipynb): Detailed examples of accessing HEASARC data holdings with the Virtual Observatory protocols using [pyvo](https://pyvo.readthedocs.io/en/latest/). diff --git a/jdaviz-demo.md b/misc-jdaviz-demo.md similarity index 69% rename from jdaviz-demo.md rename to misc-jdaviz-demo.md index 527b99f..7acd97f 100644 --- a/jdaviz-demo.md +++ b/misc-jdaviz-demo.md @@ -5,35 +5,54 @@ jupyter: extension: .md format_name: markdown format_version: '1.3' - jupytext_version: 1.15.2 + jupytext_version: 1.16.0 kernelspec: display_name: (heasoft) language: python name: heasoft --- +# A Demo for Using jdaviz on Sciserver +
+ +- **Description:** A demo for using jdaviz for creating region files from an image during data analysis. +- **Level:** Beginner. +- **Data:** NuSTAR observation of **3C 382** (ObsID 60001084002). +- **Requirements:** `heasoftpy`, `jdaviz`, `astropy` +- **Credit:** Kavitha Arur (Jun 2023). +- **Support:** Contact the [HEASARC helpdesk](https://heasarc.gsfc.nasa.gov/cgi-bin/Feedback). +- **Last verified to run:** 02/01/2024. + +
+ -## An Demo for using jdaviz on Sciserver +## 1. Introduction [jdaviz](https://jdaviz.readthedocs.io/en/latest/) is a package of astronomical data analysis visualization tools based on the Jupyter platform. These GUI-based tools link data visualization and interactive analysis. `jdaviz` includes several tools. Here, we will focus on using `Imviz`, which is a tool for visualization and quick-look analysis for 2D astronomical images, so it can be used to analyze images, create and modify regions files such as those needed in many X-ray analysis pipelines. -We will walk through the simple steps of using `Imviz` on sciserver. For more details on using the tool, please refer to the main [jdaviz site](https://jdaviz.readthedocs.io/en/latest/). +We will walk through the simple steps of using `Imviz` on Sciserver. For more details on using the tool, please refer to the main [jdaviz site](https://jdaviz.readthedocs.io/en/latest/). +
+Running On Sciserver:
+When running this notebook inside Sciserver, make sure the HEASARC data drive is mounted when initializing the Sciserver compute container. See details here. +
+Also, this notebook requires heasoftpy and jdaviz, which are available in the (heasoft) conda environment. You should see (heasoft) at the top right of the notebook. If not, click there and select it. ---- +Running Outside Sciserver:
+If running outside Sciserver, some changes will be needed, including:
+• Make sure heasoftpy and heasoft are installed (Download and Install heasoft).
+• Unlike on Sciserver, where the data is available locally, you will need to download the data to your machine.
+
- -Say we are analyzing NuSTAR data of some point source and we want to extract the spectra. We typically need to either pass the source and background selection as RA and DEC positions along with selection region information such as the radius, or we can create the region files for the source and backgorund and pass those to the extraction pipeline. In this example, we will use the latter. -For the purpose of this example, we will copy the cleaned event file for the FMPA detector from the archive. We will use observation `60001084002` of `3C 382`. - -Using [`xamin`](https://heasarc.gsfc.nasa.gov/xamin/) to search for NuSTAR observations of `3C 382`, we find that the data for this obsid is located in: `/FTP/nustar/data/obs/00/6//60001084002/`. + -First, we use the `extractor` tool from `heasoftpy` to extract an image from the event file +## 2. Module Imports +We need the following python modules: ```python # import heasoftpy to use for image extraction @@ -41,10 +60,22 @@ import heasoftpy as hsp # Imviz for working with the images from jdaviz import Imviz + +# WCS is need to handle image coordinates from astropy.wcs import WCS -%matplotlib inline + ``` +## 3. Image Extraction + +Say we are analyzing NuSTAR data of some point source and we want to extract the spectra. We typically need to either pass the source and background selection as RA and DEC positions along with selection region information such as the radius, or we can create the region files for the source and backgorund and pass those to the extraction pipeline. In this example, we will use the latter. + +For the purpose of this example, we will copy the cleaned event file for the FMPA detector from the archive. We will use observation `60001084002` of `3C 382`. + +Using [`xamin`](https://heasarc.gsfc.nasa.gov/xamin/) to search for NuSTAR observations of `3C 382`, we find that the data for this obsid is located in: `/FTP/nustar/data/obs/00/6//60001084002/`. + +First, we use the `extractor` tool from `heasoftpy` to extract an image from the event file + ```python evt_file = '/FTP/nustar/data/obs/00/6//60001084002/event_cl/nu60001084002A01_cl.evt.gz' @@ -55,6 +86,7 @@ inPars = { 'phafile' : 'NONE', 'xcolf' : 'X', 'ycolf' : 'Y', + # noprompt is set so the tool does not prompt for additional parameters 'noprompt' : True } @@ -62,6 +94,8 @@ inPars = { res = hsp.extractor(**inPars) ``` +## 4. Create Source and Background Regions + After the image is extracted, we use `Imviz` to load the image, so we can create the region files. We now proceed by creating a source and background region. @@ -102,6 +136,8 @@ regions = imviz.get_interactive_regions() print(regions) ``` +### 4.1 Save the Regions in Image Units + ```python # The following write region files in image units regions['Subset 1'].write('source.reg', format='ds9', overwrite=True) @@ -112,6 +148,8 @@ regions['Subset 2'].write('background.reg', format='ds9', overwrite=True) !cat background.reg ``` +### 4.2 Save the Regions in WCS Coordinates + ```python # To save the image files in WCS coordinates, we can use WCS from astropy wcs = WCS('nu_image.fits') diff --git a/rxte_example_lightcurves.md b/rxte_example_lightcurves.md deleted file mode 100644 index 3b329b0..0000000 --- a/rxte_example_lightcurves.md +++ /dev/null @@ -1,300 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.15.2 - kernelspec: - display_name: (heasoft) - language: python - name: heasoft ---- - -# RXTE example - -This notebook demonstrates an analysis of 16 years of RXTE data, which would be difficult outside of SciServer. We extract all of the standard product lightcurves, but then we decide that we need different channel boundaries. So we re-exctract light curves following the RXTE documentation and using the heasoftpy wrappers. - -```python -import sys,os, shutil -import pyvo as vo -import numpy as np -from astropy.io import fits -import matplotlib.pyplot as plt -%matplotlib inline -import astropy.io.fits as pyfits -import datetime - -# Ignore unimportant warnings -import warnings -warnings.filterwarnings('ignore', '.*Unknown element mirrorURL.*', - vo.utils.xml.elements.UnknownElementWarning) -``` - -```python -import subprocess as subp -from packaging import version -import importlib -import heasoftpy as hsp -print(hsp.__file__) -``` - -### Step 1: find the data - -We can use the Virtual Observatory interfaces to the HEASARC to find the data we're interested in. Specifically, we want to look at the observation tables. So first we get a list of all the tables HEASARC serves and then look for the ones related to RXTE: - -```python -tap_services=vo.regsearch(servicetype='tap',keywords=['heasarc']) -heasarc_tables=tap_services[0].service.tables -``` - -```python -for tablename in heasarc_tables.keys(): - if "xte" in tablename: - print(" {:20s} {}".format(tablename,heasarc_tables[tablename].description)) - -``` - -The "xtemaster" catalog is the one that we're interested in. - -Let's see what this table has in it. Alternatively, we can google it and find the same information here: - -https://heasarc.gsfc.nasa.gov/W3Browse/all/xtemaster.html - - -```python -for c in heasarc_tables['xtemaster'].columns: - print("{:20s} {}".format(c.name,c.description)) -``` - -We're interested in Eta Carinae, and we want to get the RXTE cycle, proposal, and observation ID etc. for every observation it took of this source based on its position. (Just in case the name has been entered differently, which can happen.) This constructs a query in the ADQL language to select the columns (target_name, cycle, prnb, obsid, time, exposure, ra, dec) where the point defined by the observation's RA and DEC lies inside a circle defined by our chosen source position. The results will be sorted by time. See the [NAVO website](https://heasarc.gsfc.nasa.gov/vo/summary/python.html) for more information on how to use these services with python and how to construct ADQL queries for catalog searches. - -```python -# Get the coordinate for Eta Car -import astropy.coordinates as coord -pos=coord.SkyCoord.from_name("eta car") -query="""SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec - FROM public.xtemaster as cat - where - contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 - and - cat.exposure > 0 order by cat.time - """.format(pos.ra.deg, pos.dec.deg) -``` - -```python -results=tap_services[0].search(query).to_table() -results -``` - -Let's just see how long these observations are: - -```python -plt.plot(results['time'],results['exposure']) -``` - -### Step 2: combine standard products and plot - -Let's collect all the standard product light curves for RXTE. (These are described on the [RXTE analysis pages](https://heasarc.gsfc.nasa.gov/docs/xte/recipes/cook_book.html).) - -```python -## Need cycle number as well, since after AO9, -## no longer 1st digit of proposal number -ids=np.unique( results['cycle','prnb','obsid','time']) -ids.sort(order='time') -ids -``` - -```python -## Construct a file list. -## In this case, the name changes -import glob -# Though Jupyter Lab container -rootdir="/FTP" -# Through batch it shows up differently: -#rootdir="/home/idies/workspace/HEASARC\ data" -rxtedata="rxte/data/archive" -filenames=[] -for (k,val) in enumerate(ids['obsid']): - fname="{}/{}/AO{}/P{}/{}/stdprod/xp{}_n2a.lc.gz".format( - rootdir, - rxtedata, - ids['cycle'][k], - ids['prnb'][k], - ids['obsid'][k], - ids['obsid'][k].replace('-','')) - #print(fname) - f=glob.glob(fname) - if (len(f) > 0): - filenames.append(f[0]) -print("Found {} out of {} files".format(len(filenames),len(ids))) -``` - -Let's collect them all into one light curve: - -```python -hdul = fits.open(filenames.pop(0)) -data = hdul[1].data -cnt=0 -lcs=[] -for f in filenames: - if cnt % 100 == 0: - print("On file {}".format(f)) - hdul = fits.open(f) - d = hdul[1].data - data=np.hstack([data,d]) - plt.plot(d['TIME'],d['RATE']) - lcs.append(d) - cnt+=1 -``` - -```python -hdul = fits.open(filenames.pop(0)) -data = hdul[1].data -cnt=0 -for f in filenames: - hdul = fits.open(f) - d = hdul[1].data - data=np.hstack([data,d]) - if cnt % 100 == 0: - print("On file {}".format(f)) - print(" adding {} rows from TSTART={}".format(d.shape[0],hdul[1].header['TSTARTI'])) - cnt+=1 -## The above LCs are merged per proposal. You can see that some proposals -## had data added later, after other proposals, so you need to sort: -data.sort(order='TIME') - -``` - -```python -plt.plot(data['TIME'],data['RATE']) - -``` - -### Step 3: Re-extract a light-curve - -Now we go out and read about how to analyze RXTE data, and we decide that we need different channel boundaries than were used in the standard products. We can write a little function that does the RXTE data analysis steps for every observation to extract a lightcurve and read it into memory to recreate the above dataset. This function may look complicated, but it only calls three RXTE executables: - -* pcaprepobsid -* maketime -* pcaextlc2 - -which extracts the Standard mode 2 data (not to be confused with the "standard products") for the channels you're interested in. It has a bit of error checking that'll help when launching a long job. - -Note that each call to this function will take 10-20 seconds to complete. So when we run a whole proposal, we'll have to wait a while. - -```python - -class XlcError( Exception ): - pass - - -# Define a function that, given an ObsID, does the rxte light curve extraction -def rxte_lc( obsid=None, ao=None , chmin=None, chmax=None, cleanup=True): - rootdir="/home/idies/workspace/headata/FTP" - rxtedata="rxte/data/archive" - obsdir="{}/{}/AO{}/P{}/{}/".format( - rootdir, - rxtedata, - ao, - obsid[0:5], - obsid - ) - #print("Looking for obsdir={}".format(obsdir)) - outdir="tmp.{}".format(obsid) - if (not os.path.isdir(outdir)): - os.mkdir(outdir) - - if cleanup and os.path.isdir(outdir): - shutil.rmtree(outdir,ignore_errors=True) - - try: - #print("Running pcaprepobsid") - result=hsp.pcaprepobsid(indir=obsdir, - outdir=outdir - ) - print(result.stdout) - # This one doesn't seem to return correctly, so this doesn't trap! - if result.returncode != 0: - raise XlcError("pcaprepobsid returned status {}".format(result.returncode)) - except: - raise - # Recommended filter from RTE Cookbook pages: - filt_expr = "(ELV > 4) && (OFFSET < 0.1) && (NUM_PCU_ON > 0) && .NOT. ISNULL(ELV) && (NUM_PCU_ON < 6)" - try: - filt_file=glob.glob(outdir+"/FP_*.xfl")[0] - except: - raise XlcError("pcaprepobsid doesn't seem to have made a filter file!") - - try: - #print("Running maketime") - result=hsp.maketime(infile=filt_file, - outfile=os.path.join(outdir,'rxte_example.gti'), - expr=filt_expr, name='NAME', - value='VALUE', - time='TIME', - compact='NO') - #print(result.stdout) - if result.returncode != 0: - raise XlcError("maketime returned status {}".format(result.returncode)) - except: - raise - - try: - #print("Running pcaextlc2") - result=hsp.pcaextlc2(src_infile="@{}/FP_dtstd2.lis".format(outdir), - bkg_infile="@{}/FP_dtbkg2.lis".format(outdir), - outfile=os.path.join(outdir,'rxte_example.lc'), - gtiandfile=os.path.join(outdir,'rxte_example.gti'), - chmin=chmin, - chmax=chmax, - pculist='ALL', layerlist='ALL', binsz=16) - #print(result.stdout) - if result.returncode != 0: - raise XlcError("pcaextlc2 returned status {}".format(result.returncode)) - except: - raise - - with pyfits.open(os.path.join(outdir,'rxte_example.lc'),memmap=False) as hdul: - lc=hdul[1].data - if cleanup: - shutil.rmtree(outdir,ignore_errors=True) - return lc - -``` - -Let's look just at a small part of the time range, and look at only the first few for speed: - -```python -break_at=10 -for (k,val) in enumerate(ids): - if k>break_at: break - l=rxte_lc(ao=val['cycle'], obsid=val['obsid'], chmin="5",chmax="10") - try: - lc=np.hstack([lc,l]) - except: - lc=l - -``` - -```python -# Because the obsids won't necessarily be processed in time order -lc.sort(order='TIME') -``` - -```python -plt.plot(lc['TIME'],lc['RATE']) -``` - -```python -hdu = pyfits.BinTableHDU(lc) -pyfits.HDUList([pyfits.PrimaryHDU(),hdu]).writeto('eta_car.lc',overwrite=True) - -``` - -You could then remove the break in the above loop and submit this job to the [batch queue](https://apps.sciserver.org/compute/jobs). - -```python - -``` diff --git a/rxte_example_spectral.md b/rxte_example_spectral.md deleted file mode 100644 index a370f9e..0000000 --- a/rxte_example_spectral.md +++ /dev/null @@ -1,164 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.15.2 - kernelspec: - display_name: (heasoft) - language: python - name: heasoft ---- - -# A simple RXTE spectral extraction example - -Here we just show how to get a list of RXTE observations of a given source, construct a file list to the standard products, and extract spectra in physical units using PyXspec. - -```python -import sys,os,glob -import pyvo as vo -import numpy as np -import matplotlib.pyplot as plt -%matplotlib inline -import astropy.io.fits as fits -import xspec -xspec.Xset.allowPrompting = False -# Ignore unimportant warnings -import warnings -warnings.filterwarnings('ignore', '.*Unknown element mirrorURL.*', - vo.utils.xml.elements.UnknownElementWarning) -``` - -First query the HEASARC for its catalogs related to XTE. For more on using PyVO to find observations, see [NAVO's collection of notebook tutorials](https://nasa-navo.github.io/navo-workshop/). - -```python -# First query the Registry to get the HEASARC TAP service. -tap_services=vo.regsearch(servicetype='tap',keywords=['heasarc']) -# Then query that service for the names of the tables it serves. -heasarc_tables=tap_services[0].service.tables - -for tablename in heasarc_tables.keys(): - if "xte" in tablename: - print(" {:20s} {}".format(tablename,heasarc_tables[tablename].description)) - -``` - -Query the xtemaster catalog for observations of Eta Car - -```python -# Get the coordinate for Eta Car -import astropy.coordinates as coord -pos=coord.SkyCoord.from_name("eta car") -query="""SELECT target_name, cycle, prnb, obsid, time, exposure, ra, dec - FROM public.xtemaster as cat - where - contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},0.1))=1 - and - cat.exposure > 0 order by cat.time - """.format(pos.ra.deg, pos.dec.deg) -results=tap_services[0].search(query).to_table() -results -``` - -```python -## Need cycle number as well, since after AO9, -## no longer 1st digit of proposal number -ids=np.unique( results['cycle','prnb','obsid']) -ids -``` - -At this point, you need to construct a file list. There are a number of ways to do this, but this one is just using our knowledge of how the RXTE archive is structured. This code block limits the results to a particular proposal ID to make this quick, but you could remove that restriction and wait longer: - -```python -## Construct a file list. -rootdir="/FTP" -rxtedata="rxte/data/archive" -filenames=[] -for (k,val) in enumerate(ids['obsid']): - # Skip some for a quicker test case - if ids['prnb'][k]!=80001: - continue - fname="{}/{}/AO{}/P{}/{}/stdprod/xp{}_s2.pha.gz".format( - rootdir, - rxtedata, - ids['cycle'][k], - ids['prnb'][k], - ids['obsid'][k], - ids['obsid'][k].replace('-','')) - #print(fname) - f=glob.glob(fname) - if (len(f) > 0): - filenames.append(f[0]) -print("Found {} out of {} files".format(len(filenames),len(ids))) -``` - -```python -print(type(ids['obsid'][k])) -print(type('-')) -import inspect,astropy -inspect.getfile(astropy) -``` - -Now we have to use our knowledge of [PyXspec](https://heasarc.gsfc.nasa.gov/xanadu/xspec/python/html/quick.html) to convert the spectra into physical units. Then we use Matplotlib to plot, since the Xspec plotter is not available here. - -(Note that there will be errors when the code tries to read in the background and response files from the working directory. We then specify them explicitly.) - -```python -dataset=[] -xref=np.arange(0.,50.,1) -for f in filenames[0:500]: - xspec.AllData.clear() # clear out any previously loaded dataset - ## Ignore the errors it will print about being unable - ## to find response or background - s = xspec.Spectrum(f) - ## Then specify with the correct path. - s.background=f.replace("_s2.pha","_b2.pha") - s.response=f.replace("_s2.pha",".rsp") - xspec.Plot.area=True - xspec.Plot.xAxis = "keV" - xspec.Plot.add = True - xspec.Plot("data") - xspec.Plot.background = True - xVals = xspec.Plot.x() - yVals = xspec.Plot.y() - yref= np.interp(xref, xVals, yVals) - dataset.append( yref ) - -``` - -```python -fig, ax = plt.subplots(figsize=(10,6)) - -for s in dataset: - ax.plot(xref,s) -ax.set_xlabel('Energy (keV)') -ax.set_ylabel(r'counts/cm$^2$/s/keV') -ax.set_xscale("log") -ax.set_yscale("log") -``` - -And now you can put these into your favorite spectral analysis program like [PyXspec](https://heasarc.gsfc.nasa.gov/xanadu/xspec/python/html/quick.html) or into an AI/ML analysis following [our lightcurve example](rxte_example_lightcurves.ipynb). - -If you prefer to use the Xspec plot routines, you can do so but only using an output file. It cannot open a window through a notebook running on SciServer. So here's an example using a GIF output file and then displaying the result in the notebook: - -```python -xspec.Plot.splashPage=None -xspec.Plot.device='spectrum.gif/GIF' -xspec.Plot.xLog = True -xspec.Plot.yLog = True -xspec.Plot.background = False -xspec.Plot() -xspec.Plot.device='/null' -``` - -```python -from IPython.display import Image -with open('spectrum.gif','rb') as f: - display(Image(data=f.read(), format='gif',width=500)) -``` - -```python - -``` diff --git a/source_list_querying.md b/source_list_querying.md deleted file mode 100644 index 2fc37ec..0000000 --- a/source_list_querying.md +++ /dev/null @@ -1,151 +0,0 @@ ---- -jupyter: - jupytext: - text_representation: - extension: .md - format_name: markdown - format_version: '1.3' - jupytext_version: 1.15.2 - kernelspec: - display_name: (heasoft) - language: python - name: heasoft ---- - -# Example of a large catalog exploration with a list of sources - -In this example, a user has a catalog of several thousand sources they are interested in. They'd like to find out if they've been observed by HEASARC missions and what the total exposure each sources has for that mission. This can be done in a variety of inefficient ways such as writing a script to call one of the HEASARC APIs for each of the sources. But we encourage users to discover the power of querying databases with SQL. - -This tutorial is a HEASARC-specific example of a more general workflow querying astronomy databases with Virtual Observatory protocols as described in our NASA Astronomical Virtual Observatories (NAVO) workshop notebook. - -The step in this tutorial are: -1. Prepare the input source list as VO table in XML format. -2. Find the list of HEASARC missions to be queried. -3. Submit and SQL query. - -```python -# suppress some specific warnings that are not important -import warnings -warnings.filterwarnings("ignore", module="astropy.io.votable.*") -warnings.filterwarnings("ignore", module="pyvo.utils.xml.*") -warnings.filterwarnings("ignore", module="astropy.units.format.vounit") - -## Generic VO access routines -import pyvo as vo -from astropy.table import Table -from astropy.io.votable import from_table, writeto -from astropy.io import ascii -``` - -As described in the NAVO workshop notebooks linked above, the first step is to create an object that represents a tool to query the HEASARC catalogs. - -```python -# Get HEASARC's TAP service: -tap_services = vo.regsearch(servicetype='tap',keywords=['heasarc']) -for s in tap_services: - if 'heasarc' in s.ivoid: - heasarc = s - break -heasarc.describe() -``` - ---- -## 1. Prepare the input source list as VO table in XML format: - -VO protocols use the VOTable standard for tables, which is both powerful and complicated. But astropy has easy tools to convert to and from this XML format. - -Typically, you may start from a list of sources you want to query. In this tutorial, we first create this list in comma-separated value (CSV) format to be used as our input. The file `inlist_10k.csv` contains a list of 10000 RA and DEC values. - -We then create a VOTable version that can be used in our query below. - -```python -## This is how I generated my input list in the first place. Comment out and replace with your own: -result = heasarc.service.run_sync("select ra, dec from xray limit 10000") -ascii.write(result.to_table(), "inlist_10k.csv", overwrite=True, format='csv') - -## Input a list of sources in CSV format -input_table = Table.read("inlist_10k.csv",format="csv") - -# Convert to VOTable -votable = from_table(input_table) -writeto(votable,"longlist.xml") -``` - -## 2. Find the list of HEASARC missions to be queried. - - -Note that you may also wish to generate a list of all of our master catalogs. In the case of the HEASARC, we have of order a thousand different catalogs, most of which are scientific results rather than mission observation tables. So you don't want to print all of our catalogs but a selection of them. For instance, you can do it this way: - -```python -master_catalogs=[] -for c in heasarc.service.tables: - if "master" in c.name or "mastr" in c.name: - master_catalogs.append(c.name) -print(master_catalogs) -``` - -## 3. Submit and SQL query. - -The next step is to construct a query in the SQL language, specifically a dialect created for astronomical queries, the ADQL. This is also described briefly in the workshop notebook among other places. - -Note also that each service can show you examples that its curators have put together to demonstrate, e.g.: - -```python -for e in heasarc.service.examples: - print(e['QUERY']) -``` - -
- -For our use case, we need to do something a bit more complicated involving a *cross-match* between our source list and the HEASARC master catalog for a given mission. While it may be possible to construct an even more complicated query that does all of the HEASARC master catalogs in one go, that may overload the servers, as does repeating the same query 10 thousand times for individual sources. The recommended approch is to do a 10 thousand sources cross match in a few dozen queries to the master catalogs. - -So let's start with the Chandra master catalog `chanmaster`. You can then repeat the exercise for all of the others. - -For a cross-match, you can simply upload your catalog with your query as an XML file, and at that point, you tell the service what to name it. In this case, we call it `mytable`. Then in your SQL, the table name is `tap_upload.mytable` and it otherwise behaves like any other table. Our list of sources had two columns named RA and DEC, so they are likewise refered to that way in the SQL. - -To compare your source list coordinates with the coordinates in the given master observation table, you an use the special `ADQL` functions `POINT`, `CIRCLE`, and `CONTAINS`, which do basically what they sound like. The query below matches the input source list against `chanmaster` based on a radius of 0.01 degrees. For each source, it sums all `chanmaster` observations' exposures to give the total exposure and counts how many observations that was: - -```python -# Construct a query to chanmaster to total the exposures -# for all of the uploaded sources in the list: -query=""" - SELECT cat.name, cat.ra, cat.dec, sum(cat.exposure) as total_exposure, count(*) as num_obs - FROM chanmaster cat, tap_upload.mytable mt - WHERE - CONTAINS(POINT('ICRS',cat.ra,cat.dec),CIRCLE('ICRS',mt.ra,mt.dec,0.01))=1 - GROUP BY cat.name, cat.ra, cat.dec """ -``` - -```python -# Send the query to the HEASARC server: -result = heasarc.service.run_sync(query, uploads={'mytable': 'longlist.xml'}) -# Convert the result to an Astropy Table -mytable = result.to_table() -mytable -``` - -The above shows that of our 10k sources, roughly a dozen (since the catalogs are updated daily and the row order may change, these numbers will change between runs of this notebook) were observed anywhere from once to over a thousand times. - -Lastly, you can convert the results back into CSV if you wish: - -```python -ascii.write(mytable, "results_chanmaster.csv", overwrite=True, format='csv') -``` - - -Note that sources with slightly different coordinates in the catalogs are summed separately here. If you want to group by the **average** `RA` and `DEC`, the query can be modified to the following, which will average the RA and DEC values that are slightly different for the same source. - -```python -query=""" - SELECT cat.name, AVG(cat.ra) as avg_ra, AVG(cat.dec) as avg_dec, sum(cat.exposure) as total_exposure, count(*) as num_obs - FROM chanmaster cat, tap_upload.mytable mt - WHERE - CONTAINS(POINT('ICRS',cat.ra,cat.dec),CIRCLE('ICRS',mt.ra,mt.dec,0.01))=1 - GROUP BY cat.name """ - -``` - - -```python - -```