Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update example notebooks to use in-flight JWST data #1680

Merged
merged 28 commits into from
Oct 13, 2022

Conversation

rosteen
Copy link
Collaborator

@rosteen rosteen commented Sep 27, 2022

Does what it says on the tin. Opening as draft since I've only finished Specviz, Cubeviz and Imviz.

@github-actions github-actions bot added the documentation Explanation of code and concepts label Sep 27, 2022
@codecov
Copy link

codecov bot commented Sep 27, 2022

Codecov Report

Base: 87.26% // Head: 87.26% // No change to project coverage 👍

Coverage data is based on head (27f906a) compared to base (8ec968c).
Patch has no changes to coverable lines.

❗ Current head 27f906a differs from pull request most recent head 32c7e90. Consider uploading reports for the commit 32c7e90 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1680   +/-   ##
=======================================
  Coverage   87.26%   87.26%           
=======================================
  Files          95       95           
  Lines       10091    10091           
=======================================
  Hits         8806     8806           
  Misses       1285     1285           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -61,7 +64,9 @@
"metadata": {},
"outputs": [],
"source": [
"fn = download_file('https://stsci.box.com/shared/static/28a88k1qfipo4yxc4p4d40v4axtlal8y.fits', cache=True)\n",
"uri = \"mast:JWST/product/jw02732-c1001_t004_miri_ch1-longmediumshort-_s3d.fits\"\n",
"result = Observations.download_file(uri, base_url=\"https://mast.stsci.edu/api/v0.1/Download/file\")\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't think create a cache dir under the working directory? We need to add that to .gitignore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, it just downloads the files to the working directory.

@@ -333,7 +332,7 @@
"metadata": {},
"outputs": [],
"source": [
"plt.imshow(mask.to_image(data.shape), origin='lower')"
"plt.imshow(mask.to_image(data.data.shape), origin='lower')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra .data?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the extra .data I get AttributeError: 'NDData' object has no attribute 'shape' with these images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's weird. It is supposed to be glue Data object. What has changed while I was away?

Copy link
Contributor

@pllim pllim Sep 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. NDData having no ndim might be a design oversight. Even CCDData has ndim. I should ask astropy. See astropy/astropy#13775

@@ -517,7 +516,7 @@
"metadata": {},
"outputs": [],
"source": [
"viewer.marker = {'color': 'green', 'alpha': 0.8, 'markersize': 10, 'fill': False}"
"viewer.marker = {'color': 'red', 'alpha': 0.8, 'markersize': 10, 'fill': False}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the color? The point is to show a color that is different from the default (red).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was worried that green over Viridis colormap wasn't visible enough, but didn't realize when I changed this that it doesn't apply to the 50 random markers before, only the my_sky markers. I'll experiment and see what a non-red color with good visibility is.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided green was fine and reverted it.

"fn = download_file('https://stsci.box.com/shared/static/exnkul627fcuhy5akf2gswytud5tazmw.fits', cache=True)"
"uri = \"mast:JWST/product/jw01538-o160_s00004_nirspec_f170lp-g235h-s1600a1-sub2048_s2d.fits\"\n",
"result = Observations.download_file(uri, base_url=\"https://mast.stsci.edu/api/v0.1/Download/file\")\n",
"fn = os.path.join(os.path.abspath('.'), uri.rsplit('/', 1)[-1])"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain what this is trying to do.

@rosteen rosteen marked this pull request as ready for review October 3, 2022 18:17
@rosteen rosteen added this to the 2.11 milestone Oct 3, 2022
@rosteen
Copy link
Collaborator Author

rosteen commented Oct 3, 2022

Moved to Ready for Review. @pllim I kept the ACS example in a separate notebook, ImvizDitheredExample. We still need to find good NIRSpec data to use for MosvizExample, but don't want to hold up updating the rest of them any longer.

Copy link
Member

@kecnry kecnry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed a few things with these data sets:

  • cubeviz: setting collapse function to median shows a flat line (for the entire cube) - are all JWST cubes median-normalized? If not, maybe we should pick one that isn't so selecting median actually shows something
  • mosviz: the first few entries (non clear/prism) are not displaying a 1D spectrum, which can be confusing for a new user. The image viewer is also blank.
  • mosviz niriss: beautiful image, but the x-limits on the spectra are kind of annoying... maybe we should open a ticket to try to detect and automatically handle cases with so much "padding" on the spectral axis.

@rosteen
Copy link
Collaborator Author

rosteen commented Oct 3, 2022

I forgot I had started working on MosvizExample, I need to revert that to the current version.

@camipacifici
Copy link
Contributor

  • mosviz niriss: beautiful image, but the x-limits on the spectra are kind of annoying... maybe we should open a ticket to try to detect and automatically handle cases with so much "padding" on the spectral axis.

There was a bug in a reference file that caused the padding. Let me see if those files have been reprocessed in MAST.

@rosteen
Copy link
Collaborator Author

rosteen commented Oct 6, 2022

@camipacifici I switched to using the reprocessed data, but it appears that it still has the same x-axis padding problem. @kecnry could you confirm? Pretty sure I avoided any caching issues.

@kecnry
Copy link
Member

kecnry commented Oct 6, 2022

but it appears that it still has the same x-axis padding problem.

Confirmed... and I'm also getting the following traceback when zooming on the 2d spectrum (probably out of scope for this PR, but would be nice to have fixed if we're going to make this the standard example):

AttributeError: 'NoneType' object has no attribute 'world_to_pixel'

@camipacifici
Copy link
Contributor

Ah sorry for not checking this. I guess they have not been reprocessed then.
I think we can live with this now (modulo the bug Kyle found). I would not want to delay all the other notebooks for this.

@@ -92,69 +102,61 @@
"metadata": {},
"outputs": [],
"source": [
"acs_47tuc_1 = download_file('https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/jbqf03gjq_flc.fits', cache=True)\n",
"acs_47tuc_2 = download_file('https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/jbqf03h1q_flc.fits', cache=True)"
"files = ['jw02731-o001_t017_nircam_clear-f090w_i2d.fits',\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe tell people how big these files are. I am still waiting for the first file to download and I am not sure how long it is supposed to take. 👀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my own question (but should still be added to the notebook narrative, I think):

  • jw02731-o001_t017_nircam_clear-f090w_i2d.fits 3.3GB
  • jw02731-o001_t017_nircam_f444w-f470n_i2d.fits 789MB

In contrast, the HST/ACS files are around 161MB each (more or less).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this more, even though it will be less impressive, I wonder if we should stick to smaller subarrays in the example notebook... 💭

" ]\n",
"for fn in files:\n",
" uri = f\"mast:JWST/product/{fn}\"\n",
" result = Observations.download_file(uri, base_url=\"https://mast.stsci.edu/api/v0.1/Download/file\")\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe more of a question for @havok2063 -- Is there no way to have MAST cache this somewhere outside of the source checkout tree or at least let user provide the cache directory? Currently, this downloads the big files into the notebooks directory, which poses two different risks:

  1. If I do a git clean -xdf (which I do from time to time), I will have to redownload these again when I next run the notebook.
  2. If someone accidentally added these to version control (you can overwrite .gitignore ignore if you are not careful though not easy but also not impossible), we will have to do some crazy git-fu to squash it out of git history (especially if it is not caught early).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you can specify a location of the download path with local_path, and you can set a cache boolean flag. If you are downloading files from the production domain, i.e.mast.stsci.edu, then you do not need to specify the base_url keyword. See https://astroquery.readthedocs.io/en/latest/api/astroquery.mast.ObservationsClass.html#astroquery.mast.ObservationsClass.download_file.

Copy link
Contributor

@pllim pllim Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! In that case, I propose we add a local_path=Path.home() / 'mastDownloads' input to the Observations.download_file() calls. This is Windows-friendly, I checked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@havok2063 It seems that local_path expects a filename, not a directory, contrary to the documentation. Should I report this as an issue in astroquery?

 [Failed]

WARNING: Found cached file /Users/rosteen/MAST_Downloads with size 64 that is different from expected size 603515520 [astroquery.query]

---------------------------------------------------------------------------
IsADirectoryError                         Traceback (most recent call last)
Cell In [4], line 11
      9 for fn in files:
     10     uri = f"mast:JWST/product/{fn}"
---> 11     result = Observations.download_file(uri, local_path=Path.home() / 'MAST_Downloads/', cache=True)
     12     with warnings.catch_warnings():
     13         warnings.simplefilter('ignore')

File ~/opt/anaconda3/envs/viz_dev/lib/python3.10/site-packages/astroquery/mast/observations.py:568, in ObservationsClass.download_file(self, uri, local_path, base_url, cache, cloud_only)
    565             self._download_file(data_url, local_path,
    566                                 cache=cache, head_safe=True, continuation=False)
    567 else:
--> 568     self._download_file(data_url, local_path,
    569                         cache=cache, head_safe=True, continuation=False)
    571 # check if file exists also this is where would perform md5,
    572 # and also check the filesize if the database reliably reported file sizes
    573 if (not os.path.isfile(local_path)) and (status != "SKIPPED"):

File ~/opt/anaconda3/envs/viz_dev/lib/python3.10/site-packages/astroquery/query.py:437, in BaseQuery._download_file(self, url, local_filepath, timeout, auth, continuation, cache, method, head_safe, **kwargs)
    433     progress_stream = io.StringIO()
    435 with ProgressBarOrSpinner(length, f'Downloading URL {url} to {local_filepath} ...',
    436                           file=progress_stream) as pb:
--> 437     with open(local_filepath, open_mode) as f:
    438         for block in response.iter_content(blocksize):
    439             f.write(block)

IsADirectoryError: [Errno 21] Is a directory: '/Users/rosteen/MAST_Downloads'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops, I missed this. Yeah I would report this as a bug in astroquery.

@@ -351,7 +353,7 @@
"metadata": {},
"outputs": [],
"source": [
"c = SkyCoord('00h24m07.33s -71d52m50.71s')\n",
"c = SkyCoord('10h36m49.211s -58d35m56.871s')\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have this actually center on some feature on the image rather than just area of blank sky.

@@ -497,8 +499,8 @@
"source": [
"# Add 100 randomly generated X,Y (0-indexed) w.r.t. reference image\n",
"# using default marker properties.\n",
"t_xy = Table({'x': np.random.randint(0, 4096, 100),\n",
" 'y': np.random.randint(0, 2048, 100)})\n",
"t_xy = Table({'x': np.random.randint(0, 14000, 50),\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have to take the markers stuff out or delete the subsets first before doing this. It is taking too long on my machine with these new images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduced this to 10 markers but still too slow on my machine.

"acs_47tuc_1 = download_file('https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/jbqf03gjq_flc.fits', cache=True)\n",
"acs_47tuc_2 = download_file('https://mast.stsci.edu/api/v0.1/Download/file?uri=mast:HST/product/jbqf03h1q_flc.fits', cache=True)"
"files = ['jw02731-o001_t017_nircam_clear-f090w_i2d.fits',\n",
" #'jw02731-o001_t017_nircam_clear-f187n_i2d.fits',\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we even give so many in the list? Someone is going to try to uncomment everything and their machine is going to 🔥

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, I only see small performance issues when loading and viewing all of these files. And hopefully those will be improved soon. I think it's useful to have them for anyone who wants to try, but maybe a warning saying they're large files and performance may be affected depending on system specs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm updating to use the Cartwheel images instead, which are ~600 MB instead of 3.2 GB.

@rosteen rosteen force-pushed the update-example-data branch from 0682eda to 44abc78 Compare October 12, 2022 20:03
"# Imviz Demonstration Notebook\n",
"\n",
"\n",
"This notebook demonstrates the Imviz API in the Notebook setting. UI equivalents for these actions, as well as additional documentation about Imviz, can be found here: https://jdaviz.readthedocs.io/en/latest/imviz/"
Copy link
Collaborator

@duytnguyendtn duytnguyendtn Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to provide a further description of why this notebook is different than the other ImvizExample/why we provide both?

@@ -395,7 +406,7 @@
"outputs": [],
"source": [
"# Center the image on given pixel position.\n",
"viewer.center_on((1173, 1013)) # X, Y (0-indexed)"
"viewer.center_on((1900, 2500)) # X, Y (0-indexed)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just running through the notebook, I'm getting an error on Imviz saying these coords are "Out of Bounds". Anyone else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh good call, this will fail if you've blinked and have data B active because it's smaller. I'll add a note about that.

Copy link
Collaborator

@duytnguyendtn duytnguyendtn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these values need to be updated as well? At least it doesn't show anything on my screen (Sorry I couldn't get GitHub Review to link to the lines directly): https://github.com/spacetelescope/jdaviz/pull/1680/files#diff-233b3446c206e78d8c837f1c0116c834df2d3193951f6e06d9e56b1a489fd575R169-R171

@rosteen
Copy link
Collaborator Author

rosteen commented Oct 12, 2022

I think these values need to be updated as well? At least it doesn't show anything on my screen (Sorry I couldn't get GitHub Review to link to the lines directly): https://github.com/spacetelescope/jdaviz/pull/1680/files#diff-233b3446c206e78d8c837f1c0116c834df2d3193951f6e06d9e56b1a489fd575R169-R171

I can't see what you were talking about here, but I just pushed an update to a couple things to do with the second viewer - is that what you meant?

@duytnguyendtn
Copy link
Collaborator

Huh, that link didn't work. My apologies. I was trying to reference the zoom values in the SpecvizExample notebook. Currently, they zoom to a blank part of the spectra

@rosteen
Copy link
Collaborator Author

rosteen commented Oct 12, 2022

Ahh good point! Updated.

Copy link
Collaborator

@duytnguyendtn duytnguyendtn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I think this is my last review comment; thanks for your patience! Are the markers showing up for you in ImvizExample? The cell is executing, but I can't find the marks. I can't tell if Im just looking in the wrong spot, or if they're not showing at all

"import pathlib\n",
"\n",
"example_data = 'https://stsci.box.com/shared/static/9lkf5zha6zkf8ujnairy6krobbh038wt.zip'\n",
"example_data = 'https://stsci.box.com/shared/static/k94suj9yv9vdbusmcdrcodhxt9urwevr.zip'\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr -- Why not MAST download here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because the full observation has a ton of objects in the catalog and is way too slow to load. This is a version of the data made by @camipacifici that cuts down the number of objects to something manageable.

Comment on lines +513 to +514
"t_xy = Table({'x': np.random.randint(0, 4000, 20),\n",
" 'y': np.random.randint(0, 4000, 20)})\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less impressive but works better for me. Otherwise, it is still laggy and gives me IOPub errors even with glue-viz/glue-jupyter#325 installed.

Suggested change
"t_xy = Table({'x': np.random.randint(0, 4000, 20),\n",
" 'y': np.random.randint(0, 4000, 20)})\n",
"t_xy = Table({'x': np.random.randint(0, 1000, 10),\n",
" 'y': np.random.randint(0, 1000, 10)})\n",

Copy link
Contributor

@pllim pllim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let you decide if you want to accept my suggestion for add_markers or not. LGTM otherwise. Thanks!

Not a blocker but I did notice data label overrun in case we want to open new ticket about it.

Screenshot 2022-10-12 170700

Thanks for keeping this:

Screenshot 2022-10-12 170947

@rosteen
Copy link
Collaborator Author

rosteen commented Oct 12, 2022

Thanks for keeping this:

It's the most important part of the notebook!

Copy link
Member

@kecnry kecnry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in so we can start using them and tweak if necessary later. Thanks!

@pllim
Copy link
Contributor

pllim commented Oct 13, 2022

Wait, wait, do we want a note about cache=False or not?

@rosteen
Copy link
Collaborator Author

rosteen commented Oct 13, 2022

Wait, wait, do we want a note about cache=False or not?

I'll add a note about it in the relevant places.

@rosteen
Copy link
Collaborator Author

rosteen commented Oct 13, 2022

Test failure unrelated, merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Explanation of code and concepts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants