JP-2096: Cube build stage3 #6093

jemorrison · 2021-06-02T01:59:07Z

Description

Speeding up cube_build using numba. Numba was installed using 'pip'
The purpose of this PR is to speed up cube_build using numba. Some of the modules in ifu_cube.py were moved out
of the class and made independent routines. Some of these routines were broken up into simpler routines. In this process the weighting by the Miri psi option was removed. It is not being used and it was getting cumbersome to it as an option. Removing Miri ps weighting as an option also means that the resolution reference files is not longer needed.

Including this PR in the JW pipeline requires numba to be added. But so far there seems little down side to including numba. It is fast, stable and easy to use.

Closes #6064
Fixes JP-2096

jemorrison · 2021-06-08T16:17:13Z

PR ready for review.

jemorrison · 2021-06-08T16:24:22Z

See information on JP-2096 on numba and speed tests D Law has run with the code.

jemorrison · 2021-06-08T16:27:33Z

Once it is approved that numba can be used in JWST Pipeline I can make further speed improvements in the blotting routines and have them use numba. I am going to hold off on that until it is confirmed numba is fine to use the jwst pipeline

codecov · 2021-06-11T00:28:45Z

Codecov Report

Merging #6093 (b173565) into master (d123c28) will decrease coverage by 1.13%.
The diff coverage is 39.72%.

@@            Coverage Diff             @@
##           master    #6093      +/-   ##
==========================================
- Coverage   77.69%   76.56%   -1.14%     
==========================================
  Files         402      404       +2     
  Lines       34412    35259     +847     
==========================================
+ Hits        26736    26995     +259     
- Misses       7676     8264     +588

Flag	Coverage Δ		*Carryforward flag
nightly	`77.69% <71.26%> (ø)`		Carriedforward from d123c28
unit	`56.07% <15.47%> (?)`

*This pull request uses carry forward flags. Click here to find out more.

Impacted Files	Coverage Δ
jwst/cube_build/cube_build.py	`84.06% <ø> (+6.29%)`	⬆️
setup.py	`0.00% <ø> (ø)`
jwst/cube_build/cube_internal_cal.py	`7.69% <7.69%> (ø)`
jwst/cube_build/blot_cube.py	`22.72% <22.72%> (ø)`
jwst/cube_build/ifu_cube.py	`60.36% <44.80%> (-12.40%)`	⬇️
jwst/cube_build/cube_build_wcs_util.py	`28.39% <77.77%> (-3.49%)`	⬇️
jwst/cube_build/blot_cube_build.py	`76.10% <83.33%> (-23.90%)`	⬇️
jwst/cube_build/cube_build_step.py	`59.21% <100.00%> (-7.60%)`	⬇️
... and 13 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d123c28...b173565. Read the comment docs.

hbushouse · 2021-06-11T17:00:41Z

@jdavies-st or @eslavich any ideas what's killing the CI test "CI/Installed package with --pyargs"?

eslavich · 2021-06-11T17:15:19Z

It looks like jwst/assign_wcs/tests/test_niriss.py::test_niriss_wfss_available_frames failed due to a reference file that didn't download correctly, which in turn caused the dreaded "cascade of open files errors". I suspect if we rerun the CI jobs they'll pass, I'll do that now.

jdavies-st · 2021-06-11T17:16:52Z

If you search the log for "FAILURES" you can see it is having trouble asdf.open() on a downloaded CRDS NIRISS WFSS reference file.

jdavies-st · 2021-06-11T17:17:33Z

We should be opening those reffiles in a cleaner way, i.e. with a with context manager so that if they fail to load, we don't get a cascade of errors.

jemorrison · 2021-06-14T16:51:05Z

@jdavies-st @nden if you have time now could you look this over or suggest someone else to look at it.
It is using numba to speed up cube_build

jemorrison · 2021-07-19T19:43:18Z

@hbushouse @nden @mcara
This PR is ready for review. This PR pulled out the computational python modules and converted them to c routines. There are now two c extensions 1. creating IFU cubes on sky 2. creating IFU cubes in detector plane.
I made a common set of c functions that both c extensions call in cube_utilts.c
As noted above in the comments - an unused weighting function - miripsf - was removed as an option.
The unit tests run and the regression tests also run (with a few differences due to improvements I made when creating c routines).

drlaw1558 · 2021-07-19T20:25:19Z

Following the usual means of checking out a PR, I'm getting errors about being unable to find the C modules. E.g.,
from .cube_match_sky import cube_wrapper
returns
No module named jwst.cube_build.cube_match_sky

I'm probably just doing something wrong- do I need to do anything special to compile the code inside the src/ directory first or should python do that automatically?

jemorrison · 2021-07-19T22:51:42Z

To compile the c code. In the top jwst directory - the directory with setup.py
Type pip install -e .
That compiles the c code that is defined in setup.py
Let me you if you have problems with that

mcara · 2021-07-20T07:47:18Z

jwst/cube_build/blot_cube.py

+    index_x = np.where(xdistance <= roi_det)
+    index_y = np.where(ydistance <= roi_det)


I am not familiar with the algorithm, but it seems like index_x and index_y, in principle, could have different lengths or point to different "pixels". Would, this be an issue? Especially different lengths in the code just below.

I am going to hold off making changes to blot_cube.py because I have another JP ticket to work just on blot cube after I get the c extensions in this PR committed. I will come back to these changes suggestions later this week.

mcara · 2021-07-20T08:04:12Z

jwst/cube_build/blot_cube.py

+
+        d1pix = x_cube[ipt] - xcenter[index_x]
+        d2pix = y_cube[ipt] - ycenter[index_y]
+
+        dxy = [(dx * dx + dy * dy) for dy in d2pix for dx in d1pix]
+        dxy = np.array(dxy)
+        dxy = np.sqrt(dxy)
+        weight_distance = np.exp(-dxy)


Suggested change

d1pix = x_cube[ipt] - xcenter[index_x]

d2pix = y_cube[ipt] - ycenter[index_y]

dxy = [(dx * dx + dy * dy) for dy in d2pix for dx in d1pix]

dxy = np.array(dxy)

dxy = np.sqrt(dxy)

weight_distance = np.exp(-dxy)

weight_distance = np.exp(-np.sqrt(np.add.outer(

np.square(x_cube[ipt] - xcenter[index_x]),

np.square(y_cube[ipt] - ycenter[index_y])

).ravel()))

or, alternatively:

Suggested change

d1pix = x_cube[ipt] - xcenter[index_x]

d2pix = y_cube[ipt] - ycenter[index_y]

dxy = [(dx * dx + dy * dy) for dy in d2pix for dx in d1pix]

dxy = np.array(dxy)

dxy = np.sqrt(dxy)

weight_distance = np.exp(-dxy)

weight_distance = np.exp(-np.linalg.norm(np.meshgrid(

y_cube[ipt] - ycenter[index_y],

x_cube[ipt] - xcenter[index_x]

), axis=0).ravel())

mcara · 2021-07-20T08:09:14Z

jwst/cube_build/blot_cube.py

+        index2d = [iy * blot_xsize + ix for iy in index_y[0] for ix in (index_x[0] + xstart)]
+        index2d = np.array(index2d)


Suggested change

index2d = [iy * blot_xsize + ix for iy in index_y[0] for ix in (index_x[0] + xstart)]

index2d = np.array(index2d)

index2d = np.add.outer(index_y[0] * blot_xsize, index_x[0] + xstart).ravel()

mcara · 2021-07-20T08:13:18Z

jwst/cube_build/blot_cube_build.py

                ts1 = time.time()
-                log.debug(f"Time to map 1 slice  =  {ts1-ts0:.1f}")
+                log.debug(f"Time to blot 1 slice on NIRspec  =  {ts1-ts0:.1f}")


Is timing for one slice relevant even for debugging purposes?

it is for NIRSPEC. It can take several seconds per slice. Once we get it faster I will remove the debug timing

mcara · 2021-07-20T08:13:42Z

jwst/cube_build/blot_cube_build.py

-                        index2d = [iy * blot_xsize + ix for iy in index_y[0] for ix in (index_x[0])]
-                        blot_flux[index2d] = blot_flux[index2d] + weighted_flux
-                        blot_weight[index2d] = blot_weight[index2d] + weight_distance
+                    blot_cube.blot_overlap(ipt, xstart,


Maybe just set xstart to 0?

holding off on blot changes - I have opened a separate ticket on blotting

mcara · 2021-07-20T09:48:10Z

jwst/cube_build/src/cube_match_sky.c

+  int *idqv = NULL;  // int vector for spaxel
+
+    if (mem_alloc_dq(ncube, &idqv)) return 1;
+
+    // Set all data to zero
+    for (long i = 0; i < ncube; i++){
+      idqv[i] = 0;
+    }


Suggested change

int *idqv = NULL; // int vector for spaxel

if (mem_alloc_dq(ncube, &idqv)) return 1;

// Set all data to zero

for (long i = 0; i < ncube; i++){

idqv[i] = 0;

}

int *idqv; // int vector for spaxel

if (mem_alloc_dq(ncube, &idqv)) return 1;

mcara · 2021-07-20T09:50:46Z

jwst/cube_build/src/cube_match_sky.c

+  int *idqv = NULL;  // int vector for spaxel
+
+  if (mem_alloc_dq(ncube, &idqv)) return 1;
+
+  // Set all data to zero
+  for (long i = 0; i < ncube; i++){
+    idqv[i] = 0;
+  }


Suggested change

int *idqv = NULL; // int vector for spaxel

if (mem_alloc_dq(ncube, &idqv)) return 1;

// Set all data to zero

for (long i = 0; i < ncube; i++){

idqv[i] = 0;

}

int *idqv; // int vector for spaxel

if (mem_alloc_dq(ncube, &idqv)) return 1;

mcara · 2021-07-20T09:51:23Z

jwst/cube_build/src/cube_match_sky.c

+      double c2_min;
+      double c1_max;
+      double c2_max;
+      int status = 0; 


Suggested change

int status = 0;

int status;

mcara · 2021-07-20T09:53:47Z

jwst/cube_build/src/cube_match_sky.c

+    double *fluxv = NULL, *weightv=NULL, *varv=NULL ;  // vectors for spaxel 
+    double *ifluxv = NULL;  // vector for spaxel
+
+    // allocate memory to hold output 
+    if (mem_alloc(ncube, &fluxv, &weightv, &varv, &ifluxv)) return 1;
+
+    double set_zero=0.0;
+    // Set all data to zero
+    for (int i = 0; i < ncube; i++){
+      varv[i] = set_zero;
+      fluxv[i] = set_zero;
+      ifluxv[i] = set_zero;
+      weightv[i] = set_zero;
+    }


This can be simplified as in cube_match_internal.c if using alloc_flux_arrays.

good suggestion. I moved alloc_flux_arrays to cube_utils.c

mcara · 2021-07-20T09:57:20Z

jwst/cube_build/src/cube_match_sky.c

+    if (mem_alloc(ncube, &fluxv, &weightv, &varv, &ifluxv)) return 1;
+
+    double set_zero=0.0;
+    // Set all data to zero
+    for (int i = 0; i < ncube; i++){
+      varv[i] = set_zero;
+      fluxv[i] = set_zero;
+      ifluxv[i] = set_zero;
+      weightv[i] = set_zero;
+    }


Again this can be simplified as in cube_match_internal.c if using alloc_flux_arrays.

hbushouse · 2021-07-20T13:15:24Z

jwst/cube_build/src/cube_match_internal.c

@@ -0,0 +1,484 @@
+/*
+The detector pixels are represented by a 'point could' on the sky. The IFU cube is


typo: "could" -> "cloud"

hbushouse · 2021-07-20T13:16:17Z

jwst/cube_build/src/cube_match_internal.c

+/*
+The detector pixels are represented by a 'point could' on the sky. The IFU cube is
+represented by a 3-D regular grid. This module finds the point cloud members contained
+in a region centered on the center of the cube spaxel. The size of the spaxel is spatial


typo? Do you mean "size of the spaxel in spatial coords?

hbushouse · 2021-07-20T13:16:45Z

jwst/cube_build/src/cube_match_internal.c

+The detector pixels are represented by a 'point could' on the sky. The IFU cube is
+represented by a 3-D regular grid. This module finds the point cloud members contained
+in a region centered on the center of the cube spaxel. The size of the spaxel is spatial
+coordinates is cdetl1 and cdelt2, while the wavelength size is zcdelt3.


typo? should "zcdelt3" be just "cdelt3"?

Oh, and "cdetl1" should be "cdelt1"

hbushouse · 2021-07-20T13:18:16Z

jwst/cube_build/src/cube_match_internal.c

+represented by a 3-D regular grid. This module finds the point cloud members contained
+in a region centered on the center of the cube spaxel. The size of the spaxel is spatial
+coordinates is cdetl1 and cdelt2, while the wavelength size is zcdelt3.
+This module uses the e modified shephard weighting method to determine how to  weight each point clold member


do you really mean "the e modified shepard weighting" or is the "e" extraneous? And another instance of "clold" that should be "cloud".

hbushouse · 2021-07-20T13:23:27Z

jwst/cube_build/src/cube_match_sky.c

@@ -0,0 +1,1441 @@
+/*
+The detector pixels are represented by a 'point could' on the sky. The IFU cube is


"could" -> "cloud"

hbushouse · 2021-07-20T13:24:21Z

jwst/cube_build/src/cube_match_sky.c

+The detector pixels are represented by a 'point could' on the sky. The IFU cube is
+represented by a 3-D regular grid. This module finds the point cloud members contained
+in a region centered on the center of the cube spaxel. The size of the spaxel is spatial
+coordinates is cdetl1 and cdelt2, while the wavelength size is zcdelt3.


All the same typos as the same paragraph in the previous module (cdetl1, zcdelt3, ...)

fixed typos

hbushouse · 2021-07-20T13:29:21Z

setup.cfg

@@ -35,6 +35,7 @@ install_requires =
    gwcs>=0.16.1
    jsonschema>=3.0.2
    numpy>=1.17
+    numba>=0.50.0


Now that we're using C extensions instead of numba, can this be removed?

yes I forgot I put that in

jemorrison · 2021-07-20T22:18:45Z

@jdavies-st @eslavich
The PR failed on CI/Installed package with --pyargs
What does the mean ? I looked at the details but I am a bit lost how to fix it

jdavies-st · 2021-07-21T16:33:51Z

It looks like it was getting a corrupted CRDS reference file. I kicked off the CI again. Hopefully it won't be corrupted this time.

jemorrison · 2021-07-21T22:26:04Z

I made the changes suggested in the review. I reran the regression tests and they look good. There are some expected differences because I improved the DQ flags for edge cases and I made a few other little improvements.
I think this is ready to be merged

jdavies-st · 2021-07-22T16:37:45Z

FYI the doc build is failing because the new C module cannot be imported for the doc build. @eslavich do you recall what the issue was for C extensions and doc builds?

hbushouse · 2021-07-22T16:47:40Z

We made a slight change in docs/conf.py or something like that to remove a line that was causing it to look for things in a parent directory. Look at https://github.com/spacetelescope/jwst/pull/6207/files

drlaw1558 · 2021-07-22T23:36:57Z

I'm seeing some larger changes in the data cubes than I would have expected given this just changes the language not the algorithm; let me investigate further and get back to you. Did anything change that should have affected the total cube FOV?

jemorrison · 2021-07-22T23:54:21Z

@drlaw1558 I tweaked the DQ flagging. Hopefully improving it near boundaries. I also tweaked- for MIRI - how the wavelength range that is used to build the cube is determined. There was a small bug in the old code.

eslavich · 2021-07-23T00:33:11Z

We merged a fix for the doc build failure in #6230, so a rebase ought to get that unstuck.

drlaw1558 · 2021-07-26T22:23:29Z

Ok, disregard my last comment- the changes that I was seeing were due to something changing earlier in the pipeline (that I'll have to run down elsewhere), not cube build. Performance looks good to me; when the test is set up properly SCI results from running spec3 in multiple different modes look identical before/after this change but with vastly improved runtimes.

drlaw1558

Focusing entirely on results, the performance of this PR looks good to me. Running some test cases with dithered exposures in multiple bands through a variety of different kinds of cube building I get identical SCI results before/after this change but with vastly improved runtimes. DQ arrays look improved.

jemorrison · 2021-07-29T12:41:04Z

@hbushouse can we merge this PR ?

nden · 2021-07-29T14:29:49Z

jwst/cube_build/src/cube_match_internal.c

+
+  // loop over each valid point on detector and find match to IFU cube based
+  // on along slice coordinate and wavelength
+  for (int ipixel= 0; ipixel< npt; ipixel++){


@jemorrison I believe you need to declare the variables in the beginning of the file as certain compilers won't like the definition within the for loop.

nden · 2021-07-29T14:30:23Z

jwst/cube_build/src/cube_match_internal.c

+    double wave_min = 10000;
+    double along_max = -10000;
+    double wave_max = -10000;
+    for (int j = 0; j< 4; j++){


Same comment as above

nden · 2021-07-29T14:30:37Z

jwst/cube_build/src/cube_match_internal.c

+
+    int nplane = naxis1 * naxis2;
+    // loop over possible overlapping cube pixels      
+    for(int zz =iz1; zz < iz2+1; zz++){


nden · 2021-07-29T14:31:00Z

jwst/cube_build/src/cube_match_internal.c

+    for(int zz =iz1; zz < iz2+1; zz++){
+      double zcenter = zcoord[zz];
+      int istart = zz * nplane;
+      for (int aa= ia1; aa< ia2 + 1; aa++){


jemorrison · 2021-07-29T18:17:05Z

@nden
As I was moving all the definition to top of code. I see that I am I doing this:

nxy = nx * ny
int wave_slice_dq[nxy]

The code works fine doing this on my Mac. I am now thinking this may not be allowed - dynamically allocated the array. Should I change how I do this ?

* updates using numba jit * flake 8 fixes * a few numba updates * updates to support internal_cal and numba * fix test - removing unused resolution file * improved blotting speed using numba * added c code for emsm * fixed setup.py to compile match_det_cube * updates to c code * some changes to c python interface * more fixes to c code * added cube_match_internal and pulled common c routines to cube_utils.c * Clean up * remove cube_cloud.py * added weighting=msm as possibility for c extension cube weighting * removed declaration of numba from routine * fix typo * flake8 fix * remove printf from c code * remove print in ifu_cube.py * typo in cube_match_sky.c * changes after review * fix alloc arrays def * Updated change log * remove print statement (cherry picked from commit 7a8738b)

* updates using numba jit * flake 8 fixes * a few numba updates * updates to support internal_cal and numba * fix test - removing unused resolution file * improved blotting speed using numba * added c code for emsm * fixed setup.py to compile match_det_cube * updates to c code * some changes to c python interface * more fixes to c code * added cube_match_internal and pulled common c routines to cube_utils.c * Clean up * remove cube_cloud.py * added weighting=msm as possibility for c extension cube weighting * removed declaration of numba from routine * fix typo * flake8 fix * remove printf from c code * remove print in ifu_cube.py * typo in cube_match_sky.c * changes after review * fix alloc arrays def * Updated change log * remove print statement

stscijgbot-jp mentioned this pull request Jun 2, 2021

Move computationally intensive portion of cube build to C or use numba(stage 2 of cube_build improvements) #6064

Closed

jemorrison requested review from jdavies-st and nden June 8, 2021 16:21

hbushouse added the cube_build label Jun 8, 2021

hbushouse added this to the Build 7.9 milestone Jun 8, 2021

jemorrison requested review from mcara, drlaw1558 and hbushouse and removed request for jdavies-st July 19, 2021 19:34

mcara reviewed Jul 20, 2021

View reviewed changes

hbushouse reviewed Jul 20, 2021

View reviewed changes

jemorrison added 5 commits July 21, 2021 11:19

updates using numba jit

484bfa2

flake 8 fixes

b37940e

a few numba updates

da1ae3c

updates to support internal_cal and numba

9e230c7

fix test - removing unused resolution file

d1e4ddb

jemorrison added 2 commits July 21, 2021 11:19

fix alloc arrays def

e506588

Updated change log

0de907f

jemorrison force-pushed the cube_build_stage3 branch from 64d4bfa to 0de907f Compare July 21, 2021 18:28

remove print statement

b173565

drlaw1558 approved these changes Jul 26, 2021

View reviewed changes

hbushouse approved these changes Jul 29, 2021

View reviewed changes

hbushouse merged commit 7a8738b into spacetelescope:master Jul 29, 2021

nden reviewed Jul 29, 2021

View reviewed changes

jemorrison deleted the cube_build_stage3 branch August 2, 2021 17:55

hbushouse mentioned this pull request Aug 3, 2021

JP-2096: Fix c code for cube_build #6255

Merged

stscijgbot mentioned this pull request Aug 17, 2021

Efficiency of Cube Build #2442

Closed

hbushouse added CANDIDATE_B7.8.2 and removed CANDIDATE_B7.8.2 labels Aug 31, 2021

jdavies-st modified the milestones: Build 7.9, Build 7.8.2 Sep 2, 2021

jdavies-st changed the title ~~Cube build stage3 - JP-2096~~ JP-2096: Cube build stage3 Sep 2, 2021

		index_x = np.where(xdistance <= roi_det)
		index_y = np.where(ydistance <= roi_det)

		index2d = [iy * blot_xsize + ix for iy in index_y[0] for ix in (index_x[0] + xstart)]
		index2d = np.array(index2d)

	index2d = [iy * blot_xsize + ix for iy in index_y[0] for ix in (index_x[0] + xstart)]
	index2d = np.array(index2d)
	index2d = np.add.outer(index_y[0] * blot_xsize, index_x[0] + xstart).ravel()

		@@ -0,0 +1,484 @@
		/*
		The detector pixels are represented by a 'point could' on the sky. The IFU cube is

		@@ -0,0 +1,1441 @@
		/*
		The detector pixels are represented by a 'point could' on the sky. The IFU cube is

JP-2096: Cube build stage3 #6093

JP-2096: Cube build stage3 #6093

Conversation

jemorrison commented Jun 2, 2021 • edited by hbushouse Loading

jemorrison commented Jun 8, 2021

jemorrison commented Jun 8, 2021

jemorrison commented Jun 8, 2021

codecov bot commented Jun 11, 2021 • edited Loading

Codecov Report

hbushouse commented Jun 11, 2021

eslavich commented Jun 11, 2021

jdavies-st commented Jun 11, 2021

jdavies-st commented Jun 11, 2021

jemorrison commented Jun 14, 2021

jemorrison commented Jul 19, 2021

drlaw1558 commented Jul 19, 2021

jemorrison commented Jul 19, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jemorrison commented Jul 20, 2021

jdavies-st commented Jul 21, 2021

jemorrison commented Jul 21, 2021

jdavies-st commented Jul 22, 2021

hbushouse commented Jul 22, 2021

drlaw1558 commented Jul 22, 2021

jemorrison commented Jul 22, 2021 • edited Loading

eslavich commented Jul 23, 2021

drlaw1558 commented Jul 26, 2021

drlaw1558 left a comment

Choose a reason for hiding this comment

jemorrison commented Jul 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jemorrison commented Jul 29, 2021

jemorrison commented Jun 2, 2021 •

edited by hbushouse

Loading

codecov bot commented Jun 11, 2021 •

edited

Loading

jemorrison commented Jul 22, 2021 •

edited

Loading