BUG: Fix/test SparseSeries/SparseDataFrame stack/unstack #16616

kernc · 2017-06-06T19:26:36Z

fixes stack() method of SparseDataFrame should return a SparseSeries and optimize memory usage #15045, closes SparseSeries.unstack() raises error #16614
tests added / passed
passes git diff upstream/master --name-only -- '*.py' | flake8 --diff
whatsnew entry

codecov · 2017-06-07T11:05:30Z

Codecov Report

Merging #16616 into master will decrease coverage by 0.01%.
The diff coverage is 94.44%.

@@            Coverage Diff             @@
##           master   #16616      +/-   ##
==========================================
- Coverage   91.26%   91.24%   -0.02%     
==========================================
  Files         163      163              
  Lines       49776    49761      -15     
==========================================
- Hits        45426    45405      -21     
- Misses       4350     4356       +6

Flag	Coverage Δ
#multiple	`89.04% <94.44%> (ø)`	⬆️
#single	`40.33% <42.59%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/categorical.py	`95.29% <100%> (+0.02%)`	⬆️
pandas/core/reshape/reshape.py	`99.27% <100%> (-0.02%)`	⬇️
pandas/core/internals.py	`94.37% <92.3%> (+0.08%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/accessor.py	`93.65% <0%> (-6.35%)`	⬇️
pandas/core/tools/datetimes.py	`83.79% <0%> (-1.42%)`	⬇️
pandas/core/indexes/interval.py	`92.85% <0%> (-0.72%)`	⬇️
pandas/core/frame.py	`97.73% <0%> (-0.15%)`	⬇️
pandas/core/indexes/multi.py	`96.9% <0%> (ø)`	⬆️
pandas/io/excel.py	`80.37% <0%> (ø)`	⬆️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d43aba8...cef5f47. Read the comment docs.

jreback

logic looks fine. see if you can move this to internals to follow existing style.

jreback · 2017-06-08T10:57:14Z

doc/source/whatsnew/v0.20.2.txt

@@ -52,6 +52,9 @@ Bug Fixes
 - Bug in :func:`cut` when ``labels`` are set, resulting in incorrect label ordering (:issue:`16459`)
 - Fixed a compatibility issue with IPython 6.0's tab completion showing deprecation warnings on ``Categoricals`` (:issue:`16409`)

+- Bug in ``SparseSeries.unstack()`` and ``SparseDataFrame.stack()`` (:issue:`16614`, :issue:`15045`)


move to 0.21.0

can you change to use :func:SparseSeries.unstack (etc)

jreback · 2017-06-08T11:32:21Z

pandas/core/reshape/reshape.py

-        result = DataFrame(BlockManager(new_blocks, new_axes))
-        mask_frame = DataFrame(BlockManager(mask_blocks, new_axes))
-        return result.loc[:, mask_frame.sum(0) > 0]
+            # BlockManager can't handle SparseBlocks with multiple items,


I would like to move this logic to a function in internals. e.g. you define BlockManager.unstack, then you have an unstack on Block. This way the override logic is pretty straightforward as its done on a block basis, and this is setup so that you can return multiple blocks (in case of sparse) from these routines.

you can pass the _Unstacker as a partial function, and so the entire loop looks something like

obj._data.unstack(_Unstacker)

then call the _Unstacker inside the block.unstack() method.

jreback · 2017-06-30T20:42:47Z

doc/source/whatsnew/v0.20.2.txt

@@ -52,6 +52,9 @@ Bug Fixes
 - Bug in :func:`cut` when ``labels`` are set, resulting in incorrect label ordering (:issue:`16459`)
 - Fixed a compatibility issue with IPython 6.0's tab completion showing deprecation warnings on ``Categoricals`` (:issue:`16409`)

+- Bug in ``SparseSeries.unstack()`` and ``SparseDataFrame.stack()`` (:issue:`16614`, :issue:`15045`)


can you change to use :func:SparseSeries.unstack (etc)

jreback · 2017-06-30T20:43:12Z

pandas/core/reshape/reshape.py

        if values.ndim == 1:
            if isinstance(values, Categorical):
                self.is_categorical = values
                values = np.array(values)
+            elif self.is_sparse:
+                # XXX: Makes SparseArray *dense*, but it's supposedly


is this a TODO? or a comment?

A comment with a mild TODO hint at whomever will be refactoring the whole thing eventually to take the note into consideration.

jreback · 2017-06-30T20:44:18Z

pandas/tests/test_multilevel.py

@@ -2381,6 +2381,29 @@ def test_iloc_mi(self):
        tm.assert_frame_equal(result, expected)


+class TestSparse(object):


can you use more pytest style here

I have. But with pytest style promoting flat, module-level functions only, the module namespace will get polluted fast and often, so the tests modules will likely benefit from becoming more fine grained.

jreback · 2017-07-12T23:45:03Z

pandas/core/categorical.py

@@ -127,6 +127,8 @@ def maybe_to_categorical(array):
    """ coerce to a categorical if a series is given """
    if isinstance(array, (ABCSeries, ABCCategoricalIndex)):
        return array._values
+    elif isinstance(array, np.ndarray):


how is this hit?

in NonConsolidatingBlock._unstack() calling .make_block_same_class(), making a CategoricalBlock from unstacked values (an ndarray). Otherwise fails frame.test_reshape.TestDataFrameReshape.test_unstack_preserve_dtypes.
Since the function is named maybe_to_categorical and accepts argument array, the change seems like making perfect sense.

can you add a comment to the doc-string that this is only an internal method.

May I prefix it with an underscore? It's used in a single place only.

jreback · 2017-07-12T23:47:50Z

pandas/tests/test_multilevel.py

@@ -2381,6 +2381,39 @@ def test_iloc_mi(self):
        tm.assert_frame_equal(result, expected)


+@pytest.fixture
+def sparse_df():


move these to pandas/tests/sparse/test_reshape.py (new file); you can move other tests if appropriate (e.g. reshaping ones as well)

Any particular sparse reshaping tests you had in mind? There's one test_stack_sparse_frame in sparse.test_frame.

jreback · 2017-07-12T23:48:58Z

pandas/core/internals.py

@@ -4066,6 +4076,27 @@ def canonicalize(block):
        return all(block.equals(oblock)
                   for block, oblock in zip(self_blocks, other_blocks))

+    def unstack(self, unstacker):
+        """Return blockmanager with all blocks unstacked"""


can you expand the doc string a bit (IOW document what unstacker is supposed to be)

jreback · 2017-07-12T23:49:54Z

pandas/core/internals.py

+            new_blocks.extend(blk._unstack(new_values.T, new_placement))
+
+        bm = BlockManager(new_blocks, [new_columns, new_index])
+        bm = bm.take(mask_columns.nonzero()[0], axis=0)


what is this take for? should be inside the loop on bocks (IOW this is on each block)

jreback · 2017-07-12T23:52:03Z

pandas/core/internals.py

+        mask_columns = np.zeros_like(new_columns, dtype=bool)
+
+        for blk in self.blocks:
+            bunstacker = unstacker(


I would push almost all of the logic inside this loop into the _unstack(...) call on the block itself.

I think I managed something, but I wouldn't say it's much better now. It's still couples block unstacking with unstacked BlockManager's new_columns, and I'm not really sure how to get rid of that.

jreback · 2017-07-19T10:33:16Z

can you rebase / update according to comments

jreback

looks good!. just some comments on doc-strings and such. the more can document the better :> ping when pushed / green.

jreback · 2017-07-22T19:05:09Z

pandas/core/internals.py

@@ -1463,6 +1464,20 @@ def equals(self, other):
            return False
        return array_equivalent(self.values, other.values)

+    def _unstack(self, unstacker_t, new_columns):
+        """Return a list of unstacked blocks of self"""


can you add a doc-string here.
also name unstacker_t -> unstacker_func

jreback · 2017-07-22T19:05:31Z

pandas/core/internals.py

@@ -1706,6 +1721,22 @@ def _slice(self, slicer):
    def _try_cast_result(self, result, dtype=None):
        return result

+    def _unstack(self, unstacker_t, new_columns):
+        # NonConsolidatable blocks can have a single item only, so we return


add doc-string

jreback · 2017-07-22T19:05:54Z

pandas/core/internals.py

@@ -4161,6 +4192,34 @@ def canonicalize(block):
        return all(block.equals(oblock)
                   for block, oblock in zip(self_blocks, other_blocks))

+    def unstack(self, unstacker_t):
+        """Return a blockmanager with all blocks unstacked.


change unstacker_t as above

jreback · 2017-07-22T19:08:47Z

is this comment addressed: #15045 (comment) (ok to raise / push off to another issue as well).

jreback · 2017-07-22T19:09:52Z

actually can you add a mini reshaping section to sparse.rst and put a pointer in the whatsnew?

kernc · 2017-08-11T13:30:25Z

is this comment addressed: #15045 (comment)

It is not addressed (nor touched). Stacking Series acts as if the fill value is NaN.

actually can you add a mini reshaping section to sparse.rst

Added. What would you have me put into it?

jreback · 2017-09-23T20:28:29Z

this generally looks ok, can you rebase and i'll give a once over again.

kernc · 2017-09-23T20:32:21Z

You sure the sparse.reshaping docs section is OK? I was leaning towards removing it as methods on sparse objects should generally produce equivalent results to methods on their super counterparts, with any exceptions in fact being bugs.

jreback · 2017-09-23T20:48:09Z

You sure the sparse.reshaping docs section is OK? I was leaning towards removing it as methods on sparse objects should generally produce equivalent results to methods on their super counterparts, with any exceptions in fact being bugs.

uh, ok to remove. I hadn't really reviewed this much recently.

kernc · 2017-09-25T13:10:59Z

Rebased.

TomAugspurger · 2017-09-25T19:49:30Z

Hmm, the appveyor failure looks entirely unrelated https://ci.appveyor.com/project/pandas-dev/pandas/build/1.0.4825/job/4kpjb91d5bhe0ier#L1215

@jreback have you seen that before?

kernc · 2017-09-25T20:12:11Z

entirely unrelated

My god, that code! Thank god it broke!

jreback · 2017-09-26T13:20:43Z

@TomAugspurger yeah let's make an issue for that 'unrelated code'. no idea what's going on there.

jreback

ok just a small comment needed. ping on green.

jreback · 2017-09-26T13:24:21Z

pandas/core/categorical.py

@@ -127,6 +127,8 @@ def maybe_to_categorical(array):
    """ coerce to a categorical if a series is given """
    if isinstance(array, (ABCSeries, ABCCategoricalIndex)):
        return array._values
+    elif isinstance(array, np.ndarray):


can you add a comment to the doc-string that this is only an internal method.

jreback · 2017-09-26T19:01:48Z

thanks @kernc always appreciate the nice PRs!

…16616)

kernc force-pushed the sparse-stack-unstack branch 4 times, most recently from 88656a8 to 05aefcb Compare June 7, 2017 10:32

kernc mentioned this pull request Jun 7, 2017

pandas MultiIndex series "unstack" to scipy sparse array functionality #8048

Closed

jreback requested changes Jun 8, 2017

View reviewed changes

jreback added Bug Sparse Sparse Data Type labels Jun 8, 2017

TomAugspurger added this to the 0.21.0 milestone Jun 30, 2017

jreback requested changes Jun 30, 2017

View reviewed changes

jreback removed this from the 0.21.0 milestone Jun 30, 2017

kernc force-pushed the sparse-stack-unstack branch from 05aefcb to b7b76fd Compare July 11, 2017 16:03

jreback reviewed Jul 12, 2017

View reviewed changes

jreback requested changes Jul 12, 2017

View reviewed changes

kernc force-pushed the sparse-stack-unstack branch from 45cf34d to fcba171 Compare July 22, 2017 13:55

jreback approved these changes Jul 22, 2017

View reviewed changes

jreback added this to the 0.21.0 milestone Jul 22, 2017

kernc force-pushed the sparse-stack-unstack branch from fcba171 to d235492 Compare August 11, 2017 13:29

kernc mentioned this pull request Aug 11, 2017

stack() method of SparseDataFrame should return a SparseSeries and optimize memory usage #15045

Closed

kernc added 2 commits September 23, 2017 22:55

BUG: Fix/test SparseSeries/SparseDataFrame stack/unstack

d4618d3

fix test_reshape.TestDataFrameReshape.test_unstack_preserve_dtypes

9ae1c10

kernc added 5 commits September 23, 2017 22:55

Move .unstack() logic onto BlockManager and Block

f621ea6

replace tests with pytest style

369e315

TST: move sparse stack/unstacking tests to sparse/test_reshape

a27c047

Move more code from BlockManager.unstack to Block._unstack

0acb6f3

Extend docstrings

8aad74c

kernc force-pushed the sparse-stack-unstack branch from a36c602 to 8aad74c Compare September 23, 2017 20:56

jreback approved these changes Sep 26, 2017

View reviewed changes

Mark _maybe_to_categorical() as internal

cef5f47

jreback merged commit d3be81a into pandas-dev:master Sep 26, 2017

alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017

BUG: Fix/test SparseSeries/SparseDataFrame stack/unstack (pandas-dev#…

183fd52

…16616)

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

BUG: Fix/test SparseSeries/SparseDataFrame stack/unstack (pandas-dev#…

a4b838b

…16616)

		@@ -2381,6 +2381,29 @@ def test_iloc_mi(self):
		tm.assert_frame_equal(result, expected)


		class TestSparse(object):

BUG: Fix/test SparseSeries/SparseDataFrame stack/unstack #16616

BUG: Fix/test SparseSeries/SparseDataFrame stack/unstack #16616

Conversation

kernc commented Jun 6, 2017

codecov bot commented Jun 7, 2017 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kernc Jul 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 19, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 22, 2017

jreback commented Jul 22, 2017

kernc commented Aug 11, 2017

jreback commented Sep 23, 2017

kernc commented Sep 23, 2017 • edited Loading

jreback commented Sep 23, 2017

kernc commented Sep 25, 2017

TomAugspurger commented Sep 25, 2017

kernc commented Sep 25, 2017

jreback commented Sep 26, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 26, 2017

codecov bot commented Jun 7, 2017 •

edited

Loading

kernc Jul 11, 2017 •

edited

Loading

kernc commented Sep 23, 2017 •

edited

Loading