Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: revamp null count supression for large frames in df.info() #5974

Merged
1 commit merged into from Jan 16, 2014
Merged

ENH: revamp null count supression for large frames in df.info() #5974

1 commit merged into from Jan 16, 2014

Conversation

ghost
Copy link

@ghost ghost commented Jan 16, 2014

#5550 deprecated options.display.max_info_rows, but df.info is still there

for the user to invoke and the null count can be very slow.

Un-deprecte the option and revamp df.info to do the right thing.

Now that @cpcloud add per column dtypes it will always show them,
and just supress the counts if needed, where previously if max_info_rows was
exceeded, it didn't even print the column names.

In [4]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1000000 entries, C00410118 to C00431445
Data columns (total 18 columns):
cmte_id              1000000 non-null object
cand_id              1000000 non-null object
cand_nm              1000000 non-null object
contbr_nm            999975 non-null object
contbr_city          1000000 non-null object
contbr_st            999850 non-null object
contbr_zip           992087 non-null object
contbr_employer      994533 non-null object
contbr_occupation    1000000 non-null float64
contb_receipt_amt    1000000 non-null object
contb_receipt_dt     18038 non-null object
receipt_desc         383960 non-null object
memo_cd              391290 non-null object
memo_text            1000000 non-null object
form_tp              1000000 non-null int64
file_num             1000000 non-null object
tran_id              999998 non-null object
election_tp          0 non-null float64
dtypes: float64(2), int64(1), object(15)

In [5]: pd.options.display.max_info_rows
Out[5]: 1690785

In [6]: pd.options.display.max_info_rows=999999

In [7]: df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1000000 entries, C00410118 to C00431445
Data columns (total 18 columns):
cmte_id              object
cand_id              object
cand_nm              object
contbr_nm            object
contbr_city          object
contbr_st            object
contbr_zip           object
contbr_employer      object
contbr_occupation    float64
contb_receipt_amt    object
contb_receipt_dt     object
receipt_desc         object
memo_cd              object
memo_text            object
form_tp              int64
file_num             object
tran_id              object
election_tp          float64

@ghost
Copy link
Author

ghost commented Jan 16, 2014

Can I get a sanity check from someone?

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

I actually like the non-count

maybe as a percentage after dtype?

object (0% not null)
float64 (99.5 % not null)

?

@ghost
Copy link
Author

ghost commented Jan 16, 2014

Unrelated but sure. should probably have both, percentage makes sense for large frames,
exact count meaningful in other situations. I'll merge and you're welcome to adapt further.

ghost pushed a commit that referenced this pull request Jan 16, 2014
ENH: revamp null count supression for large frames in df.info()
@ghost ghost merged commit 63ca307 into pandas-dev:master Jan 16, 2014
@ghost ghost deleted the PR_info_max_info_rows branch January 16, 2014 18:06
@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

@y-p test error in >
pandas/tests/test_frame.py(6047)test_info_wide()

@ghost
Copy link
Author

ghost commented Jan 16, 2014

working on it. hasty me.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

their is an error with dups on py3....i'll fix it in a PR I am workign...

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

jreback@ba0061c

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

@y-p didn't realize you already fixed this!

@ghost
Copy link
Author

ghost commented Jan 16, 2014

I broke it, I fixed it.
Feel free to tell me off whenever I break the build. bad habit.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2014

haha!

@jreback
Copy link
Contributor

jreback commented Jan 30, 2014

maybe want to add a little blurb nj whatsnew about this and how to turn back in null counts?

@ghost
Copy link
Author

ghost commented Jan 30, 2014

I did include an entry in release.rst. 08770c1 adds it to whatnew as well.

Anything else needed?

@jreback
Copy link
Contributor

jreback commented Jan 30, 2014

looks good....its just a change that I users will see up front, so want it to be visible

@ghost
Copy link
Author

ghost commented Jan 30, 2014

good thinking. You're right.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant