BUG: to_latex: Problems with MultiIndex / Mixed Index #16718

wlter · 2017-06-18T14:45:41Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np


df = pd.DataFrame()


for val0 in [3,2,1]:
  for val1 in range(val0):
    r0,r1 = np.random.uniform(size = 2)
    df = df.append({"val0" : val0,\
                          "val1":val1, 
                          "r0":r0,\
                          "r1":r1}, ignore_index=True)

print(df.to_latex())
print(df.to_latex(columns = ["r0","r1"],header = ["R0","R1"]))
  
df.set_index(['val0','val1'], inplace=True)

print(df.to_latex())
print(df.to_latex(columns = ["r0","r1"],header = ["R0","R1"]))

produces:

\begin{tabular}{lrrrr}
	\toprule
	{} &        r0 &        r1 &  val0 &  val1 \\
	\midrule
	0 &  0.631918 &  0.445245 &   3.0 &   0.0 \\
	1 &  0.418763 &  0.011685 &   3.0 &   1.0 \\
	2 &  0.742899 &  0.384768 &   3.0 &   2.0 \\
	3 &  0.275884 &  0.849828 &   2.0 &   0.0 \\
	4 &  0.371853 &  0.782338 &   2.0 &   1.0 \\
	5 &  0.763312 &  0.358247 &   1.0 &   0.0 \\
	\bottomrule
\end{tabular}

\begin{tabular}{lrr}
	\toprule
	{} &        R0 &        R1 \\
	\midrule
	0 &  0.631918 &  0.445245 \\
	1 &  0.418763 &  0.011685 \\
	2 &  0.742899 &  0.384768 \\
	3 &  0.275884 &  0.849828 \\
	4 &  0.371853 &  0.782338 \\
	5 &  0.763312 &  0.358247 \\
	\bottomrule
\end{tabular}

\begin{tabular}{llrr}
	\toprule
	&     &        r0 &        r1 \\
	val0 & val1 &           &           \\
	\midrule
	3.0 & 0.0 &  0.631918 &  0.445245 \\
	& 1.0 &  0.418763 &  0.011685 \\
	& 2.0 &  0.742899 &  0.384768 \\
	2.0 & 0.0 &  0.275884 &  0.849828 \\
	& 1.0 &  0.371853 &  0.782338 \\
	1.0 & 0.0 &  0.763312 &  0.358247 \\
	\bottomrule
\end{tabular}

\begin{tabular}{llrr}
	\toprule
	&     &        R0 &        R1 \\
	val0 & val1 &  0.631918 &  0.445245 \\
	\midrule
	3.0 & 0.0 &  0.418763 &  0.011685 \\
	& 1.0 &  0.742899 &  0.384768 \\
	& 2.0 &  0.275884 &  0.849828 \\
	2.0 & 0.0 &  0.371853 &  0.782338 \\
	& 1.0 &  0.763312 &  0.358247 \\
	\bottomrule
\end{tabular}

Problem description

Hi,

just found out about the to_latex-functionality. Thumbs up for that! Unfortunately I seem to have stumbled upon a bug - or I'm not getting multiindices right.

I'd like to use to_latex with a multiindex-array, such that the hierarchical parameters don't repeat in the rows. The example code produces 4 tabular sections of latex code. The first two, without the multi-index seem okay. The 3rd, with "multiindex = True" starts to introduce an unnecessary shift in the header names. The multi-index headers are a level below the others. When using the "header" option with this (4th block), the data gets shifted and the last row of the multiindex is missing completely.

It actually seems to be a problem with index mixing. If the index is created over all columns (df.set_index(['val0','val1',"r0","r1"], inplace=True)), then the table is complete again - there's however on more blank row above all header names.

So, this might be a bug... but I'd also accept that incomplete multi-indices are just not the way to go...

Expected Output

all header names on the same level
multiindex and non-multiindex results not shifted
all parameters of the multiindex visible

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 32 OS: Windows OS-release: 7 machine: AMD64 processor: AMD64 Family 16 Model 4 Stepping 3, AuthenticAMD byteorder: little LC_ALL: None LANG: en LOCALE: None.None

pandas: 0.20.2
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

toobaz · 2018-03-06T00:21:19Z

The 3rd, with "multiindex = True" starts to introduce an unnecessary shift in the header names. The multi-index headers are a level below the others.

I think this is a feature, not a bug. Not just because it looks clearer (otherwise you wouldn't distinguish index levels from columns), but also for coherence with the case in which columns also have a name.

When using the "header" option with this (4th block), the data gets shifted and the last row of the multiindex is missing completely.

This clearly looks like a bug.

sgsaenger · 2018-12-03T10:59:06Z

@toobaz correctly explained the third case.

The 4th case is indeed a bug, but is caused in DataFrameFormatter._to_str_columns which is the source of the initial string representation. Compare the .to_string that also uses _to_str_columns method and shows similar behavior:

print(df.to_string(header=['R0','R1'])

                 R0        R1
val0 val1  0.872907  0.932874
3.0  0.0   0.729201  0.343559
     1.0   0.819604  0.447002
     2.0   0.419192  0.076606
2.0  0.0   0.955415  0.317518
     1.0   0.141231  0.875158
1.0  0.0

Also affects to_latex midrule position Tests added for both to_string and to_latex Whatsnew added for v0.25.0

* master: (22 commits) Fixturize tests/frame/test_operators.py (pandas-dev#25641) Update ValueError message in corr (pandas-dev#25729) DOC: fix some grammar and inconsistency issues in the User Guide (pandas-dev#25728) ENH: Add public start, stop, and step attributes to RangeIndex (pandas-dev#25720) Make Rolling.apply documentation clearer (pandas-dev#25712) pandas-dev#25707 - Fixed flakiness in stata write test (pandas-dev#25714) Json normalize nan support (pandas-dev#25619) TST: resolve issues with test_constructor_dtype_datetime64 (pandas-dev#24868) DEPR: Deprecate box kwarg for to_timedelta and to_datetime (pandas-dev#24486) BUG: Preserve name in DatetimeIndex.snap (pandas-dev#25585) Fix concat not respecting order of OrderedDict (pandas-dev#25224) CLN: remove pandas.core.categorical (pandas-dev#25655) TST/CLN: Remove more Panel tests (pandas-dev#25675) Pinned pycodestyle (pandas-dev#25701) DOC: update date of 0.24.2 release notes (pandas-dev#25699) BUG: Fix error in replace with strings that are large numbers (pandas-dev#25616) (pandas-dev#25644) BUG: fix usage of na_sentinel with sort=True in factorize() (pandas-dev#25592) BUG: Fix to_string output when using header (pandas-dev#16718) (pandas-dev#25602) CLN: Remove unused test code (pandas-dev#25670) CLN: remove Panel from concat error message (pandas-dev#25676) ... # Conflicts: # doc/source/whatsnew/v0.25.0.rst

rdturnermtl mentioned this issue Nov 16, 2017

to_latex bug for midrule location #18326

Closed

jbrockmendel added the IO LaTeX to_latex label Jul 25, 2018

dickreuter mentioned this issue Jul 27, 2018

fix for TypeError: unorderable types" in when using set_index with multiple column names #22072

Merged

gfyoung added Bug MultiIndex labels Jul 27, 2018

tomneep mentioned this issue Mar 8, 2019

BUG: Fix to_string output when using header keyword arg (#16718) #25602

Merged

4 tasks

TomAugspurger closed this as completed in #25602 Mar 12, 2019

TomAugspurger pushed a commit that referenced this issue Mar 12, 2019

BUG: Fix to_string output when using header (#16718) (#25602)

5c341dc

Also affects to_latex midrule position Tests added for both to_string and to_latex Whatsnew added for v0.25.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_latex: Problems with MultiIndex / Mixed Index #16718

BUG: to_latex: Problems with MultiIndex / Mixed Index #16718

wlter commented Jun 18, 2017

toobaz commented Mar 6, 2018

sgsaenger commented Dec 3, 2018 •

edited

Loading

BUG: to_latex: Problems with MultiIndex / Mixed Index #16718

BUG: to_latex: Problems with MultiIndex / Mixed Index #16718

Comments

wlter commented Jun 18, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

toobaz commented Mar 6, 2018

sgsaenger commented Dec 3, 2018 • edited Loading

Output of `pd.show_versions()`

sgsaenger commented Dec 3, 2018 •

edited

Loading