BUG: Series.replace not working if there are duplicated columns #45316

arnau126 · 2022-01-11T16:50:12Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
I have confirmed this bug exists on 1.4.0rc0.

Reproducible Example

import pandas as pd

df = pd.DataFrame([['a', 'x', 'o'], ['b', 'y', 'p'], ['c', 'z', 'q']], columns=['F', 'F', 'G'])
df['G'].replace('p', 'NEW', inplace=True)
print(df)

Issue Description

When the DataFrame has duplicated columns names, the Series.replace with inplace=True of a not-duplicated column does not work (it does not perform the replacement) and raises a warning:

/home/arnau/.virtualenvs/lr/lib/python3.10/site-packages/pandas/core/generic.py:6619: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return self._update_inplace(result)

Some examples:

# doesn't work
df = pd.DataFrame([['a', 'x', 'o'], ['b', 'y', 'p'], ['c', 'z', 'q']], columns=['F', 'F', 'G'])
df['G'].replace('p', 'NEW', inplace=True)

# do work (because not inplace)
df = pd.DataFrame([['a', 'x', 'o'], ['b', 'y', 'p'], ['c', 'z', 'q']], columns=['F', 'F', 'G'])
df['G'] = df['G'].replace('p', 'NEW')

# do work (because there is no duplicated column)
df = pd.DataFrame([['a', 'x', 'o'], ['b', 'y', 'p'], ['c', 'z', 'q']], columns=['F', 'H', 'G'])
df['G'].replace('p', 'NEW', inplace=True)

Expected Behavior

The replacement being done, and without any warning.

Installed Versions

INSTALLED VERSIONS

commit : 66e3805
python : 3.10.0.final.0
python-bits : 64
OS : Linux
OS-release : 5.11.0-43-generic
Version : #47~20.04.2-Ubuntu SMP Mon Dec 13 11:06:56 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.5
numpy : 1.21.2
pytz : 2021.1
dateutil : 2.8.2
pip : 21.3.1
setuptools : 44.1.1
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.7.1
html5lib : None
pymysql : 0.9.3
psycopg2 : 2.9.1 (dt dec pq3 ext lo64)
jinja2 : 3.0.3
IPython : 7.28.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.3
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.2
sqlalchemy : 1.2.17
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.55.0rc1

The text was updated successfully, but these errors were encountered:

phofl · 2022-01-12T01:02:19Z

This is unrelated to replace, it is an Indexing issue.

The following should modify the df:

df = pd.DataFrame([['a', 'x', 'o'], ['b', 'y', 'p'], ['c', 'z', 'q']], columns=['F', 'F', 'G'])
x = df["G"]
x.loc[:] = "aaaa"

But does not if the original df has duplicated columns

arnau126 · 2022-01-12T08:26:38Z

You're right, it's unrelated to replace.

If the original df has duplicated columns, when you select a column you're getting a copy of it. This may be the actual issue.

df = pd.DataFrame([['a', 'x', 'o'], ['b', 'y', 'p'], ['c', 'z', 'q']], columns=['F', 'F', 'G'])
print(df['G'] is df['G'])  # output: False

df = pd.DataFrame([['a', 'x', 'o'], ['b', 'y', 'p'], ['c', 'z', 'q']], columns=['F', 'H', 'G'])
print(df['G'] is df['G'])  # output: True

arnau126 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 11, 2022

phofl added Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 12, 2022

jbrockmendel added the replace replace method label Jan 12, 2022

phofl mentioned this issue Jan 21, 2022

BUG: df.getitem returning copy instead of view for unique column in dup index #45526

Merged

5 tasks

jreback added this to the 1.5 milestone Jan 28, 2022

jreback closed this as completed in #45526 Jun 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Series.replace not working if there are duplicated columns #45316

BUG: Series.replace not working if there are duplicated columns #45316

arnau126 commented Jan 11, 2022 •

edited

Loading

INSTALLED VERSIONS

phofl commented Jan 12, 2022

arnau126 commented Jan 12, 2022

BUG: Series.replace not working if there are duplicated columns #45316

BUG: Series.replace not working if there are duplicated columns #45316

Comments

arnau126 commented Jan 11, 2022 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

phofl commented Jan 12, 2022

arnau126 commented Jan 12, 2022

arnau126 commented Jan 11, 2022 •

edited

Loading