Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixturize Test Excel #26543

Merged
merged 14 commits into from
May 30, 2019
Merged

Conversation

WillAyd
Copy link
Member

@WillAyd WillAyd commented May 28, 2019

Continued simplification of this module by moving towards pytest idiom. Here I have eliminated any test instance methods and replaced with fixtures

@WillAyd WillAyd added Testing pandas testing functions or related to the test suite IO Excel read_excel, to_excel Clean labels May 28, 2019
@pep8speaks
Copy link

pep8speaks commented May 28, 2019

Hello @WillAyd! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-29 16:39:12 UTC

@codecov
Copy link

codecov bot commented May 28, 2019

Codecov Report

Merging #26543 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26543      +/-   ##
==========================================
- Coverage   91.77%   91.76%   -0.01%     
==========================================
  Files         174      174              
  Lines       50638    50638              
==========================================
- Hits        46471    46467       -4     
- Misses       4167     4171       +4
Flag Coverage Δ
#multiple 90.3% <ø> (ø) ⬆️
#single 41.67% <ø> (-0.09%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 998a0de...fb79318. Read the comment docs.

@codecov
Copy link

codecov bot commented May 28, 2019

Codecov Report

Merging #26543 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26543      +/-   ##
==========================================
- Coverage   91.77%   91.76%   -0.01%     
==========================================
  Files         174      174              
  Lines       50649    50649              
==========================================
- Hits        46483    46479       -4     
- Misses       4166     4170       +4
Flag Coverage Δ
#multiple 90.3% <ø> (ø) ⬆️
#single 41.69% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 78.94% <0%> (-10.53%) ⬇️
pandas/core/frame.py 97% <0%> (-0.12%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a91da0c...3a5e5bb. Read the comment docs.

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

@simonjayhawkins I think looking a lot better after the monkeypatch. Let me know what you think


# FILE
localtable = os.path.join(self.dirpath, 'test1' + ext)
localtable = os.path.join(datapath("io", "data"), 'test1' + ext)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually failing with the fixturized approach because the URL requires an absolute path, or else it raises a URLError and gets skipped.

Quite a few ways to do this but I figured just reusing the datapath fixture was easiest, especially since this is pending deprecation

expected = read_excel(str_path, 'Sheet1', index_col=0)

abs_dir = os.path.abspath(self.dirpath)
path_obj = LocalPath(abs_dir).join('test1' + ext)
path_obj = LocalPath().join('test1' + ext)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the fixture changes to the data directory, this invocation without an argument resolves to there automatically without need to resolve to an absolute path

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much cleaner without the visual noise of the filepaths.

is the engine parameter of read_excel ignored when using ExcelFile? There seems to be a few tests passing if i use a bad engine in the fixture. I would have expected all the TestXlrdReader tests to fail. (except perhaps test_bad_engine_raises and similar)

it also appears to be an issue on master, so not introduced here.

can you check this out?

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

is the engine parameter of read_excel ignored when using ExcelFile?

Yes ExcelFile has it's own engine parameter. If that's passed to read_excel along with another engine argument it should probably raise or warn. I'll open that as a follow up

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

See #26566

@simonjayhawkins
Copy link
Member

Yes ExcelFile has it's own engine parameter.

so perhaps ExcelFile should be monkeypatched as well, so that the engine parameterisation applies to all tests?

@simonjayhawkins
Copy link
Member

there is basically two changes here, the working directory and the engine parameterisation, maybe they should be kept seperate.

the autouse fixture for the working directory is fine.

maybe have the engine parameterisation as a seperate fixture, not make it autouse. then

param_engine = pytest.mark.usefixtures('<fixturename>') near the top of the module

and then decorate using @param_engine on just the tests to be parametrised ( excluding test_bad_engine_raises etc)

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

A module level fixture would be great, though it's just going to take quite a few more PRs to get there, especially since a lot of the Writer test cases are strangely intertwined with reading. Left yet to go is:

  • Clean up unnecessary WriterBase subclassing (already done in a separate PR)
  • Replace instance variables with fixtures
  • Move tests that require both reading and writing out of WriterBase and into TestRoundTrip
  • Eliminate SharedItems altogether
  • Break tests into a subdirectory

Amongst potentially other things. What you described makes sense but it's going to be rather difficult to move that to the module level while the WriterBase is still subclassing SharedItems as things are convoluted and heavily intertwined where they don't need to be right now :-(

The scope it's at now reflects current state so trying to minimize movement

@simonjayhawkins
Copy link
Member

A module level fixture would be great,

i wasn't suggesting moving the fixture.

param_engine decorator defined at module level for clarity. this can be defined before the fixture since it's a pytest mark only. the fixture name is a string. but yes it could be defined at class level.

in the past where i have only a few exceptions to a autouse fixture, i created another fixture to undo the monkeypatch and apply to the exceptions. that's a bit more complicated though.

don't we now, as the PR stands, have parameterisation applied to tests that is ignored? and wasn't before and test output descriptions implies parametrisation.

@simonjayhawkins
Copy link
Member

Left yet to go is

i think the tests that pass that shouldn't are different from master. so there is a change in test behavior here. i'll go back and thoroughly double check this.

that's my concern. otherwise the changes are great.

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

param_engine decorator defined at module level for clarity

OK I think I follow. In the end we probably want separate objects for read_engine and write_engine which could be used for the various classes, and sure ultimately the combination of them applied in the TestRoundTrip class that was introduced.

don't we now, as the PR stands, have parameterisation applied to tests that is ignored?

I might not completely follow but generally I'd say no. The majority of parametrization here is for the engine, which most of the tests are using. The exceptions are tests in the WriterBase class that don't need to read in a file, but it's a mixed bag there; they don't belong in that class in any case so if we can decouple those from pure writing tests we can clean this up further, so I think that's one of the follow ups.

Let me know if I misunderstood anything though; this module definitely warrants close review!

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

i think the tests that pass that shouldn't are different from master. so there is a change in test behavior here. i'll go back and thoroughly double check this.

Let me know what you see. I was getting this on master and this branch:

======== 966 passed, 12 skipped, 3 xfailed, 2 warnings in 30.45 seconds ========

@simonjayhawkins
Copy link
Member

simonjayhawkins commented May 29, 2019

if i change the fixture

func = partial(pd.read_excel, engine=request.param)

to

func = partial(pd.read_excel, engine='foo')

i get..

$ pytest pandas/tests/io/test_excel.py::TestXlrdReader --tb=no
============================= test session starts =============================
platform win32 -- Python 3.7.3, pytest-4.5.0, py-1.7.0, pluggy-0.11.0
hypothesis profile 'ci' -> timeout=unlimited, deadline=timedelta(milliseconds=500.0), suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('C:\\Users\\simon\\OneDrive\\code\\pandas-simonjayhawkins\\.hypothesis\\examples')
rootdir: C:\Users\simon\OneDrive\code\pandas-simonjayhawkins, inifile: setup.cfg
plugins: xdist-1.28.0, mock-1.10.4, forked-0.2, cov-2.6.1, hypothesis-4.17.2
collected 360 items

pandas\tests\io\test_excel.py FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 11%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 31%]
FFFFFFFFFFFFFFFFFF..............................FFFFFFFFFFFFFFFFFFFFFFFF [ 51%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF......FFFFFF [ 71%]
ssssssFFFFFFFFFFFFFFFFFF......FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 91%]
FFFFFFFFFFFFFFFFFFFFFFFF......                                           [100%]

============================== warnings summary ===============================
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
  C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\botocore\vendored\requests\packages\urllib3\_collections.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Mapping, MutableMapping

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======== 306 failed, 48 passed, 6 skipped, 2 warnings in 27.33 seconds ========

48 tests (only 8 actual test) passing. shouldn't they fail if the parametrisation is being applied?

@simonjayhawkins
Copy link
Member

i think the tests that pass that shouldn't are different from master. so there is a change in test behavior here. i'll go back and thoroughly double check this.

Let me know what you see. I was getting this on master and this branch:

======== 966 passed, 12 skipped, 3 xfailed, 2 warnings in 30.45 seconds ========

i should have said the tests that pass, after intentionally breaking the fixture.

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

48 tests (only 8 actual test) passing. shouldn't they fail if the parametrisation is being applied?

Ha I might have this backwards but note that that particular parametrization only applies to the reading tests. There are other classes like _WriterBase and even top level tests that won't be affected by changing the partial

@simonjayhawkins
Copy link
Member

on master..

changing

new_func = partial(old_func, engine=request.param)

to

new_func = partial(old_func, engine='foo')

gives

$ pytest pandas/tests/io/test_excel.py::TestXlrdReader --tb=no
============================= test session starts =============================
platform win32 -- Python 3.7.3, pytest-4.5.0, py-1.7.0, pluggy-0.11.0
hypothesis profile 'ci' -> timeout=unlimited, deadline=timedelta(milliseconds=500.0), suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('C:\\Users\\simon\\OneDrive\\code\\pandas-simonjayhawkins\\.hypothesis\\examples')
rootdir: C:\Users\simon\OneDrive\code\pandas-simonjayhawkins, inifile: setup.cfg
plugins: xdist-1.28.0, mock-1.10.4, forked-0.2, cov-2.6.1, hypothesis-4.17.2
collected 360 items

pandas\tests\io\test_excel.py FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 11%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 31%]
FFFFFFFFFFFFFFFFFF..............................FFFFFFFFFFFFFFFFFFFFFFFF [ 51%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF............FFFFFF [ 71%]
ssssss........................FFFFFF.................................... [ 91%]
..............................                                           [100%]

============================== warnings summary ===============================
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
  C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\botocore\vendored\requests\packages\urllib3\_collections.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Mapping, MutableMapping

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======= 222 failed, 132 passed, 6 skipped, 2 warnings in 35.83 seconds ========

so the test behavior is different.

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

Great catch! But if anything I think it's indicative of an issue on master, where read_excel and pd.read_excel are used rather interchangeably but with different results.

Can you run verbose output and see what the difference in passing in? My guess is that it's function in the ReadingTestsBase which previously used pd.read_excel and may inadvertently not have gotten parametrized

@simonjayhawkins
Copy link
Member

the tests that are not observing the parametrisation in this PR are

test_excel_passes_na

test_unexpected_kwargs_raises

test_excel_table_sheet_by_index

test_bad_engine_raises

test_reader_closes_file

test_read_xlrd_book

these tests are also not failing (after intentianally breaking parameterisation) on master, so i've satisfied myself that the problem is not a new one.

@simonjayhawkins
Copy link
Member

test_bad_engine_raises shouldn't parametrised.

the other tests all use ExcelFile.

there are some other tests that use ExcelFile that aren't appearing in the list because they are tests that will need to be split (future PR). they have a read_excel preceeding the ExcelFile and not showing with this testing of the tests.

I'm happy to approve this PR on two grounds:

firstly the problems are not introduced by this PR and secondly we are only testing one engine.

before i do, do you want to see if you can easily monkeypatch ExcelFile?

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019 via email

@simonjayhawkins
Copy link
Member

i give it a go in a short while, can partial be used on the Class constructor?

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

I believe partial is only for functions not classes.

Is your question to convert get_csv_refdf into a fixture?

@simonjayhawkins
Copy link
Member

Is your question to convert get_csv_refdf into a fixture?

i was thinking we could add another monkeypatch.setattr to the current cd_and_set_engine fixture

I believe partial is only for functions not classes.

not on the init method then?

@simonjayhawkins
Copy link
Member

Great catch! But if anything I think it's indicative of an issue on master, where read_excel and pd.read_excel are used rather interchangeably but with different results.

Can you run verbose output and see what the difference in passing in? My guess is that it's function in the ReadingTestsBase which previously used pd.read_excel and may inadvertently not have gotten parametrized

on master the following tests are all using read_excel directly and not self.get_exceldf where the parameterisation was being applied. hence not being parametrised as intended.

test_excel_read_buffer
test_read_from_file_url
test_read_from_pathlib_path
test_read_from_py_localpath
test_read_excel_multiindex
test_read_excel_multiindex_header_only
test_excel_old_index_format
test_read_excel_bool_header_arg
test_read_excel_chunksize
test_read_excel_skiprows_list
test_read_excel_nrows
test_read_excel_nrows_greater_than_nrows_in_file
test_read_excel_nrows_non_integer_parameter
test_read_excel_squeeze

So this PR has fixed those issues.

@simonjayhawkins
Copy link
Member

        new_init = partialmethod(ExcelFile.__init__, engine='foo')
        monkeypatch.setattr(ExcelFile, '__init__', new_init)

is working on all the tests apart from test_read_xlrd_book (and test_bad_engine_raises), i'll keep digging

@simonjayhawkins
Copy link
Member

the non-foo version!

        new_init = partialmethod(ExcelFile.__init__, engine=request.param)
        monkeypatch.setattr(ExcelFile, '__init__', new_init)

@simonjayhawkins
Copy link
Member

is working on all the tests apart from test_read_xlrd_book

so this test is included in the parametrisation of the class, but specifies the engine explicitly as engine = "xlrd"

removing that works.

@WillAyd
Copy link
Member Author

WillAyd commented May 29, 2019

The only reader right now is xlrd. If the monkey patching you are looking at is only for the readers I think it should wait until another PR or until we actually get another reading engine implemented.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. nothing's broken that wasn't already broken. in fact this has fixed as few issues.

I do like the way partial has been used for the parametrisation. but maybe it introduces another layer of magic. I think that the current status shows that this is error prone.

so maybe at some point (not in this PR), this module should be changed to the tried and tested parameterisation method, include engine in the test function signature and just use engine=engine where required.

@jreback jreback added this to the 0.25.0 milestone May 30, 2019
@jreback
Copy link
Contributor

jreback commented May 30, 2019

lgtm. merge when ready @simonjayhawkins and @WillAyd

@simonjayhawkins simonjayhawkins merged commit 5488636 into pandas-dev:master May 30, 2019
@simonjayhawkins
Copy link
Member

thanks @WillAyd

@WillAyd WillAyd deleted the fixturize-excel-test branch May 30, 2019 01:48
@WillAyd
Copy link
Member Author

WillAyd commented May 30, 2019

Thanks for the review @simonjayhawkins !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean IO Excel read_excel, to_excel Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants