Fixturize Test Excel #26543

WillAyd · 2019-05-28T06:55:14Z

Continued simplification of this module by moving towards pytest idiom. Here I have eliminated any test instance methods and replaced with fixtures

pep8speaks · 2019-05-28T06:55:18Z

Hello @WillAyd! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-29 16:39:12 UTC

codecov · 2019-05-28T07:33:51Z

Codecov Report

Merging #26543 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26543      +/-   ##
==========================================
- Coverage   91.77%   91.76%   -0.01%     
==========================================
  Files         174      174              
  Lines       50638    50638              
==========================================
- Hits        46471    46467       -4     
- Misses       4167     4171       +4

Flag	Coverage Δ
#multiple	`90.3% <ø> (ø)`	⬆️
#single	`41.67% <ø> (-0.09%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 998a0de...fb79318. Read the comment docs.

codecov · 2019-05-28T07:33:53Z

Codecov Report

Merging #26543 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #26543      +/-   ##
==========================================
- Coverage   91.77%   91.76%   -0.01%     
==========================================
  Files         174      174              
  Lines       50649    50649              
==========================================
- Hits        46483    46479       -4     
- Misses       4166     4170       +4

Flag	Coverage Δ
#multiple	`90.3% <ø> (ø)`	⬆️
#single	`41.69% <ø> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a91da0c...3a5e5bb. Read the comment docs.

pandas/tests/io/test_excel.py

WillAyd · 2019-05-29T03:19:56Z

@simonjayhawkins I think looking a lot better after the monkeypatch. Let me know what you think

WillAyd · 2019-05-29T03:21:39Z

pandas/tests/io/test_excel.py


        # FILE
-        localtable = os.path.join(self.dirpath, 'test1' + ext)
+        localtable = os.path.join(datapath("io", "data"), 'test1' + ext)


This was actually failing with the fixturized approach because the URL requires an absolute path, or else it raises a URLError and gets skipped.

Quite a few ways to do this but I figured just reusing the datapath fixture was easiest, especially since this is pending deprecation

WillAyd · 2019-05-29T03:22:25Z

pandas/tests/io/test_excel.py

        expected = read_excel(str_path, 'Sheet1', index_col=0)

-        abs_dir = os.path.abspath(self.dirpath)
-        path_obj = LocalPath(abs_dir).join('test1' + ext)
+        path_obj = LocalPath().join('test1' + ext)


Given the fixture changes to the data directory, this invocation without an argument resolves to there automatically without need to resolve to an absolute path

pandas/tests/io/test_excel.py

simonjayhawkins

much cleaner without the visual noise of the filepaths.

is the engine parameter of read_excel ignored when using ExcelFile? There seems to be a few tests passing if i use a bad engine in the fixture. I would have expected all the TestXlrdReader tests to fail. (except perhaps test_bad_engine_raises and similar)

it also appears to be an issue on master, so not introduced here.

can you check this out?

WillAyd · 2019-05-29T19:45:17Z

is the engine parameter of read_excel ignored when using ExcelFile?

Yes ExcelFile has it's own engine parameter. If that's passed to read_excel along with another engine argument it should probably raise or warn. I'll open that as a follow up

WillAyd · 2019-05-29T19:52:06Z

See #26566

simonjayhawkins · 2019-05-29T19:59:17Z

Yes ExcelFile has it's own engine parameter.

so perhaps ExcelFile should be monkeypatched as well, so that the engine parameterisation applies to all tests?

simonjayhawkins · 2019-05-29T20:22:20Z

there is basically two changes here, the working directory and the engine parameterisation, maybe they should be kept seperate.

the autouse fixture for the working directory is fine.

maybe have the engine parameterisation as a seperate fixture, not make it autouse. then

param_engine = pytest.mark.usefixtures('<fixturename>') near the top of the module

and then decorate using @param_engine on just the tests to be parametrised ( excluding test_bad_engine_raises etc)

WillAyd · 2019-05-29T20:37:55Z

A module level fixture would be great, though it's just going to take quite a few more PRs to get there, especially since a lot of the Writer test cases are strangely intertwined with reading. Left yet to go is:

Clean up unnecessary WriterBase subclassing (already done in a separate PR)
Replace instance variables with fixtures
Move tests that require both reading and writing out of WriterBase and into TestRoundTrip
Eliminate SharedItems altogether
Break tests into a subdirectory

Amongst potentially other things. What you described makes sense but it's going to be rather difficult to move that to the module level while the WriterBase is still subclassing SharedItems as things are convoluted and heavily intertwined where they don't need to be right now :-(

The scope it's at now reflects current state so trying to minimize movement

simonjayhawkins · 2019-05-29T20:45:14Z

A module level fixture would be great,

i wasn't suggesting moving the fixture.

param_engine decorator defined at module level for clarity. this can be defined before the fixture since it's a pytest mark only. the fixture name is a string. but yes it could be defined at class level.

in the past where i have only a few exceptions to a autouse fixture, i created another fixture to undo the monkeypatch and apply to the exceptions. that's a bit more complicated though.

don't we now, as the PR stands, have parameterisation applied to tests that is ignored? and wasn't before and test output descriptions implies parametrisation.

simonjayhawkins · 2019-05-29T20:50:52Z

Left yet to go is

i think the tests that pass that shouldn't are different from master. so there is a change in test behavior here. i'll go back and thoroughly double check this.

that's my concern. otherwise the changes are great.

WillAyd · 2019-05-29T20:55:54Z

param_engine decorator defined at module level for clarity

OK I think I follow. In the end we probably want separate objects for read_engine and write_engine which could be used for the various classes, and sure ultimately the combination of them applied in the TestRoundTrip class that was introduced.

don't we now, as the PR stands, have parameterisation applied to tests that is ignored?

I might not completely follow but generally I'd say no. The majority of parametrization here is for the engine, which most of the tests are using. The exceptions are tests in the WriterBase class that don't need to read in a file, but it's a mixed bag there; they don't belong in that class in any case so if we can decouple those from pure writing tests we can clean this up further, so I think that's one of the follow ups.

Let me know if I misunderstood anything though; this module definitely warrants close review!

WillAyd · 2019-05-29T21:01:08Z

i think the tests that pass that shouldn't are different from master. so there is a change in test behavior here. i'll go back and thoroughly double check this.

Let me know what you see. I was getting this on master and this branch:

======== 966 passed, 12 skipped, 3 xfailed, 2 warnings in 30.45 seconds ========

simonjayhawkins · 2019-05-29T21:07:27Z

if i change the fixture

func = partial(pd.read_excel, engine=request.param)

to

func = partial(pd.read_excel, engine='foo')

i get..

$ pytest pandas/tests/io/test_excel.py::TestXlrdReader --tb=no
============================= test session starts =============================
platform win32 -- Python 3.7.3, pytest-4.5.0, py-1.7.0, pluggy-0.11.0
hypothesis profile 'ci' -> timeout=unlimited, deadline=timedelta(milliseconds=500.0), suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('C:\\Users\\simon\\OneDrive\\code\\pandas-simonjayhawkins\\.hypothesis\\examples')
rootdir: C:\Users\simon\OneDrive\code\pandas-simonjayhawkins, inifile: setup.cfg
plugins: xdist-1.28.0, mock-1.10.4, forked-0.2, cov-2.6.1, hypothesis-4.17.2
collected 360 items

pandas\tests\io\test_excel.py FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 11%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 31%]
FFFFFFFFFFFFFFFFFF..............................FFFFFFFFFFFFFFFFFFFFFFFF [ 51%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF......FFFFFF [ 71%]
ssssssFFFFFFFFFFFFFFFFFF......FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 91%]
FFFFFFFFFFFFFFFFFFFFFFFF......                                           [100%]

============================== warnings summary ===============================
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
  C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\botocore\vendored\requests\packages\urllib3\_collections.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Mapping, MutableMapping

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======== 306 failed, 48 passed, 6 skipped, 2 warnings in 27.33 seconds ========

48 tests (only 8 actual test) passing. shouldn't they fail if the parametrisation is being applied?

simonjayhawkins · 2019-05-29T21:13:25Z

i think the tests that pass that shouldn't are different from master. so there is a change in test behavior here. i'll go back and thoroughly double check this.

Let me know what you see. I was getting this on master and this branch:
======== 966 passed, 12 skipped, 3 xfailed, 2 warnings in 30.45 seconds ========

i should have said the tests that pass, after intentionally breaking the fixture.

WillAyd · 2019-05-29T21:18:41Z

48 tests (only 8 actual test) passing. shouldn't they fail if the parametrisation is being applied?

Ha I might have this backwards but note that that particular parametrization only applies to the reading tests. There are other classes like _WriterBase and even top level tests that won't be affected by changing the partial

simonjayhawkins · 2019-05-29T21:28:15Z

on master..

changing

new_func = partial(old_func, engine=request.param)

to

new_func = partial(old_func, engine='foo')

gives

$ pytest pandas/tests/io/test_excel.py::TestXlrdReader --tb=no
============================= test session starts =============================
platform win32 -- Python 3.7.3, pytest-4.5.0, py-1.7.0, pluggy-0.11.0
hypothesis profile 'ci' -> timeout=unlimited, deadline=timedelta(milliseconds=500.0), suppress_health_check=[HealthCheck.too_slow], database=DirectoryBasedExampleDatabase('C:\\Users\\simon\\OneDrive\\code\\pandas-simonjayhawkins\\.hypothesis\\examples')
rootdir: C:\Users\simon\OneDrive\code\pandas-simonjayhawkins, inifile: setup.cfg
plugins: xdist-1.28.0, mock-1.10.4, forked-0.2, cov-2.6.1, hypothesis-4.17.2
collected 360 items

pandas\tests\io\test_excel.py FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 11%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 31%]
FFFFFFFFFFFFFFFFFF..............................FFFFFFFFFFFFFFFFFFFFFFFF [ 51%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF............FFFFFF [ 71%]
ssssss........................FFFFFF.................................... [ 91%]
..............................                                           [100%]

============================== warnings summary ===============================
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
pandas/tests/io/test_excel.py::TestXlrdReader::test_usecols_int[xlrd-.xls]
  C:\Users\simon\Anaconda3\envs\pandas-dev\lib\site-packages\botocore\vendored\requests\packages\urllib3\_collections.py:1: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
    from collections import Mapping, MutableMapping

-- Docs: https://docs.pytest.org/en/latest/warnings.html
======= 222 failed, 132 passed, 6 skipped, 2 warnings in 35.83 seconds ========

so the test behavior is different.

WillAyd · 2019-05-29T21:41:42Z

Great catch! But if anything I think it's indicative of an issue on master, where read_excel and pd.read_excel are used rather interchangeably but with different results.

Can you run verbose output and see what the difference in passing in? My guess is that it's function in the ReadingTestsBase which previously used pd.read_excel and may inadvertently not have gotten parametrized

simonjayhawkins · 2019-05-29T21:48:58Z

the tests that are not observing the parametrisation in this PR are

test_excel_passes_na

test_unexpected_kwargs_raises

test_excel_table_sheet_by_index

test_bad_engine_raises

test_reader_closes_file

test_read_xlrd_book

these tests are also not failing (after intentianally breaking parameterisation) on master, so i've satisfied myself that the problem is not a new one.

simonjayhawkins · 2019-05-29T22:07:13Z

test_bad_engine_raises shouldn't parametrised.

the other tests all use ExcelFile.

there are some other tests that use ExcelFile that aren't appearing in the list because they are tests that will need to be split (future PR). they have a read_excel preceeding the ExcelFile and not showing with this testing of the tests.

I'm happy to approve this PR on two grounds:

firstly the problems are not introduced by this PR and secondly we are only testing one engine.

before i do, do you want to see if you can easily monkeypatch ExcelFile?

WillAyd · 2019-05-29T22:23:05Z

I don’t quite understand what you are asking for with the monkeypatch of ExcelFile - is that trying to solve a problem or expand coverage?

…

Sent from my iPhone

On May 29, 2019, at 3:07 PM, Simon Hawkins ***@***.***> wrote: test_bad_engine_raises shouldn't parametrised. the other tests all use ExcelFile. there are some other tests that use ExcelFile that aren't appearing in the list because they are tests that will need to be split (future PR). they have a read_excel preceeding the ExcelFile and not showing with this testing of the tests. I'm happy to approve this PR on two grounds: firstly the problems are not introduced by this PR and secondly we are only testing one engine. before i do, do you want to see if you can easily monkeypatch ExcelFile? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

simonjayhawkins · 2019-05-29T22:27:42Z

i give it a go in a short while, can partial be used on the Class constructor?

WillAyd · 2019-05-29T22:30:09Z

I believe partial is only for functions not classes.

Is your question to convert get_csv_refdf into a fixture?

simonjayhawkins · 2019-05-29T22:33:01Z

Is your question to convert get_csv_refdf into a fixture?

i was thinking we could add another monkeypatch.setattr to the current cd_and_set_engine fixture

I believe partial is only for functions not classes.

not on the init method then?

simonjayhawkins · 2019-05-29T22:59:10Z

Great catch! But if anything I think it's indicative of an issue on master, where read_excel and pd.read_excel are used rather interchangeably but with different results.

Can you run verbose output and see what the difference in passing in? My guess is that it's function in the ReadingTestsBase which previously used pd.read_excel and may inadvertently not have gotten parametrized

on master the following tests are all using read_excel directly and not self.get_exceldf where the parameterisation was being applied. hence not being parametrised as intended.

test_excel_read_buffer
test_read_from_file_url
test_read_from_pathlib_path
test_read_from_py_localpath
test_read_excel_multiindex
test_read_excel_multiindex_header_only
test_excel_old_index_format
test_read_excel_bool_header_arg
test_read_excel_chunksize
test_read_excel_skiprows_list
test_read_excel_nrows
test_read_excel_nrows_greater_than_nrows_in_file
test_read_excel_nrows_non_integer_parameter
test_read_excel_squeeze

So this PR has fixed those issues.

simonjayhawkins · 2019-05-29T23:22:59Z

        new_init = partialmethod(ExcelFile.__init__, engine='foo')
        monkeypatch.setattr(ExcelFile, '__init__', new_init)

is working on all the tests apart from test_read_xlrd_book (and test_bad_engine_raises), i'll keep digging

simonjayhawkins · 2019-05-29T23:24:24Z

the non-foo version!

        new_init = partialmethod(ExcelFile.__init__, engine=request.param)
        monkeypatch.setattr(ExcelFile, '__init__', new_init)

simonjayhawkins · 2019-05-29T23:43:01Z

is working on all the tests apart from test_read_xlrd_book

so this test is included in the parametrisation of the class, but specifies the engine explicitly as engine = "xlrd"

removing that works.

WillAyd · 2019-05-29T23:55:20Z

The only reader right now is xlrd. If the monkey patching you are looking at is only for the readers I think it should wait until another PR or until we actually get another reading engine implemented.

simonjayhawkins

ok. nothing's broken that wasn't already broken. in fact this has fixed as few issues.

I do like the way partial has been used for the parametrisation. but maybe it introduces another layer of magic. I think that the current status shows that this is error prone.

so maybe at some point (not in this PR), this module should be changed to the tried and tested parameterisation method, include engine in the test function signature and just use engine=engine where required.

jreback · 2019-05-30T00:57:49Z

lgtm. merge when ready @simonjayhawkins and @WillAyd

simonjayhawkins · 2019-05-30T01:02:38Z

thanks @WillAyd

WillAyd · 2019-05-30T01:48:43Z

Thanks for the review @simonjayhawkins !

WillAyd added 4 commits May 27, 2019 23:23

Removed datapath fixture

b1fdb28

Removed get_excelfile method

bb67002

Removed get_exceldf

91b889d

Replaced get_csv_defref with fixture

afcf08b

WillAyd added Testing pandas testing functions or related to the test suite IO Excel read_excel, to_excel Clean labels May 28, 2019

lint fixup

fb79318

WillAyd added 2 commits May 28, 2019 07:54

lint fix

4b61fc3

Merge remote-tracking branch 'upstream/master' into fixturize-excel-test

77cfdcc

simonjayhawkins reviewed May 28, 2019

View reviewed changes

pandas/tests/io/test_excel.py Outdated Show resolved Hide resolved

simonjayhawkins reviewed May 28, 2019

View reviewed changes

pandas/tests/io/test_excel.py Outdated Show resolved Hide resolved

WillAyd added 3 commits May 28, 2019 20:06

Simplified fixtures with monkeypatch

7c0ce3a

Reverted skipped file_url test

489dd1d

Changed docstring

c86b881

WillAyd commented May 29, 2019

View reviewed changes

simonjayhawkins reviewed May 29, 2019

View reviewed changes

pandas/tests/io/test_excel.py Outdated Show resolved Hide resolved

WillAyd added 4 commits May 29, 2019 09:25

Removed monkeypatch context manager

32b3751

Merge remote-tracking branch 'upstream/master' into fixturize-excel-test

60f9c65

Monkeypatched read_excel in pd namespace

da5a147

lint fixup

3a5e5bb

simonjayhawkins requested changes May 29, 2019

View reviewed changes

simonjayhawkins approved these changes May 30, 2019

View reviewed changes

jreback added this to the 0.25.0 milestone May 30, 2019

simonjayhawkins merged commit 5488636 into pandas-dev:master May 30, 2019

WillAyd deleted the fixturize-excel-test branch May 30, 2019 01:48

simonjayhawkins mentioned this pull request Jun 11, 2019

Add more pytest idiom to io.excel testing #26784

Closed

13 tasks

ladyyvii mentioned this pull request Oct 2, 2019

get rid of the None engine tests and just have a separate test that ensures how that gets bound to a particular engine #26662 (comment) #28749

Closed

5 tasks

Fixturize Test Excel #26543

Fixturize Test Excel #26543

Conversation

WillAyd commented May 28, 2019

pep8speaks commented May 28, 2019 • edited Loading

Comment last updated at 2019-05-29 16:39:12 UTC

codecov bot commented May 28, 2019

Codecov Report

codecov bot commented May 28, 2019 • edited Loading

Codecov Report

WillAyd commented May 29, 2019

WillAyd May 29, 2019

Choose a reason for hiding this comment

WillAyd May 29, 2019

Choose a reason for hiding this comment

simonjayhawkins left a comment

Choose a reason for hiding this comment

WillAyd commented May 29, 2019

WillAyd commented May 29, 2019

simonjayhawkins commented May 29, 2019

simonjayhawkins commented May 29, 2019

WillAyd commented May 29, 2019

simonjayhawkins commented May 29, 2019

simonjayhawkins commented May 29, 2019

WillAyd commented May 29, 2019

WillAyd commented May 29, 2019

simonjayhawkins commented May 29, 2019 • edited Loading

simonjayhawkins commented May 29, 2019

WillAyd commented May 29, 2019

simonjayhawkins commented May 29, 2019

WillAyd commented May 29, 2019

simonjayhawkins commented May 29, 2019

simonjayhawkins commented May 29, 2019

WillAyd commented May 29, 2019 via email

simonjayhawkins commented May 29, 2019

WillAyd commented May 29, 2019

simonjayhawkins commented May 29, 2019

simonjayhawkins commented May 29, 2019

simonjayhawkins commented May 29, 2019

simonjayhawkins commented May 29, 2019

simonjayhawkins commented May 29, 2019

WillAyd commented May 29, 2019

simonjayhawkins left a comment

Choose a reason for hiding this comment

jreback commented May 30, 2019

simonjayhawkins commented May 30, 2019

WillAyd commented May 30, 2019

pep8speaks commented May 28, 2019 •

edited

Loading

codecov bot commented May 28, 2019 •

edited

Loading

simonjayhawkins commented May 29, 2019 •

edited

Loading