Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' #35973

Merged
merged 37 commits into from
Nov 4, 2020

Conversation

attack68
Copy link
Contributor

@attack68 attack68 commented Aug 29, 2020

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Currently timezones raise a NotImplementedError when using the read_json(orient='table') method.

This PR aims to fix what I believe is a fairly common request (numerous workarounds and questions exist on StackOverflow).

The PR aims to reconstitute DataFrames via json columns with timezones, Index with timezones or MultiIndex with timezones and/or combinations.

@pep8speaks
Copy link

pep8speaks commented Aug 29, 2020

Hello @attack68! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-10-31 19:15:33 UTC

@attack68 attack68 changed the title ENH: implement timeszones support for DataFrame.to_json(orient='table') ENH: implement timeszones support for to_json(orient='table') read_json(orient='table') Aug 29, 2020
@attack68 attack68 changed the title ENH: implement timeszones support for to_json(orient='table') read_json(orient='table') ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' Sep 2, 2020
@attack68 attack68 requested a review from mroeschke September 2, 2020 07:11
@attack68 attack68 requested a review from mroeschke September 2, 2020 20:16
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Could you merge master in 1 more time? I think the build failures are unrelated

@attack68 attack68 requested a review from mroeschke September 6, 2020 20:42
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Maybe merge master one more time. I've tried restarting the failing job, but the failure is unrelated to your change

@attack68
Copy link
Contributor Author

attack68 commented Sep 7, 2020

I have retried the merge, but I agree my changes do not seem to be the factor.

@mroeschke mroeschke added this to the 1.2 milestone Sep 12, 2020
@mroeschke mroeschke added Timezones Timezone data dtype IO JSON read_json, to_json, json_normalize labels Sep 12, 2020
@mroeschke
Copy link
Member

@attack68 could you merge master one more time? I've tried to restart that failing built multiple times but for some reason its still segfaulting

@attack68
Copy link
Contributor Author

There was a comment referencing a github issue that discussed a strange seg fault in python 3.7.0. The solution was to patch out the test. I have tried the same solution this time...

@jreback
Copy link
Contributor

jreback commented Sep 13, 2020

cc @jbrockmendel

@jreback jreback removed this from the 1.2 milestone Sep 13, 2020
@jreback
Copy link
Contributor

jreback commented Sep 13, 2020

the tz change needs review itself, this needs to be a separate PR. don't couple changes like this directly in the same PR, makes for harder review. I guess since you did it already its ok. Though this may block the json changes (which i am not sure actually need this tz change). So splitting might be a good option.

@attack68
Copy link
Contributor Author

the tz change needs review itself, this needs to be a separate PR. don't couple changes like this directly in the same PR, makes for harder review. I guess since you did it already its ok. Though this may block the json changes (which i am not sure actually need this tz change). So splitting might be a good option.

This PR originally started as just a json fix but @mroeschke suggested the better and more general fix was via astype(), so it just evolved and was re-written. The JSON part was left in since it was a descendant and the tests would have needed to be corrected anyway.

I'll look at the other comments

@attack68 attack68 requested a review from jreback October 1, 2020 12:57
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of comments, pls merge master and ping on green.

@@ -120,6 +120,10 @@ Other enhancements
- ``Styler`` now allows direct CSS class name addition to individual data cells (:issue:`36159`)
- :meth:`Rolling.mean()` and :meth:`Rolling.sum()` use Kahan summation to calculate the mean to avoid numerical problems (:issue:`10319`, :issue:`11645`, :issue:`13254`, :issue:`32761`, :issue:`36031`)
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`)
- :meth:`to_json` now implements timezones parsing when orient structure is 'table'.
- :meth:`read_json` now implements timezones parsing when orient structure is 'table'.
- :meth:`astype` now attempts to convert to 'datetime64[ns, tz]' directly from 'object' with inferred timezone from string (:issue:`35973`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to datetimelike bug fix

expected = DataFrame(val)

result = expected.astype({"tz": "datetime64[ns, Europe/Berlin]"})
expected["tz"] = expected["tz"].dt.tz_convert("Europe/Berlin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this expected change to L595 as it goes with it

def test_astype_tz_conversion(self):
# GH 35973
val = {"tz": date_range("2020-08-30", freq="d", periods=2, tz="Europe/London")}
expected = DataFrame(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this df

@jreback
Copy link
Contributor

jreback commented Oct 2, 2020

cc @jbrockmendel @WillAyd ok here?

@@ -120,6 +120,10 @@ Other enhancements
- ``Styler`` now allows direct CSS class name addition to individual data cells (:issue:`36159`)
- :meth:`Rolling.mean()` and :meth:`Rolling.sum()` use Kahan summation to calculate the mean to avoid numerical problems (:issue:`10319`, :issue:`11645`, :issue:`13254`, :issue:`32761`, :issue:`36031`)
- :meth:`DatetimeIndex.searchsorted`, :meth:`TimedeltaIndex.searchsorted`, :meth:`PeriodIndex.searchsorted`, and :meth:`Series.searchsorted` with datetimelike dtypes will now try to cast string arguments (listlike and scalar) to the matching datetimelike type (:issue:`36346`)
- :meth:`to_json` now implements timezones parsing when orient structure is 'table'.
- :meth:`read_json` now implements timezones parsing when orient structure is 'table'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate note now.

@attack68
Copy link
Contributor Author

attack68 commented Oct 9, 2020

@jreback pinging.. this is green except for unrelated test failure in 'pandas/tests/plotting/test_frame.py' on py38_np18.

@attack68 attack68 requested a review from jreback October 19, 2020 07:37
@attack68
Copy link
Contributor Author

merged again and all green now

@attack68
Copy link
Contributor Author

@jreback @jbrockmendel I believe this is in a state where can go in now..

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm.

@jreback jreback merged commit d378852 into pandas-dev:master Nov 4, 2020
@jreback
Copy link
Contributor

jreback commented Nov 4, 2020

thanks @attack68

jreback added a commit that referenced this pull request Nov 13, 2020
… (#37655)

* Moving the file test_frame.py to a new directory

* Сreated file test_frame_color.py

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP 8 fixes

* Transfer tests

of test_frame.py
to test_frame_groupby.py and test_frame_subplots.py

* Removing unnecessary imports

* PEP 8 fixes

* Fixed class name

* Transfer tests

of test_frame.py
to test_frame_subplots.py

* Transfer tests

of test_frame.py
to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py

* Changed class names

* Removed unnecessary imports

* Removed import

* catch FutureWarnings (#37587)

* TST/REF: collect indexing tests by method (#37590)

* REF: prelims for single-path setitem_with_indexer (#37588)

* ENH: __repr__ for 2D DTA/TDA (#37164)

* CLN: de-duplicate _validate_where_value with _validate_setitem_value (#37595)

* TST/REF: collect tests by method (#37589)

* TST/REF: move remaining setitem tests from test_timeseries

* TST/REF: rehome test_timezones test

* move misplaced arithmetic test

* collect tests by method

* move misplaced file

* REF: Categorical.is_dtype_equal -> categories_match_up_to_permutation (#37545)

* CLN refactor non-core (#37580)

* refactor core/computation (#37585)

* TST/REF: share method tests between DataFrame and Series (#37596)

* BUG: Index.where casting ints to str (#37591)

* REF: IntervalArray comparisons (#37124)

* regression fix for merging DF with datetime index with empty DF (#36897)

* ERR: fix error message in Period for invalid frequency (#37602)

* CLN: remove rebox_native (#37608)

* TST/REF: tests.generic (#37618)

* TST: collect tests by method (#37617)

* TST/REF: collect test_timeseries tests by method

* misplaced DataFrame.values tst

* misplaced dataframe.values test

* collect test by method

* TST/REF: share tests across Series/DataFrame (#37616)

* Gh 36562 typeerror comparison not supported between float and str (#37096)

* docs: fix punctuation (#37612)

* REGR: pd.to_hdf(..., dropna=True) not dropping missing rows (#37564)

* parametrize set_axis tests (#37619)

* CLN: clean color selection in _matplotlib/style (#37203)

* DEPR: DataFrame/Series.slice_shift (#37601)

* REF: re-use validate_setitem_value in Categorical.fillna (#37597)

* PERF: release gil for ewma_time (#37389)

* BUG: Groupy dropped nan groups from result when grouping over single column (#36842)

* ENH: implement timeszones support for read_json(orient='table') and astype() from 'object' (#35973)

* REF/BUG/TYP: read_csv shouldn't close user-provided file handles (#36997)

* BUG/REF: read_csv shouldn't close user-provided file handles

* get_handle: typing, returns is_wrapped, use dataclass, and make sure that all created handlers are returned

* remove unused imports

* added IOHandleArgs.close

* added IOArgs.close

* mostly comments

* move memory_map from TextReader to CParserWrapper

* moved IOArgs and IOHandles

* more comments

Co-authored-by: Jeff Reback <[email protected]>

* more typing checks to pre-commit (#37539)

* TST: 32bit dtype compat test_groupby_dropna (#37623)

* BUG: Metadata propagation for groupby iterator (#37461)

* BUG: read-only values in cython funcs (#37613)

* CLN refactor core/arrays (#37581)

* Fixed Metadata Propogation in DataFrame (#37381)

* TYP: add Shape alias to pandas._typing (#37128)

* DOC: Fix typo (#37630)

* CLN: parametrize test_nat_comparisons (#37195)

* dataframe dataclass docstring updated (#37632)

* refactor core/groupby (#37583)

* BUG: set index of DataFrame.apply(f) when f returns dict (#37544) (#37606)

* BUG: to_dict should return a native datetime object for NumPy backed dataframes (#37571)

* ENH: memory_map for compressed files (#37621)

* DOC: add example & prose of slicing with labels when index has duplicate labels  (#36814)

* DOC: add example & prose of slicing with labels when index has duplicate labels #36251

* DOC: proofread the sentence.

Co-authored-by: Jun Kudo <[email protected]>

* DOC: Fix typo (#37636)

"columns(s)" sounded odd, I believe it was supposed to be just "column(s)".

* CI: troubleshoot win py38 builds (#37652)

* TST/REF: collect indexing tests by method (#37638)

* TST/REF: collect tests for get_numeric_data (#37634)

* misplaced loc test

* TST/REF: collect get_numeric_data tests

* REF: de-duplicate _validate_insert_value with _validate_scalar (#37640)

* CI: catch windows py38 OSError (#37659)

* share test (#37679)

* TST: match matplotlib warning message (#37666)

* TST: match matplotlib warning message

* TST: match full message

* pd.Series.loc.__getitem__ promotes to float64 instead of raising KeyError (#37687)

* REF/TST: misplaced Categorical tests (#37678)

* REF/TST: collect indexing tests by method (#37677)

* CLN: only call _wrap_results one place in nanmedian (#37673)

* TYP: Index._concat (#37671)

* BUG: CategoricalIndex.equals casting non-categories to np.nan (#37667)

* CLN: _replace_single (#37683)

* REF: simplify _replace_single by noting regex kwarg is bool

* Annotate

* CLN: remove never-False convert kwarg

* TYP: make more internal funcs keyword-only (#37688)

* REF: make Series._replace_single a regular method (#37691)

* REF: simplify cycling through colors (#37664)

* REF: implement _wrap_reduction_result (#37660)

* BUG: preserve fold in Timestamp.replace (#37644)

* CLN: Clean indexing tests (#37689)

* TST: fix warning for pie chart (#37669)

* PERF: reverted change from commit 7d257c6 to solve issue #37081 (#37426)

* DataFrameGroupby.boxplot fails when subplots=False (#28102)

* ENH: Improve error reporting for wrong merge cols (#37547)

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP8

* Fixes for linter

* Сhange pd.DateFrame to DateFrame

* Move inconsistent namespace check to pre-commit, fixup more files (#37662)

* check for inconsistent namespace usage

* doc

* typos

* verbose regex

* use verbose flag

* use verbose flag

* match both directions

* add test

* don't import annotations from future

* update extra couple of cases

* 🚚 rename

* typing

* don't use subprocess

* don't type tests

* use pathlib

* REF: simplify NDFrame.replace, ObjectBlock.replace (#37704)

* REF: implement Categorical.encode_with_my_categories (#37650)

* REF: implement Categorical.encode_with_my_categories

* privatize

* BUG: unpickling modifies Block.ndim (#37657)

* REF: dont support dt64tz in nanmean (#37658)

* CLN: Simplify groupby head/tail tests (#37702)

* Bug in loc raised for numeric label even when label is in Index (#37675)

* REF: implement replace_regex, remove unreachable branch in ObjectBlock.replace (#37696)

* TYP: Check untyped defs (except vendored) (#37556)

* REF: remove ObjectBlock._replace_single (#37710)

* Transfer tests
of test_frame.py
to test_frame_color.py

* TST/REF: collect indexing tests by method (#37590)

* PEP8

* Сhange DateFrame to pd.DateFrame

* Сhange pd.DateFrame to DateFrame

* Removing imports

* Bug fixes

* Bug fixes

* Fix incorrect merge

* test_frame_color.py edit

* Transfer tests

of test_frame.py
to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py

* Removing unnecessary imports

* PEP8

* # Conflicts:
#	pandas/tests/plotting/frame/test_frame.py
#	pandas/tests/plotting/frame/test_frame_color.py
#	pandas/tests/plotting/frame/test_frame_subplots.py

* Moving the file test_frame.py to a new directory

* Transfer tests

of test_frame.py
to test_frame_color.py, test_frame_groupby.py and test_frame_subplots.py

* Removing unnecessary imports

* PEP8

* CLN: clean categorical indexes tests (#37721)

* Fix merge error

* PEP 8 fixes

* Fix merge error

* Removing unnecessary imports

* PEP 8 fixes

* Fixed class name

* Transfer tests

of test_frame.py
to test_frame_subplots.py

* Transfer tests

of test_frame.py
to test_frame_groupby.py, test_frame_subplots.py, test_frame_color.py

* Changed class names

* Removed unnecessary imports

* Removed import

* TST/REF: collect indexing tests by method (#37590)

* TST: match matplotlib warning message (#37666)

* TST: match matplotlib warning message

* TST: match full message

* TST: fix warning for pie chart (#37669)

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP8

* Fixes for linter

* Сhange pd.DateFrame to DateFrame

* Transfer tests
of test_frame.py
to test_frame_color.py

* PEP8

* Сhange DateFrame to pd.DateFrame

* Сhange pd.DateFrame to DateFrame

* Removing imports

* Bug fixes

* Bug fixes

* Fix incorrect merge

* test_frame_color.py edit

* Fix merge error

* Fix merge error

* Removing unnecessary features

* Resolving Commit Conflicts daf999f 365d843

* black fix

Co-authored-by: jbrockmendel <[email protected]>
Co-authored-by: Marco Gorelli <[email protected]>
Co-authored-by: Philip Cerles <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Sven <[email protected]>
Co-authored-by: Micael Jarniac <[email protected]>
Co-authored-by: Andrew Wieteska <[email protected]>
Co-authored-by: Maxim Ivanov <[email protected]>
Co-authored-by: Erfan Nariman <[email protected]>
Co-authored-by: Fangchen Li <[email protected]>
Co-authored-by: patrick <[email protected]>
Co-authored-by: attack68 <[email protected]>
Co-authored-by: Torsten Wörtwein <[email protected]>
Co-authored-by: Jeff Reback <[email protected]>
Co-authored-by: Janus <[email protected]>
Co-authored-by: Joel Whittier <[email protected]>
Co-authored-by: taytzehao <[email protected]>
Co-authored-by: ma3da <[email protected]>
Co-authored-by: junk <[email protected]>
Co-authored-by: Jun Kudo <[email protected]>
Co-authored-by: Alex Kirko <[email protected]>
Co-authored-by: Yassir Karroum <[email protected]>
Co-authored-by: Kaiqi Dong <[email protected]>
Co-authored-by: Richard Shadrach <[email protected]>
Co-authored-by: Simon Hawkins <[email protected]>
@attack68 attack68 deleted the enh_timezones_json branch November 22, 2020 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants