-
-
Notifications
You must be signed in to change notification settings - Fork 18.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numpy.datetime64 casted to Timestamp when added to DataFrame #17183
Comments
@strazdinsg : Thanks for reporting this! For starters, I should point out that In fact, if you pass in Given the unusual nature of the frequency, I'm hesitant to classify as a bug, though we should be able to better handle such an input than silently "break" like this I think. |
there is an open issue about this already |
much more idiomatic to simply do:
numpy in general is not very friendly to any kind of non-standard datetimes. That said pandas should actually convert these non-ns dtypes. |
Thanks for an elegant solution @jreback ! |
can we convert this to general timestamp... I am not able to do that and when i am applying groupby on this time stamp its not working..I am not even able to sort my columns and i am having millions of observation:- 2018-08-26T14:05:31.000Z I will really appreciate your help @jreback |
This issue is not limited to exotic types, e.g. it applies to datetime64[D], and it has nothing to do with formatting. The issue appears to be that the resolution changes to ns when a numpy array gets stored in a DataFrame or Series. Here is an example: make [D] arrayIn [194]: x = np.array(['2018-01-03'], dtype='datetime64[D]') put it into pd.Series and extract the valuesIn [195]: y = pd.Series(x).values note the dtype has changed to [ns]In [196]: y.dtype turns out x and y arrays are not interchangeable. This creates subtle bugs in cython code.In [197]: x.view('int') == y.view('int') |
@jeybrahms pandas only supports nanosecond-precision datetimes currently. |
Problem description
I am having a list of timestamps, with millisecond accuracy, encoded as strings. Then I round them to 10ms resolution, that goes well. The bug comes when I add the rounded timestamps to DataFrame as a new column - the values of datetime64 objects get totally destroyed. My suspicion - the numpy.datetime64 is converted to some other datatype in the DataFrame.assign() method. It should maintain the same type.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.20.1
pytest: 3.0.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
xarray: None
IPython: 5.3.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.2.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.3
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.9
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: