Skip to content

fix: Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 #22180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

MarcoGorelli
Copy link
Collaborator

@MarcoGorelli MarcoGorelli commented Apr 8, 2025

closes #22167

Exploring this further, as this nominally fixes the issue but isn't the end of the story done, much happier with the current fix

@github-actions github-actions bot added fix Bug fix python Related to Python Polars rust Related to Rust Polars labels Apr 8, 2025
@MarcoGorelli MarcoGorelli force-pushed the fix-date-parse-8-digits-leading-zero branch from 7321a8f to 211bc05 Compare April 8, 2025 16:38
Copy link

codecov bot commented Apr 8, 2025

Codecov Report

Attention: Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.

Project coverage is 81.15%. Comparing base (19ae9ec) to head (4f51c2b).
Report is 28 commits behind head on main.

Files with missing lines Patch % Lines
...es/polars-time/src/chunkedarray/string/strptime.rs 71.42% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #22180      +/-   ##
==========================================
- Coverage   81.16%   81.15%   -0.01%     
==========================================
  Files        1638     1638              
  Lines      235247   235253       +6     
  Branches     2714     2714              
==========================================
- Hits       190931   190924       -7     
- Misses      43676    43688      +12     
- Partials      640      641       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kdn36
Copy link
Contributor

kdn36 commented Apr 9, 2025

FWIW - I am not familiar with the history or detail, so consider the following observation with caution.

Observation: the proposed change (from .%9f to %.9f or even %.f) will make the subsequent .%6f and .%3f inaccessible for any exhaustive search that looks for the first match as it will match for any number of digits. To be confirmed. If true, dropping the 3f/6f patterns may present an optimization improvement, as long as some conditions are met (search logic, parsing only).

This is from looking into the performance overhead of datetime schema inference not too long ago, which is a very expensive effort in the current implementation. Something worth considering.

@MarcoGorelli
Copy link
Collaborator Author

Yup, thanks

The underlying issue is here: #22167 (comment)

looking into it to see what we can do

@MarcoGorelli MarcoGorelli force-pushed the fix-date-parse-8-digits-leading-zero branch from 211bc05 to c4e4e06 Compare April 10, 2025 18:46
@MarcoGorelli MarcoGorelli changed the title fix: Datetime parsing was not consistent with python datetime.strptime when input had 8 digits and leading zero fix: Datetime parser was incorrectly parsing 8-digit fractional seconds when format specified to expect 9 Apr 10, 2025
Comment on lines -26 to +30
if parsed == 0 {
None
if parsed != incr {
panic!(
"Invariant when calling StrpTimeState.parse was not upheld. Expected {} parsed digits, got {}.",
incr, parsed
);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok to panic here? if we get here, it's a sign that something is seriously wrong, and it would've managed to avoid the bug pointed out in #22167 (comment)

In [21]: pl.Series(['2025-04-06T18:57:42.77756192Z']).str.to_datetime('%Y-%m-%dT%H:%M:%S.%9fcabbagebananapotato')
Out[21]:
shape: (1,)
Series: '' [datetime[ns]]
[
        2025-04-06 18:57:42.077756192
]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is an implementation error on our end, we can panic.

@MarcoGorelli MarcoGorelli marked this pull request as ready for review April 10, 2025 19:52
@MarcoGorelli
Copy link
Collaborator Author

looking into it to see what we can do

OK, resolved:

  • StrpTimeState.parse was being reached when its preconditions didn't hold
  • fixed fmt_len so that those StrpTimeState.parse wouldn't be reached in such an invalid state
  • made StrpTimeState.parse panic if it's executed in an invalid state

@ritchie46 ritchie46 merged commit 4329e14 into pola-rs:main Apr 15, 2025
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFC3339 Datetime(s) get mangled by CsvReader
3 participants