-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CsvReader cannot parse datetime 2016/1/1 0:00 in csv file. #16092
Comments
'2016/1/1 0:00' isn't a valid format. chrono expects month and day to be 2 digits so 2016/01/01. The same is true for hour. It might (but I'm not sure) also require seconds for a datetime but not sure. @MarcoGorelli do "we" want to make the parser robust to those types of discrepancies? |
2016/1/1 0:00 is the default format for saving csv in Microsoft Excel in windows Chinsese version. Chrono can parse "2016/1/1 0:00" by providing format string "%Y/%m/%d %H:%M". In polars, the alternative is to read it as a string and then parse it into datetime using lazyframe with format "%Y/%m/%d %H:%M". use polars::{lazy::dsl::StrptimeOptions, prelude::*};
fn main() {
let csv_file = "example.csv";
let df = CsvReader::from_path(csv_file).unwrap()
.has_header(true)
.with_try_parse_dates(false)
.finish().unwrap();
let options = StrptimeOptions{
format: Some("%Y/%m/%d %H:%M".to_string()),
strict: true,
.. StrptimeOptions::default()
};
let df = df.lazy().with_columns([
col("datetime")
.str()
.to_datetime(None, None, options, lit(""))
]).collect().unwrap();
println!("{}", df);
} |
I think it is impossible to infer all date/datetime formats out there. BUT imo it should be possible to pass in a date/datetime format to This feature is accepted but needs implementation 😉 #9550 |
Actually, polars-time has a very complete date/datetime pattern list, see polars-time-0.39.2\src\chunkedarray\string\infer.rs pub(super) static DATETIME_Y_M_D: &[&str] = &[
// ---
// ISO8601-like, generated via the `iso8601_format_datetime` test fixture
// ---
"%Y/%m/%dT%H:%M:%S",
"%Y-%m-%dT%H:%M:%S",
"%Y/%m/%dT%H%M%S",
"%Y-%m-%dT%H%M%S",
"%Y/%m/%dT%H:%M",
"%Y-%m-%dT%H:%M",
"%Y/%m/%dT%H%M",
"%Y-%m-%dT%H%M",
"%Y/%m/%dT%H:%M:%S.%9f",
"%Y-%m-%dT%H:%M:%S.%9f",
"%Y/%m/%dT%H:%M:%S.%6f",
"%Y-%m-%dT%H:%M:%S.%6f",
"%Y/%m/%dT%H:%M:%S.%3f",
"%Y-%m-%dT%H:%M:%S.%3f",
"%Y/%m/%dT%H%M%S.%9f",
"%Y-%m-%dT%H%M%S.%9f",
"%Y/%m/%dT%H%M%S.%6f",
"%Y-%m-%dT%H%M%S.%6f",
"%Y/%m/%dT%H%M%S.%3f",
"%Y-%m-%dT%H%M%S.%3f",
"%Y/%m/%d",
"%Y-%m-%d",
"%Y/%m/%d %H:%M:%S",
"%Y-%m-%d %H:%M:%S",
"%Y/%m/%d %H%M%S",
"%Y-%m-%d %H%M%S",
"%Y/%m/%d %H:%M",
"%Y-%m-%d %H:%M",
"%Y/%m/%d %H%M",
"%Y-%m-%d %H%M",
"%Y/%m/%d %H:%M:%S.%9f",
"%Y-%m-%d %H:%M:%S.%9f",
"%Y/%m/%d %H:%M:%S.%6f",
"%Y-%m-%d %H:%M:%S.%6f",
"%Y/%m/%d %H:%M:%S.%3f",
"%Y-%m-%d %H:%M:%S.%3f",
"%Y/%m/%d %H%M%S.%9f",
"%Y-%m-%d %H%M%S.%9f",
"%Y/%m/%d %H%M%S.%6f",
"%Y-%m-%d %H%M%S.%6f",
"%Y/%m/%d %H%M%S.%3f",
"%Y-%m-%d %H%M%S.%3f",
// ---
// other
// ---
// we cannot know this one, because polars needs to know
// the length of the parsed fmt
// ---
"%FT%H:%M:%S%.f",
]; |
I think the issue is here polars/crates/polars-time/src/chunkedarray/string/infer.rs Lines 40 to 64 in 1195f85
The pattern could be adjusted so that hour, minute, and second are allowed to only be 1-digit long if there's a separator |
You are right. There are three regex expressions, DATETIME_DMY_PATTERN, DATETIME_YMD_PATTERN, DATETIME_YMDZ_PATTERN. Change hour (line 24, 50, 74), minute (line 26, 52, 76), second (line 29, 55, 79) to allow 1-2 digits fixed the problem. Month and day are already allowed to have 1-2 digits. (?:\d{1,2}) # hour
(?:\d{1,2}) # minute
(?:\d{1,2}) # second |
So is there any remaining issue here? |
I think someone would just need to make a PR I'm busy with some other things right now, and given that the workaround here to read as string then then call |
Checks
Reproducible example
contents of example.csv:
datetime
2016/1/1 0:00
Log output
Issue description
2016/1/1 00:00 works, 2016/1/1 0:00 fails.
Expected behavior
CsvReader can parse datetime 2016/1/1 0:00
Installed versions
[dependencies]
polars = "0.39.2"
The text was updated successfully, but these errors were encountered: