Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locales #453

Merged
merged 68 commits into from
Jul 31, 2020
Merged

Locales #453

merged 68 commits into from
Jul 31, 2020

Conversation

cecton
Copy link
Contributor

@cecton cecton commented Jul 11, 2020

Closes #199
Closes #153

I made a pure Rust lib without any dependency that has the locales from glibc. It does not depend on glibc. It's using the locale data files from glibc.

% This file is part of the GNU C Library and contains locale data.
% The Free Software Foundation does not claim any copyright interest
% in the locale data contained in this file.  The foregoing does not
% affect the license of the GNU C Library as a whole.  It does not
% exempt you from the conditions of the license if your use would
% otherwise be governed by that license.

I contacted the FSF and they seem to allow the use of these data:

In short, we believe these files aren't copyrightable; for more background,
please check https://sourceware.org/ml/libc-locales/2013-q1/msg00048.html

--
I am not a lawyer, this is not a legal advice.

The PR is a draft, I'm waiting for your feedback before finishing it.

Copy link
Contributor

@quodlibetor quodlibetor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think that this is great!

I'm not super familiar with localization in general. From looking at other programming languages it does seem standard to hook this directly into the datetime formatting, instead of providing an api that something like fluent could hook into. Honestly I have no idea what that would look like, do you have thoughts? Is it worth us pinging them to ask for input?

Cargo.toml Outdated
@@ -24,11 +24,12 @@ appveyor = { repository = "chronotope/chrono" }
name = "chrono"

[features]
default = ["clock", "std"]
default = ["clock", "std", "locales"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this should be a default feature, but I do think that it should be called out in the README as a major feature, there isn't a clear section for features right now, I'll add one and make sure that this is prominent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Cargo.toml Outdated
@@ -38,6 +39,7 @@ num-integer = { version = "0.1.36", default-features = false }
num-traits = { version = "0.2", default-features = false }
rustc-serialize = { version = "0.3.20", optional = true }
serde = { version = "1.0.99", default-features = false, optional = true }
pure-rust-locales = "0.2.1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be optional = true and locals = ["pure-rust-locals"] above in order for everything to work correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 440 to 450
locale: &str,
) -> FormatResult<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should create a new fn format_item_localized function to preserve backwards compatibility. I also think that for backcompat the default locale should be "en_US" or "en_GB".

That also allows us to return the improved FormatResult from that function, but not change the error type here (which is also backwards-incompatible)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes POSIX makes more sense! Aggreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! More or less... I did duplicate some function but I broke the compatibility since I changed the result fmt::Result to FormatResult

@quodlibetor
Copy link
Contributor

Also, unrelated to my willingness to merge this (I am happy about it!), I glanced at pure-rust-locales and it depends on nom at build time, which will unfortunately, I believe, meaningfully contribute to compilation time even though it doesn't affect final library size. It is a little complicated whether that counts as "without any dependency".

@quodlibetor quodlibetor changed the base branch from master to main July 11, 2020 17:56
@cecton
Copy link
Contributor Author

cecton commented Jul 11, 2020

I glanced at pure-rust-locales and it depends on nom at build time, which will unfortunately, I believe, meaningfully contribute to compilation time even though it doesn't affect final library size. It is a little complicated whether that counts as "without any dependency".

Thanks! 😁 Yes I have a bit exaggerated on that. The data file for the locales come directly from glibc and I made pure-rust-locales for the sole purpose of offering these const with the locales. Therefore the data files are converted to a rust file during the build but it doesn't add nom as a dependency of the whole project (only as a build dependency).

The only faster+lighter alternative would be to publish the library with the file already generated. I'm not sure if there is an automatic way to do that...

I'm not super familiar with localization in general. From looking at other programming languages it does seem standard to hook this directly into the datetime formatting, instead of providing an api that something like fluent could hook into. Honestly I have no idea what that would look like, do you have thoughts? Is it worth us pinging them to ask for input?

I don't know if fluent is going to provide a date and time formatting function. Usually those libraries are more focused on the translations. But I'm not sure, it's worth to ask. I know for example that in Python the date and time formatting depends on libc. I actually made a Rust crate specialized for that: https://crates.io/crates/libc-strftime (doing the same for chrono would be annoying as it adds a dependency to libc and the behavior is not consistent across all platforms).

The approach I have here is very minimalist and will allow the user to either use a default locale implementation consistent across platforms or using fluent (or anything specialized) instead. I believe I can even let the user select at build time (using feature flags "en_US", "en_GB", etc...) what languages they want to support which should reduce the size of the binary.

@quodlibetor
Copy link
Contributor

I agree with your points! I'll be happy to merge this with my requests applied.

cecton added 5 commits July 12, 2020 12:42
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
Parent branch: chronotope/main
Forked at: aaee912
@cecton
Copy link
Contributor Author

cecton commented Jul 12, 2020

 error: expected identifier, found keyword `crate`
 --> /Users/runner/.cargo/registry/src/github.jparrowsec.cn-1ecc6299db9ec823/ryu-1.0.5/src/buffer/mod.rs:1:5
  |
1 | use crate::raw;
  |     ^^^^^

@quodlibetor the macos build is failing... do you have a requirement to support older versions of Rust?

@quodlibetor
Copy link
Contributor

do you have a requirement to support older versions of Rust?

Yes, but not with all features: it is expected to not try to build or test the locales feature on 1.13.0.

@quodlibetor
Copy link
Contributor

In terms of 1.13.0, if you fix this comment the tests should not try to build on 1.13.0 (there's an explicit allowlist of features that we test on 1.13.0 in ci/github.sh)

cecton added 3 commits July 12, 2020 18:08
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
src/format/mod.rs Outdated Show resolved Hide resolved
cecton added 4 commits July 12, 2020 19:01
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
@quodlibetor
Copy link
Contributor

Are you waiting on me for further action on this one @cecton , or is it still wip?

cecton added 3 commits July 17, 2020 18:17
Forked at: aaee912
Parent branch: chronotope/main
Forked at: aaee912
Parent branch: chronotope/main
@quodlibetor
Copy link
Contributor

This looks great! I've updated the feature name to unstable-locales until we have more experience with it. Hopefully this will make it easier to update the API in pure-rust-locales without forcing chrono to issue a breaking change.

Relatedly, how likely do you think it is that pure-rust-locales will require further breaking changes? It looks like it's had several, which makes sense, but do you have a sense on if it's converging on something that feels future-proof or is more experimentation expected? No real rush with the feature named unstable-locales, but releasing a 1.0 version will make me more comfortable renaming the feature to just locales.

@quodlibetor quodlibetor merged commit 8d6d83f into chronotope:main Jul 31, 2020
@quodlibetor
Copy link
Contributor

🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉 🎉

Thank you so much for this!

@ammgws
Copy link
Contributor

ammgws commented Jul 31, 2020

On phone at the moment but if I wanted to use the system's locale (i.e. LC_TIME from /etc/locale.conf), I would need to do the work to grab that info separately to use with the new _localized functions?

@quodlibetor
Copy link
Contributor

if I wanted to use the system's locale (i.e. LC_TIME from /etc/locale.conf), I would need to do the work to grab that info separately to use with the new _localized functions?

Correct. I'm open to PRs (especially cross-platform PRs) that make that easier, but probably the right place for that functionality would be in pure-rust-locales.

@cecton
Copy link
Contributor Author

cecton commented Jul 31, 2020

On phone at the moment but if I wanted to use the system's locale (i.e. LC_TIME from /etc/locale.conf), I would need to do the work to grab that info separately to use with the new _localized functions?

@ammgws Yes I'm afraid.

This is something that I could add in pure-rust-locales but I bet it will vary a lot per platform and environment. It would be better to do it in another crate.

The strong advantage of importing the locale data as we did is to not depend on libc, so you can have locales even with musl. (Also it's 100% consistent across platforms obviously.)

But the big inconvenient is that it is totally separate from existing mechanisms for localization. So yeah if you want to use the locales of the process user you need to get them first.

On the bright side, you can convert a &str like "fr_BE" to Locale:fr_BE (and all the others) easily.

@cecton
Copy link
Contributor Author

cecton commented Jul 31, 2020

Relatedly, how likely do you think it is that pure-rust-locales will require further breaking changes? It looks like it's had several, which makes sense, but do you have a sense on if it's converging on something that feels future-proof or is more experimentation expected? No real rush with the feature named unstable-locales, but releasing a 1.0 version will make me more comfortable renaming the feature to just locales.

@quodlibetor Initially it should have been a single commit. Most of the changes come from its first use here in chrono. I imagine it is possible that others could ask new features but the format of the API shouldn't change. The data will also evolve a bit (I see that the last change is from 2019-05-08, so every year I guess).

@cecton cecton deleted the locales branch July 31, 2020 18:40
@6A61736F6E206E61646572
Copy link

@cecton Do you have an example of deserialising a string to a Locale? The enum had try_from but not FromStr

@cecton
Copy link
Contributor Author

cecton commented Aug 12, 2020

@ohk2kt3t4 hmmm 🤔 I should probably have use FromStr instead of TryFrom. I'm not sure what's the difference tbh. Do you have an issue with TryFrom?

The basic usage is:

use std::convert::TryInto; // make sure this trait is in scope

let locale: Locale = "fr_BE".try_into()?;

If you have a String:

let locale_string: String = "fr_BE".to_string();
let locale: Locale = locale_string.as_str().try_into()?;

Correction: TryInto must be in scope.

@cecton
Copy link
Contributor Author

cecton commented Aug 12, 2020

If I understand correctly: FromStr is just the old one. I don't think I will make a release for such a minor issue. I'm not even sure it is an issue at all. The API of from_str is not more convenient (you still need FromStr in scope).

The feature locales of chrono is not even meant to be used on older versions of Rust. It requires stable.

If I make a release of pure-rust-locales I might just add it but I have really no reason to touch it for now. I'm open to any argument that would convince me otherwise.

ghost pushed a commit to Dirout/dokkoo that referenced this pull request Nov 9, 2020
Support for locales beyond en_US is now supported (locales pulled from glibc, see: https://sources.debian.org/src/glibc/2.31-4/localedata/locales/) experimentally (see also: chronotope/chrono#453)
emmyoh pushed a commit to Dirout/dokkoo that referenced this pull request Jan 7, 2022
Support for locales beyond en_US is now supported (locales pulled from glibc, see: https://sources.debian.org/src/glibc/2.31-4/localedata/locales/) experimentally (see also: chronotope/chrono#453)
ghost pushed a commit to Dirout/dokkoo that referenced this pull request Jan 8, 2022
Support for locales beyond en_US is now supported (locales pulled from glibc, see: https://sources.debian.org/src/glibc/2.31-4/localedata/locales/) experimentally (see also: chronotope/chrono#453)
pickfire pushed a commit to pickfire/chrono that referenced this pull request Jan 22, 2022
@leontoeides
Copy link

leontoeides commented Apr 24, 2022

I was really excited to discover this format_localized feature. I didn't realize it existed. It appears only to be implemented for DateTime and not for NaiveDateTime, NaiveDate, NaiveTime. I'm not familiar with this source to chrono so I'm not qualified to make any changes, but it looks like a matter of copying and pasting with a few tweaks (?)

@cecton
Copy link
Contributor Author

cecton commented Apr 25, 2022

I was really excited to discover this format_localized feature. I didn't realize it existed. It appears only to be implemented for DateTime and not for NaiveDateTime, NaiveDate, NaiveTime. I'm not familiar with this source to chrono so I'm not qualified to make any changes, but it looks like a matter of copying and pasting with a few tweaks (?)

strftime does have notion of timezone. You can display it with %Z/%z.

If you do have a timezone unaware object (Naive*) and want to use strftime, you need to attach a timezone onto it even if you don't use it.

I'm not a library maintainer but I don't personally see much value into adding strftime to Naive* types because: 1. it's kinda breaking (what do you do when the user provides %z?) and 2. it can be easily workedaround by using the not naive types.

@djc
Copy link
Member

djc commented Apr 25, 2022

I'm not familiar with this source to chrono so I'm not qualified to make any changes, but it looks like a matter of copying and pasting with a few tweaks (?)

This isn't quite how chrono development works. I'm happy to provide you with some pointers if you have a hard time making sense of the source code, but it is unlikely that I'll be motivated to go and make these changes myself.

(But @cecton makes a good point that it might not make sense anyway.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Localized date/time Add support for locales in format
6 participants