Skip to content

Implement exponential year precision #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cogat opened this issue May 22, 2017 · 12 comments
Closed

Implement exponential year precision #12

cogat opened this issue May 22, 2017 · 12 comments

Comments

@cogat
Copy link
Contributor

cogat commented May 22, 2017

EDTFs of the form y17101e4p3 mean "Some year between 171000000 and 171999999, estimated to be 171010000 ('p3' indicates a precision of 3 significant digits.)"

At the moment, the p value is ignored, and lower_ and upper_ values are identical, being just the base times 10 to the exponent. The lower and upper bounds should vary by the indicated precision.

@cogat cogat changed the title Implement exponential year behaviour Implement exponential year precision May 22, 2017
@ColeDCrawford
Copy link
Contributor

ColeDCrawford commented May 13, 2024

@aweakley just wanted to check on this test:

# Some year between 171010000 and 171999999, estimated to be 171010000 ('S3' indicates a precision of 3 significant digits.)
# TODO Not yet implemented, see https://github.com/ixc/python-edtf/issues/12
# ('Y17101E4S3', ('171010000-01-01', '171999999-12-31')),

The year should be 'Y17101E4S3' in EDTF, or 171010000. Should the bounds be '171000000-01-01' (just 3 digits) to '171999999-12-31', given that the precision is only 3 significant digits? The EDTF docs say:

Example 2 ‘Y171010000S3’
some year between 171000000 and 171999999 estimated to be 171010000

I think that this should be the same whether E is used to shorten the date or not?

@aweakley
Copy link
Member

I agree, I think it should be the same with or without the E.

@ColeDCrawford
Copy link
Contributor

Great. And those bounds also make sense?

@aweakley
Copy link
Member

I think so, but I'm just wondering about this bit "..estimated to be 171010000"

@ColeDCrawford
Copy link
Contributor

ExponentialYear.year should definitely return that. I'm just not sure how the "estimated" part should be expressed in the bounds. lower_fuzzy() should probably return 171000000-01-01, upper_fuzzy() should return 171999999-12-31. I guess the question is whether lower/upper_strict should take the full year 171010000 into consideration, or just the significant digits (171....).

I don't see any exponential, long or significant digit examples using date qualifiers at least ...

@aweakley
Copy link
Member

I think this implies that lower/upper_strict should take account of the full year: https://en.wikipedia.org/wiki/Significant_figures

For instance, if a length measurement yields 114.8 mm, using a ruler with the smallest interval between marks at 1 mm, the first three digits (1, 1, and 4, representing 114 mm) are certain and constitute significant figures. Further, digits that are uncertain yet meaningful are also included in the significant figures. In this example, the last digit (8, contributing 0.8 mm) is likewise considered significant despite its uncertainty.[1] Therefore, this measurement contains four significant figures.

@ColeDCrawford
Copy link
Contributor

That makes sense, but the S in EDTF directly specifies the number of digits to treat as significant, right? If we don't make use of that information, then there is no difference between the lower bound for 'Y17101E4S3', 'Y17101E4S4', or 'Y17101E4S5'.

Some of the other EDTF examples:

  • '1950S2', "some year between 1900 and 1999, estimated to be 1950" - I would assume the lower bound should be 1900 and the upper bound should be 1999, but if we use the generic significant figures definition it would be 1950 and 1959.
  • 'Y3388E2S3' "some year between 338000 and 338999, estimated to be 338800" - lower bound seems like it should be 338000 and upper bound should be 33899, but generic definition would be 338800 to 338899.

The definition for significant digits is: "A year (expressed in any of the three allowable forms: four-digit, 'Y' prefix, or exponential) may be followed by 'S', followed by a positive integer indicating the number of significant digits."

That means it's not just ExponentialYear that needs to support significant digits, but also LongYear and Date ...

@aweakley
Copy link
Member

This is really clear to me: '1950S2', "some year between 1900 and 1999. I just don't know what we're supposed to do with estimated to be 1950

I was a bit surprised by the Wikipedia article really and I wonder how far we're supposed to go if we follow that logic? What about the year 123456789S1 - surely all those digits can't be assumed to be significant when the S part tells us they're not?

Reading the article's reference here: https://chem.libretexts.org/Bookshelves/General_Chemistry/Chem1_(Lower)/04%3A_The_Basics_of_Chemistry/4.06%3A_Significant_Figures_and_Rounding it gets more confusing, because they say something different to what the EDTF standard says:

So, what is a significant digit? According to the usual definition, it is all the numerals in a measured quantity (counting from the left) whose values are considered as known exactly, plus one more whose value could be one more or one less:

In “157900” (four significant digits), the left most three digits are known exactly, but the fourth digit, “9” could well be “8” if the “true value” is within the implied range of 157850 to 157950.
In “158000” (three significant digits), the left most two digits are known exactly, while the third digit could be either “7” or “8” if the true value is within the implied range of 157500 to 158500.

What do you think about adding a new estimated() method to dates that have a significant-digits indicator? That way we could implement what the EDTF standard says.

@ColeDCrawford
Copy link
Contributor

What would estimated() return for each of these examples? Just want to see how it would differ from the ExponentialYear.year property

@aweakley
Copy link
Member

aweakley commented May 16, 2024

I think it would be the year but without the significance notation: 171010000, 1950 or 338800, so that would match the text description in the standard. So ExponentialIYear._precise_year()?

@ColeDCrawford
Copy link
Contributor

I have some WIP on this that I'll post soon. Just to confirm so I finish updating the tests - this is what we're looking for?

>>> from edtf.parser.grammar import parse_edtf as parse
>>> normal_year = parse("1950S2")
>>> normal_year
Date: '1950S2'
>>> normal_year.estimated()
1950
>>> normal_year.lower_fuzzy()
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> normal_year.upper_fuzzy()
time.struct_time(tm_year=1999, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> normal_year.lower_strict()
time.struct_time(tm_year=1950, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> normal_year.upper_strict()
time.struct_time(tm_year=1950, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> long_year = parse("Y171010000S3")
>>> long_year
LongYear: 'Y171010000S3'
>>> long_year.estimated()
171010000
>>> long_year.lower_strict()
time.struct_time(tm_year=171010000, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> long_year.upper_strict()
time.struct_time(tm_year=171010000, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> long_year.upper_fuzzy()[:3]
(171999999, 12, 31)
>>> long_year.lower_fuzzy()[:3]
(171000000, 1, 1)
>>> exp_year = parse("Y3388E2S3")
>>> exp_year
ExponentialYear: 'Y3388E2S3S3'
>>> exp_year.estimated()
338800
>>> exp_year.upper_strict()
time.struct_time(tm_year=338800, tm_mon=12, tm_mday=31, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=0, tm_yday=0, tm_isdst=-1)
>>> exp_year.lower_strict()[:3]
(338800, 1, 1)
>>> exp_year.upper_fuzzy()[:3]
(338999, 12, 31)
>>> exp_year.lower_fuzzy()[:3]
(338000, 1, 1)

This was referenced May 24, 2024
@aweakley
Copy link
Member

This is resolved by #56

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants