Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a time_length method #288

Closed
wants to merge 5 commits into from
Closed

a time_length method #288

wants to merge 5 commits into from

Conversation

larmarange
Copy link
Contributor

Following #283, a generic timespan_length method working with period, duration, difftime and interval objects.

I was not sure about the name between timespan_length time_length or span_length. The later was not intuitive for me.

I didn't change int_length yet. Do you want to keep it for backward compatibility (in that case it could be just a wrapper to timespan_length) or do you want to remove it?

Note: currently a S3 method. Should it be changed to a S4 method?

@vspinu
Copy link
Member

vspinu commented Dec 14, 2014

Great!

The name is a bit too long. Let's ponder a bit more on it. I will review
the code tomorrow.
On Dec 14, 2014 1:35 AM, "Joseph" [email protected] wrote:


You can merge this Pull Request by running

git pull https://github.com/larmarange/lubridate master

Or view, comment on, or merge it at:

#288
Commit Summary

  • a timespan_length method

File Changes

Patch Links:


Reply to this email directly or view it on GitHub
#288.

@vspinu
Copy link
Member

vspinu commented Dec 14, 2014

Overall the code looks good. Please address my comments and squash all commits into one.

Could you please add a couple of tests? I have a feeling that it might break with negative intervals because of #285.

Does the default method work for the difftime objects?

Thanks.

@larmarange
Copy link
Contributor Author

I will work on it tomorrow.

Do you have a specific way to manage tests?

@larmarange
Copy link
Contributor Author

The default method works on difftime.

Regards

@vspinu
Copy link
Member

vspinu commented Dec 14, 2014

Do you have a specific way to manage tests?

Yes. Simply add a couple of test_that statement to inst/tests/test-timespans.R.

@larmarange larmarange changed the title a timespan_length method a time_length method Dec 15, 2014
@larmarange
Copy link
Contributor Author

OK

so I changed the name to time_length. It's now a S4 method. I addressed your comments. And I have added some tests.

So far, I have one failing test, in the specific cases of individuals born a 29 Feb.

Please consider the following example:

> time_length(interval(ymd('1992-02-29'), ymd('1999-02-28')), "years")
[1] 7
> time_length(interval(ymd('1992-02-29'), ymd('1999-03-01')), "years")
[1] 7.002732

In fact, the 28th Feb 1999, this individual didn't turn 7 yet. The result should be something like 6.99. Computing the period works well:

> x <- interval(ymd('1992-02-29'), ymd('1999-02-28'))
> as.period(x)
[1] "6y 11m 30d 0H 0M 0S"
> # NB
> as.period(interval(ymd('1992-02-29'), ymd('1999-03-01')))
[1] "7y 0m 0d 0H 0M 0S"

The issue comes from computing next anniversary:

> next_anniversary <- int_start(x) %m+% period(7, units = "year")
> next_anniversary
[1] "1999-02-28 UTC"

In fact, we should consider that next anniversary is 1999-03-01 (in this specific case).

The issue comes from %m+% operator which restrict to the last day of the nth month. Which is only one way to deal with months computation. In other situations (like age computation), you need to consider that the result should be the first day of the (n+1)th month.

Therefore, should we think about an alternative operator like %m++% (and then people will have two options when dealing computation with calendar time) or should this specific case be handled directly in time_length?

Note: epptools package propose a age_calc function (see https://github.com/jknowles/eeptools/blob/master/R/age_calc.R).

> age_calc(as.Date('1992-02-29'), as.Date('1999-02-28'), units="years")
[1] 6.997702
> age_calc(as.Date('1992-02-29'), as.Date('1999-03-30'), units="years")
[1] 7.079452

Note 2: I will squash all commits once the pull request will be completely ready.

@vspinu
Copy link
Member

vspinu commented Dec 15, 2014

Yeah. This is a problem indeed. I think %m++% is actually a good idea. If
29th is missing it should be between 28th and 1st.

I wonder if we can use some shortcuts here. Wouldn't it be equivalent to
directly use standard length of the year minus one day irrespective of the
leap year?
On Dec 15, 2014 2:17 AM, "Joseph" [email protected] wrote:

OK

so I changed the name to time_length. It's now a S4 method. I addressed
your comments. And I have added some tests.

So far, I have one failing test, in the specific cases of individuals born
a 29 Feb.

Please consider the following example:

time_length(interval(ymd('1992-02-29'), ymd('1999-02-28')), "years")
[1] 7> time_length(interval(ymd('1992-02-29'), ymd('1999-03-01')), "years")
[1] 7.002732

In fact, the 28th Feb 1999, this individual didn't turn 7. The result
should be something like 6.99. Computing the period works well:

x <- interval(ymd('1992-02-29'), ymd('1999-02-28'))> as.period(x)
[1] "6y 11m 30d 0H 0M 0S"

The issue comes from computing next anniversary:

next_anniversary <- int_start(x) %m+% period(7, units = "year")> next_anniversary
[1] "1999-02-28 UTC"

In fact, we should consider that next anniversary is 1999-03-01 (in this
specific case).

The issue comes from %m+% operator which restrict to the last day of the
nth month. Which is only one way to deal with months computation. In other
situations (like age computation), you need to consider that the result
should be the first day of the (n+1)th month.

Therefore, should we think about an alternative operator like %m++% (and
then people will have two options when dealing computation with calendar
time) or should this specific case handled directly in time_length?

Note: epptools package propose a age_calc function (see
https://github.com/jknowles/eeptools/blob/master/R/age_calc.R).

age_calc(as.Date('1992-02-29'), as.Date('1999-02-28'), units="years")
[1] 6.997702> age_calc(as.Date('1992-02-29'), as.Date('1999-03-30'), units="years")
[1] 7.079452

Note 2: I will squash all commits once the pull request will be completely
ready.


Reply to this email directly or view it on GitHub
#288 (comment).

@larmarange
Copy link
Contributor Author

I wonder if we can use some shortcuts here. Wouldn't it be equivalent to
directly use standard length of the year minus one day irrespective of the
leap year?

I'm not sure to understand what you mean here.

We should also not forget that similar issue exists when dealing with a duration/age in months.

@larmarange
Copy link
Contributor Author

I was just doing some additional tests, in particular this one:

> time_length(interval(ymd('1999-02-28'), ymd('1992-02-29')), "years")
[1] -6.997268
> time_length(interval(ymd('1999-03-01'), ymd('1992-02-29')), "years")
[1] -7.002732

So, with negative intervals, it's exactly the expected result.

My first feeling (and I probably need more time to see things more clearly) is about %m-% (or using %m+% with a negative period).

So far, %m+% and %m-% always rollback to the day before, which means a day closer to origin (start date) for an addition and a day more distant from origin for a subtraction. My feeling is that there is something wrong here and that we should have a sort of symmetry for the same operator.

Probably something to explore.

@vspinu
Copy link
Member

vspinu commented Dec 28, 2014

I wonder if we can use some shortcuts here. Wouldn't it be equivalent to directly use standard length of the year minus one day irrespective of the leap year?

I'm not sure to understand what you mean here.

I meant that maybe we can avoid using %m+% and just check if the year is a leap year. This still leaves out the leap second but that's a minute imprecision.

So far, %m+% and %m-% always rollback to the day before, which means a day closer to origin (start date) for an addition and a day more distant from origin for a subtraction. My feeling is that there is something wrong here and that we should have a sort of symmetry for the same operator.

The main task of %m+/-% is to keep arithmetics of months right. That is, yy-mm-dd - x months will give a date with month being mm-x. I think this is not clear from the docs. I will fix that up. So, having a round semantics like ymd(100330) %m-% months(1) == 2010-03-01 is not really an option. So I am afraidtime_length` should recur to direct computation.

I am still thinking about %m++ and I cannot really think of real applications at the moment. Can you think of any real user patterns? If it's not really useful then there is no point adding it. It also violates month arithmetic and should probably be named differently, not with m in it.

@larmarange
Copy link
Contributor Author

Your shortcut will work.for years, but with months it would be more complex.

I don't see right now other use of %m++% except than anniversary computation. Therefore, several options available:

1 Make %m++% an internal function
2 Create a function called anniversary with 3 arguments: date of birth, integer, unit (month or years) This function could be internal or exported.
3 add an option rollback_begin_of_month to the internal function .month_plus

@vspinu
Copy link
Member

vspinu commented Dec 29, 2014

time_length(interval(ymd('1992-02-29'), ymd('1999-02-28')), "years")
[1] 7
In fact, the 28th Feb 1999, this individual didn't turn 7 yet. The result should be something like 6.99.

Let's take a simpler example:

time_length(interval(ymd('1992-02-29'), ymd('1993-02-28')), "years")
[1] 1

1993 is not a leap year and it has only 365 days. Adding 365 days to 1992-02-29 lands you into 1993-02-28. This is one way of looking at it. The other way is to consider that next cycle has 366 days. Then you land on 1994-02-28 24:00:00, and the next cycle of 365 days will land you again on 1995-02-28 24:00:00. The interesting part is that next cycle of 365 days will lend you on 1996-02-29 00:00:00 which is exactly right.

So it all comes to conventions about the computation of the "start" of the leap year. The second convention is in line with your comment above and I think it's less confusing. Your code adders to the first convention.

I have spent quite some time today into bypassing %m+% and I almost got to something, but it's still not working right. You were right, it's far from trivial.

@larmarange
Copy link
Contributor Author

The issue is also to be consistent with period computation.

> as.period(interval(ymd('1992-02-29'), ymd('1993-02-28')))
[1] "11m 30d 0H 0M 0S"

We obtain 0 year + 11m 30d so we expect time_length to be < 1. (i.e. the trunc part of time_length should be equal to @year of the period object.

%m++% (or its equivalent) will solve the problem.

@larmarange
Copy link
Contributor Author

I have updated time_length to use %m++% (not exported) for testing purpose. The function pass now all implemented tests.

We still need to decide how to implement %m++% or its equivalent.

@vspinu
Copy link
Member

vspinu commented Dec 29, 2014

I have already squashed and merged your old version into 8c08adc.

And I have implemented a version that completely bypasses the m+ operator by computing the length from first principles. Unfortunately it's quite complex, and if there are no major performance differences I would be happy to roll it into simpler m++ implementation. We need some more corner cases tests.

@vspinu
Copy link
Member

vspinu commented Dec 29, 2014

Btw, you are working on a badly broken old version of lubridate. Would be nice if you could add m++ m-- on top of master.

@larmarange
Copy link
Contributor Author

Alright. I will update my version.

Regarding %m++%, do you want to export it (i.e. will be available for everyone) or do you prefer to keep it as an internal function?

I'm not sure about implementing a %m--% operator as it's not intuitive (i.e. we roll back to the first day of the next month). However, it could be done directly by adding a negative period with %m++%.

Regards

@vspinu
Copy link
Member

vspinu commented Dec 29, 2014

Regarding %m++%, do you want to export it (i.e. will be available for everyone) or do you prefer to keep it as an internal function?

Let's start first with non-exported version and make sure it works as expected. As I said before, my main concern is that this operator breaks the months arithmetic which is the main point of %m+%.. So I would prefer not to export it unless it's proven useful outside this limited situation. If we have %m++% we will need %m--% to match %m-%.
.

@larmarange
Copy link
Contributor Author

See new pull request #292

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants