Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix copy-on-write in TimetableSnapshot #5941

Conversation

vpaturet
Copy link
Contributor

@vpaturet vpaturet commented Jul 1, 2024

Summary

As detailed in #5933, there is at least one code path that leads to the concurrent update of Timetable objects while they are being accessed by reader threads.
This PR fixes one such code path where a real-time added trip is dissociated from a trip pattern, without using copy-on-write.

Issue

Potentially close #5933

Unit tests

The concurrent access bug itself is not unit-tested.
The modified implementation details are tested by the unit tests from the original PR #5726

Documentation

No

Copy link

codecov bot commented Jul 1, 2024

Codecov Report

Attention: Patch coverage is 95.45455% with 1 line in your changes missing coverage. Please review.

Project coverage is 69.53%. Comparing base (05b9f38) to head (9f3fc97).

Files Patch % Lines
...a/org/opentripplanner/model/TimetableSnapshot.java 95.45% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##             dev-2.x    #5941      +/-   ##
=============================================
- Coverage      69.53%   69.53%   -0.01%     
+ Complexity     17113    17112       -1     
=============================================
  Files           1938     1938              
  Lines          73773    73775       +2     
  Branches        7548     7547       -1     
=============================================
- Hits           51298    51297       -1     
- Misses         19838    19840       +2     
- Partials        2637     2638       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@vpaturet vpaturet force-pushed the fix_copy_on_write_when_reverting_trip_pattern branch from 42cc0fa to 940a45e Compare July 1, 2024 08:47
@vpaturet vpaturet marked this pull request as ready for review July 1, 2024 08:50
@vpaturet vpaturet requested a review from a team as a code owner July 1, 2024 08:50
@vpaturet vpaturet marked this pull request as draft July 5, 2024 09:21
@vpaturet
Copy link
Contributor Author

vpaturet commented Jul 5, 2024

This fixes the concurrent access on the Timetable object, but the SortedSet that holds the Timetable is also shared between snapshot: it should also be copied for the fix to be complete. I will rework this.

@vpaturet vpaturet force-pushed the fix_copy_on_write_when_reverting_trip_pattern branch from 940a45e to 66dedbb Compare July 5, 2024 11:57
@vpaturet
Copy link
Contributor Author

vpaturet commented Jul 5, 2024

Complete fix:

  • The code performing the copy-on-write is extracted to a separate method and reused.
  • The copy-on-write algorithm is detailed in the Javadoc.
  • The collection holding the Timetables sorted by service date is made immutable to prevent change after snapshot publication (this comes in addition to the improvements made in Enforce immutability of published timetable snapshot #5934)

@vpaturet vpaturet marked this pull request as ready for review July 5, 2024 12:06
abyrd
abyrd previously approved these changes Jul 9, 2024
Copy link
Member

@abyrd abyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like another good move in the intended direction for the realtime update system.

I feel like the copyTimetable() method could somehow be refactored for clarity, but that is beyond the scope of this PR, which simply moves existing code into this new method body to facilitate reuse.

The name of that copyTimetable() method is slightly deceptive, as it doesn't just make a copy. New L319-320 look superficially like they make a copy and then release the reference to that copy without registering/retaining it anywhere. Maybe the method should be named something like createOrFindProtectiveCopy() or just protectiveCopyOf(). This still doesn't reveal that the copied timetable is "registered" and retained by the snapshot so the caller doesn't need to take any extra step to do so. I don't have a good name to suggest so I'll leave it as-is.

In R5 we have the distinct concept of a scenarioCopy that is similar to this. Maybe we should introduce a new term snapshotCopy that means the same thing everywhere: "protective copy for the purpose of building up a new snapshot, which will be reused as long as we're still building up that same snapshot, and which will be immediately registered/reained in the new snapshot in place of its source collection". A fully copy-on-write snapshot system might be expected to have many of these snapshotCopy() methods with similar semantics.

I also think the name of the boolean variable isDirty on new L317 is less than ideal as at this point the TripTimes has not been modified yet. This variable has a very short scope and is used only in the conditional test on the following line - should it just be inlined into the conditional test with no variable?

If you make any changes in response to the above (minor) suggestions I would just approve them. But it's also fine to merge as-is if you don't think changes are necessary.

The initial PR comment mentions "the original PR #5726". Does "original" imply that the problem was introduced there? If so it would be disappointing that none of us noticed this problem in review. However, it looks like the problematic code was already present on L169 before that PR just moved it around.

@vpaturet vpaturet dismissed stale reviews from abyrd and leonardehrenfried via d283fdf July 10, 2024 06:48
…ng_trip_pattern

# Conflicts:
#	src/main/java/org/opentripplanner/model/TimetableSnapshot.java
@vpaturet
Copy link
Contributor Author

vpaturet commented Jul 10, 2024

The bug with copy-on-write predates the PR #5726 and I have been seeing symptoms of this concurrency issue in the logs since we started to monitor them systematically more than a year ago.
#5726 is just the most recent PR that changed the structure of the class and added unit tests for proper coverage of the reverting method.

@leonardehrenfried
Copy link
Member

Since we are seeing quite a few concurrency problems with the current TimetableSnapshot, perhaps it would be a good idea to start with something that I personally had a bit lower on the list of priorities: introducing separate classes for a mutable snapshot buffer and an immutable snapshot.

@abyrd
Copy link
Member

abyrd commented Jul 11, 2024

introducing separate classes for a mutable snapshot buffer and an immutable snapshot.

One thing that makes this tricky / more work is that the damaging edits are carried out farther down the object tree. So an 'immutable snapshot' actually depends most heavily on immutable leaf nodes like TripTimes. Even if objects higher in the tree are superficially immutable, the objects they let you read/retrieve are still mutable.

@leonardehrenfried
Copy link
Member

That is true but aren't the exceptions mostly about concurrent access of the collections?

@leonardehrenfried
Copy link
Member

But I understand that calling it a immutable snapshot would give a false sense of security.

@vpaturet vpaturet merged commit e51f4c5 into opentripplanner:dev-2.x Jul 11, 2024
5 checks passed
@vpaturet vpaturet deleted the fix_copy_on_write_when_reverting_trip_pattern branch July 11, 2024 09:05
t2gran pushed a commit that referenced this pull request Jul 11, 2024
@t2gran t2gran added this to the 2.6 (next release) milestone Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent modification exception in Timetable
4 participants