-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix copy-on-write in TimetableSnapshot #5941
Fix copy-on-write in TimetableSnapshot #5941
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev-2.x #5941 +/- ##
=============================================
- Coverage 69.53% 69.53% -0.01%
+ Complexity 17113 17112 -1
=============================================
Files 1938 1938
Lines 73773 73775 +2
Branches 7548 7547 -1
=============================================
- Hits 51298 51297 -1
- Misses 19838 19840 +2
- Partials 2637 2638 +1 ☔ View full report in Codecov by Sentry. |
42cc0fa
to
940a45e
Compare
This fixes the concurrent access on the Timetable object, but the SortedSet that holds the Timetable is also shared between snapshot: it should also be copied for the fix to be complete. I will rework this. |
940a45e
to
66dedbb
Compare
Complete fix:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like another good move in the intended direction for the realtime update system.
I feel like the copyTimetable()
method could somehow be refactored for clarity, but that is beyond the scope of this PR, which simply moves existing code into this new method body to facilitate reuse.
The name of that copyTimetable()
method is slightly deceptive, as it doesn't just make a copy. New L319-320 look superficially like they make a copy and then release the reference to that copy without registering/retaining it anywhere. Maybe the method should be named something like createOrFindProtectiveCopy()
or just protectiveCopyOf()
. This still doesn't reveal that the copied timetable is "registered" and retained by the snapshot so the caller doesn't need to take any extra step to do so. I don't have a good name to suggest so I'll leave it as-is.
In R5 we have the distinct concept of a scenarioCopy
that is similar to this. Maybe we should introduce a new term snapshotCopy
that means the same thing everywhere: "protective copy for the purpose of building up a new snapshot, which will be reused as long as we're still building up that same snapshot, and which will be immediately registered/reained in the new snapshot in place of its source collection". A fully copy-on-write snapshot system might be expected to have many of these snapshotCopy()
methods with similar semantics.
I also think the name of the boolean variable isDirty
on new L317 is less than ideal as at this point the TripTimes has not been modified yet. This variable has a very short scope and is used only in the conditional test on the following line - should it just be inlined into the conditional test with no variable?
If you make any changes in response to the above (minor) suggestions I would just approve them. But it's also fine to merge as-is if you don't think changes are necessary.
The initial PR comment mentions "the original PR #5726". Does "original" imply that the problem was introduced there? If so it would be disappointing that none of us noticed this problem in review. However, it looks like the problematic code was already present on L169 before that PR just moved it around.
…ng_trip_pattern # Conflicts: # src/main/java/org/opentripplanner/model/TimetableSnapshot.java
The bug with copy-on-write predates the PR #5726 and I have been seeing symptoms of this concurrency issue in the logs since we started to monitor them systematically more than a year ago. |
Since we are seeing quite a few concurrency problems with the current TimetableSnapshot, perhaps it would be a good idea to start with something that I personally had a bit lower on the list of priorities: introducing separate classes for a mutable snapshot buffer and an immutable snapshot. |
One thing that makes this tricky / more work is that the damaging edits are carried out farther down the object tree. So an 'immutable snapshot' actually depends most heavily on immutable leaf nodes like TripTimes. Even if objects higher in the tree are superficially immutable, the objects they let you read/retrieve are still mutable. |
That is true but aren't the exceptions mostly about concurrent access of the collections? |
But I understand that calling it a immutable snapshot would give a false sense of security. |
Summary
As detailed in #5933, there is at least one code path that leads to the concurrent update of Timetable objects while they are being accessed by reader threads.
This PR fixes one such code path where a real-time added trip is dissociated from a trip pattern, without using copy-on-write.
Issue
Potentially close #5933
Unit tests
The concurrent access bug itself is not unit-tested.
The modified implementation details are tested by the unit tests from the original PR #5726
Documentation
No