-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AdaptiveSelectionStats serialization bug #28718
Fix AdaptiveSelectionStats serialization bug #28718
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks problematic still? At least, it’s not obvious to me why it’s not problematic.
@@ -96,7 +96,8 @@ public SearchTransportService(Settings settings, TransportService transportServi | |||
} | |||
|
|||
public Map<String, Long> getClientConnections() { | |||
return Collections.unmodifiableMap(clientConnections); | |||
// create a (weakly consistent) snapshot of the map so that it does not get modified when serializing it for stats reporting | |||
return Collections.unmodifiableMap(new HashMap<>(clientConnections)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is subject to the same race, concurrent modification during the copy? It won’t manifest as an inconsistent snapshot of size and number of entries when iterating for serialization, but instead a concurrent modification exception thrown during the copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html
Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is I did not realize the backing map was a concurrent hash map (the perils of reviewing on mobile).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
The AdaptiveSelectionStats object serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though. Closes #28713
The AdaptiveSelectionStats object serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though. Closes #28713
* master: Enable selecting adaptive selection stats Remove leftover mention of file-based scripts Fix threading issue on listener notification (elastic#28730) Revisit deletion policy after release the last snapshot (elastic#28627) Remove unused method Track deletes only in the tombstone map instead of maintaining as copy (elastic#27868) [Docs] Correct typo in README.textile (elastic#28716) Fix AdaptiveSelectionStats serialization bug (elastic#28718) TEST: Fix InternalEngine#testAcquireIndexCommit Add note on temporary directory for Windows service Added coming annotation and breaking changes link to release notes script Remove leftover PR link for previously disabled bwc tests Separate acquiring safe commit and last commit (elastic#28271) Fix BWC issue of the translog last modified age stats
Today, we serialize collections prefixed by their lengths. If the serialized length is inconsistent with the number of objects in the collection then the serialization succeeds but any subsequent deserialization will (almost certainly) fail, reporting something unexpected at some later point in the stream. This can happen, for instance, because of a concurrent modification (e.g. elastic#28718) to the collection in between getting its length and iterating through its objects. This is a royal pain to debug, because the failure is reported a long way from where it actually occurs. See also e.g. elastic#25323. This change adds checks to StreamOutput to verify that it wrote as many objects as it said it would, in order to catch any similar problems in future closer to their sources.
The
AdaptiveSelectionStats
object serializes theclientOutgoingConnections
map that's concurrently updated inSearchTransportService
. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though.Closes #28713