Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AdaptiveSelectionStats serialization bug #28718

Merged

Conversation

ywelsch
Copy link
Contributor

@ywelsch ywelsch commented Feb 18, 2018

The AdaptiveSelectionStats object serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though.

Closes #28713

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks problematic still? At least, it’s not obvious to me why it’s not problematic.

@@ -96,7 +96,8 @@ public SearchTransportService(Settings settings, TransportService transportServi
}

public Map<String, Long> getClientConnections() {
return Collections.unmodifiableMap(clientConnections);
// create a (weakly consistent) snapshot of the map so that it does not get modified when serializing it for stats reporting
return Collections.unmodifiableMap(new HashMap<>(clientConnections));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is subject to the same race, concurrent modification during the copy? It won’t manifest as an inconsistent snapshot of size and number of entries when iterating for serialization, but instead a concurrent modification exception thrown during the copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ConcurrentHashMap.html

Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is I did not realize the backing map was a concurrent hash map (the perils of reviewing on mobile).

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ywelsch ywelsch added v6.2.3 and removed v6.2.2 labels Feb 19, 2018
@ywelsch ywelsch merged commit c002664 into elastic:master Feb 19, 2018
ywelsch added a commit that referenced this pull request Feb 19, 2018
The AdaptiveSelectionStats object serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though.

Closes #28713
ywelsch added a commit that referenced this pull request Feb 19, 2018
The AdaptiveSelectionStats object serializes the clientOutgoingConnections map that's concurrently updated in SearchTransportService. Serializing the map consists of first writing the size of the map and then serializing the entries. If the number of entries changes while the map is being serialized, the size and number of entries go out of sync. The deserialization routine expects those to be in sync though.

Closes #28713
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Feb 19, 2018
* master:
  Enable selecting adaptive selection stats
  Remove leftover mention of file-based scripts
  Fix threading issue on listener notification (elastic#28730)
  Revisit deletion policy after release the last snapshot (elastic#28627)
  Remove unused method
  Track deletes only in the tombstone map instead of maintaining as copy (elastic#27868)
  [Docs] Correct typo in README.textile (elastic#28716)
  Fix AdaptiveSelectionStats serialization bug (elastic#28718)
  TEST: Fix InternalEngine#testAcquireIndexCommit
  Add note on temporary directory for Windows service
  Added coming annotation and breaking changes link to release notes script
  Remove leftover PR link for previously disabled bwc tests
  Separate acquiring safe commit and last commit (elastic#28271)
  Fix BWC issue of the translog last modified age stats
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Feb 21, 2018
Today, we serialize collections prefixed by their lengths. If the serialized
length is inconsistent with the number of objects in the collection then the
serialization succeeds but any subsequent deserialization will (almost
certainly) fail, reporting something unexpected at some later point in the
stream. This can happen, for instance, because of a concurrent modification
(e.g. elastic#28718) to the collection in between getting its length and iterating through
its objects.

This is a royal pain to debug, because the failure is reported a long way from
where it actually occurs. See also e.g. elastic#25323.

This change adds checks to StreamOutput to verify that it wrote as many objects
as it said it would, in order to catch any similar problems in future closer to
their sources.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants