Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java] Precedence checks are false negative when decoding a older version message #988

Closed
zyulyaev opened this issue Apr 10, 2024 · 9 comments

Comments

@zyulyaev
Copy link
Contributor

Decoders generated with sbe.generate.precedence.checks=true throw IllegalStateException when message from a previous version, that is not mentioned in the message schema, is decoded in the correct order.

For example the following schema:

     <sbe:message id="1000" name="BugShowcase">
        <field id="1" name="field" type="int64"/>
        <group id="2" name="group" sinceVersion="3">
        </group>
        <data id="3" name="data" type="varStringEncoding"/>
    </sbe:message>

results in the follow state machine:

graph TD
    NOT_WRAPPED-->|"wrap(version=0)"|V0_BLOCK
    V0_BLOCK -->|"field(?)"|V0_BLOCK
    V3_BLOCK-->|"field(?)"|V3_BLOCK
    V0_BLOCK-->|"dataLength()"|V0_BLOCK
    V3_GROUP_1_BLOCK-->|"dataLength()"|V3_GROUP_1_BLOCK
    V3_GROUP_DONE-->|"dataLength()"|V3_GROUP_DONE
    V0_BLOCK-->|"data(?)"|V0_DATA_DONE
    V3_GROUP_1_BLOCK-->|"data(?)"|V3_DATA_DONE
    V3_GROUP_DONE-->|"data(?)"|V3_DATA_DONE
    NOT_WRAPPED-->|"wrap(version=3)"|V3_BLOCK
    V3_BLOCK-->|"groupCount(0)"|V3_GROUP_DONE
    V3_BLOCK-->|"groupCount(>0)"|V3_GROUP_N
    V3_GROUP_N_BLOCK-->|"group.next()\n  where count - newIndex > 1"|V3_GROUP_N_BLOCK
    V3_GROUP_N-->|"group.next()\n  where count - newIndex > 1"|V3_GROUP_N_BLOCK
    V3_GROUP_N_BLOCK-->|"group.next()\n  where count - newIndex == 1"|V3_GROUP_1_BLOCK
    V3_GROUP_N-->|"group.next()\n  where count - newIndex == 1"|V3_GROUP_1_BLOCK
    V3_GROUP_N_BLOCK-->|"group.resetCountToIndex()"|V3_GROUP_DONE
    V3_GROUP_1_BLOCK-->|"group.resetCountToIndex()"|V3_GROUP_DONE
    V3_GROUP_DONE-->|"group.resetCountToIndex()"|V3_GROUP_DONE
    V3_GROUP_N-->|"group.resetCountToIndex()"|V3_GROUP_DONE
Loading

and the following onWrap and group methods generated:

    private void onWrap(final int actingVersion) {
        switch(actingVersion) {
            case 3:
                codecState(CodecStates.V3_BLOCK);
                break;
            case 0:
                codecState(CodecStates.V0_BLOCK);
                break;
            default:
                codecState(CodecStates.V3_BLOCK);
                break;
        }
    }

    public GroupDecoder group() {
        if (parentMessage.actingVersion < 3) {
            group.count = 0;
            group.index = 0;
            return group;
        }
        group.wrap(buffer);
        if (SBE_ENABLE_PRECEDENCE_CHECKS) {
            onGroupAccessed(group.count);
        }
        return group;
    }

So if one tries to decode a message with version 2, they will inevitably get an IllegalStateException once try to decode the data field, since there is no way to transition from V3_BLOCK to V3_GROUP_DONE.

From my understanding, the fix should be to generate the following onWrap method instead:

    private void onWrap(final int actingVersion) {
        if (actingVersion < 3) {
            codecState(CodecStates.V0_BLOCK);
        } else {
            codecState(CodecStates.V3_BLOCK);
        }
    }

Here is the reproduction:

import org.agrona.ExpandableArrayBuffer;
import org.junit.jupiter.api.Test;

import static org.assertj.core.api.Assertions.assertThat;

public class BugShowcaseTest {
    @Test
    void showcase() {
        final var headerEncoder = new MessageHeaderEncoder();
        final var encoder = new BugShowcaseEncoder();
        final var buffer = new ExpandableArrayBuffer(32);

        encoder.wrapAndApplyHeader(buffer, 0, headerEncoder);
        headerEncoder.version(2);
        buffer.putLong(8, 123);
        buffer.putShort(16, (short) 5);
        buffer.putStringWithoutLengthUtf8(18, "Hello");

        final var headerDecoder = new MessageHeaderDecoder();
        final var decoder = new BugShowcaseDecoder();

        decoder.wrapAndApplyHeader(buffer, 0, headerDecoder);
        for (final var group : decoder.group()) {
            group.sbeSkip();
        }

        assertThat(decoder.field()).isEqualTo(123);
        assertThat(decoder.data()).isEqualTo("Hello");
    }
}

And here is the error:

java.lang.IllegalStateException: Illegal field access order. Cannot access field "data" in state: V3_BLOCK. Expected one of these transitions: ["field(?)", "groupCount(0)", "groupCount(>0)"]. Please see the diagram in the Javadoc of the class BugShowcaseDecoder#CodecStates.
        at BugShowcaseDecoder.onDataAccessed(BugShowcaseDecoder.java:476)
	at BugShowcaseDecoder.data(BugShowcaseDecoder.java:539)
	at BugShowcaseTest.showcase(BugShowcaseTest.java:32)

@vyazelenko
Copy link
Contributor

@ZachBray any thoughts?

@ZachBray
Copy link
Contributor

Thank you for the report, @zyulyaev.
I'll add some tests and a fix.

@ZachBray
Copy link
Contributor

@zyulyaev, it is worth noting that adding a group when there is already variable data isn't a permitted SBE schema evolution in version 1 of the specification. To decode, after such evolutions, an implementation requires the new header field, numGroups, defined in version 2 of the SBE specification. This project currently implements version 1 of the specification.

However, I think the issue you have raised is still valid.

@zyulyaev
Copy link
Contributor Author

@ZachBray I was under impression that groups may be added without breaking backward compatibility. It does break forward-compatibility for sure though.

@ZachBray
Copy link
Contributor

ZachBray commented Apr 11, 2024

Groups can be added without breaking compatibility as long as there are no variable-length data fields following.

The SBE header has fields for:

  • blockLength: allows older decoders to skip over newly added fields (that it doesn't know about) in the block
  • templateId
  • schemaId
  • version: allows newer decoders to determine field presence

In situations where you have a newer encoder and an older decoder, adding new groups after there is already variable-length data fails because:

  • to find the first group, the old decoder skips over the blockLength from the offset of the message (limit = offset + blockLength)
  • it will then decode each repeating group it knows about (keeping track of the limit position reached in the buffer)
  • it will not skip over repeating groups it does not know about
  • it will assume that the first variable-length data field it knows about sits at the limit position (having read the repeating groups it knows about)
  • but when a new group is added before the variable-length data, limit might actually point to some repeating group data rather than the variable-length field the decoder expects

SBE v2 introduces a numGroups in the header, which facilitates skipping over these "unknown" repeating groups in older decoders.

@zyulyaev
Copy link
Contributor Author

Sorry, I must have mixed backward and forward compatibility then. What I meant was that with a newer decoder you should be able to decode a message encoded according to an older schema version.

@ZachBray
Copy link
Contributor

I've updated my response to remove that terminology for clarity, as I think it is context-dependent.

ZachBray added a commit that referenced this issue Apr 11, 2024
Relates to issue #988.

An SBE message might not change in every version of a schema. For
example, another message might change.

Previously, the field precedence checking model would expect an exact
match for `actingVersion` in `wrap` to enter one of its initial states.
However, it is only aware of the versions in which a message schema has
changed. Therefore, it was possible for `actingVersion` not to match any
of these versions, in which case the initial state that represented a
codec wrapped around the latest known version of the format was picked.

Now, we do _not_ expect an exact match in `wrap`. Instead, we select the
"best" match. For example, if a codec was changed in v1 and v3, and the
acting version is v2, we will decode using v1. However, if the acting
version is v4, we would decode as v3.
ZachBray added a commit that referenced this issue Apr 11, 2024
…ersions.

Relates to issue #988.

An SBE message might not change in every version of a schema. For
example, another message might change.

Previously, the field precedence checking model would expect an exact
match for `actingVersion` in `wrap` to enter one of its initial states.
However, it is only aware of the versions in which a message schema has
changed. Therefore, it was possible for `actingVersion` not to match any
of these versions, in which case the initial state that represented a
codec wrapped around the latest known version of the format was picked.

Now, we do _not_ expect an exact match in `wrap`. Instead, we select the
"best" match. For example, if a codec was changed in v1 and v3, and the
acting version is v2, we will decode using v1. However, if the acting
version is v4, we would decode as v3.
vyazelenko pushed a commit that referenced this issue Apr 11, 2024
…ersions. (#989)

Relates to issue #988.

An SBE message might not change in every version of a schema. For
example, another message might change.

Previously, the field precedence checking model would expect an exact
match for `actingVersion` in `wrap` to enter one of its initial states.
However, it is only aware of the versions in which a message schema has
changed. Therefore, it was possible for `actingVersion` not to match any
of these versions, in which case the initial state that represented a
codec wrapped around the latest known version of the format was picked.

Now, we do _not_ expect an exact match in `wrap`. Instead, we select the
"best" match. For example, if a codec was changed in v1 and v3, and the
acting version is v2, we will decode using v1. However, if the acting
version is v4, we would decode as v3.
@vyazelenko
Copy link
Contributor

Fixed in #989.

@zyulyaev
Copy link
Contributor Author

Amazing, thanks a lot for addressing this so quickly, and for developing this awesome feature in the first place!

DarrylGamroth pushed a commit to New-Earth-Lab/simple-binary-encoding that referenced this issue Nov 19, 2024
…ersions. (aeron-io#989)

Relates to issue aeron-io#988.

An SBE message might not change in every version of a schema. For
example, another message might change.

Previously, the field precedence checking model would expect an exact
match for `actingVersion` in `wrap` to enter one of its initial states.
However, it is only aware of the versions in which a message schema has
changed. Therefore, it was possible for `actingVersion` not to match any
of these versions, in which case the initial state that represented a
codec wrapped around the latest known version of the format was picked.

Now, we do _not_ expect an exact match in `wrap`. Instead, we select the
"best" match. For example, if a codec was changed in v1 and v3, and the
acting version is v2, we will decode using v1. However, if the acting
version is v4, we would decode as v3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants