[Java] Precedence checks are false negative when decoding a older version message #988

zyulyaev · 2024-04-10T19:43:34Z

Decoders generated with sbe.generate.precedence.checks=true throw IllegalStateException when message from a previous version, that is not mentioned in the message schema, is decoded in the correct order.

For example the following schema:

     <sbe:message id="1000" name="BugShowcase">
        <field id="1" name="field" type="int64"/>
        <group id="2" name="group" sinceVersion="3">
        </group>
        <data id="3" name="data" type="varStringEncoding"/>
    </sbe:message>

results in the follow state machine:

graph TD
    NOT_WRAPPED-->|"wrap(version=0)"|V0_BLOCK
    V0_BLOCK -->|"field(?)"|V0_BLOCK
    V3_BLOCK-->|"field(?)"|V3_BLOCK
    V0_BLOCK-->|"dataLength()"|V0_BLOCK
    V3_GROUP_1_BLOCK-->|"dataLength()"|V3_GROUP_1_BLOCK
    V3_GROUP_DONE-->|"dataLength()"|V3_GROUP_DONE
    V0_BLOCK-->|"data(?)"|V0_DATA_DONE
    V3_GROUP_1_BLOCK-->|"data(?)"|V3_DATA_DONE
    V3_GROUP_DONE-->|"data(?)"|V3_DATA_DONE
    NOT_WRAPPED-->|"wrap(version=3)"|V3_BLOCK
    V3_BLOCK-->|"groupCount(0)"|V3_GROUP_DONE
    V3_BLOCK-->|"groupCount(>0)"|V3_GROUP_N
    V3_GROUP_N_BLOCK-->|"group.next()\n  where count - newIndex > 1"|V3_GROUP_N_BLOCK
    V3_GROUP_N-->|"group.next()\n  where count - newIndex > 1"|V3_GROUP_N_BLOCK
    V3_GROUP_N_BLOCK-->|"group.next()\n  where count - newIndex == 1"|V3_GROUP_1_BLOCK
    V3_GROUP_N-->|"group.next()\n  where count - newIndex == 1"|V3_GROUP_1_BLOCK
    V3_GROUP_N_BLOCK-->|"group.resetCountToIndex()"|V3_GROUP_DONE
    V3_GROUP_1_BLOCK-->|"group.resetCountToIndex()"|V3_GROUP_DONE
    V3_GROUP_DONE-->|"group.resetCountToIndex()"|V3_GROUP_DONE
    V3_GROUP_N-->|"group.resetCountToIndex()"|V3_GROUP_DONE

and the following onWrap and group methods generated:

    private void onWrap(final int actingVersion) {
        switch(actingVersion) {
            case 3:
                codecState(CodecStates.V3_BLOCK);
                break;
            case 0:
                codecState(CodecStates.V0_BLOCK);
                break;
            default:
                codecState(CodecStates.V3_BLOCK);
                break;
        }
    }

    public GroupDecoder group() {
        if (parentMessage.actingVersion < 3) {
            group.count = 0;
            group.index = 0;
            return group;
        }
        group.wrap(buffer);
        if (SBE_ENABLE_PRECEDENCE_CHECKS) {
            onGroupAccessed(group.count);
        }
        return group;
    }

So if one tries to decode a message with version 2, they will inevitably get an IllegalStateException once try to decode the data field, since there is no way to transition from V3_BLOCK to V3_GROUP_DONE.

From my understanding, the fix should be to generate the following onWrap method instead:

    private void onWrap(final int actingVersion) {
        if (actingVersion < 3) {
            codecState(CodecStates.V0_BLOCK);
        } else {
            codecState(CodecStates.V3_BLOCK);
        }
    }

Here is the reproduction:

import org.agrona.ExpandableArrayBuffer;
import org.junit.jupiter.api.Test;

import static org.assertj.core.api.Assertions.assertThat;

public class BugShowcaseTest {
    @Test
    void showcase() {
        final var headerEncoder = new MessageHeaderEncoder();
        final var encoder = new BugShowcaseEncoder();
        final var buffer = new ExpandableArrayBuffer(32);

        encoder.wrapAndApplyHeader(buffer, 0, headerEncoder);
        headerEncoder.version(2);
        buffer.putLong(8, 123);
        buffer.putShort(16, (short) 5);
        buffer.putStringWithoutLengthUtf8(18, "Hello");

        final var headerDecoder = new MessageHeaderDecoder();
        final var decoder = new BugShowcaseDecoder();

        decoder.wrapAndApplyHeader(buffer, 0, headerDecoder);
        for (final var group : decoder.group()) {
            group.sbeSkip();
        }

        assertThat(decoder.field()).isEqualTo(123);
        assertThat(decoder.data()).isEqualTo("Hello");
    }
}

And here is the error:

java.lang.IllegalStateException: Illegal field access order. Cannot access field "data" in state: V3_BLOCK. Expected one of these transitions: ["field(?)", "groupCount(0)", "groupCount(>0)"]. Please see the diagram in the Javadoc of the class BugShowcaseDecoder#CodecStates.
        at BugShowcaseDecoder.onDataAccessed(BugShowcaseDecoder.java:476)
	at BugShowcaseDecoder.data(BugShowcaseDecoder.java:539)
	at BugShowcaseTest.showcase(BugShowcaseTest.java:32)

The text was updated successfully, but these errors were encountered:

vyazelenko · 2024-04-10T21:13:03Z

@ZachBray any thoughts?

ZachBray · 2024-04-11T08:13:31Z

Thank you for the report, @zyulyaev.
I'll add some tests and a fix.

ZachBray · 2024-04-11T14:53:32Z

@zyulyaev, it is worth noting that adding a group when there is already variable data isn't a permitted SBE schema evolution in version 1 of the specification. To decode, after such evolutions, an implementation requires the new header field, numGroups, defined in version 2 of the SBE specification. This project currently implements version 1 of the specification.

However, I think the issue you have raised is still valid.

zyulyaev · 2024-04-11T15:26:34Z

@ZachBray I was under impression that groups may be added without breaking backward compatibility. It does break forward-compatibility for sure though.

ZachBray · 2024-04-11T16:02:38Z

Groups can be added without breaking compatibility as long as there are no variable-length data fields following.

The SBE header has fields for:

blockLength: allows older decoders to skip over newly added fields (that it doesn't know about) in the block
templateId
schemaId
version: allows newer decoders to determine field presence

In situations where you have a newer encoder and an older decoder, adding new groups after there is already variable-length data fails because:

to find the first group, the old decoder skips over the blockLength from the offset of the message (limit = offset + blockLength)
it will then decode each repeating group it knows about (keeping track of the limit position reached in the buffer)
it will not skip over repeating groups it does not know about
it will assume that the first variable-length data field it knows about sits at the limit position (having read the repeating groups it knows about)
but when a new group is added before the variable-length data, limit might actually point to some repeating group data rather than the variable-length field the decoder expects

SBE v2 introduces a numGroups in the header, which facilitates skipping over these "unknown" repeating groups in older decoders.

zyulyaev · 2024-04-11T16:05:09Z

Sorry, I must have mixed backward and forward compatibility then. What I meant was that with a newer decoder you should be able to decode a message encoded according to an older schema version.

ZachBray · 2024-04-11T16:18:13Z

I've updated my response to remove that terminology for clarity, as I think it is context-dependent.

Relates to issue #988. An SBE message might not change in every version of a schema. For example, another message might change. Previously, the field precedence checking model would expect an exact match for `actingVersion` in `wrap` to enter one of its initial states. However, it is only aware of the versions in which a message schema has changed. Therefore, it was possible for `actingVersion` not to match any of these versions, in which case the initial state that represented a codec wrapped around the latest known version of the format was picked. Now, we do _not_ expect an exact match in `wrap`. Instead, we select the "best" match. For example, if a codec was changed in v1 and v3, and the acting version is v2, we will decode using v1. However, if the acting version is v4, we would decode as v3.

…ersions. Relates to issue #988. An SBE message might not change in every version of a schema. For example, another message might change. Previously, the field precedence checking model would expect an exact match for `actingVersion` in `wrap` to enter one of its initial states. However, it is only aware of the versions in which a message schema has changed. Therefore, it was possible for `actingVersion` not to match any of these versions, in which case the initial state that represented a codec wrapped around the latest known version of the format was picked. Now, we do _not_ expect an exact match in `wrap`. Instead, we select the "best" match. For example, if a codec was changed in v1 and v3, and the acting version is v2, we will decode using v1. However, if the acting version is v4, we would decode as v3.

…ersions. (#989) Relates to issue #988. An SBE message might not change in every version of a schema. For example, another message might change. Previously, the field precedence checking model would expect an exact match for `actingVersion` in `wrap` to enter one of its initial states. However, it is only aware of the versions in which a message schema has changed. Therefore, it was possible for `actingVersion` not to match any of these versions, in which case the initial state that represented a codec wrapped around the latest known version of the format was picked. Now, we do _not_ expect an exact match in `wrap`. Instead, we select the "best" match. For example, if a codec was changed in v1 and v3, and the acting version is v2, we will decode using v1. However, if the acting version is v4, we would decode as v3.

vyazelenko · 2024-04-11T21:51:16Z

Fixed in #989.

zyulyaev · 2024-04-12T18:06:39Z

Amazing, thanks a lot for addressing this so quickly, and for developing this awesome feature in the first place!

…ersions. (aeron-io#989) Relates to issue aeron-io#988. An SBE message might not change in every version of a schema. For example, another message might change. Previously, the field precedence checking model would expect an exact match for `actingVersion` in `wrap` to enter one of its initial states. However, it is only aware of the versions in which a message schema has changed. Therefore, it was possible for `actingVersion` not to match any of these versions, in which case the initial state that represented a codec wrapped around the latest known version of the format was picked. Now, we do _not_ expect an exact match in `wrap`. Instead, we select the "best" match. For example, if a codec was changed in v1 and v3, and the acting version is v2, we will decode using v1. However, if the acting version is v4, we would decode as v3.

ZachBray mentioned this issue Apr 11, 2024

[Java, C#, C++] Fix precedence checks to account for non-contiguous versions. #989

Merged

vyazelenko closed this as completed Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Java] Precedence checks are false negative when decoding a older version message #988

[Java] Precedence checks are false negative when decoding a older version message #988

zyulyaev commented Apr 10, 2024

vyazelenko commented Apr 10, 2024

ZachBray commented Apr 11, 2024

ZachBray commented Apr 11, 2024

zyulyaev commented Apr 11, 2024

ZachBray commented Apr 11, 2024 •

edited

Loading

zyulyaev commented Apr 11, 2024

ZachBray commented Apr 11, 2024

vyazelenko commented Apr 11, 2024

zyulyaev commented Apr 12, 2024

[Java] Precedence checks are false negative when decoding a older version message #988

[Java] Precedence checks are false negative when decoding a older version message #988

Comments

zyulyaev commented Apr 10, 2024

vyazelenko commented Apr 10, 2024

ZachBray commented Apr 11, 2024

ZachBray commented Apr 11, 2024

zyulyaev commented Apr 11, 2024

ZachBray commented Apr 11, 2024 • edited Loading

zyulyaev commented Apr 11, 2024

ZachBray commented Apr 11, 2024

vyazelenko commented Apr 11, 2024

zyulyaev commented Apr 12, 2024

ZachBray commented Apr 11, 2024 •

edited

Loading