Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inmemory store does not support values more than 127 bytes in size for properties with Cardinality.SET (and doesn't support key sizes > 127 bytes) #2273

Closed
mad opened this issue Dec 9, 2020 · 6 comments

Comments

@mad
Copy link
Contributor

mad commented Dec 9, 2020

For confirmed bugs, please report:

  • Version: master
  • Storage Backend: inmemory
  • Mixed Index Backend:
  • Mailing list Thread URL:
  • Steps to Reproduce:

For berkeleyje storage works as expected

graph = JanusGraphFactory.open("berkeleyje:/tmp/graph")

mgmt = graph.openManagement()
p1 = mgmt.makePropertyKey('p1').cardinality(Cardinality.SET).dataType(String.class).make()
mgmt.buildIndex("idx", Vertex.class).addKey(p1).buildCompositeIndex();
mgmt.commit();

v1=graph.traversal().addV("Portrait").property("p1", '-'*125).next()
graph.tx().commit()

But for inmemory store failed on commit

graph = JanusGraphFactory.open("inmemory")

mgmt = graph.openManagement()
p1 = mgmt.makePropertyKey('p1').cardinality(Cardinality.SET).dataType(String.class).make()
mgmt.buildIndex("idx", Vertex.class).addKey(p1).buildCompositeIndex();
mgmt.commit();

v1=graph.traversal().addV("Portrait").property("p1", '-'*125).next()
graph.tx().commit()

with exception

...
Caused by: java.lang.IllegalArgumentException
	at com.google.common.base.Preconditions.checkArgument(Preconditions.java:108)
	at org.janusgraph.diskstorage.inmemory.BufferPageUtils.buildFromEntryArray(BufferPageUtils.java:110)
	at org.janusgraph.diskstorage.inmemory.BufferPage.merge(BufferPage.java:236)
	at org.janusgraph.diskstorage.inmemory.SinglePageEntryBuffer.merge(SinglePageEntryBuffer.java:36)
	at org.janusgraph.diskstorage.inmemory.SinglePageEntryBuffer.mutate(SinglePageEntryBuffer.java:83)
	at org.janusgraph.diskstorage.inmemory.InMemoryColumnValueStore.mutate(InMemoryColumnValueStore.java:128)
	at org.janusgraph.diskstorage.inmemory.InMemoryKeyColumnValueStore.mutate(InMemoryKeyColumnValueStore.java:106)
	at org.janusgraph.diskstorage.inmemory.InMemoryStoreManager.mutateMany(InMemoryStoreManager.java:127)
	at org.janusgraph.diskstorage.locking.consistentkey.ExpectedValueCheckingStoreManager.mutateMany(ExpectedValueCheckingStoreManager.java:79)
	at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:91)
	at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction$1.call(CacheTransaction.java:88)
	at org.janusgraph.diskstorage.util.BackendOperation.executeDirect(BackendOperation.java:68)

@li-boxuan
Copy link
Member

I can reproduce it on 0.5.2.

@porunov
Copy link
Member

porunov commented Dec 17, 2020

@dk-github Do you know why it may happen?

@dk-github dk-github changed the title Inmemory store does not support long property values Inmemory store does not support long values for properties with Cardinality.SET Dec 19, 2020
@dk-github
Copy link
Contributor

dk-github commented Dec 19, 2020

We routinely store strings hundreds of MBs in size, so this looks to be a "special" case which wasn't covered in any of the tests or use cases so far and so went unnoticed.

I had a quick look, and the above code works fine if we remove the following: cardinality(Cardinality.SET)

Also, the comment above the failing precondition line says that it assumes the length of the key (not value) in an Entry is always less than 127, and uses that assumption to reduce the overhead (1 byte to store the position where value starts instead of 4 bytes).

So my hypothesis (not based on actual code dive so far) is that normally JG stores property as value under a key equal to property name, but in case of Cardinality.SET it switches to a structure which effectively stores values as keys (and thus achieves uniqueness).

Summary (NOTE that it is actually premature because it would be great to verify by diving in the code - will try doing this later next week and update this post if not correct):

  • 127-byte limit to key length is an undocumented limitation of current in-memory plugin implementation, introduced to save storage memory overhead, which I unfortunately forgot to document upfront (because it never had any significance in our use case, and didn't break any existing JG tests to date)
  • it seems (or rather seemed) reasonable for most use cases, because property name is unlikely to reach this limit (at least if it is human-named), and it does provide a significant overhead reduction, so if not for this issue - I would propose to just document it and leave as is
  • but now that we (potentially) discovered that Cardinality.SET uses keys to store values, it is clear that it is not so universally acceptable, and is a bigger problem

The options I can see here are:
a) remove this optimization and always use full 4-byte integers for indexing - this will resolve the issue, but would be sad, because 99.99% of Entries is going to be 3 bytes longer than they actually need to be, and with big graphs this can be many hundreds of MB, possibly a few GBs

b) invest some time in an "adaptive" storage strategy, so that in 99.99% cases it uses 1-byte indexes, and switches to 4-byte only if required, on per-page or per-entry basis. For example, use sign bit in first byte to signify if it is a 1-byte or 4-byte length.
We already do a similar thing where 99% of buffers are single-page and avoid the overhead of embedded page list etc, and we only switch to multi-page if the size of the data dictates it, which is relatively seldom

c) check if there is a practical limit to the key size in other backends, and if so - should we document these limits as limitations of Cardinality.SET, or look for a more involved but less limited implementation of SET on frontend side

I will try to set aside some time to verify the hypothesis, and look at how difficult would option b be. In the meantime, any thoughts are welcome.

@dk-github dk-github changed the title Inmemory store does not support long values for properties with Cardinality.SET Inmemory store does not support values more than 127 bytes in size for properties with Cardinality.SET (and doesn't support key sizes > 127 bytes) Dec 20, 2020
dk-github added a commit to dk-github/janusgraph that referenced this issue Dec 21, 2020
…page storage, to allow for Entry keys of arbitrary length, while not incurring unnecessary overhead (as 99% of keys are typically quite short)
dk-github added a commit to dk-github/janusgraph that referenced this issue Dec 21, 2020
…page storage, to allow for Entry keys of arbitrary length, while not incurring unnecessary overhead (as 99% of keys are typically quite short)

Signed-off-by: Dmitry Kovalev <[email protected]>
@dk-github
Copy link
Contributor

dk-github commented Dec 21, 2020

I opened a PR #2294 which implements option b) - i.e. it uses 1 byte to store key length unless it exceeds 127, in which case it switches to full 4-byte integer and stores it negated. SO that by reading 1st byte and checking if it is negative it can then detect if it needs to read 3 more. It relies on BigEndian order being default in java.nio cross-platform.

I haven't made any performance or memory footprint comparisons yet, but I expect the impact to be negligible to both.

Please review.

I have also converted the repro code above to Java, not sure if we want to make it into a test specifically for storing big values with Cardinality.SET - and where to put that test if we do - so just posting it here for now:

package org.janusgraph.diskstorage.inmemory;

import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.Cardinality;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.PropertyKey;
import org.janusgraph.core.schema.JanusGraphManagement;
import org.junit.jupiter.api.Test;

import java.util.ArrayList;
import java.util.Map;

import static org.junit.jupiter.api.Assertions.*;

class issue2273 {
    @Test
    void repro2273CardinalitySetLongValue() {
        JanusGraph graph = JanusGraphFactory.open("inmemory");

        StringBuffer longStringValue = new StringBuffer("-");
        for(int i=0; i < 130; i++)
        {
            longStringValue.append('-');
        }

        assertTrue(longStringValue.length() > 127); //original issue reproduces when value is 125 symbols or more

        JanusGraphManagement mgmt = graph.openManagement();
        PropertyKey p1 = mgmt.makePropertyKey("p1").cardinality(Cardinality.SET).dataType(String.class).make();
        PropertyKey p2 = mgmt.makePropertyKey("p2").dataType(String.class).make();
        PropertyKey p3 = mgmt.makePropertyKey(longStringValue.toString()).dataType(String.class).make();
        mgmt.buildIndex("idx", Vertex.class).addKey(p1).buildCompositeIndex();
        mgmt.commit();

        Vertex v1 = graph.traversal().addV("Portrait")
            .property("p1", longStringValue.toString()).property("p1", longStringValue.toString())
            .property("p2", longStringValue.toString())
            .property(longStringValue.toString(), longStringValue.toString())
            .next();
        graph.tx().commit();

        Map<Object, Object> vals = graph.traversal().V().valueMap().next();
        assertEquals(longStringValue.toString(), ((ArrayList)(vals.get("p2"))).get(0));
        assertEquals(longStringValue.toString(), ((ArrayList)(vals.get("p1"))).get(0));
        assertEquals(longStringValue.toString(), ((ArrayList)(vals.get(longStringValue.toString()))).get(0));
    }

}

@li-boxuan
Copy link
Member

li-boxuan commented Dec 23, 2020

I have also converted the repro code above to Java, not sure if we want to make it into a test specifically for storing big values with Cardinality.SET - and where to put that test if we do - so just posting it here for now

It would be nice if you can put it in JanusGraphTest.java

dk-github added a commit to dk-github/janusgraph that referenced this issue Dec 27, 2020
…page storage, to allow for Entry keys of arbitrary length, while not incurring unnecessary overhead (as 99% of keys are typically quite short)

Signed-off-by: Dmitry Kovalev <[email protected]>
@dk-github dk-github self-assigned this Dec 27, 2020
dk-github added a commit to dk-github/janusgraph that referenced this issue Dec 29, 2020
…page storage, to allow for Entry keys of arbitrary length, while not incurring unnecessary overhead (as 99% of keys are typically quite short)

Signed-off-by: Dmitry Kovalev <[email protected]>
@li-boxuan li-boxuan added this to the Release v0.6.0 milestone Jan 3, 2021
dk-github added a commit to dk-github/janusgraph that referenced this issue Jan 7, 2021
…page storage, to allow for Entry keys of arbitrary length, while not incurring unnecessary overhead (as 99% of keys are typically quite short)

Signed-off-by: Dmitry Kovalev <[email protected]>
dk-github added a commit to dk-github/janusgraph that referenced this issue Jan 8, 2021
…page storage, to allow for Entry keys of arbitrary length, while not incurring unnecessary overhead (as 99% of keys are typically quite short)

Signed-off-by: Dmitry Kovalev <[email protected]>
dk-github added a commit that referenced this issue Jan 8, 2021
…ge, to allow for Entry keys of arbitrary length, while not incurring unnecessary overhead (as 99% of keys are typically quite short) (#2294)

Signed-off-by: Dmitry Kovalev <[email protected]>
@porunov
Copy link
Member

porunov commented Jan 8, 2021

Fixed in #2294

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants