Change DataCache.Add() to never allow overwriting existing values #1177

ixje · 2019-10-22T10:31:38Z

neo/IO/Caching/DataCache.cs

codecov-io · 2019-10-22T12:47:05Z

Codecov Report

❗ No coverage uploaded for pull request base (master@0f75741). Click here to learn what that means.
The diff coverage is 92.3%.

@@            Coverage Diff            @@
##             master    #1177   +/-   ##
=========================================
  Coverage          ?   65.15%           
=========================================
  Files             ?      199           
  Lines             ?    13605           
  Branches          ?        0           
=========================================
  Hits              ?     8864           
  Misses            ?     4741           
  Partials          ?        0

Impacted Files	Coverage Δ
neo/SmartContract/Native/PolicyContract.cs	`94.23% <100%> (ø)`
neo/IO/Caching/DataCache.cs	`99.53% <88.88%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f75741...105f7e9. Read the comment docs.

erikzhang · 2019-10-22T13:34:57Z

neo/SmartContract/Native/PolicyContract.cs

@@ -46,19 +46,19 @@ private bool CheckValidators(ApplicationEngine engine)
        internal override bool Initialize(ApplicationEngine engine)
        {
            if (!base.Initialize(engine)) return false;
-            engine.Snapshot.Storages.Add(CreateStorageKey(Prefix_MaxBlockSize), new StorageItem
+            engine.Snapshot.Storages.GetAndChange(CreateStorageKey(Prefix_MaxBlockSize), () => new StorageItem


Why change to GetAndChange()?

the original code relied on the overwriting behaviour of the old Add() implementation. This now causes unit test failures because the value already exists and an exception is thrown. By using GetAndChange we basically reproduce the old Add() behaviour.

Alternatively I can insert some code like the following in all the affected tests to clear the backend data.

foreach (KeyValuePair<StorageKey, StorageItem> pair in snapshot.Storages.Find()) { snapshot.Storages.DeleteInternal(pair.Key); }

I prefer to fix the unit test

ixje · 2019-10-22T16:43:17Z

I need help in fixing the following test

neo/neo.UnitTests/UT_Culture.cs

Line 17 in e53f533

public void All_Tests_Cultures()

It fails on e.g. UT_GasToken, but if I run that test separately it passes fine so I don't know what this specific test is doing to break it.

shargon · 2019-10-22T17:50:18Z

It's a problem of isolation, if some test alter the storages, and it doesn't clean this changes, it could affect the other test.. until you solve the other test, you can skip this class from Cultures like this:

Add the NotReRunnable attribute

ixje · 2019-10-23T07:11:16Z

It's a problem of isolation, if some test alter the storages, and it doesn't clean this changes, it could affect the other test.. until you solve the other test, you can skip this class from Cultures like this:

If I read something like

[TestInitialize]
public void TestSetup()
{
   TestBlockchain.InitializeMockNeoSystem();
   Store = TestBlockchain.GetStore();
}

Then I expect Store to be a fresh copy not affected by any previous test. Unfortunately this is not the case as concluded from running All_Tests_Cultures.

I clearly do not know enough about C# unit testing because I can't determine what test to fix if all tests pass when running individually (see my previous screenshot). I tried adding NotReRunnable to all classes that it initially flags as failures when I do "Run All" (tests) but no luck.

I'm thinking somebody needs to fix the global isolation problem of the test store.

shargon · 2019-10-23T09:19:00Z

I'm thinking somebody needs to fix the global isolation problem of the test store.

Yes, It is a big problem

ixje · 2019-11-26T17:37:47Z

Why is this closed? I believe the fix for Add() is still valid. There just wasn't any activity because there was a global issue with the TestBlockchain. If that's solved I can revisit the tests.

erikzhang · 2019-12-06T07:15:26Z

This fix is good. But it make Add() becomes slow.

vncoelho · 2019-12-07T00:31:14Z

@ixje, can you give a title to this FIX? aheuahuea, something to remember the idea behind it.

Another thing, could you take a look at #1318? During that tests we noticed some things strange (but I am also not an expert as @erikzhang to use all tools), however, it was noticed that:

After modifying storage with GetAndChange it stays modified;
Even after Snapshot.Commit()
The idea was to Dispose and Get the updated Snapshot from Blockchain.Singleton.GetSnapshot()
The idea worked nice, however, when the Snapshot would be modified locally and cleaned for the next state?

Thus, It mad me fell that some improvements on Cache might still be possible.
In this sense, I would like to understand better what you are fixing and also thank you for your efforts in doing that!

@shargon, thank you for the tips and examples during that tests on Consensus, helping with the manipulation of Cache.

ixje · 2019-12-09T08:11:57Z

Another thing, could you take a look at #1318?

@vncoelho is this question still relevant as I see it's already merged?

During that tests we noticed some things strange (but I am also not an expert as @erikzhang to use all tools), however, it was noticed that:

After modifying storage with GetAndChange it stays modified;

Even after Snapshot.Commit()

I think that's expected. You create a snapshot, you get a value, you modify it. Now for as long as you keep requesting data from that snapshot it should get your modified state. If you create a second snapshot before calling commit on the first, then the value shouldn't be changed in the second snapshot, also not after committing the first snapshot. Note that this behaviour is specific to LevelDB. Committing with say a Postgresql backend would update snapshot 2 as there a snapshot is connection based.

Snapshot.Commit() writes to the real backend (which does a LevelDB.WriteBatch.write()) and makes it permanent.

The idea worked nice, however, when the Snapshot would be modified locally and cleaned for the next state?

Each call to GetSnapshot() creates a new snapshot with new local DataCaches. At least that is how it worked before #1249 , I haven't analyzed those changes yet.

vncoelho · 2019-12-09T21:28:37Z

@vncoelho is this question still relevant as I see it's already merged?

I think it is relevant, @ixje, because I did a workaround there.

Thanks for your explanation:

"(...) then the value shouldn't be changed in the second snapshot, also not after committing the first snapshot. (...)"

Thanks is strange, neither after committing the first?
In the case of that aforementioned UTs, we are:

Using Add to add Next_Validators (specifying a specific Multi-Sig for the Consensus) using a Snapshot obtained from Blockchain.Singleton
After commit even Blockchain.Singleton is being modified, thus, we need to revert, otherwise, the remaining tests will behave differently then expected, because we set random Validators for each test.
Thus, we update the consensus.context Snapshot by Disposing and seting it to Blockchain.Singleton.GetSnapshot() again.
Then, we delete the Next_Validators field and commit again.
We are ready for the other standard tests

In summary, I think, as you said, this is the expected, right? Thanks for your attention.

vncoelho · 2019-12-09T21:29:27Z

It fails on e.g. UT_GasToken, but if I run that test separately it passes fine so I don't know what this specific test is doing to break it.

Do you still suffer from this problem? I possible known how to fix this.

vncoelho · 2019-12-09T21:32:08Z

I'm thinking somebody needs to fix the global isolation problem of the test store.

It is possible to use like this @ixje and @shargon, we just need to ensure that every tests is moving the state back to the original standard.

The template of #1318 is good to be followed.

vncoelho · 2019-12-09T22:03:20Z

I rebase with master for fixing, @ixje.

If it is possible, it would be better if you open direct as a branch here.

vncoelho · 2019-12-09T22:15:21Z

@shargon @ixje @erikzhang

Perhaps we need to apply the same logic of Ut_Consensus to all these other UTs that use snapshot.commit(), reverting the Storage back at the end of the tests.

Check #1343

vncoelho · 2019-12-09T22:22:33Z

@ixje, the last error was here:

  X TestUpdatePoolForBlockPersisted [2ms]
  Error Message:
   Test method Neo.UnitTests.Ledger.UT_MemoryPool.TestUpdatePoolForBlockPersisted threw exception: 
System.ArgumentException: Value does not fall within the expected range.
  Stack Trace:
      at Neo.IO.Caching.DataCache`2.Add(TKey key, TValue value) in /home/runner/work/neo/neo/src/neo/IO/Caching/DataCache.cs:line 58
   at Neo.UnitTests.Ledger.UT_MemoryPool.TestUpdatePoolForBlockPersisted() in /home/runner/work/neo/neo/tests/neo.UnitTests/Ledger/UT_MemoryPool.cs:line 490

        public void TestUpdatePoolForBlockPersisted()
        {
            var snapshot = Blockchain.Singleton.GetSnapshot();
            byte[] transactionsPerBlock = { 0x18, 0x00, 0x00, 0x00 }; // 24
            byte[] feePerByte = { 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, 0x00, 0x00 }; // 1048576
            StorageItem item1 = new StorageItem
            {
                Value = transactionsPerBlock
            };
            StorageItem item2 = new StorageItem
            {
                Value = feePerByte
            };
            var key1 = CreateStorageKey(Prefix_MaxTransactionsPerBlock);
            var key2 = CreateStorageKey(Prefix_FeePerByte);
            key1.ScriptHash = NativeContract.Policy.Hash;
            key2.ScriptHash = NativeContract.Policy.Hash;
            snapshot.Storages.Add(key1, item1);
            snapshot.Storages.Add(key2, item2);

Perhaps, since the Key already exists we need to use GetAndChange now, yes?

Check 29cebaf

vncoelho · 2019-12-09T22:35:33Z

Another error is on CheckPolicy.

In fact, the value really already exists, thus, it needs GetAndChange instead of Add as well

        public void TestCheckPolicy()
        {
            Transaction tx = Blockchain.GenesisBlock.Transactions[0];
            var snapshot = Blockchain.Singleton.GetSnapshot();

            StorageKey storageKey = new StorageKey
            {
                ScriptHash = NativeContract.Policy.Hash,
                Key = new byte[sizeof(byte)]
            };
            storageKey.Key[0] = 15;
            var sITest = snapshot.Storages.TryGet(storageKey);
            Console.WriteLine($"sITest {sITest.Value.ToScriptHash()}");

            snapshot.Storages.GetAndChange(storageKey, () => new StorageItem
            {
                Value = new UInt160[] { tx.Sender }.ToByteArray()
            });

    /*
            snapshot.Storages.Add(storageKey, new StorageItem
            {
                Value = new UInt160[] { tx.Sender }.ToByteArray()
            });
*/
            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeFalse();

            snapshot = Blockchain.Singleton.GetSnapshot();
            NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeTrue();
        }

However, the first assert is returning True

    Expected NativeContract.Policy.CheckPolicy(tx, snapshot) to be false, but found True.

vncoelho · 2019-12-09T22:42:04Z

In fact, it looks like that this test was wrong, I do not know why it would return False since it just checks BlockedAccounts?

Any idea, @erikzhang @shargon ?

vncoelho · 2019-12-09T22:53:04Z

tests/neo.UnitTests/SmartContract/Native/UT_PolicyContract.cs

@@ -231,21 +235,6 @@ public void TestCheckPolicy()
        {
            Transaction tx = Blockchain.GenesisBlock.Transactions[0];
            var snapshot = Blockchain.Singleton.GetSnapshot();
-


@erikzhang @eryeer @shargon, double check this because perhaps this test was wrong.

How about change it like this.

[TestMethod] public void TestCheckPolicy() { Transaction tx = Blockchain.GenesisBlock.Transactions[0]; var snapshot = Blockchain.Singleton.GetSnapshot(); StorageKey storageKey = new StorageKey { ScriptHash = NativeContract.Policy.Hash, Key = new byte[sizeof(byte)] }; storageKey.Key[0] = 15; var item = snapshot.Storages.GetAndChange(storageKey); item.Value = new UInt160[] { tx.Sender }.ToByteArray(); NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeFalse(); snapshot = Blockchain.Singleton.GetSnapshot(); NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeTrue(); }

I tried, it says True instead of False, that is why I believe the test was wrong. Why was it False in any situation?

Using this code, the test will pass. We need to change the return value of snapshot.Storages.GetAndChange(storageKey) instead of passing a StorageItem creating function into it. Because the initalization will add a storageItem into storage, trackable.Item won't be null.

Anhhh, I see, but it would be better to DeleteKey, no?

storageKey.Key[0] = 15; this key is the blockedAccount key

neo/src/neo/SmartContract/Native/PolicyContract.cs

Line 24 in 0f75741

private const byte Prefix_BlockedAccounts = 15;

Because it is private, we can only use the 15 instead of Prefix_BlockedAccounts

I partially understood! Thanks for the explanation, @eryeer.

However, I still need to figure out this part:

We need to change the return value of snapshot.Storages.GetAndChange(storageKey) instead of passing a StorageItem creating function into it. Because the initalization will add a storageItem into storage, trackable.Item won't be null.

😄 Here is the initalization.

neo/src/neo/SmartContract/Native/PolicyContract.cs

Lines 62 to 65 in 0f75741

engine.Snapshot.Storages.Add(CreateStorageKey(Prefix_BlockedAccounts), new StorageItem

{

Value = new UInt160[0].ToByteArray()

});

So whenever we get the BlockedAccounts from Internal storage, the value will be new UInt160[0].ToByteArray() by default. We need to change the defult value instead of create a new one to add to storage.

I think that I understood.

This would also work:

[TestMethod] public void TestCheckPolicy() { Transaction tx = Blockchain.GenesisBlock.Transactions[0]; var snapshot = Blockchain.Singleton.GetSnapshot(); StorageKey storageKey = new StorageKey { ScriptHash = NativeContract.Policy.Hash, Key = new byte[sizeof(byte)] }; storageKey.Key[0] = 15; //var item = snapshot.Storages.GetAndChange(storageKey); //item.Value = new UInt160[] { tx.Sender }.ToByteArray(); snapshot.Storages.Delete(storageKey); snapshot.Storages.GetAndChange(storageKey, () => new StorageItem { Value = new UInt160[] { tx.Sender }.ToByteArray() }); snapshot.Commit(); NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeFalse(); snapshot = Blockchain.Singleton.GetSnapshot(); snapshot.Storages.Delete(storageKey); snapshot.Storages.GetAndChange(storageKey, () => new StorageItem { Value = new UInt160[0].ToByteArray() }); snapshot.Commit(); NativeContract.Policy.CheckPolicy(tx, snapshot).Should().BeTrue(); }

Oh yes, congratulations!

vncoelho · 2019-12-09T22:54:22Z

tests/neo.UnitTests/Ledger/UT_MemoryPool.cs

@@ -487,8 +487,14 @@ public void TestUpdatePoolForBlockPersisted()
            var key2 = CreateStorageKey(Prefix_FeePerByte);
            key1.ScriptHash = NativeContract.Policy.Hash;
            key2.ScriptHash = NativeContract.Policy.Hash;
-            snapshot.Storages.Add(key1, item1);
-            snapshot.Storages.Add(key2, item2);
+            snapshot.Storages.GetAndChange(key1, () => new StorageItem


@ixje, this change is expected, right? Since the value was previously existing.

@vncoelho I don't remember, but it is most likely caused by the TestBlockchain isolation problem. I think the best way forward is to have that fixed first, reset all changes to the unit tests and try again. Some of the changes (like here) have likely been applied because of the isolation problem and might not be needed anymore once that is fixed

AHeauehuaea, @ixje, this change was done by me, please check the last commits.

The question was to be sure this was consistent.

All UTs were fixed, the PR is kind of ready to merge.

Changed already done. Re-review requested.

erikzhang · 2019-12-10T06:43:56Z

src/neo/IO/Caching/DataCache.cs

+                if (dictionary.TryGetValue(key, out Trackable trackable))
+                {
+                    if (trackable.State != TrackState.Deleted)
+                        throw new ArgumentException();
+                }
+                else
+                {
+                    if (TryGetInternal(key) != null)
+                        throw new ArgumentException();
+                }


When we are using Add(), we assume that the data does not exist. If we can't assume that, we should use GetAndChange(). So, since we are using Add(), there is no need to add another data read operation.

The idea behind this PR (and the above code change) is to create consistent behaviour of the cache by not just validating that the value exists in cache, but to also validate that it does not exist in the real backend. You previously choose this exact solution (#1173 (comment)) instead of always overriding the data (described in the linked comment as option 1).

If we don't choose either option 1 or 2 we will have inconsistent behaviour as described in #1173

I don't mind which of the 2 we choose, but right now the feedback looks like you don't want to change anything?! 😕 I think it is wrong to assume that data does not exist and I think you agreed with that when you wrote the code initially, otherwise this check would not exists

if (trackable.State != TrackState.Deleted) throw new ArgumentException();

The overhead of this check is very small, so just do it by the way.

I did not understand the point as well, @erikzhang.

The implementation here seems to be option 2 of that aforementioned issue.
Do you want to leave the code as in the original implementation?

I don't want to change it. The purpose of Add() is to add records when they do not exist, and save the cost of a read operation. If you are not sure that the record exists or not, you should use GetAndChange().

I see, @erikzhang.
It looks like to be more efficient in the original code, yes? In the sense of avoiding a second check with if (TryGetInternal(key) != null).

@ixje, if that is the case, perhaps it is better to close PR and issue and open another one for commenting the code properly in order to avoid future misinterpretations.

So everybody basically wasted their time in creating the PR, reviewing it and fixing unit tests (still thanks for that @vncoelho) because 1.5 months after opening the PR and the initial review we're suddenly trying to save performance on a db operation that takes < 10 microseconds (according to leveldb benchmarks for random reads) while elsewhere in the code base millisecond delays are accepted.

Don't get me wrong here. I'm all cool with keeping performance in mind for reaching the highest TPS, but how are we supposed to make a sound decision what makes sense to spend time on when performance criteria seems randomly applied?

@ixje, I really apologize for that.

I really did not have enough knowledge for checking it before.
In fact, I would like to thank you for your dedication. In addition, your insights helped me to investigate further.

Perhaps, there are methods that are not being used from the DataCache, such as GetOrAdd.
We need to solve these questions and improve what needs to be improved.

ixje · 2019-12-10T07:39:18Z

"(...) then the value shouldn't be changed in the second snapshot, also not after committing the first snapshot. (...)"

Thanks is strange, neither after committing the first?

For LevelDB not. LevelDB makes a 100% separate copy and this copy will not listen anymore to new changes happening to the real DB. Here's what it looks like for LevelDB

and this is for Postgresql

Note how for postgresql there is a syncing between the real backend and any outstanding snapshots (=a network connection/session in their case).

I think the big problem with the tests is that this TestBlockchain is using the same Blockchain.singleton everytime. It needs to be a new, clean one for each call.

vncoelho · 2019-12-10T08:48:40Z

Such a nice explanation and figure, @ixje. Did you create that?

ixje · 2019-12-10T10:34:45Z

Such a nice explanation and figure, @ixje. Did you create that?

Yes, with the free online https://www.draw.io/ platform

shargon · 2019-12-10T11:17:28Z

src/neo/SmartContract/Native/PolicyContract.cs

@@ -47,19 +47,19 @@ private bool CheckValidators(ApplicationEngine engine)
        internal override bool Initialize(ApplicationEngine engine)
        {
            if (!base.Initialize(engine)) return false;
-            engine.Snapshot.Storages.Add(CreateStorageKey(Prefix_MaxBlockSize), new StorageItem
+            engine.Snapshot.Storages.GetAndChange(CreateStorageKey(Prefix_MaxBlockSize), () => new StorageItem


This should be the same, if we already know that the item doesn't exists, why we need to call GetAndChange?

ixje · 2019-12-10T11:20:05Z

closing according to #1177 (comment)

ixje changed the title ~~Change Add() to never allow overwriting existing values~~ Fix 1173 Oct 22, 2019

erikzhang previously requested changes Oct 22, 2019

View reviewed changes

neo/IO/Caching/DataCache.cs Outdated Show resolved Hide resolved

erikzhang reviewed Oct 22, 2019

View reviewed changes

erikzhang closed this Nov 25, 2019

erikzhang reopened this Dec 6, 2019

ixje changed the title ~~Fix 1173~~ Change DataCache.Add() to never allow overwriting existing values Dec 9, 2019

ixje added 4 commits December 9, 2019 19:00

Change Add() to never allow overwriting existing values

6ee22a8

fix lint error

11bd6a7

take Deleted -> Added scenario into account

be31a2a

fixes for unittests

20c6e40

Fixing mempool UT to GetAndChange

29cebaf

Removing wrong test on TestCheckPolicy of Blocked Accounts

115372a

vncoelho reviewed Dec 9, 2019

View reviewed changes

vncoelho approved these changes Dec 9, 2019

View reviewed changes

vncoelho requested review from erikzhang, shargon and eryeer December 9, 2019 22:54

erikzhang requested changes Dec 10, 2019

View reviewed changes

Merge branch 'master' into fix1173

34a8ae0

ixje mentioned this pull request Dec 10, 2019

Adding some comments to datacache #1347

Merged

Merge branch 'master' into fix1173

105f7e9

shargon reviewed Dec 10, 2019

View reviewed changes

ixje closed this Dec 10, 2019

	engine.Snapshot.Storages.Add(CreateStorageKey(Prefix_BlockedAccounts), new StorageItem
	{
	Value = new UInt160[0].ToByteArray()
	});

Change DataCache.Add() to never allow overwriting existing values #1177

Change DataCache.Add() to never allow overwriting existing values #1177

Conversation

ixje commented Oct 22, 2019

codecov-io commented Oct 22, 2019 • edited by codecov bot Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ixje commented Oct 22, 2019

shargon commented Oct 22, 2019

ixje commented Oct 23, 2019

shargon commented Oct 23, 2019

ixje commented Nov 26, 2019

erikzhang commented Dec 6, 2019

vncoelho commented Dec 7, 2019 • edited Loading

ixje commented Dec 9, 2019 • edited Loading

vncoelho commented Dec 9, 2019 • edited Loading

vncoelho commented Dec 9, 2019

vncoelho commented Dec 9, 2019

vncoelho commented Dec 9, 2019

vncoelho commented Dec 9, 2019 • edited Loading

vncoelho commented Dec 9, 2019 • edited Loading

vncoelho commented Dec 9, 2019

vncoelho commented Dec 9, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eryeer Dec 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eryeer Dec 10, 2019 • edited Loading

Choose a reason for hiding this comment

vncoelho Dec 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ixje Dec 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ixje Dec 10, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ixje commented Dec 10, 2019 • edited Loading

vncoelho commented Dec 10, 2019

ixje commented Dec 10, 2019

Choose a reason for hiding this comment

ixje commented Dec 10, 2019

codecov-io commented Oct 22, 2019 •

edited by codecov bot

Loading

vncoelho commented Dec 7, 2019 •

edited

Loading

ixje commented Dec 9, 2019 •

edited

Loading

vncoelho commented Dec 9, 2019 •

edited

Loading

vncoelho commented Dec 9, 2019 •

edited

Loading

vncoelho commented Dec 9, 2019 •

edited

Loading

eryeer Dec 10, 2019 •

edited

Loading

eryeer Dec 10, 2019 •

edited

Loading

vncoelho Dec 10, 2019 •

edited

Loading

ixje Dec 10, 2019 •

edited

Loading

ixje Dec 10, 2019 •

edited

Loading

ixje commented Dec 10, 2019 •

edited

Loading