Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trustchain scalability and stess testing experiment #4140

Closed
synctext opened this issue Jan 10, 2019 · 12 comments
Closed

Trustchain scalability and stess testing experiment #4140

synctext opened this issue Jan 10, 2019 · 12 comments

Comments

@synctext
Copy link
Member

synctext commented Jan 10, 2019

Having a Trustchain database with infinite growth is a problem.

task: build a Trustchain block collector and automatic generator

Trustchain is becoming a key performance bottleneck, as our network is growing to 20k concurrent users. We are experiencing the limits of our performance. Overlap with prior issue: #3861

Solution: dedicated experiment with 1 real full Tribler instance and an emulated "mocked" network of thousands or even over a million peers. All communication with the network will result in a response from a single generator that generates IPv8-based and Trustchain-compliant return traffic. The generator of an infinite amount of faked Trustchain records is then used for performance analysis. These Trustchain records have valid binary content, but the data is fake. The Tribler Trustchain community can be tested and we will able to conduct a performance analysis.

Desired outcome Repeatable performance numbers for a nightly job and performance regression analysis. For instance, after collecting 1 million Trustchain records the read performance for a random Trustchain lookup becomes 500 ms and writing a newly discovered Trustchain record becomes 300 ms.

Background reading: thesis using the same technique plus .PDF file
Source code

@devos50 devos50 added this to the Backlog milestone Jan 12, 2019
@grimadas
Copy link
Contributor

grimadas commented Feb 5, 2019

Please assign me to this task

@qstokkink

This comment has been minimized.

@grimadas
Copy link
Contributor

grimadas commented Feb 8, 2019

Progress so far:

  • I run the crawler twisted
  • Have local code running

image

Plans for the next week:

  • Single mocked interface with generated Trustchain blocks

@synctext
Copy link
Member Author

synctext commented Feb 13, 2019

Latest numbers: 1 million peers databases, 10 trustchain records per peer and 21 KByte per Trustchain record = 210 GByte. Move it to Gumby+Jenkins possibly, see existing trustchain crawler?

@grimadas
Copy link
Contributor

Update:

The number i mentioned before is the size of a block as a Python object - 21 KB.
The half-block in serialised format takes only 290 bytes.

Database testing

I tested on the scalability of the database (sqlite).
Whole database is stored in two files WAL and db.
All transactions first go through write-ahead log, that grows linearly with 60 KB per transaction(see figure).
image

After collecting 9 MB sqlite initiates vacuum and flushes to db file.
The db file is growing linearly with with each block adding 500 bytes.

image

image

image

@grimadas
Copy link
Contributor

We can now try our stress testing and scalability on DAS5 and Jenkins.

For example, 9 peers sending 100 blocks per sec to one peer(let's say leader node).
https://jenkins-ci.tribler.org/job/pers/job/validation_experiment_trustchain_bulat/44/
See attached histogram of blocks arrived at leader node.
image
image

@devos50
Copy link
Contributor

devos50 commented Feb 21, 2019

Cool!

@synctext
Copy link
Member Author

#4471 Overlap.
Possible new 1st year thesis direction: heavy benchmarking of Trustchain and compare to Holochain, blockbench, and the seminal PeerReview publication. Input for a Usenix submission. We have freeriding detection that scales.

@synctext
Copy link
Member Author

Storing Trustchain records is rational and incentive compatible: prevent being defrauded by malicious actors.

@synctext
Copy link
Member Author

synctext commented Sep 16, 2019

Update....We need performance understanding, "fault attribution", and "fault resolution". Fault is a neutral term which covers both malicious and non-malicious (e.g. Byzantine failure) violations of the protocol, possibly resulting in double spending.
Scalability of fault detection, fault attribution and fault resolution is a worthy thesis chapter. Heavy benchmarking of Trustchain before X-Mas 2019 is required for Usenix.

fault attribution : presenting evidence that could be used to umambiguously convince any observer which actor caused the protocol fault

@devos50
Copy link
Contributor

devos50 commented Sep 16, 2019

I already have some experiments around extensive double-spend detection, which shows that if everyone crawls the network and requests chains of others, even a single double-spend can be detected within seconds. I included this graph in the (rejected) workshop submission of the market paper. The experiment should still be around on Github.

@qstokkink
Copy link
Contributor

Trustchain has been removed and, therefore, this issue is no longer relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants