Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min #5836

Open
johnny-smitherson opened this issue Mar 12, 2024 · 7 comments
Labels
type/question Type: question about the product

Comments

@johnny-smitherson
Copy link

johnny-smitherson commented Mar 12, 2024

Describe the bug (required)

I'm seeing very slow restarts of the storaged containers after loading the cluster with all the data from the studio examples, total 500k vertex and edges.

Data folders for all storaged containers is 100MB per container.

I am using WSL2 on Windows 10 platform with Docker Desktop to run the 3-container docker-compose setup from here. The health checks take more than 30min to become green for only 300MB of total data.

The storage is hosted on single NVME SSD.

I did try BALANCE LEADER - load time stays at 30min.

Throught the reload, CPU use stays under 20% per container, and I/O use is minimal.

2024-03-12 19:34:09 I20240312 17:34:09.054407     1 NebulaStore.cpp:117] Load space 1 from disk
2024-03-12 19:34:09 I20240312 17:34:09.054708     1 NebulaStore.cpp:117] Load space 2 from disk
.... 30min later
2024-03-12 19:55:48 I20240312 17:55:48.176484     1 NebulaStore.cpp:286] Load space 26 complete
...
2024-03-12 19:57:48 I20240312 17:57:48.083451     1 NebulaStore.cpp:451] [Space: 1, Part: 1] has existed!
...
2024-03-12 19:57:48 I20240312 17:57:48.577541   289 MetaClient.cpp:3269] Load leader ok

Your Environments (required)

  • OS: Linux hostname 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu Oct 5 21:02:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Compiler: ??? (from docker nebula v3.6.0)
  • CPU: 12th Gen Intel(R) Core(TM) i5
  • Commit id - docker nebula v3.6.0

How To Reproduce(required)

Steps to reproduce the behavior:

  1. Install Windows 10 or 11 + WSL2 + Docker Desktop
  2. Install nenbula-docker-compose with 3x (storaged, metad, graphd) containers, and Nebula Studio
  3. Download all the sample spaces from Studio
  4. Run BALANCE LEADER on all the spaces
  5. Shut down all docker containers.
  6. Restart all docker containers.
  7. Measure waiting time from the first Load space 1 from disk to the last Load leader ok message in the logs, for me it's >30min

Expected behavior

Loading should be faster than 300mb/30min = 170kb/s - still, is this to be expected?

Additional context

I will install Linux on host and re-do the experiment.

Other issues:

Data usage report:

$ du -hd1 lib_docker/nebula/data/
22M     lib_docker/nebula/data/meta0
38M     lib_docker/nebula/data/meta1
13M     lib_docker/nebula/data/meta2
95M     lib_docker/nebula/data/storage0
94M     lib_docker/nebula/data/storage1
100M    lib_docker/nebula/data/storage2
359M    lib_docker/nebula/data/

File count report:

$ find  lib_docker/nebula/data/ -type f | wc -l
649

Downloaded spaces description

CREATE SPACE `demo_sns` (partition_num = 3, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32)) comment = 'https://www.siwei.io/nebulagraph-sns/'

CREATE SPACE `demo_movie_recommendation` (partition_num = 3, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32)) comment = 'https://www.siwei.io/recommendation-system-with-graphdb/'

CREATE SPACE `demo_football_2022` (partition_num = 3, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32)) comment = 'https://www.siwei.io/chatgpt-and-nebulagraph-predict-fifa-world-cup/'

Biggest space stats

    Type         Name   Count
0    Tag        genre    1650
1    Tag        movie   20701
2    Tag       person   24831
3    Tag         user     584
4   Edge     acted_by   43399
5   Edge  directed_by   32645
6   Edge      watched   20251
7   Edge   with_genre   63705
8  Space     vertices   47766
9  Space        edges  160000

Questions

  • Could the initial load be configured to happen in parallel, on 5-10 threads per container, for example?
  • Can I make some configuration changes (starting from the default docker compose configs) to speed this up?
  • May this be a problem only happening on the Windows10 + WSL2 + Docker Desktop platform?
@johnny-smitherson johnny-smitherson added the type/bug Type: something is unexpected label Mar 12, 2024
@github-actions github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Mar 12, 2024
@QingZ11 QingZ11 added type/question Type: question about the product and removed type/bug Type: something is unexpected severity/none Severity of bug affects/none PR/issue: this bug affects none version. labels Mar 13, 2024
@QingZ11
Copy link
Contributor

QingZ11 commented Mar 13, 2024

Let me start with my situation: I'm using an M1 macOS machine with 16GB of RAM, and as you know, it's an SSD device. My dataset is quite small, with only a few hundred nodes and over ten thousand edges. As for the startup time, it takes just a few seconds, less than five seconds.

As for why your startup time is so long, you might want to see what others have to say about it.

@johnny-smitherson johnny-smitherson changed the title booting storaged with 300MB of data takes more than 20min [Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min Mar 13, 2024
@johnny-smitherson
Copy link
Author

Confirm what is said above. On linux debian, filesystem ext4 single nvme disk, with exactly same data volumes, partition loading takes <1s for all sample data spaces.

Next I will try to run Hyper-V VM to see if that works ok or same issue exists.

@QingZ11
Copy link
Contributor

QingZ11 commented Mar 14, 2024

Thank you, Johnny, for the testing. Looking forward to your results. Appreciate your sharing this!

@johnny-smitherson
Copy link
Author

johnny-smitherson commented Mar 18, 2024

  • Tried disable windows defender antivirus, no fix
  • tried fresh install of windows 11 + WSL 2 + docker desktop, no fix
  • could not get runing hyper-v manager VMs
  • linux VM under Virtualbox works GREAT, i see correct starting time (under 1s) and correct query performance (on WSL2, MATCH (n) limit 1000 takes 1s, on virtualbox it's under 50ms)

With lack of further understanding, when on Windows, just use Virtualbox and set up VM port forwarding to reach graphd1,2,3

@QingZ11 maybe some project devs have access to Windows10/11+WSL2+Docker and can run profiling/tracing on the storaged booting up and loading parts for the larger demo samples?

@wey-gu
Copy link
Contributor

wey-gu commented Mar 22, 2024

Sorry @johnny-smitherson for the bad IO perf on WSL2 Docker Desktop.

This seems to be a long-open issue for the stack 😢.

docker/for-win#12401

@johnny-smitherson
Copy link
Author

johnny-smitherson commented Mar 22, 2024

ah i see, thanks for the link!

might be worth it to put this situation into the FAQ under "resources" https://docs.nebula-graph.io/3.0.2/4.deployment-and-installation/1.resource-preparations/ - while it's obvious that only Linux is supported, people will still try to use WSL2 as replacement for linux

I had no other functional errors using the platform, just the 20x-100x performance degradation - so you could mention wsl2 as unsupported testing platform (as long as you don't have any significant amount of data)

@QingZ11
Copy link
Contributor

QingZ11 commented Mar 25, 2024

Thank you for Johnny's suggestion. I will communicate it with the documentation team to see if need to add a FAQ section. Once again, thank you for your testing work; it's very beneficial to us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question Type: question about the product
Projects
None yet
Development

No branches or pull requests

3 participants