[Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min #5836

johnny-smitherson · 2024-03-12T18:12:42Z

Describe the bug (required)

I'm seeing very slow restarts of the storaged containers after loading the cluster with all the data from the studio examples, total 500k vertex and edges.

Data folders for all storaged containers is 100MB per container.

I am using WSL2 on Windows 10 platform with Docker Desktop to run the 3-container docker-compose setup from here. The health checks take more than 30min to become green for only 300MB of total data.

The storage is hosted on single NVME SSD.

I did try BALANCE LEADER - load time stays at 30min.

Throught the reload, CPU use stays under 20% per container, and I/O use is minimal.

2024-03-12 19:34:09 I20240312 17:34:09.054407     1 NebulaStore.cpp:117] Load space 1 from disk
2024-03-12 19:34:09 I20240312 17:34:09.054708     1 NebulaStore.cpp:117] Load space 2 from disk
.... 30min later
2024-03-12 19:55:48 I20240312 17:55:48.176484     1 NebulaStore.cpp:286] Load space 26 complete
...
2024-03-12 19:57:48 I20240312 17:57:48.083451     1 NebulaStore.cpp:451] [Space: 1, Part: 1] has existed!
...
2024-03-12 19:57:48 I20240312 17:57:48.577541   289 MetaClient.cpp:3269] Load leader ok

Your Environments (required)

OS: Linux hostname 5.15.133.1-microsoft-standard-WSL2 #1 SMP Thu Oct 5 21:02:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Compiler: ??? (from docker nebula v3.6.0)
CPU: 12th Gen Intel(R) Core(TM) i5
Commit id - docker nebula v3.6.0

How To Reproduce(required)

Steps to reproduce the behavior:

Install Windows 10 or 11 + WSL2 + Docker Desktop
Install nenbula-docker-compose with 3x (storaged, metad, graphd) containers, and Nebula Studio
Download all the sample spaces from Studio
Run BALANCE LEADER on all the spaces
Shut down all docker containers.
Restart all docker containers.
Measure waiting time from the first Load space 1 from disk to the last Load leader ok message in the logs, for me it's >30min

Expected behavior

Loading should be faster than 300mb/30min = 170kb/s - still, is this to be expected?

Additional context

I will install Linux on host and re-do the experiment.

Other issues:

complains about 3h of loading parts: All storages is offline after restart nebula services #5398 (comment)

Data usage report:

$ du -hd1 lib_docker/nebula/data/
22M     lib_docker/nebula/data/meta0
38M     lib_docker/nebula/data/meta1
13M     lib_docker/nebula/data/meta2
95M     lib_docker/nebula/data/storage0
94M     lib_docker/nebula/data/storage1
100M    lib_docker/nebula/data/storage2
359M    lib_docker/nebula/data/

File count report:

$ find  lib_docker/nebula/data/ -type f | wc -l
649

Downloaded spaces description

CREATE SPACE `demo_sns` (partition_num = 3, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32)) comment = 'https://www.siwei.io/nebulagraph-sns/'

CREATE SPACE `demo_movie_recommendation` (partition_num = 3, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32)) comment = 'https://www.siwei.io/recommendation-system-with-graphdb/'

CREATE SPACE `demo_football_2022` (partition_num = 3, replica_factor = 1, charset = utf8, collate = utf8_bin, vid_type = FIXED_STRING(32)) comment = 'https://www.siwei.io/chatgpt-and-nebulagraph-predict-fifa-world-cup/'

Biggest space stats

    Type         Name   Count
0    Tag        genre    1650
1    Tag        movie   20701
2    Tag       person   24831
3    Tag         user     584
4   Edge     acted_by   43399
5   Edge  directed_by   32645
6   Edge      watched   20251
7   Edge   with_genre   63705
8  Space     vertices   47766
9  Space        edges  160000

Questions

Could the initial load be configured to happen in parallel, on 5-10 threads per container, for example?
Can I make some configuration changes (starting from the default docker compose configs) to speed this up?
May this be a problem only happening on the Windows10 + WSL2 + Docker Desktop platform?

The text was updated successfully, but these errors were encountered:

QingZ11 · 2024-03-13T02:32:17Z

Let me start with my situation: I'm using an M1 macOS machine with 16GB of RAM, and as you know, it's an SSD device. My dataset is quite small, with only a few hundred nodes and over ten thousand edges. As for the startup time, it takes just a few seconds, less than five seconds.

As for why your startup time is so long, you might want to see what others have to say about it.

johnny-smitherson · 2024-03-13T13:11:00Z

Confirm what is said above. On linux debian, filesystem ext4 single nvme disk, with exactly same data volumes, partition loading takes <1s for all sample data spaces.

Next I will try to run Hyper-V VM to see if that works ok or same issue exists.

QingZ11 · 2024-03-14T05:45:56Z

Thank you, Johnny, for the testing. Looking forward to your results. Appreciate your sharing this!

johnny-smitherson · 2024-03-18T14:00:49Z

Tried disable windows defender antivirus, no fix
tried fresh install of windows 11 + WSL 2 + docker desktop, no fix
could not get runing hyper-v manager VMs
linux VM under Virtualbox works GREAT, i see correct starting time (under 1s) and correct query performance (on WSL2, MATCH (n) limit 1000 takes 1s, on virtualbox it's under 50ms)

With lack of further understanding, when on Windows, just use Virtualbox and set up VM port forwarding to reach graphd1,2,3

@QingZ11 maybe some project devs have access to Windows10/11+WSL2+Docker and can run profiling/tracing on the storaged booting up and loading parts for the larger demo samples?

wey-gu · 2024-03-22T04:56:25Z

Sorry @johnny-smitherson for the bad IO perf on WSL2 Docker Desktop.

This seems to be a long-open issue for the stack 😢.

docker/for-win#12401

johnny-smitherson · 2024-03-22T14:45:57Z

ah i see, thanks for the link!

might be worth it to put this situation into the FAQ under "resources" https://docs.nebula-graph.io/3.0.2/4.deployment-and-installation/1.resource-preparations/ - while it's obvious that only Linux is supported, people will still try to use WSL2 as replacement for linux

I had no other functional errors using the platform, just the 20x-100x performance degradation - so you could mention wsl2 as unsupported testing platform (as long as you don't have any significant amount of data)

QingZ11 · 2024-03-25T07:02:14Z

Thank you for Johnny's suggestion. I will communicate it with the documentation team to see if need to add a FAQ section. Once again, thank you for your testing work; it's very beneficial to us.

johnny-smitherson added the type/bug Type: something is unexpected label Mar 12, 2024

github-actions bot added affects/none PR/issue: this bug affects none version. severity/none Severity of bug labels Mar 12, 2024

QingZ11 added type/question Type: question about the product and removed type/bug Type: something is unexpected severity/none Severity of bug affects/none PR/issue: this bug affects none version. labels Mar 13, 2024

johnny-smitherson changed the title ~~booting storaged with 300MB of data takes more than 20min~~ [Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min Mar 13, 2024

abby-cyber mentioned this issue Mar 26, 2024

feedback on docker desktop vesoft-inc/nebula-docs#2518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min #5836

[Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min #5836

johnny-smitherson commented Mar 12, 2024 •

edited

Loading

QingZ11 commented Mar 13, 2024

johnny-smitherson commented Mar 13, 2024

QingZ11 commented Mar 14, 2024

johnny-smitherson commented Mar 18, 2024 •

edited

Loading

wey-gu commented Mar 22, 2024

johnny-smitherson commented Mar 22, 2024 •

edited

Loading

QingZ11 commented Mar 25, 2024

[Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min #5836

[Windows10 + WSL2 Docker Desktop] booting storaged with 300MB of data takes more than 20min #5836

Comments

johnny-smitherson commented Mar 12, 2024 • edited Loading

QingZ11 commented Mar 13, 2024

johnny-smitherson commented Mar 13, 2024

QingZ11 commented Mar 14, 2024

johnny-smitherson commented Mar 18, 2024 • edited Loading

wey-gu commented Mar 22, 2024

johnny-smitherson commented Mar 22, 2024 • edited Loading

QingZ11 commented Mar 25, 2024

johnny-smitherson commented Mar 12, 2024 •

edited

Loading

johnny-smitherson commented Mar 18, 2024 •

edited

Loading

johnny-smitherson commented Mar 22, 2024 •

edited

Loading