Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Dump for very large RDB files (> 30 GBs) is Slow #23

Open
jsrawan-mobo opened this issue Mar 19, 2013 · 4 comments
Open

Memory Dump for very large RDB files (> 30 GBs) is Slow #23

jsrawan-mobo opened this issue Mar 19, 2013 · 4 comments

Comments

@jsrawan-mobo
Copy link

For very large RDB, the memory dump can take upwards of 30 minutes. Even slower, the "key" feature requires a sequential scan over the whole file.

Finally trying to further introspect a data structure like a hash, list, set to find out which field is taking up the most memory. In my case I use celery as worker queue, and some tasks can be gigantic.

So I've made some enhancements such as the following
i) Reduce time to about 5 minutes to dump in quick mode
ii) Allow re-seeking for key contents in seconds, and limit mode
iii) Allow for verbose dumping of hash/list/set to file structure

@sripathikrishnan
Copy link
Owner

@jsrawan-mobo Thanks for taking the time to investigate this!

I am painfully aware of the sub-optimal performance. I have been tracking it under issue#1, but haven't really found the motivation to fix it yet.

It seems you have made some fixes/enhancements. Did you miss a pull request? Can you point me where you have made these fixes?

@jsrawan-mobo
Copy link
Author

See Pull Request #24.

It not completely done, but you can try and see the performance improvement by skipping past the lzf_decompress() and storing the index to a deep dump later.

If you like where its headed, i can cleanup and do a proper pull request.

@amarlot
Copy link

amarlot commented Jul 19, 2016

Have you been able to improve it ? Would it be possible to realease it ?
As for huge DB (about 50Go / 1 Millions keys) on very faster server it takes like half a day as it's monothread.

Thanks,
Alex

@jsrawan-mobo
Copy link
Author

I hadn't looked at this in a few years, seems like this project went stale. The pull request I put up does work in quick mode like this if you want to give it a try

  1. Generate a quick memory dump and index. In quick mode, only compressed_size is valid.
    rdb.py -c memory -q --file redis_memory_quick.csv redis.rdb

  2. After viewing, dump a hash/list to view contents of a offending key
    rdb.py -c memory --max 1 --pos 3568796958 -v --key mongow --file redis_memory_mongow.csv redis.rdb

I'd be willing to fix this up if someone finds use for it, or fork the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants