Skip to content

InitRoot/rustdedup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 

Repository files navigation

rustdedup

Follow on Twitter GitHub last commit GitHub stars

Deduplicate files at fast speeds! Written in RUST.

Memory optimizations made:

  • Well rust.
  • Input lines directly streamed to the processing threads without collecting them all first.
  • Partitions the hash space to reduce lock contention.

Some stats

In the below test we utilise a small 75mb file (else we wait too long for hyperfine) with 1 595 966 lines of data. image

When we up the anty a little bit going to large files 2.3gb we see some improvements. image

When we compare with the likes of duplicut (https://github.com/nil0x42/duplicut) some significant improvements can be seen, however, I'm not sure if this boils down to the rust usage over c. image

Usage

cat file.txt | rustdedup

rustdedup -i /diska9.txtextra.csvmodded.csv -o output2.txt

About

Deduplicate files from stdin at rust speeds.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages