rustdedup

Deduplicate files at fast speeds! Written in RUST.

Memory optimizations made:

Well rust.
Input lines directly streamed to the processing threads without collecting them all first.
Partitions the hash space to reduce lock contention.

Some stats

In the below test we utilise a small 75mb file (else we wait too long for hyperfine) with 1 595 966 lines of data.

When we up the anty a little bit going to large files 2.3gb we see some improvements.

When we compare with the likes of duplicut (https://github.com/nil0x42/duplicut) some significant improvements can be seen, however, I'm not sure if this boils down to the rust usage over c.

Usage

cat file.txt | rustdedup

rustdedup -i /diska9.txtextra.csvmodded.csv -o output2.txt

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
src		src
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

rustdedup

Memory optimizations made:

Some stats

Usage

About

Releases 1

Packages

Languages

InitRoot/rustdedup

Folders and files

Latest commit

History

Repository files navigation

rustdedup

Memory optimizations made:

Some stats

Usage

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages