Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Fuzzy deduplication on Spark #152

Open
1 of 2 tasks
pzerfos opened this issue May 20, 2024 · 1 comment
Open
1 of 2 tasks

[Feature] Fuzzy deduplication on Spark #152

pzerfos opened this issue May 20, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@pzerfos
Copy link

pzerfos commented May 20, 2024

Search before asking

  • I searched the issues and found no similar issues.

Component

Transforms/universal/fdedup

Feature

Spark version of fuzzy deduplication that can work across code and language.

  • Incremental logging and progress indicators
  • Checkpointing
  • Resource utilization estimation for network/compute/memory

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@pzerfos pzerfos added the enhancement New feature or request label May 20, 2024
@cmadam
Copy link
Collaborator

cmadam commented Jul 11, 2024

Started implementation work in the fuzzy-dedup-spark branch

@Kibnelson Kibnelson mentioned this issue Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants