Skip to content
This repository has been archived by the owner on Dec 3, 2024. It is now read-only.

Latest commit

 

History

History
36 lines (22 loc) · 949 Bytes

duplicates.md

File metadata and controls

36 lines (22 loc) · 949 Bytes

Duplicates

General Ledger. Accounting use-case

{% embed url="https://owl-analytics.com/general-ledger" %}

Whether your looking for a Fuzzy matching percent or single client cleanup Owl's duplicate detection can help you sort and rank the likely hood of duplicate data.

-f file:///home/ec2-user/single_customer.csv \
-d "," \
-ds customers \
-rd 2018-01-08 \
-dupe \
-dupenocase \
-depth 4

User Table has duplicate user entry

Carrisa Rimmer vs Carrissa Rimer

ATM customer data with only a 88% match

As you can see below less than a 90% match in most cases is a false positive. Each dataset is a bit different but in many cases you should tune your duplicates to roughly a 90+% match for interesting findings.

Simple DataFrame Example