Knowledge Fusion, Data Fusion, and Truth Discovery

📝 Surveys

A Survey on Truth Discovery [Paper] 🌟
Truth Discovery Algorithms: An Experimental Evaluation [Paper]
A survey on data fusion: what for? in what form? what is next? (Journal of Intelligent Information Systems, 2020) [Paper]

📝 General Papers

Data Fusion

I think this is a relatively old topic, people are moving to knowledge fusion since 2018. Actually there are many interesting small topics. e.g., single truth/multi-truth, copy detection, source reliability. I will classfiy the following papers later. However, I think data fusion/knowledge fusion will play an essential role in data processing in the pre-trained dataset in LLMs/LMs.

Truth Discovery with Multiple Conflicting Information Providers on the Web (TKDE 2008), the most classical one. 🌟
Integrating conflicting data: the role of source dependence (VLDB 2009), the most classical one. 🌟
Fusing data with correlations (SIGMOD 2014) 🌟
Truth discovery and copying detection in a dynamic world (VLDB 2009) 🌟
Global detection of complex copying relationships between sources (VLDB 2010) [Paper] 🌟
Online data fusion (VLDB 2011) 🌟
Compact explanation of data fusion decisions (WWW 2013)
Truth finding on the Deep Web: Is the problem solved? (VLDB 2013) 🌟
A Confidence-Aware Approach for Truth Discovery on Long-Tail Data (VLDB 2014) 🌟
Dynamic Truth Discovery on Numerical Data (ICDM 2018) 🌟
Scaling up Copy Detection (ICDE 2015) 🌟

Knowledge Fusion, Cleaning and Evaluation

Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion (KDD 2014) [Paper] 🌟
From data fusion to knowledge fusion (VLDB 2014) [Paper] [Slides] 🌟
Data X-Ray: A diagnostic tool for data errors (SIGMOD 2015) [Paper] [Slides] [Demo] 🌟
Knowledge-based trust: estimating the trustworthiness of web sources [Paper] [Slides]🌟
Knowledge verification for long tail verticals (VLDB 2017) 🌟
Efficient knowledge graph accuracy evaluation (VLDB 2019) [Link] 🌟
MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps (ICDE 2019) 🌟
Distilling relations using knowledge bases (VLDBJ 2018) 🌟

Given a relational table, we study the problem of detecting and repairing erroneous data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB.

HoloDetect: Few-Shot Learning for Error Detection [PDF], the same team of the HoloClean (SIGMOD 2019) 🌟
Unsupervised String Transformation Learning for Entity Consolidation [PDF] (ICDE 2019) 🌟
Normalization of Duplicate Records from Multiple Sources (TKDE 2019) 🌟
Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise (VLDB 2020) 🌟
Learning Over Dirty Data Without Cleaning [Paper] (SIGMOD 2020) 🌟
CoClean: Collaborative Data Cleaning [Paper] (SIGMOD 2020, demo) 🌟
T-REx: Table Repair Explanations [Paper] (SIGMOD 2020, demo) 🌟
Triple Trustworthiness Measurement for Knowledge Graph (WWW 2019)
Tracy: Tracing Facts over Knowledge Graphs and Text (WWW 2019, short)
Few-Shot Knowledge Validation using Rules (WWW 2021) [Paper]
Two Heads are Better than One: Zero-shot Cognitive Reasoning via Multi-LLM Knowledge Fusion (CIKM 2024) [Paper] 🔥

Vandalism Detection

Debiasing Vandalism Detection Models at Wikidata (WWW 2019)

Malicious Participant Detection

Truth discovery for spatio-temporal events from crowdsourced data (VLDB 2017) [Paper] 🌟
Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation (SIGMOD 2014) [Paper] 🌟 (only mention malicious sources in one sentence)
Reputation-Aware Data Fusion and Malicious Participant Detection in Mobile Crowdsensing (2018 IEEE International Conference on Big Data (Big Data)) [Paper]

📊 Datasets

Fusion Datasets [Link]

💬 Notes

Data Fusion – Resolving Data Conflicts for Integration [Tutorial Proposal]
Data Integration and Machine Learning: A Natural Synergy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Knowledge Fusion, Cleaning, Evaluation and Truth Discovery.md

Knowledge Fusion, Cleaning, Evaluation and Truth Discovery.md

Knowledge Fusion, Data Fusion, and Truth Discovery

📝 Surveys

📝 General Papers

📊 Datasets

💬 Notes

Files

Knowledge Fusion, Cleaning, Evaluation and Truth Discovery.md

Latest commit

History

Knowledge Fusion, Cleaning, Evaluation and Truth Discovery.md

File metadata and controls

Knowledge Fusion, Data Fusion, and Truth Discovery

📝 Surveys

📝 General Papers

📊 Datasets

💬 Notes