- A Survey on Truth Discovery [Paper] 🌟
- Truth Discovery Algorithms: An Experimental Evaluation [Paper]
- A survey on data fusion: what for? in what form? what is next? (Journal of Intelligent Information Systems, 2020) [Paper]
Data Fusion
I think this is a relatively old topic, people are moving to knowledge fusion since 2018. Actually there are many interesting small topics. e.g., single truth/multi-truth, copy detection, source reliability. I will classfiy the following papers later. However, I think data fusion/knowledge fusion will play an essential role in data processing in the pre-trained dataset in LLMs/LMs.
- Truth Discovery with Multiple Conflicting Information Providers on the Web (TKDE 2008), the most classical one. 🌟
- Integrating conflicting data: the role of source dependence (VLDB 2009), the most classical one. 🌟
- Fusing data with correlations (SIGMOD 2014) 🌟
- Truth discovery and copying detection in a dynamic world (VLDB 2009) 🌟
- Global detection of complex copying relationships between sources (VLDB 2010) [Paper] 🌟
- Online data fusion (VLDB 2011) 🌟
- Compact explanation of data fusion decisions (WWW 2013)
- Truth finding on the Deep Web: Is the problem solved? (VLDB 2013) 🌟
- A Confidence-Aware Approach for Truth Discovery on Long-Tail Data (VLDB 2014) 🌟
- Dynamic Truth Discovery on Numerical Data (ICDM 2018) 🌟
- Scaling up Copy Detection (ICDE 2015) 🌟
Knowledge Fusion, Cleaning and Evaluation
- Knowledge Vault: A Web-Scale Approach to Probabilistic Knowledge Fusion (KDD 2014) [Paper] 🌟
- From data fusion to knowledge fusion (VLDB 2014) [Paper] [Slides] 🌟
- Data X-Ray: A diagnostic tool for data errors (SIGMOD 2015) [Paper] [Slides] [Demo] 🌟
- Knowledge-based trust: estimating the trustworthiness of web sources [Paper] [Slides]🌟
- Knowledge verification for long tail verticals (VLDB 2017) 🌟
- Efficient knowledge graph accuracy evaluation (VLDB 2019) [Link] 🌟
- MIDAS: Finding the Right Web Sources to Fill Knowledge Gaps (ICDE 2019) 🌟
- Distilling relations using knowledge bases (VLDBJ 2018) 🌟
Given a relational table, we study the problem of detecting and repairing erroneous data, as well as marking correct data, using well curated knowledge bases (KBs). We propose detective rules (DRs), a new type of data cleaning rules that can make actionable decisions on relational data, by building connections between a relation and a KB.
- HoloDetect: Few-Shot Learning for Error Detection [PDF], the same team of the HoloClean (SIGMOD 2019) 🌟
- Unsupervised String Transformation Learning for Entity Consolidation [PDF] (ICDE 2019) 🌟
- Normalization of Duplicate Records from Multiple Sources (TKDE 2019) 🌟
- Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise (VLDB 2020) 🌟
- Learning Over Dirty Data Without Cleaning [Paper] (SIGMOD 2020) 🌟
- CoClean: Collaborative Data Cleaning [Paper] (SIGMOD 2020, demo) 🌟
- T-REx: Table Repair Explanations [Paper] (SIGMOD 2020, demo) 🌟
- Triple Trustworthiness Measurement for Knowledge Graph (WWW 2019)
- Tracy: Tracing Facts over Knowledge Graphs and Text (WWW 2019, short)
- Few-Shot Knowledge Validation using Rules (WWW 2021) [Paper]
- Two Heads are Better than One: Zero-shot Cognitive Reasoning via Multi-LLM Knowledge Fusion (CIKM 2024) [Paper] 🔥
Vandalism Detection
- Debiasing Vandalism Detection Models at Wikidata (WWW 2019)
Malicious Participant Detection
- Truth discovery for spatio-temporal events from crowdsourced data (VLDB 2017) [Paper] 🌟
- Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation (SIGMOD 2014) [Paper] 🌟 (only mention malicious sources in one sentence)
- Reputation-Aware Data Fusion and Malicious Participant Detection in Mobile Crowdsensing (2018 IEEE International Conference on Big Data (Big Data)) [Paper]
- Fusion Datasets [Link]
- Data Fusion – Resolving Data Conflicts for Integration [Tutorial Proposal]
- Data Integration and Machine Learning: A Natural Synergy