-
Notifications
You must be signed in to change notification settings - Fork 4
GSoC 2017: Final Report
Wikipedia represents a comprehensive cross-domain source of knowledge with millions of contributors. The DBpedia project tries to extract structured information from Wikipedia and transform it into RDF.
The main classification system of DBpedia depends on human curation, which causes it to lack coverage, resulting in a large amount of untyped resources. DBTax provides an unsupervised approach that automatically learns a taxonomy from the Wikipedia category system and extensively assigns types to DBpedia entities, through the combination of several NLP and interdisciplinary techniques. It provides a robust backbone for DBpedia knowledge and has the benefit of being easy to understand for end users.
The approach to unsupervised learning of taxonomy was presented in DBTax paper. Streamline & improve the approach that is described in the paper and make it easy to run on a new DBpedia release.
Repository I contributed: DBTax
Link to my Daily Progress page
- Worked on fixing inconsistencies with expected output, improvements in Stage 3 and 4. The entire pipeline working version.PR7 (in review)
- Cycle removal code in Page type assignment PR6 (merged)
- Remaining steps in Stage 3: T-Box generation and Stage 4: Page type assignment steps PR5 (closed)
- Hierarchy Generation Attempt, integrated Logger, ported to Stanford NLP PR4 (closed)
- Stage 2: Prominent node discovery step, Automated Threshold calculation approach PR3 (merged)
- Stage 1: Leaf Extraction Step. PR 2 (closed)
- Scripts to download Wikidumps PR1 (merged)
The entire pipeline works well for English.
- To enable faster testing, we planned to integrate automated testing and CI. This will be done in next few weeks.
- There are a few open challenges which are encountered and may be worked upon from a research perceptive Open Challenges
I would like to thank every member of DBpedia community, especially my mentors, Marco Fossati and Dimitris Kontokostas, for being so nice and helpful. I have learnt a lot in the past 3 months and it has been a great experience to be a part of this wonderful community. I also like to thank DBpedia and Google for giving me this opportunity.
If you have any questions about your project or related issues you are encouraged to pose them via our support page.