Project: parseMEANTIME
Overview:
During my PhD at Edith Cowan University, I developed parseMEANTIME, a powerful Python application designed to transform the MEANTIME annotated corpus into a graph-based text representation stored in a Neo4j database. This project aims to make the rich semantic annotations of the MEANTIME corpus more accessible and useful for a variety of advanced analytical tasks.
MEANTIME Corpus:
MEANTIME is a gold standard, manually annotated dataset of news stories, capturing important semantic elements such as entities, events, and relationships. Despite its detailed annotations, the dataset is stored in XML format, making it challenging to link text spans across multiple semantic elements and documents. This limitation hinders the examination and processing of the annotated dataset.
parseMEANTIME Features:
Graph-Based Transformation: Converts the MEANTIME annotated corpus into nodes and edges, representing a large semantic network. Semantic Property Graphs: Encodes semantic elements into property graphs, facilitating easier analysis and visualization. Neo4j Integration: Instantiates the graph structure in a Neo4j database, leveraging its powerful graph data management and querying capabilities. Key Contributions:
Enhanced Accessibility: Simplifies the linking of semantic elements within and across documents, overcoming the limitations of the original XML format. Advanced Analysis: Enables various applications such as human analysis, graph data analysis, machine learning, and pattern mining (e.g., subgraph pattern mining). Python Implementation: Written in Python, ensuring ease of use and integration with other tools and frameworks. Versatile Applications: The transformed dataset can be used for diverse purposes, including semantic analysis, data mining, and research in natural language processing (NLP) and knowledge graphs. Impact:
parseMEANTIME transforms the MEANTIME corpus into a more usable and analyzable format, opening new possibilities for semantic analysis and research. By converting the annotated dataset into a Neo4j database, this project provides a robust platform for exploring complex semantic relationships and patterns in text.