Skip to content

Latest commit

 

History

History
36 lines (17 loc) · 2.18 KB

README.md

File metadata and controls

36 lines (17 loc) · 2.18 KB

David Winer IE task: DIRTAR: "Discovery of Inference Rules from Text for Action Recognition"

Summary: DIRT algorithm (Lin and Pantel, 2001) with modifications (lemmas, constituency parse, slot-sim, slot-types, hypernyms, semantic-discrimination)

CODE FILES:

moviescript_crawler.py - collects movie corpus (from local path) and each document is inserted into a single document called movie_combo.txt.

sentence_parser.py - reads moviescripts from movie_comb.txt, splits into sentences, and each sentence into clauses (using constituency parse), output is "movie_cluases.txt"

semantic_parser.py - used for one of the experimental conditions - hand written frame-net style rules for discriminating candidate nouns from slots

dirtar.py - runs the dirt algorithm with all experimental conditions, and includes which X and Y slot dependencies are included for some of the experimental conditions. Running this file dumps the databases as pickle files, which are loaded for analysis by "assign_labels_moviedirt.py"

assign_labels_moviedirt.py - reads triple databases and reads (and parses with stanford parser) the test sentences (duel corpus sentences), in "IE_sent_key.txt". Outputs text files for each experimental condition where each line is a verb in the duel corpus, the guess, etc.

score_labels_dirtar.py - reads the experimental labels and calculates fscore, etc, and spits out files which are in "scored_labels" folder

FOLDERS:

extract_duel_sentences.zip - includes original duel corpus and data structures used to construct "IE_sent_key.txt" from excel file.

experimental_labels - folder containing the text files spit by assign_labels_moviedirt.py

scored_labels - folder containing the evaluation of each experimental condition, includes "total" and 1 per action of interest

Other Files:

IE_sent_key.txt - each line has a sentence from duel corpus, followed by "-#-", followed by list of action classes for that sentence.

movie_combo.txt - not attached (too large) contains combined text file for movies from Walker's cleaned imsdb database (https://nlds.soe.ucsc.edu/fc2)

movie_clauses.txt - movie_comb.txt separated into clause triples, where each slot in the triple has some extra annotations from the parse