-
Notifications
You must be signed in to change notification settings - Fork 4
ErgTreebankingGuidelines
Contents
- Heuristics for efficient treebanking
- Technical choices
- Notes from Tomar meeting
- Choose the construction that spans the whole sentence
- Typically SUBJH
- Typically not one of the FRAG* rules
- Disambiguate lexical entries early, to reduce remaining ambiguity
- in general prefer the simpler choice
- e.g. for nominal seating prefer NP over intransitive V, rather than NP over transitive V with an optional complement.
- |Mr. Browne|
- Choose NP-TITLE-CMPND, not APPOS
- treat as parts of name, not ordinary words
- |Rolls-Royce Motor Cars Inc.|
- |Motor Cars|
- NP_NAME_CMPND, not NOUN_N_CMPND
- |Rolls-Royce|
- Choose multi-word entry when available
- |Rolls-Royce Motor Cars|
- NP_NAME_CMPND
- Attach |Inc.| with NADJ_RR
- |Motor Cars|
- treat as appositive
- |Howard Mosher, president and CEO|
- First combine |Howard Mosher|
- Then combine it with |president and CEO| using APPOS_NBAR
- Company names
- |Rolls-Royce|
- Choose n_-_pn_le, not NP_NAME_CMPND
- |Rolls-Royce|
- Country names
- |U.S.|
- Choose n_-_c-nm-pd_le, not n_-_pn-gen_le
- |U.S.|
- Unknown names
- |Elianti.|
- Choose PUNCT_PERIOD_ORULE (period is not part of name)
- |Elianti.|
- Name abbreviations containing periods
- |U.S.|
- Choose PUNCT_PERIOD_ORULE if word is at end of sentence
- |U.S.|
- Choose highest attachment point consistent with meaning
-
|remain steady at 1,200 cars|
- attach to VP, not to |steady|
-
|reserve a room for Browne|
- attach to VP, not to |room|
-
but disprefer modifier attachment to semantically vacuous heads
- e.g. attach modifiers to hiring ..., not be hiring ...
-
- In copula constructions (with forms of verb "be"), attach PP inside
- |be payable Feb. 15|
- First combine |payable| with |Feb. 15| with HADJ_I_UNS
- |be payable Feb. 15|
- Complement vs. modifier - choose complement when available
- |based in Los Angeles|
- Choose HCOMP, not HADJ_I_UNS
- |based in Los Angeles|
- PP modifier inserted between verb and its complement NP
- |publish in statements the names of insiders|
- First combine |publish| with |in statements| using VMOD_I
- |publish in statements the names of insiders|
- When precede VP, attach to subject NP
- |the maker last year sold cars|
- attach |last year| to |maker|
- |the maker last year sold cars|
- Treat as modifiers, pumping temporal NP to a PP
- |last year|
- Choose NPADV, not ADJN
- |Feb. 15|
- Combine with HSPECHC, then choose NPADV
- |last year|
- Complex phrases
- |early next year|
- Combine |early| with |next year| using NADJ_RR
- |early next year|
- Choose bracketing with intended sense
- |luxury auto maker|
- first combine |luxury| with |auto|
- |luxury auto maker|
- When intended bracketing is not clear, group from right to left
- |airline ticket counter|
- first combine |ticket| with |counter|
- |airline ticket counter|
-
if you have a choice between XP CCONJ XPvs X CCONJ X choose the XP (or S), that is, the highest constituent
- e.g., for cats and dogs, prefer NP coordination over N coordination with a bare NP rule on top
-
Nominal phrases
- Choose N_COORD_TOP_2, not N_COORD_TOP_3 when given the choice
-
Sentence-initial conjunction - treat as incomplete coordination of clauses
- |But Abrams arrived early.|
- Combine |But| with |Abrams arrived early.| with HMARK_CL
- |But Abrams arrived early.|
- Choose verb if the meaning is agentive; otherwise choose adjective
- |A date hasn't been set|
- For |set|, choose v_np*_le, not aj_-_i_le
- |A date hasn't been set|
- Attach punctuation to the preceding words
- except for some rare conjunctions
- Paired commas marking off a modifier: choose "paired" rule (-PR
suffix)
- |Bell, based in Los Angeles|
- Choose NADJ_RC_PR to combine modifier phrase with |Bell|
- |Bell, based in Los Angeles|
- Negation - always attach |not| to preceding auxiliary if possible
- |did not meet|
- First combine |did| with |not| using HCOMP
- |did not meet|
- Other adverbs between auxiliary and main VP - attach adverb to
following VP
- |can really sing|
- First combine |really| with |sing| using ADJH_S
- |can really sing|
- Sentence-initial - Prefer attachment without extraction when
possible
- |Apparently the commission met|
- Choose ADJ_S, not FILLHEAD_NON_WH_IG
- |Apparently the commission met|
- Degree modifiers - combine with the number word
- |about 25 % of them|
- First combine |about| with |25| using HSPECHC
- Combine |%| with |of them| using HCOMP
- |about 25 % of them|
- Dollar amounts - treat the symbol |$| as the head (the unit of
measure)
- |$ 80 billion|
- Combine |$| with |80 billion| using MEAS_NP_SYMB
- |$ 80 billion|
- treat as extraction from 'saying' verb
- |They arrived, Browne said.|
- Combine |They arrived,| with |Browne said.| using FILLHEAD_NON_WH
- |They arrived, Browne said.|
- First pump determiner to noun, and treat of-PP as complement
- |some of the books|
- Combine |some| with |of the books| using HCOMP
- |some of the books|
- For |all|, |not all|, |both|, and |half|, treat following NP
as complement
- |not all those who wrote|
- For |not all|, choose native entry n_np_mc-neg_le
- Combine |not all| with |those who wrote| using HCOMP
- |not all those who wrote|
- Modifiers to the right of the head noun are always attached
_before_
- any modifiers to the left
- |important changes by the SEC|
- First combine |changes| with |by the SEC| using NADJ_RR
-
Where lexical ambiguity is hard to decide (e.g. even-deg vs even-conj), choose based on frequency in redwoods/deepbank
-
Disprefer modifier attachment to semantically vacuous heads e.g. attach adverbs to hiring..., not be hiring...
-
For there-copula:
- Avoid double-object choice and avoid modification of there-cop
- Also prefer low attachment of modifier after obj NP
- Accept extraction of PP for there-cop as is
-
When choice of verb-particle or verb-mod as in go away, if you can modify the `particle' as in go far away, it is not verb-particle.
-
When choice of spr-hd or mod-hd for Adv-Adj, choose mod-hd
-
Avoid adv-add except for not
-
When WH-Q of form NP-be-NP [EMB: guessing this is choose subj-head; Dan please confirm]
-
For complement of saying, if there's a main clause option for the quoted material choose it:
- |"Who did Kim hire" asked Mary| not |*Who Kim hired, asked Mary|
-
No free relatives
-
Attach three-dot punct as low as possible
-
Reject ellipsis
-
For ndash between clauses, use run-on
-
For degree specifiers, when there's a choice, take the shortest lexent type name
-
Attach subord clause high [EMB: subordinate clauses are understood as clauses with all arguments overt; do not include in+order+to purposives, etc.]
Home | Forum | Discussions | Events