Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copula with clausal argument #824

Closed
aryamanarora opened this issue Dec 2, 2021 · 8 comments
Closed

Copula with clausal argument #824

aryamanarora opened this issue Dec 2, 2021 · 8 comments
Labels
dependencies Hindi-Urdu Indo-Iranian question UPOS Universal part-of-speech tags: definitions and examples
Milestone

Comments

@aryamanarora
Copy link
Member

A messy construction with clauses and ccomp:

  1. [ਉਸਦਾ ਕਹਿਣਾ]1 ਹੈv [ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ]2 lit. "[his saying]1 is [that you are correct]2"
  2. [ਉਸਦਾ ਕਹਿਣਾ]1 [ਇਹ]2 ਹੈv [ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ]3 lit. "[his saying]1 is [this]2: [that you are correct]3"

Following clause is ccomp to copula in first:

  • [ਉਸਦਾ ਕਹਿਣਾ]v:nsubj ਹੈroot [ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ]v:ccomp
    but second one has 3 args so having them all headed by the copula means we don't know what relation to assign to ਇਹ "this" (nsubj, ?, ccomp).

Note that first two NP constituents + verb form just a normal equational copula (ਉਸਦਾ ਕਹਿਣਾ ਇਹ ਹੈ "his saying is this"), so one option is this:

  • [ਉਸਦਾ ਕਹਿਣਾ]2:nsubj [ਇਹ]0:root ਹੈ2:cop [ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ]2:ccomp

So then why are 1 and 2 different? (Note: exact same issue in Hindi)


HDTB would treat these as:

  • [ਉਸਦਾ ਕਹਿਣਾ]2:nsubj [ਇਹ]0:root ਹੈ2:cop [ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ]2:acl

Only one instance in Hindi PUD but with this weird kind of thing (ends up non-projective):

  • [ਉਸਦਾ ਕਹਿਣਾ]2:nsubj [ਇਹ]0:root ਹੈ2:cop [ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ]1:acl

Sort of see the rationale, since 1 and 3 can form a single NP constituent:

  • [ਉਸਦਾ ਕਹਿਣਾ [ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ]acl] ਚੰਗਾ ਹੈ "[his saying [that you are correct]] is good"

Originally posted by @aryamanarora in https://github.com/UniversalDependencies/UD_Punjabi-PunTB/issues/3#issuecomment-984230106

@dan-zeman
Copy link
Member

I suggest to move this issue to the main issue tracker at the docs repository since it is about annotation guidelines rather than about bugs in a treebank.

@dan-zeman dan-zeman transferred this issue from UniversalDependencies/UD_Punjabi-PunTB Dec 3, 2021
@dan-zeman dan-zeman added dependencies Hindi-Urdu Indo-Iranian UPOS Universal part-of-speech tags: definitions and examples question labels Dec 3, 2021
@dan-zeman dan-zeman added this to the v2.10 milestone Dec 3, 2021
@dan-zeman
Copy link
Member

dan-zeman commented Dec 3, 2021

I think the HDTB analysis is correct and the one from Hindi PUD is wrong. The example His saying that you are correct is good should end up different because it is different: here, "being good" is the predicate. In the original two examples, we could debate whether "his saying" is the subject and the other part is the predicate, or vice versa, but if I understand the construction correctly, the clause "that you are correct" is still an elaboration of "this" when "this" is present. So I think the following are possible:

ਉਸਦਾ ਕਹਿਣਾ ਹੈ ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ
usadā kahiṇā hai ki tusīṁ sahī ho

nsubj(hai, kahiṇā)
xcomp(hai, sahī)

or

cop(kahiṇā, hai)
csubj(kahiṇā, sahī)

ਉਸਦਾ ਕਹਿਣਾ ਇਹ ਹੈ ਕਿ ਤੁਸੀਂ ਸਹੀ ਹੋ
usadā kahiṇā iha hai ki tusīṁ sahī ho

nsubj(iha, kahiṇā)
cop(iha, hai)
acl(iha, sahī)

or

nsubj(kahiṇā, iha)
cop(kahiṇā, hai)
acl(iha, sahī)

The main difference between the first and the second sentence is that the copula ਹੈ hai is the head when the non-verbal predicate is a clause (note that there is no issue if we say that "his saying" is the predicate and the clause is the subject instead). It is unfortunate that the resulting tree is different but it has been specified so in the guidelines for all languages. The reason is that we want to avoid having two subjects attached to the head of the predicate clause: one internal to the clause, and the other for the superordinate clause where this clause serves as a predicate.

@amir-zeldes
Copy link
Contributor

amir-zeldes commented Dec 3, 2021

The reason is that we want to avoid having two subjects attached to the head of the predicate clause: one internal to the clause, and the other for the superordinate clause where this clause serves as a predicate.

I agree with most of the above, but I guess this is the point where I have to defend the two nsubj analysis for nested copulas... I've said this on the EWT tracker, and in a presentation at the recent Dagstuhl seminar, but this probably belongs in the docs repo as well:

In "the problem is that Kim is tired", If we label the matrix clause copula as root and make "tired" its ccomp dependent, then we are saying that:

  • "is" can be a transitive verb (which it definitely isn't IMO)
  • there is a different construction in "Kim is tired" vs. "the problem is that..." (it is the same syntactic structure as far as I can tell, they are both the copular A is B construction, and it sounds like Dan also feels this distinction would be unfortunate)
  • we are not using the usual phrasal analysis in UD, in which functional dependents of a phrase are attached to the normal root of that phrase, regardless of what is going on inside it. For example in "jump from under the bed", "bed" has two case dependents, one actually modifying the noun, and one modifying the PP. For nested PPs it doesn't seem to bother us that there are two case dependents, but for nsubj for some reason it triggers a different treatment. In my opinion the two are analogous:

In languages with zero copula constructions, the analysis of the copula as root and head of ccomp is not actually possible, leading to further inconsistency across UD languages. Hebrew example:

  • ha-be'aya she-kim ayefa
  • the-problem that-Kim tired "the problem is that Kim is tired"

In this case, there is no possibility of applying the exceptional copula-as-root analysis, and we get two nsubj relations on "tired" no matter what. I don't consider two nsubjs to be a problem though, I consider it to be the expected analysis from a UD perspective (lexico-centric, does not assume that we need verbs for predication "A is B" analyzed the same on both levels).

@dan-zeman
Copy link
Member

but I guess this is the point where I have to defend the two nsubj analysis for nested copulas

Thanks, Amir. I am not saying that the guidelines cannot be modified in the future (and I am personally not a strong supporter of this particular rule) but in my previous comment I was trying to explain what the current guidelines (v2) say.

Copular/nonverbal clauses are difficult and every rule seems to have a lot of drawbacks, which is probably the reason why various copula-related issues resurface in every other thread in the Github issue tracker. Of the more recent ones, see e.g. #706; and #657 was about the problem that languages without an overt copula cannot make a copula the head, so they still end up with two subjects attached to the same predicate.

nschneid added a commit that referenced this issue May 9, 2022
#657, #824 (#868)

* changes page: Multiple Subjects amendment summary
* complex-syntax overview: Predicate Clauses
* add nsubj:outer and csubj:outer to both universal and English guidelines
* remove the old analysis from ccomp and cop pages
* en-dep-table: update to v2 (!) and add nsubj:outer and csubj:outer
* en/nsubj: document for-subjects of infinitivals (UniversalDependencies/UD_English-EWT#322)
@dan-zeman dan-zeman modified the milestones: v2.10, v2.11 Jun 13, 2022
@LarsAhrenberg
Copy link
Contributor

@nschneid, @dan-zeman I am reannotating a sentence that did not pass the 2.11 validator, such as

The fact is that it is not a joke.

I notice in the annotated examples in the Predicate Clause amendment that two copulas ('cop') are allowed but not two subjects. The subjects need to be distinguished (by the subrelation :outer), so why are the copulas not treated the same way? With passives both the subject and the auxiliary are marked :pass.

It would be nice to have an example with two copulas in the guidelines for nsubj:outer on this page:

@nschneid
Copy link
Contributor

Good idea, I have updated the documentation page.

We considered adding cop:outer and aux:outer for the predicate cluster, but decided against requiring it on practical grounds as it would add many new rare subtypes. In principle, any dependent of the outer clause could also be distinguished (advmod:outer, obl:outer), but this seemed like overkill (and again, these subtypes would be rare). See https://universaldependencies.org/changes.html#multiple-subjects

Personally, I wouldn't mind changing the guidelines to simplify aux:pass to aux—not sure distinguishing them is in line with UD's principle that functional relations (aux, cop, det, ...) are of secondary importance. That an auxiliary functions as part of the passive construction might better be marked on features.

@Stormur
Copy link
Contributor

Stormur commented Nov 21, 2022

In Latin we have introduced cop:outer since it seems necessary, in all cases, actually, so not to create "false copulas".

Could it be sensible to extend the outer subtype for all functional and core dependents?

As for :pass, I think in a wider perspective that it could be in fact eliminated as I am not sure of its position in UD (see case of deponent verbs in #713 ), and maintained only on the subject. But at some syntactic level it should be there. I am actually mroe and more convinced that since transitivity and passivisation are syntactic phenomena, the focus should be on them and not in the features.

@nschneid
Copy link
Contributor

The UD position is that treebanks are free to implement additional subtypes. The validator has a special rule about :outer for subjects though, to avoid confusing inner and outer ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Hindi-Urdu Indo-Iranian question UPOS Universal part-of-speech tags: definitions and examples
Projects
None yet
Development

No branches or pull requests

6 participants