Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug(?) in string parsing #45

Open
flipz357 opened this issue Feb 8, 2024 · 3 comments
Open

Bug(?) in string parsing #45

flipz357 opened this issue Feb 8, 2024 · 3 comments

Comments

@flipz357
Copy link
Contributor

flipz357 commented Feb 8, 2024

I don't need this fixed, but maybe it's of interest:

I noticed that this smatch version returns a score of 1.00 for the different graphs:

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans")))

and

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans_")))

Wondering if there's a bug, or if there is some reason for this? It might be some sort of preprocessing that happens here which is not obvious. Since there are also https link etc. in AMR that may contain stuff like "_" I think it may not be sensible to remove characters.

edit

It can be even more severe. The score is also 1.00 for the very different graphs:

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans Meier")))

and

(s / see-01
   :ARG0 (p / person
         :name (n / name
              :op1 "Hans")))

While the first two graphs could be some pre-processing quirk, the second two ones clearly seem like bug.

@snowblink14
Copy link
Owner

Thanks for reporting this, @flipz357. I think this old piece of code has bug handling blank space and underscores when processing the values.

Should be an easy fix though -- I'll fix this soon.

@jzw2
Copy link

jzw2 commented Jul 3, 2024

Is this intended behaviour? I noticed the given test_input1 and test_input2 have a similar example, with "Wiilliam" differeing in capitalization and having underscores. It currently counts it as a match.

@flipz357
Copy link
Contributor Author

flipz357 commented Jul 3, 2024

Hi @jzw2 I guess from the view of AMR evaluation/similarity it can seem okay to not differ between upper- and lower-case and be considered a user choice (for very strict evaluation, we might want to differ though). The observed treatment of underscores (the bug you experience, or the example shown above), however, seems indeed like wrong/unintended behavior and should not count as a match.

Maybe you can check out my SMATCH++ that has a best-practice evaluation protocol with AMR normalization (and ILP solver).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants