Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments on the PyDKPro examples notebook #14

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

reckart
Copy link
Collaborator

@reckart reckart commented Feb 2, 2020

This is not strictly speaking a PR - rather a way of providing my comments on the examples notebook. I hope this works for you.

@jcklie I also requested a review from you since I believe at least one comment may apply to Cassis - i.e. the one about cas.add_annotation().

@reckart
Copy link
Collaborator Author

reckart commented Feb 2, 2020

I made a few small changes to the descriptive texts in the notebook that are not marked in any particular way.

My comments are marked with REC.

"p = Pipeline(version=\"2.0.0\", language='en')\n",
"p.add(Component().clearNlpSegmenter())\n",
"\n",
"// REC: Is printTagSet not false by default?\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added printTagset to demonstrate "how to pass the parameters" But you are right, default value is false. I'll update it to True

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw. considering that ClearNLP is no longer maintained, maybe use other components for the examples - e.g. CoreNLP or OpenNLP?

Copy link
Collaborator

@aggarwalpiush aggarwalpiush Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok Sure!! Thanks for noticing this.

"p.add(Component().stanfordPosTagger(variant='fast-caseless', printTagSet='false'))\n",
"\n",
"// If it has a build method, maybe it should be a PipelineBuilder? Also, the build method should return something,\n",
"// i.e. the actual pipeline which maybe can later be used to shut the pipeline down?\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this comment. Could you please elaborate on it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cf. https://dzone.com/articles/design-patterns-the-builder-pattern

If something has a build() method, it looks like a builder pattern. A builder does not return itself in the build() method but rather the object it was used to construct. In your case, the code looks like you want to have a class PipelineBuilder whose build() method eventually produces the actual readily-configured Pipeline which then can be used to process stuff.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this pattern. Yes, as soon as the backend gets ready, we will update like this.

Repository owner deleted a comment from reckart Feb 3, 2020
"cas.select(dts().token).as_text() "
"cas.select(dts().token).as_text() \n",
"\n",
"// REC: I wonder which version of the DKProCoreTypeSystem is used here...\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is good point, here dts() is DKPro type system class object call which load DKPro Core type systems (similar to DKPro Core Integration in dkpro-cassis wrapper) with latest version and return as CAS object. I believe we should use it to automatically load the type systems. Version of typesystem can also passed while calling this object

from pydkpro import DKProCoreTypeSystem as dts

ts_token = dts(version="2.0.0").token

cas.select(ts_token).as_text() 

Please provide your feedback, if you like or dislike above way of representation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is an object, it should start with a Capital letter, no?

If it is a utility method, maybe load_dkpro_core_typesystem(version="2.0.0") would make it clearer.

Personally, I would find load_dkpro_core_typesystem(version="2.0.0").get_token_type() quite understandable.

ts = load_dkpro_core_typesystem(version="2.0.0");
cas.select(ts.get_token_type()).as_text();

or maybe

ts = load_dkpro_core_typesystem(version="2.0.0");
cas.select(ts.token_type).as_text();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks nicer. But here with load_dkpro_core_typesystem you mean replacing it with DKProCoreTypeSystem class name and not just use it as alias name. If yes, I'll update notebook like following:

ts = Load_dkpro_core_typesystem(version="2.0.0")
cas.select(ts.token_type).as_text()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me Load_dkpro_core_typesystem(...) would be a utility method, not a constructor - so load_dkpro_core_typesystem(...). If you wanted to use a class, I would think of something like DKProCoreTypesystemLoader().load(version="2.0.0") but that seems a bit clumsy to me - why use a class here if a utility method suffices?

"\n",
"// REC: I wonder which version of the DKProCoreTypeSystem is used here...\n",
"\n",
"// REC: Why is dts() a method? It would seem quite redundant to always have to call this method.\n",
Copy link
Collaborator

@aggarwalpiush aggarwalpiush Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explanation is provided in this comment

"Similar to [DKPro cassis](https://github.com/dkpro/dkpro-cassis), to add manual annotations to cas object, we need to defined it with `typesystem`. For the given text, annotations of Tokens that has an id and pos feature can be added in the following.\n"
"Similar to [DKPro cassis](https://github.com/dkpro/dkpro-cassis), to add manual annotations to cas object, we need to defined it with `typesystem`. For the given text, annotations of Tokens that has an id and pos feature can be added in the following.\n",
"\n",
"**REC**: The `CAS` type in Java UIMA is written with all upper-case letters. I think it would introduce an unnecessary difference to call it `Cas` here.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to pep8 python format. Class names should normally use the CapWords convention. The uppercase convention is used for static variables. So, I am not sure, it would be a great idea to adopt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java has the same convention. I guess the reason why CAS is all capitals in Java is because it is an abbreviation the long form being "Common Analysis System".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the JCas, the Java implementation also departs from the all capitals. So there are pro and con reasons - I don't have a particularly strong opinion - my suggestion to use CAS is because I think it may make it easier for people if the naming wouldn't arbitrarily differ between implementations. @jcklie

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, I'll update it.

"\n",
"**REC**: The `CAS` type in Java UIMA is written with all upper-case letters. I think it would introduce an unnecessary difference to call it `Cas` here.\n",
"\n",
"**REC**: Why does `CAS(dts().typesystem)()` have two sets of parentheses? That looks very odd."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be replaced the whole line with:

dkpro_typesystem = dts().typesystem or dkpro_typesystem = dts(version="2.0.0").typesystem
Cas(dkpro_typesystem)()

We need parenthesis again to initialize Cas object.

Please provide your feedback on this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dkpro_typesystem = dts(version="2.0.0").typesystem looks better but is still strange. Maybe dkpro_core_typessystem = DKProCore(version="2.0.0").get_typesystem()?

I still don't get it. The CAS object can be constructed using Cas(typesystem=DKProCore(version="2.0.0").get_typesystem()), right? Why put another set of parentheses after it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point, For now, I am using pydkpro Cas class as a wrapper on the top of dkpro-cassis Cas along with some helper functions. So, double parenthesis is used to call pydkpro Cas class method to constructs the dkpro-cassis Cas object. I believe, in that way, we can avoid redundancy in the code. If you have any second opinion, please let me know. I should have mentioned about this in earlier comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would subclassing work? I.e. let the pydkpro CAS inherit from the cassis CAS and let the pydkpro CAS call the constructor from the superclass?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is the correct way. I'll do the update. Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants