-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments on the PyDKPro examples notebook #14
base: master
Are you sure you want to change the base?
Conversation
I made a few small changes to the descriptive texts in the notebook that are not marked in any particular way. My comments are marked with |
"p = Pipeline(version=\"2.0.0\", language='en')\n", | ||
"p.add(Component().clearNlpSegmenter())\n", | ||
"\n", | ||
"// REC: Is printTagSet not false by default?\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added printTagset to demonstrate "how to pass the parameters" But you are right, default value is false
. I'll update it to True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw. considering that ClearNLP is no longer maintained, maybe use other components for the examples - e.g. CoreNLP or OpenNLP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok Sure!! Thanks for noticing this.
"p.add(Component().stanfordPosTagger(variant='fast-caseless', printTagSet='false'))\n", | ||
"\n", | ||
"// If it has a build method, maybe it should be a PipelineBuilder? Also, the build method should return something,\n", | ||
"// i.e. the actual pipeline which maybe can later be used to shut the pipeline down?\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this comment. Could you please elaborate on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cf. https://dzone.com/articles/design-patterns-the-builder-pattern
If something has a build()
method, it looks like a builder pattern. A builder does not return itself in the build()
method but rather the object it was used to construct. In your case, the code looks like you want to have a class PipelineBuilder
whose build()
method eventually produces the actual readily-configured Pipeline
which then can be used to process stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this pattern. Yes, as soon as the backend gets ready, we will update like this.
"cas.select(dts().token).as_text() " | ||
"cas.select(dts().token).as_text() \n", | ||
"\n", | ||
"// REC: I wonder which version of the DKProCoreTypeSystem is used here...\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is good point, here dts()
is DKPro type system class object call which load DKPro Core type systems (similar to DKPro Core Integration in dkpro-cassis
wrapper) with latest version and return as CAS object. I believe we should use it to automatically load the type systems. Version of typesystem can also passed while calling this object
from pydkpro import DKProCoreTypeSystem as dts
ts_token = dts(version="2.0.0").token
cas.select(ts_token).as_text()
Please provide your feedback, if you like or dislike above way of representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is an object, it should start with a Capital letter, no?
If it is a utility method, maybe load_dkpro_core_typesystem(version="2.0.0")
would make it clearer.
Personally, I would find load_dkpro_core_typesystem(version="2.0.0").get_token_type()
quite understandable.
ts = load_dkpro_core_typesystem(version="2.0.0");
cas.select(ts.get_token_type()).as_text();
or maybe
ts = load_dkpro_core_typesystem(version="2.0.0");
cas.select(ts.token_type).as_text();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it looks nicer. But here with load_dkpro_core_typesystem
you mean replacing it with DKProCoreTypeSystem
class name and not just use it as alias name. If yes, I'll update notebook like following:
ts = Load_dkpro_core_typesystem(version="2.0.0")
cas.select(ts.token_type).as_text()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me Load_dkpro_core_typesystem(...)
would be a utility method, not a constructor - so load_dkpro_core_typesystem(...)
. If you wanted to use a class, I would think of something like DKProCoreTypesystemLoader().load(version="2.0.0")
but that seems a bit clumsy to me - why use a class here if a utility method suffices?
"\n", | ||
"// REC: I wonder which version of the DKProCoreTypeSystem is used here...\n", | ||
"\n", | ||
"// REC: Why is dts() a method? It would seem quite redundant to always have to call this method.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explanation is provided in this comment
"Similar to [DKPro cassis](https://github.com/dkpro/dkpro-cassis), to add manual annotations to cas object, we need to defined it with `typesystem`. For the given text, annotations of Tokens that has an id and pos feature can be added in the following.\n" | ||
"Similar to [DKPro cassis](https://github.com/dkpro/dkpro-cassis), to add manual annotations to cas object, we need to defined it with `typesystem`. For the given text, annotations of Tokens that has an id and pos feature can be added in the following.\n", | ||
"\n", | ||
"**REC**: The `CAS` type in Java UIMA is written with all upper-case letters. I think it would introduce an unnecessary difference to call it `Cas` here.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to pep8 python format. Class names should normally use the CapWords convention. The uppercase convention is used for static variables. So, I am not sure, it would be a great idea to adopt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java has the same convention. I guess the reason why CAS is all capitals in Java is because it is an abbreviation the long form being "Common Analysis System".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the JCas, the Java implementation also departs from the all capitals. So there are pro and con reasons - I don't have a particularly strong opinion - my suggestion to use CAS
is because I think it may make it easier for people if the naming wouldn't arbitrarily differ between implementations. @jcklie
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, I'll update it.
"\n", | ||
"**REC**: The `CAS` type in Java UIMA is written with all upper-case letters. I think it would introduce an unnecessary difference to call it `Cas` here.\n", | ||
"\n", | ||
"**REC**: Why does `CAS(dts().typesystem)()` have two sets of parentheses? That looks very odd." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be replaced the whole line with:
dkpro_typesystem = dts().typesystem
or dkpro_typesystem = dts(version="2.0.0").typesystem
Cas(dkpro_typesystem)()
We need parenthesis again to initialize Cas object.
Please provide your feedback on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dkpro_typesystem = dts(version="2.0.0").typesystem
looks better but is still strange. Maybe dkpro_core_typessystem = DKProCore(version="2.0.0").get_typesystem()
?
I still don't get it. The CAS object can be constructed using Cas(typesystem=DKProCore(version="2.0.0").get_typesystem())
, right? Why put another set of parentheses after it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point, For now, I am using pydkpro Cas class as a wrapper on the top of dkpro-cassis Cas along with some helper functions. So, double parenthesis is used to call pydkpro Cas class method to constructs the dkpro-cassis Cas object. I believe, in that way, we can avoid redundancy in the code. If you have any second opinion, please let me know. I should have mentioned about this in earlier comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would subclassing work? I.e. let the pydkpro CAS inherit from the cassis CAS and let the pydkpro CAS call the constructor from the superclass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is the correct way. I'll do the update. Thanks!!
This is not strictly speaking a PR - rather a way of providing my comments on the examples notebook. I hope this works for you.
@jcklie I also requested a review from you since I believe at least one comment may apply to Cassis - i.e. the one about
cas.add_annotation()
.