-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the EnumData
data plugin to easily store Enum
instances
#5225
Add the EnumData
data plugin to easily store Enum
instances
#5225
Conversation
@giovannipizzi here example implementation of what we discussed. |
Codecov Report
@@ Coverage Diff @@
## develop #5225 +/- ##
===========================================
+ Coverage 75.20% 81.27% +6.07%
===========================================
Files 533 534 +1
Lines 37377 37423 +46
===========================================
+ Hits 28107 30410 +2303
+ Misses 9270 7013 -2257
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Do I understand correctly that to get the actual value, one would need to do node.value.value
? It's a bit convoluted (but I guess that, if we want to use this implantation, there's not much to improve for this?)
One thing I don't like too much is that .value
is a property that raises if the python definition is not there (very common: people just import Data nodes and might not have the python implementation coming with it). I think in most cases, just having the underlying enum value is enough.
Suggestion:
- have a
raw_value
orenum_value
property on the class that just returnsself.get_attribute(self.KEY_VALUE)
and never raises - if we believe that
.value
should return the actual Enum, replace.value
with an actual method.get_enum()
. I know this goes in contradiction with the base classes, but e.g. this is not JSON-serializable (the value might be, though). So it's probably OK not to consider this a base class, and to have an explicit method if you really want to reconstruct a class that might not be there (likeStructureData.get_ase()
).
Actually, thinking better to it (and to my first point), maybe I would just have .value
return the value of the Enum, i.e. this replaces what I was suggesting to call .raw_value
above?
So you can do node.value
and this would work like Enum.value
that is quite intuitive and also avoids to have to do node.value.value
. And if you really care about the Enum, you do node.get_enum()
aiida/orm/nodes/data/enum.py
Outdated
@@ -0,0 +1,64 @@ | |||
# -*- coding: utf-8 -*- | |||
"""Data plugin that allows to easily wrap an :class:`enum.Enum` instance.""" | |||
import enum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any remote risk that this tries to import itself? In this case, should we rename the module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but instead of renaming the module, I have changed the import to from enum import Enum
. That should be fine, no?
Thanks for the feedback @giovannipizzi . I agree that |
9f89e31
to
680f065
Compare
Do you mind me asking what is the use case for this? Initially I would have thought that the natural thing is to store a whole |
As any other use case for an
This wouldn't make sense to me honestly. Sure, like this you can always load the node and deserialize the actual member, even if the original class was modified or not even available anymore. But the whole purpose of this plugin is to allow actually using it inside a process and in that case the enum has to be available in the environment or the code wouldn't work anyway. So serializing the entire class and storing that on the node won't help you there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just a minor thing to fix in the doctsring - I'm pre-approving, but great if you could fix it before merging
aiida/orm/nodes/data/enum.py
Outdated
|
||
The class ``Color`` is an enumeration (or enum). The attributes ``Color.RED`` and ``Color.GREEN`` are enumeration | ||
members (or enum members) and are functionally constants. The enum members have names and values: the name of | ||
``Color.RED`` is ``RED`` and the value of ``Color.BLUE`` is ``3``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Colo.BLUE
is not in the example above
An `Enum` member is represented by three attributes in the database: * `name`: the member's name * `value`: the member's value * `identifier`: the string representation of the enum's identifier The original enum member can be obtained from the node through the `get_member` method. This will except if the enum class can no longer be imported or the stored value is no longer a valid enum value. This can be the case if the implementation of the class changed since the original node was stored. The plugin is named `EnumData`, and not `Enum`, because the latter would overlap with the Python built-in class `Enum` from the `enum` module. The downside is that this differs from the naming of other Python base type analogs, such as `Str` and `Float`, which simply use a capitalized version of the type for the data plugin class.
680f065
to
34cab28
Compare
Eurgh, @sphuber I very much agree with @ramirezfranciscof, it is fragile; I don't how he feels about it, but I don't think you adequately answered he's questions. In [1]: from plumpy.loaders import get_object_loader
In [2]: import enum
In [3]: class DummyEnum(enum.Enum):
...: """Dummy enum for testing."""
...:
...: OPTION_A = 'a'
...: OPTION_B = 'b'
...:
In [4]: get_object_loader().identify_object(DummyEnum)
Out[4]: '__main__:DummyEnum' This is what will be stored under |
A few more notes:
A possible more general approach for a data type that stored "validated" fields, would be to store them in some way alongside a jsonschema (I already kind of do this in some of my plugins), and enum would be a subset of such a data type (https://json-schema.org/understanding-json-schema/reference/generic.html?highlight=enum#enumerated-values) {
"enum": ["red", "amber", "green"]
} Naturally, you would want to deduplicate these schemas: I guess the easiest way currently would be to save them in the new repository. |
I clearly recognized this problem and the class deals with this "properly", as in the best way possible, by saying it cannot import the class. Note that this does not necessarily stop you from still using the node, the original name and value are also stored and can be retrieved from the But my main point is, even if you were to store the entire representation, how would you then actually use it in code? The way you would use this is something like the following: from aiida.engine import Process
from aiida.orm import EnumData
from some.module import SomeEnum
class SomeProcess(Process):
@classmethod
def define(cls, spec):
super().define(spec)
spec.input('some_enum', valid_type=EnumData)
def run(self):
if self.inputs.some_enum == SomeEnum.VALUE_ONE:
# do something
elif self.inputs.some_enum == SomeEnum.VALUE_TWO:
# do something else So in this example, if you presume that the My point is that the example you provide of declaring an enum in a shell and then storing a member with the |
yes exactly, as I've already said, it prioritises transient use over long term storage, and is fragile to changes in module code. |
@sphuber perhaps we could maybe make a meeting to discuss this better live? So then you can explain to us a bit more the details of the use case you had in mind when you discussed with @giovannipizzi , and then we might understand this better and/or perhaps brainstorm some ideas that could make the information stored here a bit more robust. We could carve a couple of minutes during the coding week so as to not take more time from your regular schedule. |
Fixes #5224
The
Enum
instance is represented by two attributes in the database:value
: the enum valueidentifier
: the string representation of the enum's classThe original enum instance can be obtained from the node through the
value
property. This will except if the enum class can no longer beimported or the stored value is no longer a valid enum value. This can
be the case if the implementation of the class changed since the
original node was stored.
The plugin is named
EnumData
, and notEnum
, because the latter wouldoverlap with the Python built-in class
Enum
from theenum
module.The downside is that this differs from the naming of other Python base
type analogs, such as
Str
andFloat
, which simply use a capitalizedversion of the type for the data plugin class.