Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for type arguments in pyclass #303

Closed
wants to merge 18 commits into from
Closed

Support for type arguments in pyclass #303

wants to merge 18 commits into from

Conversation

athre0z
Copy link
Contributor

@athre0z athre0z commented Dec 7, 2018

I did some experiments on support for generic structs. With generics being compiler time, we obviously can't leave them fully generic. However, we could allow for registering multiple variants of a struct with different type arguments under different names, e.g.:

#[pyclass]
struct GenericClass<T> {
    foo: T,
}

#[pymodule]
fn rust_py(py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<GenericClass<u32>>("GenericClassU32")?;
    m.add_class::<GenericClass<f32>>("GenericClassF32")?;
    Ok(())
}

We'd either need to make the breaking change of adding a name argument to add_class or add a second method add_generic_class. Also, this clashes with PyTypeInfo having a const NAME property. Judging from a quick grep trough the project, aside from initialize_type, it doesn't appear to be used widely, so it might be an option to just drop it from the trait.

I'm sure there are more issues that I didn't think of, yet. Please let me know if you like the general idea and in case you do, I'd be happy to contrib all changes required to make this work!

@kngwyu
Copy link
Member

kngwyu commented Dec 7, 2018

Great work, thanks 👍
I think the only problem is FromPyObject implementation.
Default implementation of FromPyObject checks only a given python value is an instance of T::type_object or not.
So we should do some dynamic type checking in FromPyObject::extract(like rust numpy does).
But... now I have no good idea to automatically implement this kind of routine 🤔

@athre0z
Copy link
Contributor Author

athre0z commented Dec 7, 2018

Hmm. As a previous C++ dev, it came as a surprise to me that the static variable in the generic trait impl is in fact shared between all instantiations of all types. I did a bit of googling, but it really does appear like there is no way to create per-generic-instantation statics at this point.

In this case, we might have to fall back to lazy allocation of the type object in the PyTypeInfo::type_object impl, using a HashMap<TypeId, PyTypeObject> when type arguments are detected by the proc macro. Do you see any issues with this approach? Also, someone more familiar with the code base might be able to give some insights on whether we'd need to wrap it into a mutex or if we are protected by the GIL here.

**Edit: Mh, that would require either a dep on lazy_static or some pretty unholy unsafe pointer casting trickery (since statics cannot have non-const initializers).

@athre0z
Copy link
Contributor Author

athre0z commented Dec 8, 2018

I now implemented the static HashMap approach discussed above. When generic arguments are detected by the proc macro, instead of using a simple static, this code is emit:

unsafe fn type_object() -> &'static mut ::pyo3::ffi::PyTypeObject {
    use std::collections::hash_map::Entry;
    let mut map = ::pyo3::typeob::PY_TYPE_OBJ_MAP.lock().unwrap();
    let obj = match map.entry(std::any::TypeId::of::<S<T>>()) {
        Entry::Occupied(o) => o.into_mut(),
        Entry::Vacant(v) => v.insert(::pyo3::ffi::PyTypeObject_INIT),
    };
    &mut *(obj as *mut _)
}

This way, every variant of the generic has it's own type object.

It required adding a dep on lazy_static and adding unsafe impl Send for PyTypeObject {}. Would this generally be acceptable? I choose to use a global, shared hash map to prevent the problems with generating code from the macro referring to lazy_static, requiring every user of the lib to depend on lazy_static directly.

Also, generics have to be bound to 'static because std::any::TypeId::of requires it. This requirement could be lifted once non_static_type_id becomes available since we actually don't care about lifetimes, however this cannot be expected to happen anytime soon. However, requiring 'static doesn't appear to be much of a restriction to me, just a little bit of extra typing.

I'll take a look at the required reworks to PyTypeInfo::NAME next and will keep you posted on my status!

Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this pull requests! In general I like the approach with the HashMap, just some notes:

lazy_static is fine, that crate is so common I expect it to go into the standard library at some point. For Send for PyTypeObject I'm not sure, I'll have to give that a second look later.

I think it would be better to have one HashMap for every #[pyclass]. It should also be documented somewhere, that using a generic causes a hashmap to be used in the background (maybe in the class chapter of the guide).

Eventually we'll also need a test that check that python considers the types of two instances of the same generic instantiation as equal and those of different generic instantiation as different

pyo3-derive-backend/src/py_class.rs Outdated Show resolved Hide resolved
@kngwyu
Copy link
Member

kngwyu commented Dec 8, 2018

I'd rather apply simpler approach.
How about just not implementing PyTryFrom for generic class?

@konstin
Copy link
Member

konstin commented Dec 8, 2018

That would only allow converting that rust object in to a generic python object, not to create a python of the type e.g. Foo<Bar>. It also wouldn't allow to have fields in rust that are not convertible to python types, but are made available to python through some abstraction (e.g. if you'd implement a database connector in rust, you'd need that).

pyo3-derive-backend/src/py_class.rs Outdated Show resolved Hide resolved
@kngwyu
Copy link
Member

kngwyu commented Dec 8, 2018

That would only allow converting that rust object in to a generic python object, not to create a python of the type e.g. Foo

Ah I didn't think about this case, thanks

It also wouldn't allow to have fields in rust that are not convertible to python types, but are made available to python through some abstraction

I can't imagine concrete use cases... 🤔

Now I think this approach is not bad, but I also think we allow users to override PyTryFrom.
Like numpy's ndarray, I think there's some cases users want to extract Struct<T> according to type flags in Python objects.
But, off course, it should be done in a separated PR.

@kngwyu
Copy link
Member

kngwyu commented Dec 8, 2018

Ah I noticed a very important point.
Type objects of T::PyTypeInfo are initialized here so...
Maybe we should override PyTypeCrete::init_type for generics.

Sorry I think it's not problematic.
Anyway I'm not sure this code works and want some tests

@kngwyu
Copy link
Member

kngwyu commented Dec 8, 2018

Looks like we can't change type object names displayed in python, because they're
set as <T as PyTypeInfo>::NAME> in https://github.com/PyO3/pyo3/blob/master/src/typeob.rs#L320 🤔
We should fix this in the future.

@athre0z
Copy link
Contributor Author

athre0z commented Dec 9, 2018

While starting to rework the code in order remove PyTypeInfo::NAME, I discovered PyTypeCreate and realized that add_class is not the only way to register new types, so passing the name as an argument to add_class is not an option.

I think I might have a better idea that gets rid of both the hash map and the naming issue at the same time. What if, for generic pyclasses, we require users to pass a parameter generic_instantiations/type_instantiations/variants/type_variants (ideas for better names welcome), defining supported type instantiations and a corresponding name and then have the proc macro emit a specialized version of PyTypeInfo for each variant? This kind of specialization is supported in stable rust by the way and doesn't require #![feature(specialization)] which, I believe, you guys are currently working on getting rid of.

Possible syntax:

#[pyclass(variants="MyStructU32<u32>,MyStructF32<f32>")]
struct MyStruct<T> where T: 'static {
    _a: T,
}

Or, using repeated arguments (would probably require some rather substantial reworks in parse_attributes):

#[pyclass(variant="MyStructU32<u32>", variant="MyStructF32<f32>")]
struct MyStruct<T> where T: 'static {
    _a: T,
}

Edit: This syntax is possible as well -- I think this is my preferred one:

#[pyclass(variants("MyStructU32<u32>", "MyStructF32<f32>"))]
struct MyStruct<T> where T: 'static {
    _a: T,
}

@kngwyu
Copy link
Member

kngwyu commented Dec 10, 2018

  • Since the type objects are not freed till the program exits, I think &'static mut PyTypeObject via Box::leak is better than Box
  • How about using Vec<(TypeId, TypeObj)>, instead of HashMap? I think it's faster, because in most cases we don't register so many instances.
  • I don't want to make PyTypeObject sendable. How about creating a sole use wrapper struct? Like
#[doc(hidden)]
pub struct PyTypeObSendable(PyTypeObject)
  • As @konstin commented, we need tests that check our new generic type-objects really work.

@athre0z
Copy link
Contributor Author

athre0z commented Dec 10, 2018

Did you read my previous comment? Using this approach, we'll no longer need a dynamic type object lookup, a Send impl for PyTypeObject or lazy_static at all. In fact, I might not have to change anything at all outside of py_class.rs this way. I'm currently working on an implementation (might take until tomorrow, I'm pretty packed today).

As indicated by the title, this is all rather work in progress and I never expected it to be merged as-is. Instead, the PR serves as a means of communicating with you guys and figuring out how to design this. I'll write some decent tests as the final step.

@kngwyu
Copy link
Member

kngwyu commented Dec 10, 2018

then have the proc macro emit a specialized version of PyTypeInfo for each variant?

You mean in proc macro, we generate a wrapper structs like

struct MyStructU32(MyStructU32)

and implement traits for each one?
If so, I think it's good.

@athre0z
Copy link
Contributor Author

athre0z commented Dec 10, 2018

Nah, something like this:

juabuqvhjdtcgy0x

@kngwyu
Copy link
Member

kngwyu commented Dec 10, 2018

Ah I understand, thanks.
But not sure how to integrate with pymethods

@athre0z
Copy link
Contributor Author

athre0z commented Jan 12, 2019

Alright, the usual end of year / new years madness is over and I found some more time to work in this again. The variants-annotation based approach seems to be fitting the current project design rather well and I got it to work without a lot of deep refactoring. It also doesn't have any run-time overhead compared to just duplicating the class definition with different types manually or using macros.

I implemented pymethods using an approach very similar to what I do with PyTypeInfo for pyclass -- I simply emit one PyMethodsProtocolImpl per variant.

Compared to my initial approach using the HashMap, it has the drawback that new variants can only be defined when the pyclass is defined. However, I don't think the initial approach would be viable with the current overall design of the project -- it would require very deep refactoring and design changes towards a more "centralized" design where every type must be registered (and named) into a registry or something like this and would come at the cost of runtime overhead.

All in all, I think we're better of with the variant based design.

The test file for generic classes gives a general idea on how usage would look like with the current design.

I refactored py_class.rs to use a struct for storing the attributes instead of using a HashMap<String, syn::Expr> due to the requirements of storing class variants that aren't syn::Exprs. IMO, it makes the code more readable as well.

In case you are wondering about https://github.com/athre0z/pyo3/commit/22e4a2225adfecd2754b0252bbe29c41740d1f52: I originally expected that I would have to change substantial amounts of code in this file and started by refactoring out most of the code duplication that made it hard to comprehend ("is the definition for func type X different than type Y" etc) and unified it into a macro. It later turned out that I wouldn't have to change this stuff after all, but I didn't want to revert it because I think it improves code quality anyways. If you want me to extract this commit into a dedicated PR, that's no problem.

Currently, pyclass and pymethods use the same syntax for their variants definition, requiring for unnecessary duplication of the variant name on the pymethods block. I can change the pymethods signature to be type only, if you'd prefer that.

I think we're slowly moving towards something mergable with this. Please let me know what you guys think!

Edit: If you want me to rebase this on master, just say the word.
Edit 2: pyproto will require some reworks as well to support generics. However, since this PR is already pretty big, I'd suggest we leave that out for now and implement & review that in a separate PR.

@athre0z athre0z changed the title Support for type arguments in pyclass (don't merge, work in progress) Support for type arguments in pyclass Jan 12, 2019
@milesgranger
Copy link
Contributor

Nothing constructive to say other than this will be a great new feature to have, thank you!! 👍

) -> TokenStream {
check_generic(name, sig);

let doc = utils::get_doc(&meth_attrs, true);
let spec = FnSpec::parse(name, sig, meth_attrs);

macro_rules! make_py_method_def {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this also be a normal function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it would require 2 more arguments passed to each invocation of the macro/function (the local macro has access to the locals doc and name without passing them explicitly) and wrapping the fourth argument into a lambda. Do you think this is preferable? I don't really have a preference here.

@konstin
Copy link
Member

konstin commented Feb 1, 2019

The code looks good in general, though there are some merge conflicts due to the rust 2018 migration and I fear I'll cause some more breakages with #332. I also like that we don't need a hashmap any more.

Regarding the syntax of the generics: Do we need to declare all variants in the macro attributes, or would it be possible to have a separate macro to add variants? In both cases, do we need the parentheses around the specific types? (e.g. variants("SimpleGenericU32<u32>", "SimpleGenericF32<f32>") vs variants(SimpleGenericU32<u32>, SimpleGenericF32<f32>))

You should also consider using syn::parse for the macro arguments. It's not used in pyo3 because that part of syn is newer than the macros, but it could make new parsing code much more idiomatic.

Rebasing onto master would be good, but given the current churn I think it's better to just merge master. That should also unblock travis.

@athre0z
Copy link
Contributor Author

athre0z commented Feb 1, 2019

This parsing API looks a lot cleaner indeed. However, I don't think that it would make a lot of sense to just use it for the new variants stuff. I'd rather go ahead and refactor the whole pymethod argument parsing to use this approach, if that is fine with you.

Anyways -- I'll start by merging master now.

Edit: I originally missed this part of your reply:

Regarding the syntax of the generics: Do we need to declare all variants in the macro attributes, or would it be possible to have a separate macro to add variants?

Hm yeah, I thought about that as well. I think that could be possible by having pyclass and pymethods declare just the generic version of all structs and moving all the stuff that that is per-variant (PyTypeInfo impl, inventory::collect, inventory::submit and probably some more things that I didn't think of) into a register_variant macro (which could just be auto-invoked for non-generic types).

@athre0z
Copy link
Contributor Author

athre0z commented Feb 4, 2019

I looked through the required additional changes needed to allow for defining variants outside of the initial definitions and concluded that only a small fraction of what is currently implemented in this PR would fit the new design. Since this PR has way too many commits with discarded changes and discussion regarding old versions already, I'm going to close it here.

Unfortunately, the amount of work and thus time that is still required to bring this to a level where it would be an actually useful addition for the user rather than a fancy way of doing what 5 lines of macros can do is significantly more than I can allocate right now.

I'll give a brief (and likely incomplete) list of things that need to be tackled in order for this feature to land, in case anyone wants to pick up where I left:

  • Forward the class' type arguments & bounds to any helper type generated by PyO3
  • Separate generated code that can remain generic from code that needs to be concrete / per variant (essentially everything directly registered to Python, e.g. methods and the class name)
  • For each generic pyclass and pymethods block, generate a macro_rules macro that can be used to define new variants. This macro would then define the previously separated per-variant things.
    • A second macro would likely be required for generating & registering the variant methods, although it might be possible to generate a third one that just invokes both.
    • Alternate lazy approach: don't separate these things and generate everything per-variant for the concrete type. Likely a lot less work.
  • Look into pyproto and check what reworks are required to allow generics here
    • Generic pyclass and pymethods are limited in usefulness without having generic pyproto as well
  • Write some decent tests and documentation on how to use generic types
    • Essentially, all the features like freelist, setters and getters, etc. would have to be tested again in their generic counterpart

Anyone who is willing to look into this and is unfamiliar with the code base should likely allocate at least two weeks of more or less full time work. This might sound like a lot, but debugging proc macro code is extremely ugly and the majority of PyO3's macro input parsing & code-gen is undocumented and a little chaotic. Perhaps, at some point, it might be a good idea to rewrite it with generics in mind and split the whole stuff into two phases. One for parsing the inputs, storing all requirements in a nice struct and the second one for generating the actual code. IMHO, the mixture of both makes extending the code a rather unpleasant experience.

Perhaps, my previous work here at least gives an idea on how to not design type argument support. :D Feel free to cherry-pick anything you feel is worth keeping!

@athre0z athre0z closed this Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants