-
Notifications
You must be signed in to change notification settings - Fork 796
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add performance suggestions to docs #2278
Comments
I am surprised by this. Shouldn't both Is it possible to show that performance difference using a self-contained reproducer? |
I tried the following benchmark diff --git a/benches/bench_frompyobject.rs b/benches/bench_frompyobject.rs
index 8cf8a4da7..ca4b80bb2 100644
--- a/benches/bench_frompyobject.rs
+++ b/benches/bench_frompyobject.rs
@@ -1,6 +1,6 @@
-use criterion::{criterion_group, criterion_main, Bencher, Criterion};
+use criterion::{black_box, criterion_group, criterion_main, Bencher, Criterion};
-use pyo3::{prelude::*, types::PyString};
+use pyo3::{prelude::*, types::{PyList, PyString}};
#[derive(FromPyObject)]
enum ManyTypes {
@@ -18,8 +18,30 @@ fn enum_from_pyobject(b: &mut Bencher<'_>) {
})
}
+fn list_via_cast_as(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).cast_as().unwrap();
+ });
+ })
+}
+
+fn list_via_extract(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).extract().unwrap();
+ });
+ })
+}
+
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("enum_from_pyobject", enum_from_pyobject);
+ c.bench_function("list_via_cast_as", list_via_cast_as);
+ c.bench_function("list_via_extract", list_via_extract);
}
criterion_group!(benches, criterion_benchmark); and found no measurable difference:
Could it be that something besides using |
Code is currently closed source to avoid confusion, but if I can make it work it will end up being OS. I'll invite you as a collaborator so you can see what I mean, feel free to ignore the invitation if perfer. |
Since the commits always change multiple things (like EDIT: Just noticed the PR you opened doing exactly that. |
Regardless of the outcome of this issue I think this is a great idea 👍 |
So I think the problem is the following: The benchmarked code is basically the following if let Ok(list) = any.extract::<&PyList>() { ... } else { ... } where the else branch is always taken. The difference is indeed significant as show by the benchmark results:
given by the diff diff --git a/benches/bench_frompyobject.rs b/benches/bench_frompyobject.rs
index 8cf8a4da7..86d2047df 100644
--- a/benches/bench_frompyobject.rs
+++ b/benches/bench_frompyobject.rs
@@ -1,6 +1,6 @@
-use criterion::{criterion_group, criterion_main, Bencher, Criterion};
+use criterion::{black_box, criterion_group, criterion_main, Bencher, Criterion};
-use pyo3::{prelude::*, types::PyString};
+use pyo3::{prelude::*, types::{PyList, PyString}};
#[derive(FromPyObject)]
enum ManyTypes {
@@ -18,8 +18,52 @@ fn enum_from_pyobject(b: &mut Bencher<'_>) {
})
}
+fn list_via_cast_as(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).cast_as().unwrap();
+ });
+ })
+}
+
+fn list_via_extract(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyList::empty(py).into();
+
+ b.iter(|| {
+ let _list: &PyList = black_box(any).extract().unwrap();
+ });
+ })
+}
+
+fn not_a_list_via_cast_as(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyString::new(py, "foobar").into();
+
+ b.iter(|| {
+ black_box(any).cast_as::<PyList>().unwrap_err();
+ });
+ })
+}
+
+fn not_a_list_via_extract(b: &mut Bencher<'_>) {
+ Python::with_gil(|py| {
+ let any: &PyAny = PyString::new(py, "foobar").into();
+
+ b.iter(|| {
+ black_box(any).extract::<&PyList>().unwrap_err();
+ });
+ })
+}
+
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("enum_from_pyobject", enum_from_pyobject);
+ c.bench_function("list_via_cast_as", list_via_cast_as);
+ c.bench_function("list_via_extract", list_via_extract);
+ c.bench_function("not_a_list_via_cast_as", not_a_list_via_cast_as);
+ c.bench_function("not_a_list_via_extract", not_a_list_via_extract);
}
criterion_group!(benches, criterion_benchmark); This only affects the EDIT: I also tried replacing |
Using something like #[derive(FromPyObject)]
enum ListOrDictOrAny<'a> {
List(&'a PyList),
Dict(&'a PyDict),
Any(&'a PyAny),
} also does not help as the generated implementation is probably also an if-else-if chain of calls to |
This is really interesting, thank you so much. Stuff like this scares me a little, the explanation is beyond my knowledge. If I hadn't happened upon this fix, I would probably have abandoned the entire project assuming the performance gains I had hoped for weren't possible. |
Agreed, I've wanted to write some notes along these lines for a long time. I think the main thing holding me off has been lack of time, and I'd kind of been hoping I could make progress in #1308 first (which would reduce a few known PyO3 overheads and likely change a number of performance best practices). |
Another point for the performance section: as per pydantic/pydantic-core#501, PyErr::new::<CustomError, _>(([args])) is significantly slower than let custom_error = CustomError::new([args]);
match Py::new(py, validation_error) {
Ok(err) => PyErr::from_value(err.into_ref(py)),
Err(err) => err,
} For some errors, I think because That change (found by chance) improved some of our benchmarks by 30%. |
Superseded by #3310 |
Hi, thanks so much for pyo3 - it's wonderful, I'm using it a lot (see rtoml and watchfiles, I might be using it pydantic soon too).
I wonder if it would be worth adding a "Performance Suggestions" or similar section to the docs, perhaps the FAQs?
Examples:
cast_as
vsextract
I spent a lot of the morning very worried that pyo3 rust making heavy use of python types was actually slower than python, until by chance I tried
cast_as
instead ofextract
.This change in my case made a ~4x improvement to performance.
Optimised compilation
While investigating the above, I noticed this SO question referenced on #1470 didn't mention that the performance is massively effected by debug/release compilation. I've added an answer to the question with suggestions.
These are both no doubt very obvious to the maintainers of this library and sessioned users, but to those new to rust and pyo3, they might be less obvious.
I'm sure there are other things which can have a dramatic effect on performance which I'm not ware of - I've love to hear any other general suggestions?
Update:
Other things to suggest when this section is written:
PyString
s rather than rust stringsThe text was updated successfully, but these errors were encountered: