-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added optimization for copying arrays of simple types from python to C using buffer protocol #129
Conversation
966901a
to
841dfdb
Compare
PyObject * item = PySequence_Fast_GET_ITEM(seq_field, i); | ||
if (!item) { | ||
Py_DECREF(seq_field); | ||
@[ if isinstance(member.type, AbstractSequence) and isinstance(member.type.value_type, BasicType)]@ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 251-280 is a new code. Lines 281-430 is the old code with an additional level of indentation, used as a backup if PyObject_CheckBuffer call fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note for interested parties - github diff view has "Ignore whitespace changes" under the "settings gear" which makes the diff a lot easier to look at in this case
Hi, I can confirm that this improves the publishing performance and also fixes ros2/rclpy#763 (tested by applying the changes to the foxy branch). Thank you @ksuszka this is really great! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left a couple of comments for consideration.
PyErr_SetString(PyExc_RuntimeError, "unable to get buffer"); | ||
Py_DECREF(field); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I'm wondering whether we should fall back to the "slow" method if PyObject_GetBuffer
fails. It's not clear to me in which situations it would fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quote from reference (https://docs.python.org/3/c-api/buffer.html#c.PyObject_GetBuffer): "Send a request to exporter to fill in view as specified by flags. If the exporter cannot provide a buffer of the exact type, it MUST raise PyExc_BufferError, set view->obj to NULL and return -1.".
So, if I understand correctly, in theory it could happen if python structure we try to access is some complex, multidimensional array and we ask for some other shape. But we ask for PyBUF_SIMPLE (https://docs.python.org/3/c-api/buffer.html#shape-strides-suboffsets), so for the simplest possible structure. So it probably will never fail.
However, I'm not a python expert so I'm guessing here.
EDIT: Also, we already check on template level that we are handling AbstractSequence of BasicTypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: Also, we already check on template level that we are handling AbstractSequence of BasicTypes.
This part doesn't matter, I think. At runtime, if the __convert_from_py
function was handed a message that had an attribute for data that we thought should be a buffer, but wasn't, we still might fail here.
But in that case the message would have been not what we were expecting anyway, so any decoding would probably eventually fail. I'm OK with not falling back in this case.
PyErr_SetString(PyExc_RuntimeError, "unable to copy buffer"); | ||
PyBuffer_Release(&view); | ||
Py_DECREF(field); | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here about whether we should fall back to the slow method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quote from reference (https://docs.python.org/3/c-api/buffer.html#c.PyBuffer_ToContiguous): "This function fails if len != src->len.".
As I undrestand it - as we just literally took the value from the src it should never happen. However, just for any future possible change in the bahaviour I would left this check, just in case.
0f1d815
to
930c672
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, and should be a great speedup on Python. I'd like to get another review from @sloretz (and CI) before we move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement! I just a couple nitpicks in the error handling.
Also, mind rebasing this on top of master? That will help with running CI.
This is a fantastic contribution! It really makes a big difference for arrays of numeric types. I checked that the non-buffer API case works, but it isn't that important because the setters for array fields set the field to an Test script#!/usr/bin/python3
import array
import builtin_interfaces.msg
import collections
import rclpy
from rclpy.node import Node
from rclpy.qos import QoSProfile
from test_msgs.msg import UnboundedSequences
import sys
class NoBuffer(collections.Sequence):
def __init__(self, seq):
self._seq = seq
def __getitem__(self, *args, **kwargs):
return self._seq.__getitem__(*args, **kwargs)
def __len__(self):
return len(self._seq)
class SelfCycler(Node):
"""Publish large messages to itself endlessly."""
def __init__(self):
super().__init__("self_cycle_py")
pub_qos = QoSProfile(depth=10)
sub_qos = QoSProfile(depth=10)
self._sub = self.create_subscription(UnboundedSequences, topic="seq_py", callback=self._handle_msg, qos_profile=sub_qos)
self._pub = self.create_publisher(UnboundedSequences, topic="seq_py", qos_profile=pub_qos)
self._timer = self.create_timer(0.1, self.publish_msg)
def publish_msg(self):
num_items = 327680
# BROKEN :(
# self._publish_msg_attr("byte_values", b'\xaa' * num_items)
self._publish_msg_attr("bool_values", NoBuffer((True,) * num_items))
self._publish_msg_attr("byte_values", NoBuffer((b'\xaa',) * num_items))
self._publish_msg_attr("char_values", NoBuffer((234,) * num_items))
self._publish_msg_attr("float32_values", NoBuffer((3.14,) * num_items))
self._publish_msg_attr("float64_values", NoBuffer((6.28,) * num_items))
self._publish_msg_attr("int8_values", NoBuffer((127,) * num_items))
self._publish_msg_attr("uint8_values", NoBuffer((235,) * num_items))
self._publish_msg_attr("int16_values", NoBuffer((1234,) * num_items))
self._publish_msg_attr("uint16_values", NoBuffer((17000,) * num_items))
self._publish_msg_attr("int32_values", NoBuffer((100123,) * num_items))
self._publish_msg_attr("uint32_values", NoBuffer((3000123,) * num_items))
self._publish_msg_attr("int64_values", NoBuffer((42,) * num_items))
self._publish_msg_attr("uint64_values", NoBuffer((84,) * num_items))
self._publish_msg_attr("string_values", NoBuffer(("hi",) * num_items))
self._publish_msg_attr("bool_values", (True,) * num_items)
self._publish_msg_attr("byte_values", (b'\xaa',) * num_items)
self._publish_msg_attr("char_values", (234,) * num_items)
self._publish_msg_attr("float32_values", (3.14,) * num_items)
self._publish_msg_attr("float64_values", (6.28,) * num_items)
self._publish_msg_attr("int8_values", (127,) * num_items)
self._publish_msg_attr("uint8_values", (235,) * num_items)
self._publish_msg_attr("int16_values", (1234,) * num_items)
self._publish_msg_attr("uint16_values", (17000,) * num_items)
self._publish_msg_attr("int32_values", (100123,) * num_items)
self._publish_msg_attr("uint32_values", (3000123,) * num_items)
self._publish_msg_attr("int64_values", (42,) * num_items)
self._publish_msg_attr("uint64_values", (84,) * num_items)
self._publish_msg_attr("string_values", ("hi",) * num_items)
self._publish_msg_attr("bool_values", [True,] * num_items)
self._publish_msg_attr("byte_values", [b'\xaa',] * num_items)
self._publish_msg_attr("char_values", [234,] * num_items)
self._publish_msg_attr("float32_values", [3.14,] * num_items)
self._publish_msg_attr("float64_values", [6.28,] * num_items)
self._publish_msg_attr("int8_values", [127,] * num_items)
self._publish_msg_attr("uint8_values", [235,] * num_items)
self._publish_msg_attr("int16_values", [1234,] * num_items)
self._publish_msg_attr("uint16_values", [17000,] * num_items)
self._publish_msg_attr("int32_values", [100123,] * num_items)
self._publish_msg_attr("uint32_values", [3000123,] * num_items)
self._publish_msg_attr("int64_values", [42,] * num_items)
self._publish_msg_attr("uint64_values", [84,] * num_items)
self._publish_msg_attr("string_values", ["hi",] * num_items)
self._publish_msg_attr("char_values", array.array('B', [234,] * num_items))
self._publish_msg_attr("float32_values", array.array('f', [3.14,] * num_items))
self._publish_msg_attr("float64_values", array.array('d', [6.28,] * num_items))
self._publish_msg_attr("int8_values", array.array('b', [127,] * num_items))
self._publish_msg_attr("uint8_values", array.array('B', [235,] * num_items))
self._publish_msg_attr("int16_values", array.array('h', [1234,] * num_items))
self._publish_msg_attr("uint16_values", array.array('H', [17000,] * num_items))
# BROKEN :(
# self._publish_msg_attr("int32_values", array.array('l', [100123,] * num_items))
# self._publish_msg_attr("uint32_values", array.array('L', [3000123,] * num_items))
# self._publish_msg_attr("int64_values", array.array('q', [42,] * num_items))
# self._publish_msg_attr("uint64_values", array.array('Q', [84,] * num_items))
sys.exit()
def _publish_msg_attr(self, msg_attr, value):
msg = UnboundedSequences()
setattr(msg, msg_attr, value)
t1 = self.get_clock().now().to_msg()
self._pub.publish(msg)
t2 = self.get_clock().now().to_msg()
self.get_logger().info(f"{msg_attr} {type(value)} took {self.format_dt(t1, t2)}")
def format_dt(self, t1: builtin_interfaces.msg.Time, t2: builtin_interfaces.msg.Time):
""" Helper for formatting the difference between two stamps in microseconds """
us1 = t1.sec * 1e6 + t1.nanosec // 1e3
us2 = t2.sec * 1e6 + t2.nanosec // 1e3
return f"{int(us2 - us1):5} [us]"
def _handle_msg(self, msg):
pass # Nothing to do here
def main(args=None):
rclpy.init()
node = SelfCycler()
try:
rclpy.spin(node)
finally:
node.destroy_node()
if __name__ == '__main__':
main() Results on master
Results with this PR
|
Signed-off-by: ksuszka <[email protected]>
Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Krzysztof Suszka <[email protected]>
930c672
to
c727729
Compare
Signed-off-by: Krzysztof Suszka <[email protected]>
d637f31
to
0510135
Compare
CI LGTM, the warnings on windows are also in the nightly https://ci.ros2.org/view/nightly/job/nightly_win_rel/1961/msbuild/new/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the PR!
@Mergifyio backport galactic |
…C using buffer protocol (#129) * Added optimization for copying arrays of simple types from python to C Signed-off-by: ksuszka <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Krzysztof Suszka <[email protected]> * Remove setting error message twice Signed-off-by: Krzysztof Suszka <[email protected]> Co-authored-by: Chris Lalancette <[email protected]> (cherry picked from commit 1796dfd)
Command
|
…C using buffer protocol (#129) (#145) * Added optimization for copying arrays of simple types from python to C Signed-off-by: ksuszka <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Krzysztof Suszka <[email protected]> * Remove setting error message twice Signed-off-by: Krzysztof Suszka <[email protected]> Co-authored-by: Chris Lalancette <[email protected]> (cherry picked from commit 1796dfd) Co-authored-by: ksuszka <[email protected]>
…C using buffer protocol (ros2#129) * Added optimization for copying arrays of simple types from python to C Signed-off-by: ksuszka <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Krzysztof Suszka <[email protected]> * Remove setting error message twice Signed-off-by: Krzysztof Suszka <[email protected]> Co-authored-by: Chris Lalancette <[email protected]>
…C using buffer protocol (ros2#129) * Added optimization for copying arrays of simple types from python to C Signed-off-by: ksuszka <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Krzysztof Suszka <[email protected]> * Remove setting error message twice Signed-off-by: Krzysztof Suszka <[email protected]> Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Aleksandr Rozhdestvenskii <[email protected]>
@Mergifyio backport foxy |
…C using buffer protocol (#129) * Added optimization for copying arrays of simple types from python to C Signed-off-by: ksuszka <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Krzysztof Suszka <[email protected]> * Remove setting error message twice Signed-off-by: Krzysztof Suszka <[email protected]> Co-authored-by: Chris Lalancette <[email protected]> (cherry picked from commit 1796dfd)
✅ Backports have been created
|
…C using buffer protocol (#129) (#146) * Added optimization for copying arrays of simple types from python to C Signed-off-by: ksuszka <[email protected]> * Apply suggestions from code review Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Krzysztof Suszka <[email protected]> * Remove setting error message twice Signed-off-by: Krzysztof Suszka <[email protected]> Co-authored-by: Chris Lalancette <[email protected]> Signed-off-by: Aleksandr Rozhdestvenskii <[email protected]> Co-authored-by: ksuszka <[email protected]> Co-authored-by: Chris Lalancette <[email protected]>
Does this also fix potential issues with CPP to python conversion? I have a CPP service that sends data in the form of |
Possibly, are you on Foxy? This patch hasn't been released in Foxy yet, but you could try to compare against Galactic and Rolling if possible. |
I am trying on Galactic (built from source) |
This is rather naive approach to improve perfomance of copying large amount of data from python to C.
I didn't remove old code, but instead I've added a check if data can be accessed using buffer protocol, and if the answer is yes then a sequence is copied in single step, if the answer is no, then fallback to previous method is used.
It seems to work for ROS2 version of carla_ros_bridge where we discovered this issues. It seems to be the same issue as described in ros2/rclpy#763