Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make serde pickle version configurable #160

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions pymemcache/codecs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# Copyright 2012 Pinterest.com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why rename the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide backwards compatibility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Without polluting the primary namespace)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also a better name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import abc
import logging
from io import BytesIO
import six
from six.moves import cPickle as pickle

log = logging.getLogger(__name__)

try:
long_type = long # noqa
except NameError:
long_type = None


@six.add_metaclass(abc.ABCMeta)
class ICodec(object):
"""
Interface for serializers.
"""

@abc.abstractmethod
def from_python(self, key, value):
"""
Serialize a python object.

:param str|unicode key: Key
:param str|unicode value: Value
:return tuple[str, int]: tuple(value, flags)
"""
raise NotImplementedError()

@abc.abstractmethod
def to_python(self, key, value, flags):
"""
Deserialize a value into a python object.

:param str|unicode key: Key
:param str|unicode value: Value
:param int flags: Bitflag containing flags used to specify how to
deserialize this object.
:return object: Deserialized python object.
"""
raise NotImplementedError()


class Serde(ICodec):
"""
Serialization handler.

Meant to be compatible with `python-memcached`.
"""

FLAG_BYTES = 0
FLAG_PICKLE = 1 << 0
FLAG_INTEGER = 1 << 1
FLAG_LONG = 1 << 2
FLAG_COMPRESSED = 1 << 3 # unused, to main compatibility with python-memcached
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./pymemcache/codecs.py:71:81: E501 line too long (83 > 80 characters)
    FLAG_COMPRESSED = 1 << 3  # unused, to main compatibility with python-memcached

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll conform to VT100 in an update

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

FLAG_TEXT = 1 << 4

pickle_version = 0

def __init__(self, pickle_version=0):
"""
Init

:param int pickle_version: Pickle version to use (from python).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not the pickle version but rather the pickle protocol version

https://docs.python.org/3/library/pickle.html#data-stream-format

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Use `-1` to use the highest supported at runtime.
Deserialization is not affected by this parameter.

A forewarning with `0` (the default): If somewhere in your value lies
a slotted object, ie defines `__slots__`, even if you do not include
it in your pickleable state via `__getstate__`, python will raise:
```
TypeError: a class that defines __slots__ without defining
__getstate__ cannot be pickled
```
"""
if pickle_version is not None:
self.pickle_version = pickle_version

def from_python(self, key, value):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about call this function serialize() (here and in metaclass)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it because in my experience serialize and deserialize are not the best names. Using a common anchor it seems to stick with more juniors easier and avoids headache. Still, I can update if you want, but I'd recommend not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
Serialize a python object.

:param str|unicode key: Key
:param str|unicode value: Value
:return tuple[str, int]: tuple(value, flags)
"""
flags = 0
value_type = type(value)

# Check against exact types so that subclasses of native types will be
# restored as their native type
if value_type is bytes:
pass

elif value_type is six.text_type:
flags |= self.FLAG_TEXT
value = value.encode('utf8')

elif value_type is int:
flags |= self.FLAG_INTEGER
value = "%d" % value

elif six.PY2 and value_type is long_type:
flags |= self.FLAG_LONG
value = "%d" % value

else:
flags |= self.FLAG_PICKLE

output = BytesIO()
pickler = pickle.Pickler(output, self.pickle_version)
pickler.dump(value)
value = output.getvalue()

return value, flags

def to_python(self, key, value, flags):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here with deserialize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
Deserialize a value into a python object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicochar, anytime for you bud.


:param str|unicode key: Key
:param str|unicode value: Value
:param int flags: Bitflag containing flags used to specify how to
deserialize this object.
:return object: Deserialized python object.
"""
if flags == 0:
return value

elif flags & self.FLAG_TEXT:
return value.decode('utf8')

elif flags & self.FLAG_INTEGER:
return int(value)

elif flags & self.FLAG_LONG:
if six.PY3:
return int(value)
else:
return long_type(value)

elif flags & self.FLAG_PICKLE:
try:
buf = BytesIO(value)
unpickler = pickle.Unpickler(buf)
return unpickler.load()
except Exception as exc:
# This includes exc as a string for troubleshooting as well as
# providing a trace.
log.exception('Could not depickle value (len=%d): %s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear why this is better:

import logging
try:
    raise Exception("raised")
except Exception as exc:
    logging.exception("TEST: %s", exc)


try:
    raise Exception("raised")
except Exception:
    logging.exception("TEST")
ERROR:root:TEST: raised
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    raise Exception("raised")
Exception: raised
ERROR:root:TEST
Traceback (most recent call last):
  File "test.py", line 9, in <module>
    raise Exception("raised")
Exception: raised

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because from the initial line, you know what the exc was that caused it. This helps when an exception storm hits and you have N of them in your log interspersed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging entries are usually configured with a better format than the default that you used, which contain things such as the process and thread ID.

len(value), exc)
return None

return value
104 changes: 8 additions & 96 deletions pymemcache/serde.py
Original file line number Diff line number Diff line change
@@ -1,98 +1,10 @@
# Copyright 2012 Pinterest.com
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Backwards compatibility with the older serialization api previously
provided by this module.
"""
from . import codecs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use absolute imports here.

https://www.python.org/dev/peps/pep-0008/#imports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 allows relative imports. Is there a reason to not use them and opt for the less featureful alternative?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it consistent with the rest of the project, which uses absolute imports. Both positions are defendable but I think there's a strong argument for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, consistency > opinions


import logging
from io import BytesIO
import six
from six.moves import cPickle as pickle
_SERDE = codecs.Serde()

try:
long_type = long # noqa
except NameError:
long_type = None


FLAG_BYTES = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I don't think removing these from this file will be a problem it is technically a backwards incompatible change and don't want to break backwards compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO These should have been private to begin with, plus can't you just bump the major? I'll import them into serde.py if you want though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why this is backward incompatible? I think this is OK. Am I missing something @jogo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already imported the API methods, and I don't think anyone should really be using constants in an arbitrary location in a library.

FLAG_PICKLE = 1 << 0
FLAG_INTEGER = 1 << 1
FLAG_LONG = 1 << 2
FLAG_COMPRESSED = 1 << 3 # unused, to main compatability with python-memcached
FLAG_TEXT = 1 << 4

# Pickle protocol version (-1 for highest available to runtime)
# Warning with `0`: If somewhere in your value lies a slotted object,
# ie defines `__slots__`, even if you do not include it in your pickleable
# state via `__getstate__`, python will complain with something like:
# TypeError: a class that defines __slots__ without defining __getstate__
# cannot be pickled
PICKLE_VERSION = -1


def python_memcache_serializer(key, value):
flags = 0
value_type = type(value)

# Check against exact types so that subclasses of native types will be
# restored as their native type
if value_type is bytes:
pass

elif value_type is six.text_type:
flags |= FLAG_TEXT
value = value.encode('utf8')

elif value_type is int:
flags |= FLAG_INTEGER
value = "%d" % value

elif six.PY2 and value_type is long_type:
flags |= FLAG_LONG
value = "%d" % value

else:
flags |= FLAG_PICKLE
output = BytesIO()
pickler = pickle.Pickler(output, PICKLE_VERSION)
pickler.dump(value)
value = output.getvalue()

return value, flags


def python_memcache_deserializer(key, value, flags):
if flags == 0:
return value

elif flags & FLAG_TEXT:
return value.decode('utf8')

elif flags & FLAG_INTEGER:
return int(value)

elif flags & FLAG_LONG:
if six.PY3:
return int(value)
else:
return long_type(value)

elif flags & FLAG_PICKLE:
try:
buf = BytesIO(value)
unpickler = pickle.Unpickler(buf)
return unpickler.load()
except Exception:
logging.info('Pickle error', exc_info=True)
return None

return value
python_memcache_serializer = _SERDE.from_python
python_memcache_deserializer = _SERDE.to_python
75 changes: 75 additions & 0 deletions pymemcache/test/test_codecs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# -*- coding: utf-8 -*-
from unittest import TestCase

from pymemcache import codecs
import pytest
import six

try:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should import this the same way we do elsewhere in this code: from six.moves import cPickle as pickle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I can do that, it breaks tab completion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import cPickle as pickle
except ImportError:
import pickle


class CustomInt(int):
"""
Custom integer type for testing.

Entirely useless, but used to show that built in types get serialized and
deserialized back as the same type of object.
"""
pass


@pytest.mark.unit()
class TestSerde(TestCase):
Serde = codecs.Serde

def check(self, value, expected_flags, pickle_version=None):
s = self.Serde(pickle_version=pickle_version)

serialized, flags = s.from_python(b'key', value)
assert flags == expected_flags

# pymemcache stores values as byte strings, so we immediately the value
# if needed so deserialized works as it would with a real server
if not isinstance(serialized, six.binary_type):
serialized = six.text_type(serialized).encode('ascii')

deserialized = s.to_python(b'key', serialized, flags)
assert deserialized == value

def test_bytes(self):
self.check(b'value', self.Serde.FLAG_BYTES)
self.check(b'\xc2\xa3 $ \xe2\x82\xac', self.Serde.FLAG_BYTES) # £ $ €

def test_unicode(self):
self.check(u'value', self.Serde.FLAG_TEXT)
self.check(u'£ $ €', self.Serde.FLAG_TEXT)

def test_int(self):
self.check(1, self.Serde.FLAG_INTEGER)

def test_long(self):
# long only exists with Python 2, so we're just testing for another
# integer with Python 3
if six.PY2:
expected_flags = self.Serde.FLAG_LONG
else:
expected_flags = self.Serde.FLAG_INTEGER
self.check(123123123123123123123, expected_flags)

def test_pickleable(self):
self.check({'a': 'dict'}, self.Serde.FLAG_PICKLE)

def test_subtype(self):
# Subclass of a native type will be restored as the same type
self.check(CustomInt(123123), self.Serde.FLAG_PICKLE)

def test_pickle_version(self):
for pickle_version in range(-1, pickle.HIGHEST_PROTOCOL):
self.check(
dict(whoa='nelly', humans=u'horrid', answer=42),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use a different key value pair for humans.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assist me with what you would like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean humans aren't horrid? But we are! Lol, I'll update it!

self.Serde.FLAG_PICKLE,
pickle_version=pickle_version,
)
Loading