Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached_function and cached_method for unhashable elements #16316

Closed
saraedum opened this issue May 9, 2014 · 61 comments
Closed

cached_function and cached_method for unhashable elements #16316

saraedum opened this issue May 9, 2014 · 61 comments

Comments

@saraedum
Copy link
Member

saraedum commented May 9, 2014

Caching does currently not work for unhashable elements

sage: @cached_function
....: def f(x):
....:     return x
....:
sage: K.<a> = Qq(9)
sage: f(a)
TypeError: unhashable type: 'sage.rings.padics.padic_ZZ_pX_CR_element.pAdicZZpXCRElement'

In this case a should not be hashable since its current operator == could
never be made consistent with a non-trivial hash value (cf. #11895).
However, such elements should be cacheable.

This ticket adds a _cache_key() for such elements which can be used to hash and
compare unhashable elements. That method should then return a tuple (or
anything else that is hashable) which uniquely identifies the element, e.g. for
p-adics the precision and a lift of the element to the integers.

Depends on #16251

CC: @simon-king-jena

Component: misc

Author: Julian Rueth

Branch/Commit: aa037de

Reviewer: Peter Bruin, Travis Scrimshaw

Issue created by migration from https://trac.sagemath.org/ticket/16316

@saraedum saraedum added this to the sage-6.2 milestone May 9, 2014
@saraedum
Copy link
Member Author

saraedum commented May 9, 2014

Branch: u/saraedum/ticket/16316

@saraedum
Copy link
Member Author

saraedum commented May 9, 2014

Dependencies: #16251

@saraedum
Copy link
Member Author

saraedum commented May 9, 2014

Author: Julian Rueth

@saraedum
Copy link
Member Author

saraedum commented May 9, 2014

New commits:

7a0f094Introduced _cache_key for sage objects
62c9681made polynomials with unhashable coefficients unhashable
fa16bc7Merge branch 'u/saraedum/ticket/16251' of git://trac.sagemath.org/sage into ticket/16316
798aaf8Implemented _cache_key() for polynomials
877302eEnable caching for non-hashable objects

@saraedum
Copy link
Member Author

saraedum commented May 9, 2014

Commit: 877302e

@tscrim
Copy link
Collaborator

tscrim commented May 10, 2014

comment:5

Your proposal is not nearly as scary as your description would indicate. We do not want to cache unhashable things (in fact, we can't). Instead what you're doing is creating a generic method _create_key, which is the default key when the object being passed to a @cached_function. For example, this would allow us to have matrices/graphs automatically return immutable copies of themselves (something I think I could leverage and find useful for passing through UniqueRepresentation.__classcall__).

I'm thinking we should also do automatic conversions for mutable python objects such as list -> tuple and set -> frozenset. Yet we should make really sure that we're not slowing things down in doing this. Similar for memory leaks.

Simon, do you have any thoughts on this?

@nexttime nexttime mannequin modified the milestones: sage-6.2, sage-6.3 May 11, 2014
@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 13, 2014

Changed commit from 877302e to a65ac85

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented May 13, 2014

Branch pushed to git repo; I updated commit sha1. New commits:

caa380bDo not unpack hashable entries of tuples when looking for a cache key for unhashable elements in sage.misc.cachefunc
a65ac85improved docstring of SageObject._cache_key

@pjbruin
Copy link
Contributor

pjbruin commented Jun 23, 2014

comment:8

It seems that the problem really has to do with key comparison in dictionaries, not with hashing or caching specifically. Given (1) Python's use of == for key comparison and (2) Sage's lenient implementation of == for elements, we basically cannot use Sage elements as dictionary keys.

One would like a third comparison operator, say equal1, that regards two objects as equal if and only if all their "mathematically relevant" properties (e.g. value and precision for real or p-adic numbers, but not memory location) are the same. Our caching problems would then disappear if keys were compared using equal. In this hypothetical case, the correct property for hash functions would be X equal Y => hash(X) == hash(Y).

  • Do I understand correctly that the purpose of cache_key can be viewed as emulating the "missing" equal by assigning to each object X a (hashable) object cache_key(X) satisfying the condition cache_key(X) == cache_key(Y) <=> X equal Y ?

  • If so, does the implementation in this ticket satisfy this condition? (I am mainly worried that the parent of an element x does not seem to be encoded in cache_key(x).)

1 I just looked up the different comparison functions (=, eq, eql, equal and equalp) in Lisp and equal seems to be the closest to what we would need, hence I am using that notation. However, Lisp uses eql by default for hash tables.

@saraedum
Copy link
Member Author

comment:9

Replying to @pjbruin:

It seems that the problem really has to do with key comparison in dictionaries, not with hashing or caching specifically. Given (1) Python's use of == for key comparison and (2) Sage's lenient implementation of == for elements, we basically cannot use Sage elements as dictionary keys.

This is not really the topic of the ticket, but I do not agree. It depends on what you want to do. If you have sufficient control over which elements are put in the dictionary, then you can use Sage objects. Of course you are right, that our == causes a lot of trouble.

One would like a third comparison operator, say equal1, that regards two objects as equal if and only if all their "mathematically relevant" properties (e.g. value and precision for real or p-adic numbers, but not memory location) are the same. Our caching problems would then disappear if keys were compared using equal. In this hypothetical case, the correct property for hash functions would be X equal Y => hash(X) == hash(Y).

Such an operator might be helpful and would really fix trouble with caching. Then again, when are two objects equal? Do their parents have to be identical? Or just equal? It is not clear to me whether there can be a generic implementation that makes everybody happy. If you are aware of the current limitations, then you can work around the problems.

  • Do I understand correctly that the purpose of cache_key can be viewed as emulating the "missing" equal by assigning to each object X a (hashable) object cache_key(X) satisfying the condition cache_key(X) == cache_key(Y) <=> X equal Y ?

Yes and no. The idea is really to give elements a hash which don't have one normally. The intention was to do something similar to what you propose above: Two unhashable elements in the same parent with equal cache key should behave identical in any reasonable computation. (There is of course room for interpretation in this 'definition'.)

  • If so, does the implementation in this ticket satisfy this condition? (I am mainly worried that the parent of an element x does not seem to be encoded in cache_key(x).)

I did this on purpose. I replicate the current behaviour in sage, i.e., if you mix elements with different parents in one dictionary, then you really need to know what you are doing. Why not include the parent? Well, are two parents equal if they are ==-equal, or identical, and what if they can not be hashed (which will be the case for p-adic extension fields).

I have no strong opinion on whether or not to include the parent but I think it can be more confusing than helpful. Also keys might unnecessarily be rather large. If your application will only get elements from a single parent but you store all the data needed to create the extension field with every element in a cache, that could be really a waste. We could of course only store id(parent); but I'm not sure whether that is too extreme.

@pjbruin
Copy link
Contributor

pjbruin commented Jun 23, 2014

comment:10

Replying to @saraedum:

Replying to @pjbruin:

It seems that the problem really has to do with key comparison in dictionaries, not with hashing or caching specifically. Given (1) Python's use of == for key comparison and (2) Sage's lenient implementation of == for elements, we basically cannot use Sage elements as dictionary keys.

This is not really the topic of the ticket, but I do not agree.

Sorry if this sounded off-topic; I was thinking of the dictionaries used in caching, but didn't focus on that because I thought my "objections" were applicable to dictionaries in Sage in general.

I also wanted to avoid focussing on hashing because actually I don't fully understand what the criteria are for objects to be hashable. The only thing that is clear to me is that if the "mathematical meaning" can change over time (e.g. if the object is a list whose length can change, or a matrix on which you can perform row/column operations), then the object should not be hashable. But what about p-adic elements? These are supposedly fixed in time. Why exactly do we want elements of Zmod(n) and RR to be hashable but not elements of p-adic fields? Are p-adic fields treated differently because different elements of the same parent can satisfy a == b without being equal (e.g. if they have different precision)? In that case, shouldn't power series also be unhashable?

It depends on what you want to do. If you have sufficient control over which elements are put in the dictionary, then you can use Sage objects.

Sure, but @cached_function should also work for user functions, and we have no control over what kind of arguments the user is going to feed those. It would not be strange for an unsuspecting user to have some complicated function of one variable accepting elements of arbitrary p-adic fields, for example. Then the easiest way to cache them would be to stick @cached_function on the function, but that could easily lead to problems when you don't store the parent, couldn't it?

Such an operator might be helpful and would really fix trouble with caching. Then again, when are two objects equal? Do their parents have to be identical? Or just equal? It is not clear to me whether there can be a generic implementation that makes everybody happy.

I would say the only option (for caching purposes) is to make two elements equal only if their parents are identical, otherwise you will get cases where the cached function returns values in wrong (equal but not identical) parents. Hence I would translate this hypothetical equal (for elements) in terms of cache_key as x equal y <=> parent(x) is parent(y) and cache_key(x) == cache_key(y).

I have no strong opinion on whether or not to include the parent but I think it can be more confusing than helpful. Also keys might unnecessarily be rather large. If your application will only get elements from a single parent but you store all the data needed to create the extension field with every element in a cache, that could be really a waste. We could of course only store id(parent); but I'm not sure whether that is too extreme.

Actually that was the option I was thinking of; it should be fast and won't waste too much memory. Alternatively, there could be both an "internal" cache_key (for use with @cached_method, where the parent is known) that does not include the parent id, and an "external" one (for use with @cached_function) that does include the parent id.

@saraedum
Copy link
Member Author

comment:11

Replying to @pjbruin:

Replying to @saraedum:

Replying to @pjbruin:

I also wanted to avoid focussing on hashing because actually I don't fully understand what the criteria are for objects to be hashable. The only thing that is clear to me is that if the "mathematical meaning" can change over time (e.g. if the object is a list whose length can change, or a matrix on which you can perform row/column operations), then the object should not be hashable. But what about p-adic elements? These are supposedly fixed in time. Why exactly do we want elements of Zmod(n) and RR to be hashable but not elements of p-adic fields? Are p-adic fields treated differently because different elements of the same parent can satisfy a == b without being equal (e.g. if they have different precision)?

Right, there is no mathematical justification for p-adics not being hashable. It is an implementation detail because their == can not be compatible with hash.

In that case, shouldn't power series also be unhashable?

Absolutely.

It depends on what you want to do. If you have sufficient control over which elements are put in the dictionary, then you can use Sage objects.

Sure, but @cached_function should also work for user functions, and we have no control over what kind of arguments the user is going to feed those. It would not be strange for an unsuspecting user to have some complicated function of one variable accepting elements of arbitrary p-adic fields, for example. Then the easiest way to cache them would be to stick @cached_function on the function, but that could easily lead to problems when you don't store the parent, couldn't it?

I see. I guess this means that we should at least have that if a != b, then some_cached_function(a) should have a chance to be != some_cached_function(b).

Such an operator might be helpful and would really fix trouble with caching. Then again, when are two objects equal? Do their parents have to be identical? Or just equal? It is not clear to me whether there can be a generic implementation that makes everybody happy.

I would say the only option (for caching purposes) is to make two elements equal only if their parents are identical, otherwise you will get cases where the cached function returns values in wrong (equal but not identical) parents. Hence I would translate this hypothetical equal (for elements) in terms of cache_key as x equal y <=> parent(x) is parent(y) and cache_key(x) == cache_key(y).

Ok.

I have no strong opinion on whether or not to include the parent but I think it can be more confusing than helpful. Also keys might unnecessarily be rather large. If your application will only get elements from a single parent but you store all the data needed to create the extension field with every element in a cache, that could be really a waste. We could of course only store id(parent); but I'm not sure whether that is too extreme.

Actually that was the option I was thinking of; it should be fast and won't waste too much memory. Alternatively, there could be both an "internal" cache_key (for use with @cached_method, where the parent is known) that does not include the parent id, and an "external" one (for use with @cached_function) that does include the parent id.

Imho, @cached_method and @cached_function should not differ in such a way. Storing the parent's id is not a waste of memory and should hardly lead to any surprises.

I'll push a new commit which implements the inclusion of the parent. Thanks for joining this discussion btw.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 24, 2014

Changed commit from a65ac85 to 87cf9da

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 24, 2014

Branch pushed to git repo; I updated commit sha1. New commits:

6e09cabMerge branch 'develop' into ticket/16316
87cf9daInclude an element's parent in _cache_key()

@pjbruin
Copy link
Contributor

pjbruin commented Jun 24, 2014

Changed commit from 87cf9da to c0a7fb8

@pjbruin
Copy link
Contributor

pjbruin commented Jun 24, 2014

Reviewer: Peter Bruin

@pjbruin
Copy link
Contributor

pjbruin commented Jun 24, 2014

Changed branch from u/saraedum/ticket/16316 to u/pbruin/16316-cache_key

@pjbruin
Copy link
Contributor

pjbruin commented Jun 24, 2014

comment:13

Looks good, passes doctests and (together with #16317) resolves a doctest failure in my upcoming branch on #11474. The reviewer patch only contains a few typographical and spelling fixes.

@tscrim
Copy link
Collaborator

tscrim commented Jun 25, 2014

comment:14

You must change the wording of "caching elements which are unhashable"; this is not correct as you cannot cache (i.e. set as the key in a dict) anything without a (well-defined) hash. Instead say something like

This element is noramlly unhashable, so we use the function ``_cache_key``
to create a hashable key for the object for the cache.

@saraedum
Copy link
Member Author

comment:15

Replying to @tscrim:

You must change the wording of "caching elements which are unhashable"; this is not correct as you cannot cache (i.e. set as the key in a dict) anything without a (well-defined) hash. Instead say something like

This element is noramlly unhashable, so we use the function ``_cache_key``
to create a hashable key for the object for the cache.

I don't think that the word "cache" is necessarily connected to "hashing". A hashtable is one way of caching but there are others. (I just checked, https://en.wikipedia.org/wiki/Cache_%28computing%29 does not contain the word 'hash'.)

Is it acceptable for you if we leave the current wording?

@tscrim
Copy link
Collaborator

tscrim commented Jun 27, 2014

Changed branch from u/tscrim/ticket/16316 to u/saraedum/ticket/16316

@tscrim
Copy link
Collaborator

tscrim commented Jun 27, 2014

Changed commit from 7a9a6b2 to 83944fa

@tscrim
Copy link
Collaborator

tscrim commented Jun 27, 2014

comment:36

Replying to @saraedum:

It's a very good idea to include the description in the summary of cachefunc.pyx. But I also think it should be in SageObject._cache_key; if somebody is wondering what it is good for, that is the first place where they will look for it. It is a duplication of docstrings but I hope it is acceptable.

It's a private method only used in one place, so the chances of someone finding it randomly are very slim, but okay, it doesn't really matter.

I made a few changes to your commit:

  • I removed some debug statements.

Whoops, forgot about those.

  • Your changes to _cache_key change the current behaviour. Say you have a tuple of a polynomial over the rationals and a p-adic number. The old version would only call _cache_key of the p-adic number and leave the polynomial. Your version unpacks both which is unnecessary. I restored my version of the code.
  • The default implementation of _cache_key should raise an error. If somebody calls it (misunderstanding what it is good for), then it makes sense to warn people that they did something wrong: the code in SageObject._cache_key() will never be executed in a legitimate context. I do not mind raising a TypeError instead of a NotImplementedError.

Restoring your branch so I can look again.


New commits:

b41c455restore previous version of _cache_key() in cachefunc.pyx
37aa276reverted changes which treated _cache_key like an attribute
5586f56removed debug output
83944faRestored old version of SageObject._cache_key() but raising a TypeError instead of a NotImplementedError.

@tscrim
Copy link
Collaborator

tscrim commented Jun 27, 2014

comment:37

Ah, now I see why you're trying to hash every time inside of the function _cache_key, which is why SageObject._cache_key should be the thing raising the error. Although what I had before, with proper placement in the class hierarchy, would not have unpacked the polynomial. Defining _cache_key for all polynomials led to some of my confusion before.

I'm also not in favor of raising an error in SageObject._cache_key because an object is unhashable. What I had before allowed more flexibility is use, say I wanted to do my own custom cache which uses the _cache_key method.

So overall I feel like this is a heavy-handed approach. However I don't have a strong enough use case to oppose this getting in as is. So positive review.

@tscrim
Copy link
Collaborator

tscrim commented Jun 27, 2014

comment:38

Replying to @pjbruin:

Hmm, I see. But if we store the parent itself, then I think we have to make sure that all unhashable elements belong to parents satisfying unique representation. Otherwise, calling a cached function on an element whose parent is equal but not identical to the parent of an element that was used before could return a cached result that lives in this equal-but-not-identical parent.

No, this isn't a problem because equal parents have equal hashes.

@tscrim
Copy link
Collaborator

tscrim commented Jun 27, 2014

comment:39

Replying to @simon-king-jena:

That's not correct. At least when you have a Python class, a lazy attribute will be a plain python attribute after the first access. Indeed, a lazy attribute L defined for some class C has a __get__ method, so that L.__get__(O, C) assigns the output value to O.L, for an instance O of C. In other words, it overrides itself by the return value and thus is exactly as fast as a usual python attribute, after the first access.

Ah right. Then perhaps I was thinking a @cached_method with no args was faster than a Python attribute access? shrugs

However, if C is a Cython class, attribute assignment does not always work. Hence, in this case, L.__get__(O,C) will not be able to assign the return value to O.L. By consequence, the __get__ method will be called upon each single access. Hence, if I see that correctly, the performance will be worse than calling a method.

Hmm...is that an issue with the extension class not necessarily having a __dict__ and related to the @lazy_attribute not always working with (Cython) inheritance?

@pjbruin
Copy link
Contributor

pjbruin commented Jun 27, 2014

comment:40

Replying to @tscrim:

Replying to @pjbruin:

Hmm, I see. But if we store the parent itself, then I think we have to make sure that all unhashable elements belong to parents satisfying unique representation. Otherwise, calling a cached function on an element whose parent is equal but not identical to the parent of an element that was used before could return a cached result that lives in this equal-but-not-identical parent.

No, this isn't a problem because equal parents have equal hashes.

That is exactly what causes the problem. Consider the following example with a hypothetical parent class MyParent that has unhashable elements but does not have unique representation:

sage: @cached_function
sage: def f(x):
....:     """
....:     Return `x^2 + 1` in the parent of `x`.
....:     """
....:     return x^2 + 1
....: 
sage: P = MyParent()
sage: Q = MyParent()  # Q is equal but not identical to P
sage: f(P(0)).parent() is P
True
sage: f(Q(0)).parent() is Q
False  # since the parents are equal, f(q) returns the cached f(p)
sage: f(P(0)) is f(Q(0))
True

This already happens before this patch with parent P and Q where there exists a coercion map compatible with hashing (e.g. Z and Q):

sage: @cached_function
....: def f(x):
....:     return x^2 + 1
....: 
sage: f(ZZ(0)).parent()
Integer Ring
sage: f(QQ(0)).parent()
Integer Ring
sage: f(ZZ(0)) is f(QQ(0))
True

However, I think we should avoid this situation whenever possible.

@tscrim
Copy link
Collaborator

tscrim commented Jun 27, 2014

comment:41

Oh that's subtle. However this usually (almost always?) won't be a problem because the coercion, which allowed the elements to compare as equal, will step in to make the arithmetic work. I would think equal-but-not-identical parents should have coercions between themselves too.

@vbraun
Copy link
Member

vbraun commented Jun 27, 2014

comment:42

Documentation doesn't build

build/pipestatus "./sage --docbuild --no-pdf-links all html -j  2>&1" "tee -a logs/dochtml.log"
Traceback (most recent call last):
  File "/mnt/SSD1/mod_buildslave/sage_git/build/src/doc/common/builder.py", line 27, in <module>
    from sage.misc.cachefunc import cached_method
ImportError: dynamic module does not define init function (initcachefunc)
make: *** [doc-html-mathjax] Error 1

@vbraun
Copy link
Member

vbraun commented Jun 27, 2014

comment:43

Note: seems to only fail on UW mod

@vbraun
Copy link
Member

vbraun commented Jun 27, 2014

comment:44

Doctests fail everywhere

File "src/sage/misc/cachefunc.pyx", line 416, in sage.misc.cachefunc
Failed example:
    b = a + O(3)
Exception raised:
    Traceback (most recent call last):
      File "/Users/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 480, in _run
        self.execute(example, compiled, test.globs)
      File "/Users/buildslave-sage/slave/sage_git/build/local/lib/python2.7/site-packages/sage/doctest/forker.py", line 839, in execute
        exec compiled in globs
      File "<doctest sage.misc.cachefunc[98]>", line 1, in <module>
        b = a + O(Integer(3))
      File "parent.pyx", line 1064, in sage.structure.parent.Parent.__call__ (build/cythonized/sage/structure/parent.c:8683)
    NotImplementedError

possibly due to the dependent ticket #16317

@vbraun
Copy link
Member

vbraun commented Jun 27, 2014

comment:45

So actual objects and cache keys go into the same dictionary? Isn't that dangerous, one person's key could easily be another one's object? Shouldn't it be different dicts or some special object to hold the key?

@saraedum
Copy link
Member Author

comment:46

Replying to @pjbruin:

Replying to @tscrim:

Replying to @pjbruin:
No, this isn't a problem because equal parents have equal hashes.

That is exactly what causes the problem. Consider the following example with a hypothetical parent class MyParent that has unhashable elements but does not have unique representation:

As you wrote, this is not a new problem. It does not seem to be related to unhashable elements. Mixing non-unique parents and caching will simply lead to trouble.

@saraedum
Copy link
Member Author

comment:47

Replying to @vbraun:

So actual objects and cache keys go into the same dictionary? Isn't that dangerous, one person's key could easily be another one's object? Shouldn't it be different dicts or some special object to hold the key?

Though this is very unlikely it is a valid point. I will take care of it.

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 27, 2014

Branch pushed to git repo; I updated commit sha1. New commits:

c987fc7_cache_key is not an attribute anymore
a4f17d1fixed doctests
aa037deFix collisions between unhashable and hashable elements in @cached_method

@sagetrac-git
Copy link
Mannequin

sagetrac-git mannequin commented Jun 27, 2014

Changed commit from 83944fa to aa037de

@vbraun
Copy link
Member

vbraun commented Jul 14, 2014

Changed branch from u/saraedum/ticket/16316 to aa037de

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants