-
-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write a WeakValueDictionary with safer key removal #13394
Comments
This comment has been minimized.
This comment has been minimized.
comment:2
The standard Python
There is an underlying
then it really executes
This is a weak reference to v, so v can still get collected. If it does, the callback can ensure that
gets executed. This locates the entry by computing the hash of
Then the callback can be
and the
(normal precautions to put the dict reference in a closure or a class should apply here in defining Or one could just immediately integrate the In either case, the main effect is that removal does not cause any hashing or comparison methods to be called on |
Attachment: weak_value_dictionary.pyx.gz Proof of concept |
comment:5
I have attached a proof of concept. I think the class Performance
Hence, the WVD is faster than the Safety
Hence, the
Hence, the WVD is safe (or at least: safer...) I suppose I should add tests and documentation to the proof of concept, put into sage.misc and rename |
comment:6
Excellent work! I like how you found a way to reuse python dictionaries. I think there is one fairly easy optimization that you can make that is fairly similar to how we ended up implementing the buckets for MonoDict and TripleDict: Presently, your bucket is a list of tuples. That provides an extra layer of indirection, meaning both slower access and more memory use and allocation. If instead you "inline" the tuples by making the bucket a list with the layout It may be worth browsing through MonoDict anyway. There may more more little tricks that we used there that may apply here and that I don't remember right now. |
comment:7
Replying to @nbruin:
Good point. In addition, I think that |
Branch: u/SimonKing/ticket/13394 |
comment:9
I have posted an initial version, that still lacks documentation and tests (and with the next commit I will also fix some trailing whitespace). I am afraid that the current version of a weak value dictionary is not faster than what I have described in the proof of concept, although I used the improved bucket layout and am using C-API functions rather thoroughly. Anyway. I guess part of the documentation will be benchmarks for each separate task. Then we will see what should be further improved. |
Changed branch from u/SimonKing/ticket/13394 to none |
New commits:
|
Commit: |
Branch: u/SimonKing/ticket/13394 |
comment:11
I wonder: Is it really a good idea to have a dictionary storing one list for each hash value? Granted, it does solve the problem of defunct keys in the callback of weak values. However, storing the hash buckets similar to what we do for Well, I guess one should first have a version that works and is faster and safer than |
comment:12
Replying to @simon-king-jena:
Well, it's not entirely optimal of course. Python's own dict already has a mechanism to deal with hash collisions, so having the lists stored is, strictly speaking, an unnecessary indirection. However, breaking open python's dict implementation to access the hash buckets there is going to be a very hard to maintain solution (you'd basically be patching python and implement an extra "delete by hash and value id" routine. Entirely doable, but we'd be stuck with a patched python to eternity). There are two solutions:
We followed the latter for MonoDict and TripleDict because the key semantics there are so different that borrowing python's dict wasn't really an option. For WeakValueDictionary it is an option to borrow from dict. I suspect you'd have to work pretty hard to come close to that performance. Note that almost all of your lists are just going to be pairs. Python internally is making such things all the time: arguments tend to be packaged and copied as tuples. I guess that brings us to a third possibility: Finish the proof of concept, show the python community what the problem is with WeakValueDictionary, and suggest the extra hook we need to make it safe. Then we might end up with a safe and fast WeakValueDictionary in python proper. That would only be in Python3+, though, so we'd have to backport to Python2.7. There are already some issues reported: http://bugs.python.org/issue7105 http://bugs.python.org/issue17816. You might want to check if your implementation fixes those. |
comment:13
Replying to @nbruin:
I did not address this one: I do not switch garbage collection off during iteration. I could, of course. Later perhaps. This is fixed in my implementation. A slight variation of the example that issue 17816 is proposing, showing that the weakref version of
Instead,
|
comment:14
PS: In a yet-to-be-pushed commit, I am replacing all |
comment:15
Here is an example concerning "garbage collection during iteration". Features of the example:
First, with
And with
So, there is an error, too. Perhaps one could improve the error message. |
Branch pushed to git repo; I updated commit sha1. New commits:
|
Author: Simon King |
Upstream: None of the above - read trac for reasoning. |
comment:17
With the current commits, Hence, I make it "needs review", and we will see whether we will report upstream. Next, I'll try to construct finer grained benchmarks, to see if there are aspects in which the implementation in New commits:
|
This comment has been minimized.
This comment has been minimized.
comment:102
Replying to @simon-king-jena:
I'm pretty sure that's why it was originally designed. It seems that subsequently
It's an elegant idea and probably how tp_print would be implemented if iterators had been around earlier, but I'm pretty sure one wouldn't want to slow down |
comment:103
Please check that the documentation builds fine. I get:
(but I'm not entirely sure this was with the latest version of the patch) |
comment:104
Replying to @jdemeyer:
Well, there is only one version of the patch... |
Work Issues: Docbuild |
Attachment: trac13394-weak_value_dictionary.patch.gz Mercuarial patch in old folder layout |
Branch pushed to git repo; I updated commit sha1. New commits:
|
comment:106
I have updated both the hg patch and the git branch (it was only needed to remove one ":"). To me, the documentation looks good. Apply trac13394-weak_value_dictionary.patch |
Changed work issues from Docbuild to none |
comment:107
Thanks everybody for your work on this ticket. Since #10963 now depends on it, I hope it will get in soon! |
comment:108
Patch applies to 5.12 and results in a functional |
comment:109
Removed git branch to reduce confusion. |
Changed commit from |
Changed branch from u/SimonKing/ticket/13394 to none |
Merged: sage-5.13.beta3 |
comment:111
The
This is because http://docs.python.org/2/c-api/dict.html#PyDict_GetItem does not throw a |
comment:112
This is now #15956. Replying to @saraedum:
|
On ticket #12313 we found that the use of
WeakValueDictionaries
as caches can causeremoval
callbacks in rather harsh environments. NormalWeakValueDictionaries
remove keys with dead values by looking up the key. This involves Python equality testing on the key, which can cause any kind of operation in Sage. We need a dictionary where we can delete entries without key comparisons. See below for possible strategies.To the release manager:
Apply attachment: trac13394-weak_value_dictionary.patch
Upstream: None of the above - read trac for reasoning.
CC: @simon-king-jena
Component: memleak
Author: Simon King, Nils Bruin
Reviewer: Nils Bruin, Simon King
Merged: sage-5.13.beta3
Issue created by migration from https://trac.sagemath.org/ticket/13394
The text was updated successfully, but these errors were encountered: