-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Context variables #35833
RFC: Context variables #35833
Conversation
snapshot_context() -> snapshot::ContextSnapshot | ||
|
||
Get a snapshot of a context that can be passed to [`reset_context`](@ref) to | ||
rewind all changes in the context variables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is called contextvars.copy_context
in Python and ExecutionContext.CreateCopy
in .NET. But I was not sure if "copy" is the right word when the underlying data is (treated as) immutable.
@@ -94,6 +94,8 @@ function uuid4(rng::AbstractRNG=Random.default_rng()) | |||
UUID(u) | |||
end | |||
|
|||
Base._uuid4(rng::AbstractRNG=Random.default_rng()) = uuid4(rng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a proper way to "import" stdlib to Base. Is this an OK approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just rand(UInt64)
or something involving objectid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks to me like most of the uses of key
could just be replaced with objectid(::ContextVar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the main reason I wanted to use UUID was uuid5
because I can then use it to get deterministic and namespaced key for "global const" context variables (while uuid4
is used only for local and "global non-const" context variables):
julia/base/contextvariables.jl
Line 181 in ea0e4a7
return uuid5(pkgid.uuid, join(fullpath, '.')) |
(where join(fullpath, '.')
is something like "PackageName.SubModuleName.varname"
)
Making the key for "global const" context variable is important for Distributed (otherwise different processes can't agree on the key). So no, we can't use objectid
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a proper way to "import" stdlib to Base. Is this an OK approach?
Yeah it's a tricky problem: you'd like a uuid4
to use a non-terrible RNG. But we can't pull all of Random
into Base.
Another workaround for this would be to use root_module
to look up the UUIDs
module without loading it. There's precedent for this in the way LibGit2 is looked up in one place:
Line 316 in 97e3fe8
LibGit2 = root_module(libgit2_id) |
In either case, it's not great to have Base
missing some functionality when compiled without a stdlib present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the explanation. How about vendoring libc-based uuid4 in Base?
function _uuid4()
u = reinterpret(UInt128, [Libc.rand() for _ in 1:(sizeof(UInt128) ÷ sizeof(Cint))])[1]
u &= 0xffffffffffff0fff3fffffffffffffff
u |= 0x00000000000040008000000000000000
UUID(u)
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What quality of random numbers do we need for this? I guess a large period and good seeding are desired, as the UUIDs may end up in a distributed system? Actually this would suggest that the global RNG from Random
isn't a great choice for UUIDs
anyway because the user may control the seeding for reasons unrelated to UUIDs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this would suggest that the global RNG from
Random
isn't a great choice forUUIDs
anyway because the user may control the seeding for reasons unrelated to UUIDs.
Oooh, that's a good point. Maybe we should use Random.RandomDevice()
instead of Random.default_rng()
in UUIDs? See also #32954 (comment)
But I think _uuid4
above may be OK as the intial implementation since:
- It is used only for
@contextvar local var
and@contextvar global var
(the latter is available mainly for exploration in REPL and testing). But the "main entrypoint" is@contextvar var
which uses uuid5. - The UUIDs are generated during macro expansion time. So, messing with it requires you to call
Libc.srand(my_seed)
at the top-level of your module. It's totally possible but somewhat unlikely.
In the long run, I think we'd need something like Base.CoreRandom.RandomDevice
.
I don't see why you've "scoped" ContextVars to modules the way you have. Why not just scope them using normal objects (i.e., why do the vars themselves need to know about what module they were declared in?)? This is the way it works in Python at least. |
As I said #35833 (comment), that's a requirement for this to work with Distributed across different machines. Consider module MyPackage
const KEY = uuid4()
end
|
"Modules and variable names must not contain a dot:\n" * join(fullpath, "\n"), | ||
)) | ||
end | ||
return uuid5(pkgid.uuid, join(fullpath, '.')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah very nice chaining these UUIDs. I was just coming here to suggest it should be done this way. But you've already done it :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know that we independently arrived at the same solution :) It's an indication that this solution makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well perhaps I shouldn't be so definitive as to say "should" ... but I think it makes sense :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking of UUIDs, where does uuid5
come from? I guess it will need to be moved it into base/uuid.jl
from the UUIDs stdlib?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a fake uuid5
definition for loading:
Lines 103 to 116 in 8f512f3
# fake uuid5 function (for self-assigned UUIDs) | |
# TODO: delete and use real uuid5 once it's in stdlib | |
function uuid5(namespace::UUID, key::String) | |
u::UInt128 = 0 | |
h = hash(namespace) | |
for _ = 1:sizeof(u)÷sizeof(h) | |
u <<= sizeof(h) << 3 | |
u |= (h = hash(key, h)) | |
end | |
u &= 0xffffffffffff0fff3fffffffffffffff | |
u |= 0x00000000000050008000000000000000 | |
return UUID(u) | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Not your fault at all, but ick!
It seems some of the hashing stuff should move back into Base (at least some minimal parts of the implementation of SHA1 and RandomDevice, not the whole stdlib). Good hashing and randomness is core functionality which isn't really optional in some of these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a PR to move RandomDevice
to Base
: #35894
Example usecase: circular reference detectionAs a short somewhat-real usecase, here is how you can use the context variable to detect circular references. Consider this innocent-looking struct WAT
value
end
Base.show(io::IO, x::WAT) = print(io, "WAT(", repr(x.value), ")") However, this causes a stack overflow with x = Any[WAT(Any[1])]
x[1].value[1] = x
@show x # throws StackOverflowError This is because current circular reference detection relies on We can use a context variable instead of function nocircular(f, io, @nospecialize(x))
@contextvar local shown::Cons
history = get(shown)
d = 1
for y in something(history, ())
if x === y
print(io, "#= circular reference @-$d =#")
return
end
d += 1
end
with_context(f, shown => Some(Cons(x, something(history, Some(nothing)))))
end
struct Cons
car
cdr
end
Base.iterate(list::Cons) = iterate(list, list)
Base.iterate(::Cons, list::Cons) = (list.car, list.cdr)
Base.iterate(::Cons, ::Nothing) = nothing We can use this to implement our mutable struct Mutable
value
end
function Base.show(io::IO, x::Mutable)
nocircular(io, x) do
print(io, "Mutable(")
show(io, x.value)
print(io, ")")
end
end With this approach, circular references can be detected without using x = Mutable(WAT(Mutable(1)))
x.value.value.value = x
@show x # => Mutable(WAT(Mutable(#= circular reference @-2 =#))) Of course, it works even if you switch the task inside struct WAAAAAT
value
end
Base.show(io::IO, x::WAAAAAT) = print(io, "WAAAAAT(", fetch(@async repr(x.value)), ")")
x = Mutable(WAAAAAT(Mutable(1)))
x.value.value.value = x
@show x # => Mutable(WAAAAAT(Mutable(#= circular reference @-2 =#))) (Edit: Actually, |
I just found a rather nice post on the use and abuse of Go's equivalent https://medium.com/@cep21/how-to-correctly-use-context-context-in-go-1-7-8f2c0fafdf39 They argue quite persuasively against I'm not arguing against |
A bit tangential, but I find the section "Does Context.Value even belong?" interesting. I haven't realized that the cancellation context (= nursery) and the context variable is so tightly coupled in Go. Though given the focus on server-side application in the Anyway, I agree dynamically scoped variables in any form can be used to write bad programs. It's more or less just a "less bad" form of global variables, in the end. For example, you can put a nursery in the context variable to easily create dynamically-scoped nursery. We already have |
Yes I thought this was interesting, personally I feel it's a weird mixture of abstractions :-) But perhaps it's the best which can be done with the desire to explicitly pass no more than one context, and with the limitations of Go interfaces. A couple more half-formed thoughts about the Julia interface:
I also think it worth considering what API we're able to offer if we want to replace |
Yeah, I think that's what you need most of the time. The The I think we can special-case Another aspect is that, even if we only support the
Do you mean the
Maybe we should recommend using uppercase for context variables, too?
Yeah, I'd hope so. Context variable lookup is just a dictionary look up in the end. So, if the compiler can assume it's effect-free, I suppose it can hoist it out? We might need a better dictionary implantation, though. For now, I'm just using |
I got a chance to chat to @JeffBezanson about this. Jeff had some great questions which I was ill-prepared for (and I know @tkf could have answered better) but it was productive nonetheless. Here's a summary:
I think the most important point discussed was ability to implement this efficiently in the future: exposing |
I think the best way forward might be to just expose the Anyway, here is how they work in the reference implementation:
It'd be possible to merge
I guess we can swap it with something better once we find out a better way to do it? For example, maybe we can have a global "key broker" that maps an UUID to a small key
Yeah, I just want to get API correct and make it easy to experiment with better backend storage. HAMT is an obvious candidate. |
Ah yes, I see the wikipedia page mentions persistent variants of HAMTs. This seems like the thing we want 👍 |
Would that allow code to access context via |
How module A
module B
@contextvar x
get_x() = x[]
function shadowing()
x = 1 # shadows context variable `x`
sum = 1 # shadows `Base.sum`
return get_x()
end
end
get_B_x() = B.x[]
end
# These work and equivalent:
A.B.x[]
A.B.get_x()
A.get_B_x()
A.B.shadowing()
# These do not work:
x[]
A.x[]
|
Oh, of course - very nice! How would you handle the case where modified context is to be passed to tasks/workers? I am in a certain context, but I want to modify part of that context in a different way for each worker (without changing by current context). E.g. to partition resources listed in the context among tasks/workers, etc. How should be express that, API-wise? |
I think API changes from ContextVariablesX still need porting in here:
|
Is this something that we want to also support context in the sense that |
I don't think it's only useful for concurrency, it's about propagating argument-like "settings" that can't be well handled via function arguments. Things like logger, progress monitor, choice of computational resources, etc. This is of course very important for thread- and process-level parallelism, but I think it would benefit serial chains of function calls as well. |
Just to add another potential motivation for this sort of thing, the way that It'd be pretty cool if we could have context variables be lightweight enough that things like flagging that it's okay to turn off error checks could use this mechanism. |
I don't think |
If we had an index set and could say that "in the context of this collection it is inbounds" then limiting to one level of propagation wouldn't be necessary (and ensure not mutations too). Unless we need to further process indices we just create a view, so it's not really a critical example to get working. It's more of a proof of concept for propagating other relationships |
I want to ask about another related usage of
https://pkg.go.dev/context#WithCancel The Go Context object can also contain an indication of whether a request has been cancelled by the client, and/or whether a timeout/duration has been exceeded. This requires the server code to thread the Context through all of the functions on the server, across module boundaries, and check for cancellation at very places. Could we also cover that logic with a ContextVariable as proposed here / in https://github.com/tkf/ContextVariablesX.jl?
I'm thinking maybe this could be done by having a function like The main place I see where this might not mesh well with the current design is:
It seems like supporting multiple separate cancellation contexts is a bit at-odds with the feature this package seems to provide, which is avoiding having to thread the variable through all your code by instead making it a global variable. Maybe the approach is to have these cancellation contexts also be named context variables? Like @contextvar request_cancelled = CancellationContext() And then every new incoming server request could set its own Has anyone else thought about this already? We're working on implementing user-requested Transaction Cancellation in our async web server at RAI now, and it occurred to me that these go-style Context objects seem like the right design. |
I think you can use an implementation of dynamic scope which is inherited across However I don't think there's a need to bake anything like that into the context variables system itself. Safe and convenient cancellation is a very deep topic and it's not clear to me that any language has fully solved this problem. But I still think the way that cancellation is handled in structured concurrency systems is the best option seen so far. On that note, you should definitely check out https://github.com/JuliaConcurrent/Julio.jl if you haven't seen it already :-) |
One comment from triage was that the |
I totally agree that anything that can be implemented as a macro must be implemented without a macro. However, I couldn't find any other ways to provide some desired properties without macro. (This was precisely what was discussed in #35833 (comment)) Distributed computing is the main reason why the primary construction API is provided through macro. With macro-based construction, it supports the stability of the context variable key across precompilations. This key is used for identifying the value assigned to the context variable within a context (say, a task-local So, it would be nice if triage can discuss: (1) Do we want to support sending values assigned to context variables across distributed processes? (2) If so, is it reasonable to use what is implemented in ContextVariablesX? Even if the answer to the second question is "maybe?", I think it would be better to use the current macro-based API even we ended up using different implementation strategies because it is constrained enough to allow the implementation used by ContextVariablesX. Footnotes
|
Yes please! |
To elaborate: I think distributed use cases would profit from context variables very much. And if the machinery doesn't forward context variables through remote calls and the like it will be very hard for the user to do so. And that will mean that code that relies on context variables would malfunction when used distributed - so lot's of special handling would need to be added manually for the distributed case, making code much less generic. |
I broadly like the overall design here, but I have a minor suggestion: provide some mechanism for specifying how context variables should be passed when spawning subtasks on local and remote processes. This was motivated by #48121: the idea is that when you spawn a task on a remote process, you could wrap the logger in a |
Another implementation is |
ScopedVariables are containers whose observed value depends the current dynamic scope. This implementation is inspired by https://openjdk.org/jeps/446 A scope is introduced with the `scoped` function that takes a lambda to execute within the new scope. The value of a `ScopedVariable` is constant within that scope and can only be set upon introduction of a new scope. Scopes are propagated across tasks boundaries. In contrast to #35833 the storage of the per-scope data is assoicated with the ScopedVariables object and does not require copies upon scope entry. This also means that libraries can use scoped variables without paying for scoped variables introduces in other libraries. Finding the current value of a ScopedVariable, involves walking the scope chain upwards and checking if the scoped variable has a value for the current or one of its parent scopes. This means the cost of a lookup scales with the depth of the dynamic scoping. This could be amortized by using a task-local cache.
ScopedVariables are containers whose observed value depends the current dynamic scope. This implementation is inspired by https://openjdk.org/jeps/446 A scope is introduced with the `scoped` function that takes a lambda to execute within the new scope. The value of a `ScopedVariable` is constant within that scope and can only be set upon introduction of a new scope. Scopes are propagated across tasks boundaries. In contrast to #35833 the storage of the per-scope data is assoicated with the ScopedVariables object and does not require copies upon scope entry. This also means that libraries can use scoped variables without paying for scoped variables introduces in other libraries. Finding the current value of a ScopedVariable, involves walking the scope chain upwards and checking if the scoped variable has a value for the current or one of its parent scopes. This means the cost of a lookup scales with the depth of the dynamic scoping. This could be amortized by using a task-local cache.
ScopedVariables are containers whose observed value depends the current dynamic scope. This implementation is inspired by https://openjdk.org/jeps/446 A scope is introduced with the `scoped` function that takes a lambda to execute within the new scope. The value of a `ScopedVariable` is constant within that scope and can only be set upon introduction of a new scope. Scopes are propagated across tasks boundaries. In contrast to #35833 the storage of the per-scope data is assoicated with the ScopedVariables object and does not require copies upon scope entry. This also means that libraries can use scoped variables without paying for scoped variables introduces in other libraries. Finding the current value of a ScopedVariable, involves walking the scope chain upwards and checking if the scoped variable has a value for the current or one of its parent scopes. This means the cost of a lookup scales with the depth of the dynamic scoping. This could be amortized by using a task-local cache.
ScopedVariables are containers whose observed value depends the current dynamic scope. This implementation is inspired by https://openjdk.org/jeps/446 A scope is introduced with the `scoped` function that takes a lambda to execute within the new scope. The value of a `ScopedVariable` is constant within that scope and can only be set upon introduction of a new scope. Scopes are propagated across tasks boundaries. In contrast to #35833 the storage of the per-scope data is assoicated with the ScopedVariables object and does not require copies upon scope entry. This also means that libraries can use scoped variables without paying for scoped variables introduces in other libraries. Finding the current value of a ScopedVariable, involves walking the scope chain upwards and checking if the scoped variable has a value for the current or one of its parent scopes. This means the cost of a lookup scales with the depth of the dynamic scoping. This could be amortized by using a task-local cache.
ScopedVariables are containers whose observed value depends the current dynamic scope. This implementation is inspired by https://openjdk.org/jeps/446 A scope is introduced with the `scoped` function that takes a lambda to execute within the new scope. The value of a `ScopedVariable` is constant within that scope and can only be set upon introduction of a new scope. Scopes are propagated across tasks boundaries. In contrast to #35833 the storage of the per-scope data is assoicated with the ScopedVariables object and does not require copies upon scope entry. This also means that libraries can use scoped variables without paying for scoped variables introduces in other libraries. Finding the current value of a ScopedVariable, involves walking the scope chain upwards and checking if the scoped variable has a value for the current or one of its parent scopes. This means the cost of a lookup scales with the depth of the dynamic scoping. This could be amortized by using a task-local cache.
ScopedVariables are containers whose observed value depends the current dynamic scope. This implementation is inspired by https://openjdk.org/jeps/446 A scope is introduced with the `scoped` function that takes a lambda to execute within the new scope. The value of a `ScopedVariable` is constant within that scope and can only be set upon introduction of a new scope. Scopes are propagated across tasks boundaries. In contrast to #35833 the storage of the per-scope data is assoicated with the ScopedVariables object and does not require copies upon scope entry. This also means that libraries can use scoped variables without paying for scoped variables introduces in other libraries. Finding the current value of a ScopedVariable, involves walking the scope chain upwards and checking if the scoped variable has a value for the current or one of its parent scopes. This means the cost of a lookup scales with the depth of the dynamic scoping. This could be amortized by using a task-local cache.
Replaced by #50958 |
This PR proposes context variables API that can be used to propagate context-dependent information across task boundaries. It is conceptually similar to
task_local_storage
but with a main difference that it "copies" all key-value pairs to childTask
s. The context variables are conceptually similar to dynamically scoped variables.Motivations
There are several places context variables can be useful or required.
ENV
It was proposed to use
task_local_storage
to fix thread-safety ofwithenv
#34726 (comment) by maintaining task-local copy (or overlay) ofENV
. However, it would mean that the environment variables cannot cross task boundaries. Context variable can fix this shortcoming.@testset
@testset
usestask_local_storage
to track current active test set. However, following example does not work (printsNo tests
) because the information of the current test does not propagate to the child task:Logging
Task
has thelogstate
field that propagates to child tasks; i.e., it works as a hard-coded context variable. Once context variable handling is sufficiently matured, it may be possible to eliminate the special handling fromTask
and use a context variable forTask
. Furthermore, context variable allows users to develop logging-like interfaces.Custom worker pool abstraction
In #35757 and Propagation of available/assigned worker-IDs in hierarchical computations? - Domains / Julia at Scale - JuliaLang, it was brought up that propagating "computation resources" (thread/process pools, etc.) across tasks and processes is required for implementing custom worker pool interfaces.
Misc.
It would be useful for implementing a better nestable progress information handling JuliaLogging/ProgressLogging.jl#13 (comment).
Other languages
Implicit context
contextvars
. See PEP 567 and PEP 550 for discussion.ExecutionContext
.CoroutineContext
.Explicit context
context
.Proposed design
I propose an API with the following basic usages.
@contextvar x
x[]
x[] = value
For a tutorial and the full reference API of the proposed design, see https://tkf.github.io/ContextVariablesX.jl/dev/
Internally,
@contextvar
creates an instance ofContextVar
which is defined asThen
x[]
andx[] = value
invokecurrent_task().ctxvars[x.key]
and an "immutable version" ofcurrent_task().ctxvars[x.key] = value
, respectively (roughly speaking).This has a couple of nice properties:
x[]
can be inferredx
is forced to be namespaced (i.e., it has to exist in some module name space.)x
can be backed up by an efficient concrete key type (e.g., UUID)x
allows small-size optimization when the value type can be inlined into the context storage (in principle)See also #35757 which already contains some discussion on this API.