-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpc : resource management rework #7562
Conversation
ggml-rpc.cpp
Outdated
delete rpc_ctx->buft; | ||
delete rpc_ctx; | ||
delete backend; | ||
instances.erase(endpoint); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we lock the instances
here? will we access it from different thread simultaneously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said in the description this implementation is not thread-safe; I'd like to get some feedback on the reference count approach and then I will add thread-safety
ggml-rpc.cpp
Outdated
@@ -96,27 +96,37 @@ static ggml_guid_t ggml_backend_rpc_guid() { | |||
return &guid; | |||
} | |||
|
|||
struct ggml_backend_rpc_buffer_type_context { | |||
struct rpc_backend { | |||
int ref_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks we have 2 reference counts for same object: 1. rpc_backend.ref_count
, 2, inside the rpc_backend_ptr
.
IMO, better to merge then together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can try something link intrusive shared pointer: std::enable_shared_from_this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that the correct way to do this would be to change the pointers in instances
to weak_ptr
, then the backend will be automatically freed by shared_ptr
on the last instance. create_rpc_backend
would need to check if the weak_ptr
in instances is still alive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that could be the best, inside the instances
we maintain a weak_ptr<rpc_backend>
, without increase the reference count, and then at the create_rpc_backend
we obtain shared_ptr
from weak_ptr
.
this also save the erase
call at free_rpc_backend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately it is not that simple. We keep a backend reference in ggml_backend_rpc_buffer_type_context
and we use this reference when allocating new buffers. If we free this reference in ggml_backend_rpc_free()
then we won't be able to allocate new buffers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, we have several structure have a reference to rpc_backend
here:
ggml_backend_rpc_buffer_type_context
ggml_backend_rpc_buffer_context
ggml_backend_rpc_context
instances
.
first 1-3
should have a strong reference to let them allocate buffer after ggml_backend_rpc_free
call explicitly - that's the case you mention.
and for the 4
, i think we can be make it as weak_ptr
since the life-span of ggml_backend_rpc_context
is expected longer than the rpc_backend
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we free the reference which is kept in ggml_backend_rpc_buffer_type_context
? If you free this reference, then you can no longer allocate buffers. If you don't free this reference, then the backend will never be freed because of this reference.
I don't see how to solve this without explicit reference count. I also don't see how weak_ptr
can help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Buffer types are never freed, I think it would complicate the user code too much to require buffer types to be freed, for no real benefit. In the RPC backend, buffer types only need a connection during initialization and during buffer allocation. You can make a connection during initialization to obtain the alignment and max size, drop it, and only open it again once a buffer is allocated, and then rely on the shared_ptr
to keep it alive until all buffers and backend instances have been freed.
This is probably complicated because the RPC backend uses a ggml_backend
instance to represent a connection, but connections and ggml_backend
backend instances should probably be different concepts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's say that the connection is not tied to the backend instance and we have a separate entity (e.g. rpc_connection
) which is being referenced with shared_ptr
. How do we obtain a shared_ptr
to the connection when a new buffer is allocated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think something like this should work:
std::shared_ptr<connection_t> get_connection(const std::string & endpoint) {
static std::map<std::string, std::weak_ptr<connection_t>> connections;
auto it = connections.find(endpoint);
if (it != connections.end()) {
if (auto connection = it->second.lock()) {
return connection;
}
}
auto connection = std::make_shared<connection_t>(endpoint);
connections[endpoint] = connection;
return connection;
}
Use this function when you need to create a new buffer or backend instance. You can also use it during initialization of the buffer type, just use a local shared_ptr
in the function only, but don't keep the instance.
I have decoupled the backend and its corresponding socket. Sockets are cached and shared between RPC buffers. |
Nice work! |
draw a picture for the backend objects here, for better understanding:
|
This patch tries to address the concerns raised in PR #7435. We track how many times an RPC backend is referenced and we deallocate its resources when this count becomes 0. The reference count is increased when a new RPC buffer is allocated or
ggml_backend_rpc_init()
is called. Respectively it is decreased when an RPC buffer is freed orggml_backend_rpc_free()
is called.The implementation is not thread-safe. I will address thread-safety in a follow-up patch.