-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[x/group] weird deadlock when querying group members #12111
Comments
I can replicate. Looks like it's coming from the ORM, something to do with an iterator. Will investigate further 🕵️♂️ |
Looks pretty concerning. This needs to be fixed for 0.46 |
Findings so far:
cosmos-sdk/x/group/internal/orm/iterator.go Lines 178 to 181 in 7feae9c
That calls cosmos-sdk/x/group/internal/orm/types.go Lines 106 to 118 in 7feae9c
See prefix/store cosmos-sdk/store/prefix/store.go Lines 91 to 103 in 7feae9c
gaskv (I skipped a bunch of places) Ends up calling cachekv EDIT 1: Not sure if relevant, but if you set this key (which is the one for the member number 67), then it fails even if you set the members number to something lower:
|
// Set implements DB.
func (db *MemDB) Set(key []byte, value []byte) error {
if len(key) == 0 {
return errKeyEmpty
}
if value == nil {
return errValueNil
}
db.mtx.Lock() // <---------
defer db.mtx.Unlock()
db.set(key, value)
return nil
}
func newMemDBIteratorMtxChoice(db *MemDB, start []byte, end []byte, reverse bool, useMtx bool) *memDBIterator {
ctx, cancel := context.WithCancel(context.Background())
ch := make(chan *item, chBufferSize)
iter := &memDBIterator{
ch: ch,
cancel: cancel,
start: start,
end: end,
useMtx: useMtx,
}
if useMtx {
db.mtx.RLock()
}
go func() {
if useMtx {
defer db.mtx.RUnlock()
}
// Because we use [start, end) for reverse ranges, while btree uses (start, end], we need
// the following variables to handle some reverse iteration conditions ourselves.
var (
skipEqual []byte
abortLessThan []byte
)
visitor := func(i btree.Item) bool {
item := i.(item)
if skipEqual != nil && bytes.Equal(item.key, skipEqual) {
skipEqual = nil
return true
}
if abortLessThan != nil && bytes.Compare(item.key, abortLessThan) == -1 {
return false
}
select { // <--------------
case <-ctx.Done():
return false
case ch <- &item:
return true
}
} The immediate symptom is that |
Yes, I totally agree here. Very poor API design in tmdb. It seems we're blocked here @facundomedica on redesigning memdb or at least addressing this concern. One thing I'm curious about is if we can prevent writes while we iterate? |
That's exactly what the RLock does. The problem is that whoever is iterating doesn't finish, meaning the RLock is never RUnlocked, meaning writes are prevented forever. |
What I meant was to not iterate while we write explicitly in the ORM layer, not depending on the locking mechanism explicitly in the memdb implementation. |
Does group have an open iterator while writing? That does violate the iterator contract. But from looking at the code I don't see that happening |
You have the stack trace here:
Basically, the orm/iterator eventually calls store/cachekv.Store.dirtyItems, which calls clearUnsortedCacheSubset, which invokes a |
Summary of Bug
When testing creating groups and querying for their members, a weird deadlock appears as soon as the group has more than
67
members. I've validated that it's fine with any numbers of members up to 66, and anything above this ends with a timeout.Version
master @ ece3d0e
Steps to Reproduce
The following test case illustrates and reproduce the issue (placed in
x/group/keeper/keeper_test.go
)This produces the following output, where we can see the group with 67 members gets successfully created, but then fail to query its members. Note that the loop used in the above test does not matter, a single call to
s.createGroupAndGetMembers(67)
fails in the exact same way.For Admin Use
The text was updated successfully, but these errors were encountered: