Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sentinel_tunnel stops working after several random host failures #7

Open
chukynax opened this issue Jun 24, 2019 · 2 comments
Open

Comments

@chukynax
Copy link

chukynax commented Jun 24, 2019

infrastructure:
redis-01
redis-sentinel-01

redis-02
redis-sentinel-02

redis-03
redis-sentinel-03

sentinel-tunnel

procedure:

  1. turning off first host
    sentinel-tunnel:
    changed to another host has become master
  2. returning back first host
    sentinel-tunnel:
    shows all nodes up and running
  3. turning off second host
    sentinel-tunnel:
    changed to third host has become master
  4. returning back second host
    sentinel-tunnel:
    shows all nodes up and running
  5. turning off third host
    sentinel-tunnel:
    doesn't reply anything and restart of sentinel_tunnel is needed, after works normal

Randomly sentinel_tunnel stops answering after 6th or 7th try of turning off hosts with redis.
logs are:

err: failed read line from client
dial tcp 10.1.10.160:26379: connect: connection refused

and hangs forever

@DivPro DivPro mentioned this issue Jun 26, 2019
@SpComb
Copy link

SpComb commented Nov 20, 2019

There's a clear bug in the st_sentinel_connection where it doesn't return any reply to the get_master_address_by_name request after successfully reconnecting to redis-sentinel:

	for db_name := range c.get_master_address_by_name {
		addr, err, is_client_closed := c.getMasterAddrByNameFromSentinel(db_name)
		if err != nil {
			if !is_client_closed {
				c.get_master_address_by_name_reply <- ...
			}
			if !c.reconnectToSentinel() {
				c.get_master_address_by_name_reply <- ...
			}
			continue
		}
		c.get_master_address_by_name_reply <-...
	}

No reply is sent in the err != nill -> c.reconnectToSentinel() == true -> continue case, it just skips on to the next read from c.get_master_address_by_name.

This will block the handleConnection until the next connection triggers a new lookup, and then either of the two lookups will get the new response. If the second lookup is for a different database, then the original handleConnection may even end up connecting to the wrong database.

I wouldn't use this in production, and definitely not with multiple databases configured, at the risk of applications connecting to / writing to the wrong database.

@SpComb
Copy link

SpComb commented Nov 20, 2019

The <-c.get_master_address_by_name_reply construct in GetAddressByDbName doesn't look very safe anyways, there's no ordering guarantees on the reply channel reads... if two goroutines do concurrent lookups for different databases, then they might randomly get either of the two lookup results, which might be for the wrong database:

reply := <-c.get_master_address_by_name_reply

func (c *Sentinel_connection) GetAddressByDbName(name string) (string, error) {
	c.get_master_address_by_name <- name
	reply := <-c.get_master_address_by_name_reply
	return reply.reply, reply.err
}

The only mitigating aspect here might be the use of unbuffered/blocking channels, which mean that the second c.get_master_address_by_name send might not proceed until the retrieveAddressByDbName has sent the reply... but the two goroutines calling GetAddressByDbName could still race on which gets to the get_master_address_by_name_reply first, although the race window will include the getMasterAddrByNameFromSentinel network call, so that's unlikely. the blocking c.get_master_address_by_name_reply send might preclude that race?

A better way to implement this kind of request-reply pattern is to have the client create and send a new reply chan, and have the handling goroutine close that chan.

sampathkolachana added a commit to sampathkolachana/sentinel_tunnel that referenced this issue Oct 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants