Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Mysql Server Resets When Connecting Immediately After Launch #8131

Closed
farhanhubble opened this issue Jul 17, 2024 · 5 comments
Closed
Labels
bug Something isn't working customer issue docker good repro Easily reproducible bugs sql server Issues related to the built in SQL server

Comments

@farhanhubble
Copy link

farhanhubble commented Jul 17, 2024

Scenario:

  • Launch Dolt Mysql server using the latest docker image (dolthub/dolt-sql-server:latest , 12b8a3ac4c1a)
  • Connect to server using a mysql library, like SqlAlchemy+Pymysql
  • Carry out any read operation.
  • Exception pymysql.err.OperationalError - Lost connection to MySQL server during query

MRE:

# No bytes received when not waiting before connection (time.sleep(0))
docker start DoltServer && python -c 'import socket; import time; time.sleep(0); sock=socket.create_connection(("localhost", 3307), timeout=1); sock.settimeout(1); d=sock.recv(1024); print(d)' ; docker stop DoltServer


# Valid bytes when waiting (sleep(2))
> docker start DoltServer && python -c 'import socket; import time; time.sleep(2); sock=socket.create_connection(("localhost", 3307), timeout=1); sock.settimeout(1); d=sock.recv(1024); print(d)' ; docker stop DoltServer
> b'J\x00\x00\x00\n8.0.33\x00\x01\x00\x00\x00\x0bzke)r4@\x00\x8f\xa2!\x00\x00;\x01\x15\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x7fDh>`ozb@Ri5\x00mysql_native_password\x00'

Additional Info:

  • Dolt server start command:
docker 'run', '--name', 'DoltServer', '--restart', 'unless-stopped', '--detach', '-p', '3307:3306', '-v', './.doltdbdir:/var/lib/dolt', '-v', './.doltworkdir/servercfg.d:/etc/dolt/servercfg.d', '-v', './.doltworkdir/doltcfg.d:/etc/dolt/doltcfg.d', 'dolthub/dolt-sql-server:latest'
  • Server Configuration:
log_level: info

behavior:
  read_only: false
  autocommit: true
  persistence_behavior: load
  disable_client_multi_statements: false
  dolt_transaction_commit: false
  event_scheduler: "ON"

user:
  name: ""
  password: ""

listener:
  host: 0.0.0.0
  port: 3306
  max_connections: 100
  read_timeout_millis: 28800000
  write_timeout_millis: 28800000
  tls_key: null
  tls_cert: null
  require_secure_transport: null
  allow_cleartext_passwords: null

performance:
  query_parallelism: null

data_dir: .

cfg_dir: .doltcfg

metrics:
  labels: {}
  host: null
  port: -1

remotesapi: {}

privilege_file: .doltcfg/privileges.db

branch_control_file: .doltcfg/branch_control.db

user_session_vars: []
@timsehn timsehn added bug Something isn't working sql server Issues related to the built in SQL server good repro Easily reproducible bugs docker labels Jul 17, 2024
@timsehn
Copy link
Contributor

timsehn commented Jul 17, 2024

Thanks. This seems like we're accepting the connection in a state where we can't service it and then never getting back to it. Good repro. We'll get started debugging this in the next day or so.

@fulghum
Copy link
Contributor

fulghum commented Jul 18, 2024

Thanks for providing such easy repro steps @farhanhubble! 🙏 I've been digging into this one this morning, and I can repro the behavior you're seeing.

I'm pretty sure this is a timing issue... when docker start starts up the stopped container, it isn't giving the container long enough to fully get back into service, so the python script connects to the mapped port, but there isn't anything inside the docker container ready to handle that request, hence the empty response. One of the clues supporting that is that when I take out the docker stop DoltServer command at the end of your repro, I don't see this behavior anymore.

I'm going to experiment with adding a HEALTHCHECK to our sql-server Dockerfile. I believe that can get docker start to pause until the server is ready to handle connections.

Thanks for helping us identify this issue. I'll have another update for you soon.

@fulghum
Copy link
Contributor

fulghum commented Jul 18, 2024

I dug into this one some more... I'm not a Docker expert, so this was fun for me to get deeper into what's going on with various container commands and container startup...

docker start will always return as soon as soon as the container is started. This means that the initialization script for the Dolt sql-server image (or for the MySQL server image) will still be running when docker start returns. Defining health checks in a Dockerfile does not change this – they expose a health status for the container, but they do not cause docker start to wait for the container to be healthy before they return.

docker run has a slightly different behavior. It does start a container, just like docker start, but by default, it also attaches to that container to show the STDOUT from the main process. This ends up giving the initialization scripts a little more time to run before the container is used by another process, which may be enough for container initialization to complete. (You can also pass --attach to docker start, but it doesn't help in your case, because your python script won't run until you detach from that docker container or stop the docker container).

I noticed that the MySQL Docker image (which we use an example for Dolt's sql-server image) does not define healthchecks and I also saw in the documentation that it has the same behavior with initialization taking a few seconds. (See the "No connections until MySQL init complete" section in the MySQL Docker image documentation). That's specifically around initializing a brand new MySQL install, but it illustrates that the same behavior is happening with docker start always returning immediately.

The standard approach for this with Docker containers seems to be to poll the container to check for readiness. We provide an example of doing this with Kubernetes in our docs in case that's helpful. For your example, I think a good approach would be to have a small loop in your Python code that checks to see if d has received enough bytes to indicate that the server is running. That loop could retry every 250ms or so, for a set number of times, and then break out of the loop when it detects a successful connection, so the rest of the application code can run. I've included an example python script that does this below in case that's helpful.

Let us know if any of that doesn't make sense, or if you just think I'm misunderstanding something that's going on here. We're open to adding more of this info to our docs if you think that might have been helpful for you, too.

Python check_connection.py Example

import socket
import time

# Parameters
host = "localhost"
port = 3307
timeout = 1
check_interval = 0.25  # 250 milliseconds
max_duration = 5  # 5 seconds

# Function to check for data
def check_connection():
    start_time = time.time()
    while True:
        sock = socket.create_connection((host, port), timeout=timeout)
        sock.settimeout(timeout)
        
        try:
            data = sock.recv(1024)
            if len(data) > 4:
                print("Data received:", data)
                return
        except socket.timeout:
            pass
        finally:
            sock.close()

        if time.time() - start_time > max_duration:
            print("Timeout: No sufficient data received within 5 seconds.")
            return
            
        time.sleep(check_interval)

# Run the data check function
check_connection()

This script can be used like this:
docker start DoltServer && python check_connection.py && docker stop DoltServer

@farhanhubble
Copy link
Author

farhanhubble commented Jul 18, 2024

Thank you so much @fulghum! I'm no docker expert myself. The interesting thing is that I noticed this issue because I had written a await_heartbeat() function to check that the server was ready and all it did was something like nc -z localhost 3037 and check for the return code. Because of the bug though I'd get a return code of 0 and then when I tried to read data I'd get the PyMySQL error. I modified the function to check liveness by actually trying to read data, which helped me get around the problem.

I wasn't aware of the subtleties of docker start vs run but in this case using docker run did not make a difference. I did not want to count on a process being started to decide if the process was ready for network connections and was relying on await_heartbeat(). In hindsight, I should have checked with a read operation earlier since there are a few ways for ports to show up open even when the process that opened them is gone.

@fulghum
Copy link
Contributor

fulghum commented Jul 18, 2024

My pleasure! This was a fun one to dig into. 😄 I'm glad you found a way to poll for the server readiness and verify that the server has started up successfully. For others who may read this issue... polling for a successful read is a much more resilient approach than a static sleep since startup times can change for a variety of reasons such as load on the host or the size of the database.

I'll go ahead and resolve since we got this figured out, but don't hesitate to let us know if there's anything else we can do to help you build with Dolt!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working customer issue docker good repro Easily reproducible bugs sql server Issues related to the built in SQL server
Projects
None yet
Development

No branches or pull requests

4 participants