Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/rpcsrv: Return a new server by pointer #3661

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

alexey-savchuk
Copy link
Contributor

@alexey-savchuk alexey-savchuk commented Nov 5, 2024

Problem

Before, a new server was returned by value which could cause a panic unlock of unlocked mutex on SIGHUP handling. It's because the new server overwrites a locked mutex of the already existing server.

oct‚ 22 13:51:15 node1 neo-go[1183338]: fatal error: sync: Unlock of unlocked RWMutex oct‚ 22 13:51:15 node1 neo-go[1183338]: goroutine 538 [running]: oct‚ 22 13:51:15 node1 neo-go[1183338]: sync.fatal({0xf83d64?, 0xc001085880?})
oct‚ 22 13:51:15 node1 neo-go[1183338]:         runtime/panic.go:1007 +0x18
oct‚ 22 13:51:15 node1 neo-go[1183338]: sync.(*RWMutex).Unlock(0xc00019a4c8)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         sync/rwmutex.go:208 +0x45
oct‚ 22 13:51:15 node1 neo-go[1183338]: github.com/nspcc-dev/neo-go/pkg/services/rpcsrv.(*Server).dropSubscriber(0xc00019a2c8, 0xc000a77740)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         github.com/nspcc-dev/neo-go/pkg/services/rpcsrv/server.go:825 +0xce
oct‚ 22 13:51:15 node1 neo-go[1183338]: github.com/nspcc-dev/neo-go/pkg/services/rpcsrv.(*Server).handleWsReads(0xc00019a2c8, 0xc0034478c0, 0xc000af5f80, 0xc000a7
7740)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         github.com/nspcc-dev/neo-go/pkg/services/rpcsrv/server.go:810 +0x266
oct‚ 22 13:51:15 node1 neo-go[1183338]: github.com/nspcc-dev/neo-go/pkg/services/rpcsrv.(*Server).handleHTTPRequest(0xc00019a2c8, {0x11c3900, 0xc003437dc0}, 0xc00
31945a0)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         github.com/nspcc-dev/neo-go/pkg/services/rpcsrv/server.go:582 +0x54a
oct‚ 22 13:51:15 node1 neo-go[1183338]: net/http.HandlerFunc.ServeHTTP(0x471779?, {0x11c3900?, 0xc003437dc0?}, 0xc000943b68?)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:2171 +0x29
oct‚ 22 13:51:15 node1 neo-go[1183338]: net/http.serverHandler.ServeHTTP({0xc000a77680?}, {0x11c3900?, 0xc003437dc0?}, 0x6?)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:3142 +0x8e
oct‚ 22 13:51:15 node1 neo-go[1183338]: net/http.(*conn).serve(0xc0032030e0, {0x11c5220, 0xc000a76960})
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:2044 +0x5e8
oct‚ 22 13:51:15 node1 neo-go[1183338]: created by net/http.(*Server).Serve in goroutine 534
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:3290 +0x4b4

Solution

Return a new server by pointer.

@alexey-savchuk
Copy link
Contributor Author

Additionally, it seems that rpcsrv.Server doesn't wait for all HTTP handler to complete, which could cause an overlap in the lifetimes of the old and new servers and, as a consequence, the behavior described in the PR. I was wondering if this behavior of the server is intended?

neo-go/cli/server/server.go

Lines 565 to 566 in 95098d4

rpcServer.Shutdown()
rpcServer = rpcsrv.New(chain, cfgnew.ApplicationConfiguration.RPC, serv, oracleSrv, log, errChan)

Copy link

codecov bot commented Nov 5, 2024

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 83.09%. Comparing base (cee296e) to head (df9247c).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
cli/server/server.go 33.33% 2 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #3661   +/-   ##
=======================================
  Coverage   83.09%   83.09%           
=======================================
  Files         334      334           
  Lines       46576    46576           
=======================================
+ Hits        38702    38704    +2     
+ Misses       6303     6301    -2     
  Partials     1571     1571           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@AnnaShaleva AnnaShaleva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

it seems that rpcsrv.Server doesn't wait for all HTTP handler to complete

Why do you think so? rpcServer.Shutdown() must properly wait for all connections to be gracefully closed and all server routines to be finished:

for _, srv := range s.http {
s.log.Info("shutting down RPC server", zap.String("endpoint", srv.Addr))
err := srv.Shutdown(context.Background())

And http.Server's `Shutdown documentation says that:

// Shutdown gracefully shuts down the server without interrupting any
// active connections. Shutdown works by first closing all open
// listeners, then closing all idle connections, and then waiting
// indefinitely for connections to return to idle and then shut down.

and subscription routines are also properly awaited:

// Wait for handleSubEvents to finish.
<-s.subEventsToExitCh

@AnnaShaleva AnnaShaleva added this to the v0.107.0 milestone Nov 5, 2024
@AnnaShaleva AnnaShaleva added bug Something isn't working rpc RPC server and client labels Nov 5, 2024
Before, a new server was returned by value which could cause
a panic `unlock of unlocked mutex` on SIGHUP handling. It's
because the new server overwrites a locked mutex of the already
existing server.

oct‚ 22 13:51:15 node1 neo-go[1183338]: fatal error: sync: Unlock of unlocked RWMutex
oct‚ 22 13:51:15 node1 neo-go[1183338]: goroutine 538 [running]:
oct‚ 22 13:51:15 node1 neo-go[1183338]: sync.fatal({0xf83d64?, 0xc001085880?})
oct‚ 22 13:51:15 node1 neo-go[1183338]:         runtime/panic.go:1007 +0x18
oct‚ 22 13:51:15 node1 neo-go[1183338]: sync.(*RWMutex).Unlock(0xc00019a4c8)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         sync/rwmutex.go:208 +0x45
oct‚ 22 13:51:15 node1 neo-go[1183338]: github.com/nspcc-dev/neo-go/pkg/services/rpcsrv.(*Server).dropSubscriber(0xc00019a2c8, 0xc000a77740)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         github.com/nspcc-dev/neo-go/pkg/services/rpcsrv/server.go:825 +0xce
oct‚ 22 13:51:15 node1 neo-go[1183338]: github.com/nspcc-dev/neo-go/pkg/services/rpcsrv.(*Server).handleWsReads(0xc00019a2c8, 0xc0034478c0, 0xc000af5f80, 0xc000a7
7740)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         github.com/nspcc-dev/neo-go/pkg/services/rpcsrv/server.go:810 +0x266
oct‚ 22 13:51:15 node1 neo-go[1183338]: github.com/nspcc-dev/neo-go/pkg/services/rpcsrv.(*Server).handleHTTPRequest(0xc00019a2c8, {0x11c3900, 0xc003437dc0}, 0xc00
31945a0)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         github.com/nspcc-dev/neo-go/pkg/services/rpcsrv/server.go:582 +0x54a
oct‚ 22 13:51:15 node1 neo-go[1183338]: net/http.HandlerFunc.ServeHTTP(0x471779?, {0x11c3900?, 0xc003437dc0?}, 0xc000943b68?)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:2171 +0x29
oct‚ 22 13:51:15 node1 neo-go[1183338]: net/http.serverHandler.ServeHTTP({0xc000a77680?}, {0x11c3900?, 0xc003437dc0?}, 0x6?)
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:3142 +0x8e
oct‚ 22 13:51:15 node1 neo-go[1183338]: net/http.(*conn).serve(0xc0032030e0, {0x11c5220, 0xc000a76960})
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:2044 +0x5e8
oct‚ 22 13:51:15 node1 neo-go[1183338]: created by net/http.(*Server).Serve in goroutine 534
oct‚ 22 13:51:15 node1 neo-go[1183338]:         net/http/server.go:3290 +0x4b4

Signed-off-by: Alexey Savchuk <[email protected]>
@alexey-savchuk
Copy link
Contributor Author

That's a good point. I've thought about it as well.

And http.Server's `Shutdown documentation says that:

// Shutdown gracefully shuts down the server without interrupting any
// active connections. Shutdown works by first closing all open
// listeners, then closing all idle connections, and then waiting
// indefinitely for connections to return to idle and then shut down.

My bad — I meant not the handler itself but the goroutines it might spawn. I've added an example below.

The stack trace I attached shows that the mutex unlock happens in the following call chain:
rpcsrv.(*Server).handleHTTPRequest ->
rpcsrv.(*Server).handleWsReads ->
rpcsrv.(*Server).dropSubscriber
As I can see, during shutdown, we don't wait for the WebSocket connection to close. Correct me if I'm wrong.

package main

import (
	"context"
	"errors"
	"log"
	"net/http"
	"time"
)

func main() {
	// Create a server with a default handler
	srv := http.Server{
		Addr: ":8080",
		Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			// Spawn a goroutine
			go func() {
				ticker := time.NewTicker(time.Second)
				defer ticker.Stop()

				for {
					<-ticker.C
					log.Println("Running")
				}
			}()
		}),
	}

	// Run the server
	go func() {
		err := srv.ListenAndServe()
		if !errors.Is(err, http.ErrServerClosed) {
			exitOnErr(err)
		}
		log.Println("Server has been closed")
	}()
	time.Sleep(time.Second)

	// Make a request
	go func() {
		c := http.Client{}

		req, err := http.NewRequest("GET", "http://localhost:8080", nil)
		exitOnErr(err)

		_, err = c.Do(req)
		exitOnErr(err)
		log.Println("Request has been handled")
	}()
	time.Sleep(5 * time.Second)

	err := srv.Shutdown(context.Background())
	exitOnErr(err)
	log.Println("Server shutdown is completed")

	// The handler's goroutine keeps running
	for {
	}
}

func exitOnErr(err error) {
	if err != nil {
		log.Fatalf("unexpected error: %s\n", err)
	}
}

@AnnaShaleva
Copy link
Member

As I can see, during shutdown, we don't wait for the WebSocket connection to close.

Yes, you're right. I've created a separate issue for that, ref. #3664.

@AnnaShaleva AnnaShaleva merged commit b651658 into nspcc-dev:master Nov 6, 2024
31 checks passed
@alexey-savchuk alexey-savchuk deleted the fix-panic-on-sighup branch November 6, 2024 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rpc RPC server and client
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants