Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move JSON-RPC server into an isolated plugin #1212

Merged

Conversation

cburgdorf
Copy link
Contributor

@cburgdorf cburgdorf commented Aug 23, 2018

What was wrong?

Currently the JSON-RPC server runs in the networking process together the syncing and other messaging. Now that isolated plugins are a thing, it's a good idea to move the JSON-RPC server because:

  • a separate process provides better isolation
    • DDOS attacks
    • crashes in the JSON-RPC can't bring the syncing down
  • better code organization

How was it fixed?

  • provide a LightPeerChainBridge plugin that brings an EventBusLightPeerChain to be consumed from other processes
  • enforce all RPC modules to be async
  • turn existing JSON-RPC server into an isolated plugin

In addition to the current functionality of the JSON-RPC server, the new plugin-based version can be disabled by passing --disable-rpc at startup

Cute Animal Picture

put a cute animal picture link inside the parentheses

@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 3 times, most recently from 7adbad7 to cc59287 Compare August 27, 2018 12:04
OWN = auto()


class PluginContext:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember at least PluginContext(object) being recommended somewhere...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that since python3 the two are equivalent.

Copy link
Member

@pipermerriam pipermerriam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THis looks the same as something else I just reviewed....

@cburgdorf
Copy link
Contributor Author

THis looks the same as something else I just reviewed....

Yep, Will probably have something more interesting here tomorrow ;)

@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 2 times, most recently from 2840d43 to c5ea88d Compare August 28, 2018 13:38
@cburgdorf cburgdorf changed the title Allow plugins to run in a separate process Move JSON-RPC server into an isolated plugin Aug 28, 2018
@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 2 times, most recently from 3d598fe to baf1e40 Compare August 28, 2018 14:07
@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 2 times, most recently from 76551d4 to a7cbf0f Compare August 28, 2018 21:16
@cburgdorf
Copy link
Contributor Author

cburgdorf commented Aug 29, 2018

I moved this to the Ada Lovelace milestone because I think we shouldn't rush it. There's a lot of code to review (not only in this PR but also in the new lahja event bus library) and async problems can be very subtle.

@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 3 times, most recently from 714080a to 272e12e Compare August 29, 2018 13:03
@cburgdorf cburgdorf closed this Aug 29, 2018
@cburgdorf cburgdorf deleted the christoph/feat/plugins-own-process branch August 29, 2018 13:07
@cburgdorf cburgdorf reopened this Aug 29, 2018
@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 2 times, most recently from 714080a to b86f3e3 Compare August 29, 2018 13:39
@cburgdorf
Copy link
Contributor Author

cburgdorf commented Aug 29, 2018

Ok, I think I need a break and I'm reaching out for help. I've been trying whole day to work out the remaining issues but there are two things that stand out that I think are related.

  1. Trinity hangs on shutdown. This is also reproducible with a minimal test case that basically just runs event_bus.start(). This will prevent a graceful shutdown.

  2. The tests fail pretty much for the same reason. The event bus is preventing the tests from closing, so while they succeed in principle, the test runner hangs and is shut down after 10 minutes of inactivity.

Both issues indicate that there's a problem with the event bus in general. Even the simple test in the lahja repo shows that behaviour. It goes green but never closes.

I guess it has to do with the Queue but I tried pretty much every thing I could think of (and what I found in the docs and StackOverflow). Really happy for any help on this.

UPDATE: I received an answer on StackOverflow that should help with this.

https://stackoverflow.com/questions/52081033/how-to-shutdown-process-with-event-loop-and-executor#comment91115577_52081665

@cburgdorf
Copy link
Contributor Author

@pipermerriam @carver This is pretty much ready but tomorrow I'll write an alternative "less magical" version of the LightPeerChainBridgePlugin to have us look at the trade offs.

@cburgdorf
Copy link
Contributor Author

Ok I think to make a fair alternative that evaluates @pipermerriam point of letting the type system help us more (in contrast to metaprogramming magic) we first need to fix the 3rd party library typing issue. Because the thing is, the revealed type of foobar will always be Any for the time being which prevents mypy from being an actual support here.

        foobar = await self.event_bus.request(event)
        reveal_type(foobar)

I will dig into that now.

@cburgdorf
Copy link
Contributor Author

Alright, made the changes to Lahja to ensure the needed type safety guarantees. Will give it a shot now.

@cburgdorf
Copy link
Contributor Author

@pipermerriam I think the main point left open here is the proper handling of cancellations (also, I moved it to a BaseService and use run_daemon_task already). I feel that can be addressed as part of #1284 that I would soon start looking into.

How do you feel about taking it as is?

@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 4 times, most recently from 33cc77e to 5c5dcc4 Compare September 13, 2018 21:19
@cburgdorf
Copy link
Contributor Author

@pipermerriam this looks weird. I just rebased this and now the integration test caught this:

"p2p.exceptions.DecryptionError: Invalid frame mac: expected b'\\xde\\xcag\\x0e\\x94\\xdbkC\\xa8\\xc4\\xb8\\x10\\xdb@\\xf8o', got b'\\xdd\\x16\\xcea]B\\xbd1\\x8a\\x0f\\xa7\\xe8B\\x87\\xf9\\xce'\n"

Looks totally random to me. https://circleci.com/gh/ethereum/py-evm/77685?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

I'm out of town tomorrow until evening. Guess I'll investigate that when I come home.

@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch from 5c5dcc4 to 00c0465 Compare September 13, 2018 21:40
@cburgdorf
Copy link
Contributor Author

Ok, turns out that DecryptionError was indeed random. After all, the integration test boots the node in an environment with internet access which means these things can happen. Forgot that for a moment.

@pipermerriam
Copy link
Member

two things, I never finished reviewing this (will try to do it tonight so you're clear to move forward tomorrow)

second, make an issue for that decryption error as thoseshouldn't escape the trinity process.

p2p/peer.py Outdated
@@ -794,6 +804,16 @@ def __init__(self,
self.max_peers = max_peers
self.connected_nodes: Dict[Node, BasePeer] = {}
self._subscribers: List[PeerSubscriber] = []
self.event_bus = event_bus
self.run_task(self.handle_peer_count_requests())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be using self.run_daemon_task

header = self._headerdb.get_block_header_by_hash(block_hash)
return self.get_block_by_header(header)
return await self.get_block_by_header(header)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What am missing? The very next function definition is for get_block_by_header and it is both not a coroutine and raises a NotImplementedError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that was meant to go to coro_get_block_by_header

return async_formatted_func
else:
@functools.wraps(func)
def formatted_func(self: Any, *args: Any) -> Callable[..., Any]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still necessary, quick glance suggests that all of the wrapped methods are now coroutines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will change that.

@cburgdorf cburgdorf force-pushed the christoph/feat/plugins-own-process branch 3 times, most recently from df83846 to 6ee8b1d Compare September 15, 2018 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants