Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Route core worker ERROR/FATAL logs to driver logs #18577

Merged
merged 3 commits into from
Sep 14, 2021

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Sep 13, 2021

Why are these changes needed?

Currently, core worker ERROR and FATAL logs are sent to python-core-worker-* files in /tmp/ray. This makes it hard to detect error conditions and check failures, as the user only gets a message to "check logs".

This change configures the spdlog to also send err+ message unconditionally to the stderr, which means the normal log routing machinery picks it up.

Related issue number

Closes #12893

@@ -570,13 +570,13 @@ CoreWorker::CoreWorker(const CoreWorkerOptions &options, const WorkerID &worker_
// Retry after a delay to emulate the existing Raylet reconstruction
// behaviour. TODO(ekl) backoff exponentially.
uint32_t delay = RayConfig::instance().task_retry_delay_ms();
RAY_LOG(ERROR) << "Will resubmit task after a " << delay
<< "ms delay: " << spec.DebugString();
RAY_LOG(INFO) << "Will resubmit task after a " << delay
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid excess log spam since we print a message elsewhere on retry already.

@ericl ericl merged commit 3e0ae38 into ray-project:master Sep 14, 2021
edoakes added a commit to edoakes/ray that referenced this pull request Sep 14, 2021
edoakes added a commit that referenced this pull request Sep 14, 2021
ericl added a commit to ericl/ray that referenced this pull request Sep 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Logging] Core worker ERRORs/check failures not streamed to drivers.
5 participants