Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] Example of catching panics in Rust #8757

Closed
wants to merge 3 commits into from

Conversation

jasonish
Copy link
Member

@jasonish jasonish commented Apr 24, 2023

@codecov
Copy link

codecov bot commented Apr 24, 2023

Codecov Report

Merging #8757 (9cbae4e) into master (9a4231d) will decrease coverage by 0.01%.
The diff coverage is 89.55%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8757      +/-   ##
==========================================
- Coverage   82.25%   82.24%   -0.01%     
==========================================
  Files         969      969              
  Lines      273176   273215      +39     
==========================================
+ Hits       224689   224714      +25     
- Misses      48487    48501      +14     
Flag Coverage Δ
fuzzcorpus 64.51% <80.00%> (+0.05%) ⬆️
suricata-verify 60.32% <82.22%> (-0.03%) ⬇️
unittests 62.98% <57.46%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@suricata-qa
Copy link

WARNING:

field baseline test %
SURI_TLPW1_stats_chk
.tcp.rst 112875 145123 128.57%
.app_layer.error.dns_tcp.parser 1 2630 263000.0%
SURI_TLPR1_stats_chk
.app_layer.error.dns_tcp.parser 30 11551 38503.33%

Pipeline 13305

Comment on lines +773 to +776
let _ = catch_unwind(|| {
let state = cast_pointer!(state, DNSState);
state.parse_request_udp(flow, stream_slice);
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why we're using it here. I'm reading Don't build your programs to unwind under normal circumstances. Ideally, you should only panic for programming errors or extreme problems. at https://doc.rust-lang.org/nomicon/unwinding.html where I also see You must absolutely catch any panics at the FFI boundary! so I kind of see why we want to do this. But,

  1. What points of panic/undefined behavior can happen on the enclosed code?
  2. How do we know that the panic caused here for whatever reason would be implemented with unwinding? Or, do we not care about it as it is not incurring any runtime cost in either case?

Thank you for this! :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand why we're using it here. I'm reading Don't build your programs to unwind under normal circumstances. Ideally, you should only panic for programming errors or extreme problems. at https://doc.rust-lang.org/nomicon/unwinding.html where I also see You must absolutely catch any panics at the FFI boundary! so I kind of see why we want to do this. But,

  1. What points of panic/undefined behavior can happen on the enclosed code?

Simple ones that we can catch would be programming errors like walking off the end of an array. In C this might create a segfault, but is a catchable error in Rust. These warnings are more to prevent people from using unwinding as a flow control mechanism, or treating them like exceptions when coming from C++, Python, etc.

I also disagree about the must on FFI boundaries. Currently, we crash if the Rust code panics at the FFI boundary, and that's a reasonable outcome.

  1. How do we know that the panic caused here for whatever reason would be implemented with unwinding? Or, do we not care about it as it is not incurring any runtime cost in either case?

We don't care. If the panic is unwindable we'll catch it and not crash. If it's not catchable, perhaps a deep-rooted segfault, or an illegal instruction the program should abort and crash.

All the Rust web frameworks like Axum, Rocket, Warp, implement such unwinding in their request handlers, so one bad request handler won't take down the server, this is a very similar approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation!
Do I get it right that if there is actually an error that can be unwinded, it'll affect the performance negatively?
Do we log it in stats somehow to indicate what's causing the possible slowdown?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I get it right that if there is actually an error that can be unwinded, it'll affect the performance negatively?

Yes, it should a little as it has to do some resetting of data. Plus that line that gets log that we can't control is slow. All prints to the console are slow though.

The alternative is crashing.

} else if !stream_slice.is_empty() {
return state.parse_response_tcp(flow, stream_slice);
}
if x[99] {}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I didn't mean to commit this introduce error, this is why QA is so far off.

@jasonish jasonish force-pushed the rust-panic-handler/v1 branch from c10b896 to dfa6101 Compare April 25, 2023 15:08
@suricata-qa
Copy link

WARNING:

field baseline test %
SURI_TLPW1_stats_chk
.tcp.rst 112875 145124 128.57%
SURI_TLPR1_stats_chk
.flow.memuse 528023824 938040360 177.65%

Pipeline 13347

@victorjulien
Copy link
Member

victorjulien commented May 5, 2023

I think this looks good for using with big expensive calls like going into the parser. What are your thoughts about the frequent much smaller calls like into json builder? Can we wrap these at minimal cost too, or should we rework those to not be able to panic (to a reasonable level of course)?

IIRC @catenacyber was seeing panics in jsonbuilder when experimenting with alloc failures.

@catenacyber
Copy link
Contributor

IIRC catenacyber was seeing panics in jsonbuilder when experimenting with alloc failures.

Indeed cf google/oss-fuzz#9902

@jasonish
Copy link
Member Author

jasonish commented May 5, 2023

I think this looks good for using with big expensive calls like going into the parser. What are your thoughts about the frequent much smaller calls like into json builder? Can we wrap these at minimal cost too, or should we rework those to not be able to panic (to a reasonable level of course)?

IIRC @catenacyber was seeing panics in jsonbuilder when experimenting with alloc failures.

I will have to do some tests around JsonBuilder but its worth doing. For something like JsonBuilder it makes sense to hand roll this into the API but for parsers that follow an interface for being called into, I'm hoping to abstract this out so each individual parser doesn't need to worry about it - this is something I'm currently looking at, but may be quite intrusive.

@catenacyber
Copy link
Contributor

For JsonBuilder, there should only be a few places with allocations

What is the solution ?
https://github.com/microsoft/rust_fallible_vec ?

@jasonish
Copy link
Member Author

jasonish commented May 5, 2023

For JsonBuilder, there should only be a few places with allocations

What is the solution ? https://github.com/microsoft/rust_fallible_vec ?

I think the idea is for more than just allocations, just any panic that can be unwound.

As all the jb functions currently return an error code, we'd just wrap them and return error if anything unwindable happened, which I assume would include allocation errors.

That crate could be interesting to look at, but this unwinding is a little more generic.

@catenacyber
Copy link
Contributor

Indeed I get it

@jasonish
Copy link
Member Author

jasonish commented May 8, 2023

Rust 1.57.0 as try_reserve which probably makes much more sense for something like JsonBuilder, I've created a ticket here: https://redmine.openinfosecfoundation.org/issues/6057

@jasonish jasonish force-pushed the rust-panic-handler/v1 branch from dfa6101 to a84a836 Compare May 8, 2023 22:01
jasonish added 2 commits May 8, 2023 16:01
Wrap DNS probing/parsing in catch_unwind as an example how to
gracefully handle panics from Rust.
Some very minor changes to formatting.
@jasonish jasonish force-pushed the rust-panic-handler/v1 branch from a84a836 to 66bd9c1 Compare May 8, 2023 22:01
@suricata-qa
Copy link

WARNING:

field baseline test %
SURI_TLPW1_stats_chk
.tcp.rst 113274 145130 128.12%

Pipeline 13683

@@ -604,7 +647,11 @@ impl JsonBuilder {
/// than building onto the buffer.
#[inline(always)]
fn encode_string(&mut self, val: &str) -> Result<(), JsonError> {
let mut buf = vec![0; val.len() * 2 + 2];
let size = val.len() * 2 + 2;
let mut buf = Vec::new();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this fail as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, flagged as a TODO in my new branch/pr specifically for JsonBuilder.

let boxed = Box::new(JsonBuilder::new_object());
Box::into_raw(boxed)
match JsonBuilder::try_new_object() {
Ok(js) => Box::into_raw(Box::new(js)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is Box::new() also doing an alloc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. Box::try_new is an experimental API that would be ideal here, but until then it should probably be wrapped in a panic handler. I need to actually look into if Rust memory allocations panic or abort, the latter not being catchable.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 13684

Convert "new_object" and "new_array" functions that return a Result
and use "try_reserve" to allocate the amount of data requested. This
should allow memory allocation errors to be detected and handled in a
Rust-ful matter without resorting to catching a panic.
@jasonish jasonish force-pushed the rust-panic-handler/v1 branch from 35e2cd1 to 9cbae4e Compare May 9, 2023 15:52
@jasonish
Copy link
Member Author

jasonish commented May 9, 2023

I guess the issue of handling memory allocation errors is still in question. Rust has these try_reserve APIs in many places that do this in a sane way (see the new JsonBuilder), but outside of that I don't think memory allocation errors can be reliably caught, as Rust will abort, rather than panic. The reason being, if the system is truly out of memory, it can't do anything to return a value, or provide that state for a panic handler to unwind into a known state.

So when it comes to allocated memory for data, we should do our best with try_reserve and like methods, but unwinding panics would be more about programming errors, than memory errors.

@suricata-qa
Copy link

Information: QA ran without warnings.

Pipeline 13705

@victorjulien
Copy link
Member

@catenacyber are you able to test how this behaves in your alloc tests?

@jasonish
Copy link
Member Author

@catenacyber are you able to test how this behaves in your alloc tests?

This would probably be the better one to test: #8855

I'm going to close this one and and this branch of pull requests will become more for catching panics in parsers..

@jasonish jasonish closed this May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants