[draft] Example of catching panics in Rust #8757

jasonish · 2023-04-24T21:37:21Z

https://redmine.openinfosecfoundation.org/issues/3333

codecov · 2023-04-24T22:12:57Z

Codecov Report

Merging #8757 (9cbae4e) into master (9a4231d) will decrease coverage by 0.01%.
The diff coverage is 89.55%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8757      +/-   ##
==========================================
- Coverage   82.25%   82.24%   -0.01%     
==========================================
  Files         969      969              
  Lines      273176   273215      +39     
==========================================
+ Hits       224689   224714      +25     
- Misses      48487    48501      +14

Flag	Coverage Δ
fuzzcorpus	`64.51% <80.00%> (+0.05%)`	⬆️
suricata-verify	`60.32% <82.22%> (-0.03%)`	⬇️
unittests	`62.98% <57.46%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

suricata-qa · 2023-04-25T03:30:02Z

WARNING:

field	baseline	test	%
	SURI_TLPW1_stats_chk
.tcp.rst	112875	145123	128.57%
.app_layer.error.dns_tcp.parser	1	2630	263000.0%
	SURI_TLPR1_stats_chk
.app_layer.error.dns_tcp.parser	30	11551	38503.33%

Pipeline 13305

inashivb · 2023-04-25T05:25:55Z

rust/src/dns/dns.rs

+    let _ = catch_unwind(|| {
+        let state = cast_pointer!(state, DNSState);
+        state.parse_request_udp(flow, stream_slice);
+    });


I'm trying to understand why we're using it here. I'm reading Don't build your programs to unwind under normal circumstances. Ideally, you should only panic for programming errors or extreme problems. at https://doc.rust-lang.org/nomicon/unwinding.html where I also see You must absolutely catch any panics at the FFI boundary! so I kind of see why we want to do this. But,

What points of panic/undefined behavior can happen on the enclosed code?

How do we know that the panic caused here for whatever reason would be implemented with unwinding? Or, do we not care about it as it is not incurring any runtime cost in either case?

Thank you for this! :)

For reference https://redmine.openinfosecfoundation.org/issues/3333

I'm trying to understand why we're using it here. I'm reading Don't build your programs to unwind under normal circumstances. Ideally, you should only panic for programming errors or extreme problems. at https://doc.rust-lang.org/nomicon/unwinding.html where I also see You must absolutely catch any panics at the FFI boundary! so I kind of see why we want to do this. But,

What points of panic/undefined behavior can happen on the enclosed code?

Simple ones that we can catch would be programming errors like walking off the end of an array. In C this might create a segfault, but is a catchable error in Rust. These warnings are more to prevent people from using unwinding as a flow control mechanism, or treating them like exceptions when coming from C++, Python, etc.

I also disagree about the must on FFI boundaries. Currently, we crash if the Rust code panics at the FFI boundary, and that's a reasonable outcome.

How do we know that the panic caused here for whatever reason would be implemented with unwinding? Or, do we not care about it as it is not incurring any runtime cost in either case?

We don't care. If the panic is unwindable we'll catch it and not crash. If it's not catchable, perhaps a deep-rooted segfault, or an illegal instruction the program should abort and crash.

All the Rust web frameworks like Axum, Rocket, Warp, implement such unwinding in their request handlers, so one bad request handler won't take down the server, this is a very similar approach.

Thank you for the explanation!
Do I get it right that if there is actually an error that can be unwinded, it'll affect the performance negatively?
Do we log it in stats somehow to indicate what's causing the possible slowdown?

Do I get it right that if there is actually an error that can be unwinded, it'll affect the performance negatively?

Yes, it should a little as it has to do some resetting of data. Plus that line that gets log that we can't control is slow. All prints to the console are slow though.

The alternative is crashing.

jasonish · 2023-04-25T15:07:32Z

rust/src/dns/dns.rs

+        } else if !stream_slice.is_empty() {
+            return state.parse_response_tcp(flow, stream_slice);
+        }
+	if x[99] {}


Oops, I didn't mean to commit this introduce error, this is why QA is so far off.

suricata-qa · 2023-04-26T01:45:03Z

WARNING:

field	baseline	test	%
	SURI_TLPW1_stats_chk
.tcp.rst	112875	145124	128.57%
	SURI_TLPR1_stats_chk
.flow.memuse	528023824	938040360	177.65%

Pipeline 13347

victorjulien · 2023-05-05T15:37:49Z

I think this looks good for using with big expensive calls like going into the parser. What are your thoughts about the frequent much smaller calls like into json builder? Can we wrap these at minimal cost too, or should we rework those to not be able to panic (to a reasonable level of course)?

IIRC @catenacyber was seeing panics in jsonbuilder when experimenting with alloc failures.

catenacyber · 2023-05-05T15:46:30Z

IIRC catenacyber was seeing panics in jsonbuilder when experimenting with alloc failures.

Indeed cf google/oss-fuzz#9902

jasonish · 2023-05-05T17:47:47Z

I think this looks good for using with big expensive calls like going into the parser. What are your thoughts about the frequent much smaller calls like into json builder? Can we wrap these at minimal cost too, or should we rework those to not be able to panic (to a reasonable level of course)?

IIRC @catenacyber was seeing panics in jsonbuilder when experimenting with alloc failures.

I will have to do some tests around JsonBuilder but its worth doing. For something like JsonBuilder it makes sense to hand roll this into the API but for parsers that follow an interface for being called into, I'm hoping to abstract this out so each individual parser doesn't need to worry about it - this is something I'm currently looking at, but may be quite intrusive.

catenacyber · 2023-05-05T18:32:10Z

For JsonBuilder, there should only be a few places with allocations

What is the solution ?
https://github.com/microsoft/rust_fallible_vec ?

jasonish · 2023-05-05T18:35:24Z

For JsonBuilder, there should only be a few places with allocations

What is the solution ? https://github.com/microsoft/rust_fallible_vec ?

I think the idea is for more than just allocations, just any panic that can be unwound.

As all the jb functions currently return an error code, we'd just wrap them and return error if anything unwindable happened, which I assume would include allocation errors.

That crate could be interesting to look at, but this unwinding is a little more generic.

catenacyber · 2023-05-05T18:37:27Z

Indeed I get it

jasonish · 2023-05-08T13:52:21Z

Rust 1.57.0 as try_reserve which probably makes much more sense for something like JsonBuilder, I've created a ticket here: https://redmine.openinfosecfoundation.org/issues/6057

Wrap DNS probing/parsing in catch_unwind as an example how to gracefully handle panics from Rust.

Some very minor changes to formatting.

suricata-qa · 2023-05-09T00:30:03Z

WARNING:

field	baseline	test	%
	SURI_TLPW1_stats_chk
.tcp.rst	113274	145130	128.12%

Pipeline 13683

victorjulien · 2023-05-09T04:27:08Z

rust/src/jsonbuilder.rs

@@ -604,7 +647,11 @@ impl JsonBuilder {
    /// than building onto the buffer.
    #[inline(always)]
    fn encode_string(&mut self, val: &str) -> Result<(), JsonError> {
-        let mut buf = vec![0; val.len() * 2 + 2];
+	let size = val.len() * 2 + 2;
+        let mut buf = Vec::new();


can this fail as well?

Yeah, flagged as a TODO in my new branch/pr specifically for JsonBuilder.

victorjulien · 2023-05-09T04:27:36Z

rust/src/jsonbuilder.rs

-    let boxed = Box::new(JsonBuilder::new_object());
-    Box::into_raw(boxed)
+    match JsonBuilder::try_new_object() {
+        Ok(js) => Box::into_raw(Box::new(js)),


is Box::new() also doing an alloc?

It is. Box::try_new is an experimental API that would be ideal here, but until then it should probably be wrapped in a panic handler. I need to actually look into if Rust memory allocations panic or abort, the latter not being catchable.

suricata-qa · 2023-05-09T04:30:03Z

Information: QA ran without warnings.

Pipeline 13684

Convert "new_object" and "new_array" functions that return a Result and use "try_reserve" to allocate the amount of data requested. This should allow memory allocation errors to be detected and handled in a Rust-ful matter without resorting to catching a panic.

jasonish · 2023-05-09T17:05:59Z

I guess the issue of handling memory allocation errors is still in question. Rust has these try_reserve APIs in many places that do this in a sane way (see the new JsonBuilder), but outside of that I don't think memory allocation errors can be reliably caught, as Rust will abort, rather than panic. The reason being, if the system is truly out of memory, it can't do anything to return a value, or provide that state for a panic handler to unwind into a known state.

So when it comes to allocated memory for data, we should do our best with try_reserve and like methods, but unwinding panics would be more about programming errors, than memory errors.

suricata-qa · 2023-05-09T21:55:03Z

Information: QA ran without warnings.

Pipeline 13705

victorjulien · 2023-05-10T04:48:45Z

@catenacyber are you able to test how this behaves in your alloc tests?

jasonish · 2023-05-10T15:32:04Z

@catenacyber are you able to test how this behaves in your alloc tests?

This would probably be the better one to test: #8855

I'm going to close this one and and this branch of pull requests will become more for catching panics in parsers..

inashivb reviewed Apr 25, 2023

View reviewed changes

jasonish commented Apr 25, 2023

View reviewed changes

jasonish force-pushed the rust-panic-handler/v1 branch from c10b896 to dfa6101 Compare April 25, 2023 15:08

jasonish force-pushed the rust-panic-handler/v1 branch from dfa6101 to a84a836 Compare May 8, 2023 22:01

jasonish added 2 commits May 8, 2023 16:01

rust/dns: example of catching rust panics

d904590

Wrap DNS probing/parsing in catch_unwind as an example how to gracefully handle panics from Rust.

jsonbuilder: rustfmt

66bd9c1

Some very minor changes to formatting.

jasonish force-pushed the rust-panic-handler/v1 branch from a84a836 to 66bd9c1 Compare May 8, 2023 22:01

victorjulien reviewed May 9, 2023

View reviewed changes

jasonish force-pushed the rust-panic-handler/v1 branch from 35e2cd1 to 9cbae4e Compare May 9, 2023 15:52

jasonish closed this May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[draft] Example of catching panics in Rust #8757

[draft] Example of catching panics in Rust #8757

jasonish commented Apr 24, 2023 •

edited by victorjulien

Loading

codecov bot commented Apr 24, 2023 •

edited

Loading

suricata-qa commented Apr 25, 2023

inashivb Apr 25, 2023

victorjulien Apr 25, 2023

jasonish Apr 25, 2023

inashivb Apr 26, 2023

jasonish Apr 26, 2023

jasonish Apr 25, 2023

suricata-qa commented Apr 26, 2023

victorjulien commented May 5, 2023 •

edited

Loading

catenacyber commented May 5, 2023

jasonish commented May 5, 2023

catenacyber commented May 5, 2023

jasonish commented May 5, 2023

catenacyber commented May 5, 2023

jasonish commented May 8, 2023

suricata-qa commented May 9, 2023

victorjulien May 9, 2023

jasonish May 9, 2023

victorjulien May 9, 2023

jasonish May 9, 2023

suricata-qa commented May 9, 2023

jasonish commented May 9, 2023

suricata-qa commented May 9, 2023

victorjulien commented May 10, 2023

jasonish commented May 10, 2023

[draft] Example of catching panics in Rust #8757

[draft] Example of catching panics in Rust #8757

Conversation

jasonish commented Apr 24, 2023 • edited by victorjulien Loading

codecov bot commented Apr 24, 2023 • edited Loading

Codecov Report

suricata-qa commented Apr 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suricata-qa commented Apr 26, 2023

victorjulien commented May 5, 2023 • edited Loading

catenacyber commented May 5, 2023

jasonish commented May 5, 2023

catenacyber commented May 5, 2023

jasonish commented May 5, 2023

catenacyber commented May 5, 2023

jasonish commented May 8, 2023

suricata-qa commented May 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

suricata-qa commented May 9, 2023

jasonish commented May 9, 2023

suricata-qa commented May 9, 2023

victorjulien commented May 10, 2023

jasonish commented May 10, 2023

jasonish commented Apr 24, 2023 •

edited by victorjulien

Loading

codecov bot commented Apr 24, 2023 •

edited

Loading

victorjulien commented May 5, 2023 •

edited

Loading