Better XDP error handling #391

enhaut · 2025-01-17T13:59:54Z

Description

XDP's test modules might raise an exception when running on agent
machine. Currently these are not propagated to conroller, so it treats
incomplete/wrong results as a valid ones and so, controller crashes
on weird issues because of that.

Tests

no crash test (just to be sure everything works as expected) J:10515937
crash test (to test added code) J:10515938

olichtne

So this is probably a good step but this just raises a different exception which means that the Controller will still crash, just with a different exception.

this can be seen in your crash test logs:

LNST Controller crashed with an exception:
Traceback (most recent call last):
  File "/mnt/tests/data.lnst.anl.eng.rdu2.dc.redhat.com/data-server-content/gitlab-tasks/beaker-lnst-tasks/master.tar.gz/lnst/test-runner/./do-my-test", line 35, in main
    ctl.run(recipe, multimatch=bool(params.get("MULTIMATCH", False)))
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Controller/Controller.py", line 172, in run
    recipe.test()
  File "/root/rhextensions-lnst/lnst/RHExtensions/RHRecipeMixin.py", line 109, in test
    super(RHRecipeMixin, self).test()
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Recipes/ENRT/BaseEnrtRecipe.py", line 210, in test
    self.do_tests(recipe_config)
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Recipes/ENRT/BaseEnrtRecipe.py", line 332, in do_tests
    self.do_perf_tests(recipe_config)
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Recipes/ENRT/BaseEnrtRecipe.py", line 357, in do_perf_tests
    result = self.perf_test(perf_config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/rhextensions-lnst/lnst/RHExtensions/RHRecipeMixin.py", line 425, in perf_test
    return super().perf_test(recipe_conf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Recipe.py", line 162, in perf_test
    self.perf_test_iteration(recipe_conf, results)
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Recipe.py", line 192, in perf_test_iteration
    measurement_results = measurement.collect_results()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Measurements/XDPBenchMeasurement.py", line 154, in collect_results
    flow_results.generator_results = self._parse_generator_results(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Measurements/XDPBenchMeasurement.py", line 169, in _parse_generator_results
    raise job.result["exception"]  # propagate exception from agent
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lnst.Tests.BaseTestModule.TestModuleError: pktgen module is not loaded

This is better than before, it better explains what went wrong, but our other Measurement modules don't reraise exceptions, instead they return "fail results" with some error messages to indicate issues.

This may be important when running multiple tests where possibly some can fail due to errors but some can run just fine.

Or potentially to let rerun mechanics to rerun the test automatically in case of random errors, similar to what i did here: #382

In some cases it's likely good to raise some exception, to indicate serious really bad issues, but i'm not sure if this should be always the case. In what situations will this occur?

PktGen test module may raise an exception which needs to be handled on controller side as well to prevent it to treat exception as a result.

Handling exceptions from agent's test modules to prevent controller crash as it excepts resuts to be present.

enhaut · 2025-01-23T12:12:40Z

Hmm actually yeah, in context of our tooling it makes sense to just report "invalid" results

olichtne requested changes Jan 21, 2025

View reviewed changes

enhaut added 2 commits January 23, 2025 10:27

XDPBenchMeasurement: propagate exception from generator

16e328e

PktGen test module may raise an exception which needs to be handled on controller side as well to prevent it to treat exception as a result.

XDPBenchMeasurement: propagate exception from receiver

70fc317

Handling exceptions from agent's test modules to prevent controller crash as it excepts resuts to be present.

enhaut force-pushed the xdp_error_handling branch from 2a88663 to 70fc317 Compare January 23, 2025 09:27

enhaut requested a review from olichtne January 23, 2025 12:12

olichtne approved these changes Jan 23, 2025

View reviewed changes

olichtne merged commit a26a22b into LNST-project:master Jan 23, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better XDP error handling #391

Better XDP error handling #391

enhaut commented Jan 17, 2025 •

edited

Loading

olichtne left a comment

enhaut commented Jan 23, 2025

Better XDP error handling #391

Better XDP error handling #391

Conversation

enhaut commented Jan 17, 2025 • edited Loading

Description

Tests

olichtne left a comment

Choose a reason for hiding this comment

enhaut commented Jan 23, 2025

enhaut commented Jan 17, 2025 •

edited

Loading