Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better XDP error handling #391

Merged
merged 2 commits into from
Jan 23, 2025
Merged

Conversation

enhaut
Copy link
Member

@enhaut enhaut commented Jan 17, 2025

Description

XDP's test modules might raise an exception when running on agent
machine. Currently these are not propagated to conroller, so it treats
incomplete/wrong results as a valid ones and so, controller crashes
on weird issues because of that.

Tests

  • no crash test (just to be sure everything works as expected) J:10515937
  • crash test (to test added code) J:10515938

Copy link
Collaborator

@olichtne olichtne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is probably a good step but this just raises a different exception which means that the Controller will still crash, just with a different exception.

this can be seen in your crash test logs:

LNST Controller crashed with an exception:
Traceback (most recent call last):
  File "/mnt/tests/data.lnst.anl.eng.rdu2.dc.redhat.com/data-server-content/gitlab-tasks/beaker-lnst-tasks/master.tar.gz/lnst/test-runner/./do-my-test", line 35, in main
    ctl.run(recipe, multimatch=bool(params.get("MULTIMATCH", False)))
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Controller/Controller.py", line 172, in run
    recipe.test()
  File "/root/rhextensions-lnst/lnst/RHExtensions/RHRecipeMixin.py", line 109, in test
    super(RHRecipeMixin, self).test()
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Recipes/ENRT/BaseEnrtRecipe.py", line 210, in test
    self.do_tests(recipe_config)
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Recipes/ENRT/BaseEnrtRecipe.py", line 332, in do_tests
    self.do_perf_tests(recipe_config)
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/Recipes/ENRT/BaseEnrtRecipe.py", line 357, in do_perf_tests
    result = self.perf_test(perf_config)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/rhextensions-lnst/lnst/RHExtensions/RHRecipeMixin.py", line 425, in perf_test
    return super().perf_test(recipe_conf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Recipe.py", line 162, in perf_test
    self.perf_test_iteration(recipe_conf, results)
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Recipe.py", line 192, in perf_test_iteration
    measurement_results = measurement.collect_results()
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Measurements/XDPBenchMeasurement.py", line 154, in collect_results
    flow_results.generator_results = self._parse_generator_results(
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/virtualenvs/rhextensions-lnst-Xo1BSm3a-py3.12/lib/python3.12/site-packages/lnst/RecipeCommon/Perf/Measurements/XDPBenchMeasurement.py", line 169, in _parse_generator_results
    raise job.result["exception"]  # propagate exception from agent
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
lnst.Tests.BaseTestModule.TestModuleError: pktgen module is not loaded

This is better than before, it better explains what went wrong, but our other Measurement modules don't reraise exceptions, instead they return "fail results" with some error messages to indicate issues.

This may be important when running multiple tests where possibly some can fail due to errors but some can run just fine.

Or potentially to let rerun mechanics to rerun the test automatically in case of random errors, similar to what i did here: #382

In some cases it's likely good to raise some exception, to indicate serious really bad issues, but i'm not sure if this should be always the case. In what situations will this occur?

PktGen test module may raise an exception which needs to be
handled on controller side as well to prevent it to treat
exception as a result.
Handling exceptions from agent's test modules to prevent
controller crash as it excepts resuts to be present.
@enhaut enhaut force-pushed the xdp_error_handling branch from 2a88663 to 70fc317 Compare January 23, 2025 09:27
@enhaut
Copy link
Member Author

enhaut commented Jan 23, 2025

Hmm actually yeah, in context of our tooling it makes sense to just report "invalid" results

@enhaut enhaut requested a review from olichtne January 23, 2025 12:12
@olichtne olichtne merged commit a26a22b into LNST-project:master Jan 23, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants