Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix XML comparison, compare time within 1e-6 seconds #63

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

CervEdin
Copy link

There appears to be a difference in how 'time' is generated in the XML output, possibly different in different systems/configurations.

#62

Possible culprit is how the 'time' attribute is for 'testsuite' elements. https://github.com/kyrus/python-junit-xml/blob/4bd08a272f059998cedf9b7779f944d49eba13a6/junit_xml/__init__.py#L135

Test case attributes appear to be formatted using '%f' % time, which is 6 decimal places by default.

For the testsuite, it looks like it's just a str call and maybe this is causing different systems to generate time of different precision.

Due to this small differences in how time is encoded in the XML, the test suite may fail due to differences in the amount of precision is encoded on the system.

Add a new function to the test suite that compares the XML document recursively and assert that tags, text and attributes match.

For 'time' attributes, we compare using math.isclose() with a tolerance of 1e-6.

There appears to be a difference in how 'time' is generated in the XML
output, possibly different in different systems/configurations.

Possible culprit is how the 'time' attribute is for 'testsuite' elements.
https://github.com/kyrus/python-junit-xml/blob/4bd08a272f059998cedf9b7779f944d49eba13a6/junit_xml/__init__.py#L135

Test case attributes appear to be formatted using `'%f' % time`, which
is 6 decimal places by default.

For the testsuite, it looks like it's just a `str` call and maybe this
is causing different systems to generate time of different precision.

Due to this small differences in how time is encoded in the XML, the
test suite may fail due to differences in the amount of precision is
encoded on the system.

Add a new function to the test suite that compares the XML document
recursively and assert that tags, text and attributes match.

For 'time' attributes, we compare using `math.isclose()` with a
tolerance of 1e-6.
@cclauss cclauss requested a review from MoLow September 25, 2024 19:05
@nicola-lunghi
Copy link

nicola-lunghi commented Oct 16, 2024

Hi @CervEdin can you fix the ruff issue with the xml module?

Run ruff check --output-format=github .
Error: test/test_outputs.py:63:24: S314 Using `xml` to parse untrusted data is known to be vulnerable to XML attacks; use `defusedxml` equivalents
Error: test/test_outputs.py:64:22: S314 Using `xml` to parse untrusted data is known to be vulnerable to XML attacks; use `defusedxml` equivalents
Error: Process completed with exit code 1.

from defusedxml documentation:

Instead of:

>>> from xml.etree.ElementTree import parse
>>> et = parse(xmlfile)

alter code to:

>>> from defusedxml.ElementTree import parse
>>> et = parse(xmlfile)

@nicola-lunghi
Copy link

also I don't think this works:

  • the problem is the testsuites line:

<testsuites disabled="0" errors="1" failures="0" tests="2370" time="11.862548">
<testsuite disabled="0" errors="1" failures="0" name="test/fixtures/test" skipped="45" tests="2370" time="11.862548" hostname="{HOSTNAME}">

 <?xml version="1.0" encoding="utf-8"?>
-<testsuites disabled="0" errors="1" failures="0" tests="2367" time="1.1381459999999999">
-       <testsuite disabled="0" errors="1" failures="0" name="test/fixtures/test3" skipped="25" tests="2367" time="1.1381459999999999" hostname="{HOSTNAME}">
+<testsuites disabled="0" errors="1" failures="0" tests="2367" time="1.138146">
+       <testsuite disabled="0" errors="1" failures="0" name="test3" skipped="25" tests="2367" time="1.138146" hostname="{HOSTNAME}">
                <testcase name="test-async-await" time="0.000601" classname="async-hooks"/>

@cclauss
Copy link
Collaborator

cclauss commented Oct 16, 2024

% ruff rule S314

suspicious-xml-element-tree-usage (S314)

Derived from the flake8-bandit linter.

What it does

Checks for uses of insecure XML parsers.

Why is this bad?

Many XML parsers are vulnerable to XML attacks (such as entity expansion),
which cause excessive memory and CPU usage by exploiting recursion. An
attacker could use such methods to access unauthorized resources.

Consider using the defusedxml package when parsing untrusted XML data,
to protect against XML attacks.

Example

from xml.etree.ElementTree import parse

tree = parse("untrusted.xml")  # Vulnerable to XML attacks.

Use instead:

from defusedxml.ElementTree import parse

tree = parse("untrusted.xml")

References

@CervEdin
Copy link
Author

Hi @CervEdin can you fix the ruff issue with the xml module?

@nicola-lunghi I guess there are two options here

  1. Use another dependency which is not the xml parser in the stdlib
  2. Add comments to ignore this particular check in ruff

I personally don't see the security issue here since the XML is trusted but option 1 is not bad per se

also I don't think this works:

* the problem is the testsuites line:

Sorry but I didn't follow what you meant here. Is something failing? Could be that epsilon needs tweaking.
The choice I made for this was pretty arbitrary

@CervEdin
Copy link
Author

% ruff rule S314

suspicious-xml-element-tree-usage (S314)

Derived from the flake8-bandit linter.

What it does

Checks for uses of insecure XML parsers.

Why is this bad?

Many XML parsers are vulnerable to XML attacks (such as entity expansion), which cause excessive memory and CPU usage by exploiting recursion. An attacker could use such methods to access unauthorized resources.

Consider using the defusedxml package when parsing untrusted XML data, to protect against XML attacks.

Example

from xml.etree.ElementTree import parse

tree = parse("untrusted.xml")  # Vulnerable to XML attacks.

Use instead:

from defusedxml.ElementTree import parse

tree = parse("untrusted.xml")

References

* [Python documentation: `xml` — XML processing modules](https://docs.python.org/3/library/xml.html)

* [PyPI: `defusedxml`](https://pypi.org/project/defusedxml/)

* [Common Weakness Enumeration: CWE-400](https://cwe.mitre.org/data/definitions/400.html)

* [Common Weakness Enumeration: CWE-776](https://cwe.mitre.org/data/definitions/776.html)

@cclauss thanks for the references

I wonder if it applies in this case. The XML being parsed is 1) the xml files checked into the repo 2) the xml generated from tap files checked in to the repo and processed by tap2junit.
Spontaneously, neither seem to necessarily qualify as untrusted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants