Skip to content

File Comparison Across Versions

Conrad Rosenbrock edited this page Apr 22, 2016 · 7 revisions

FORTPY: File Comparison

Return to the Unit Testing Main Page

Part of the unit testing framework's task is to compare the output of the executable that it generates with some "model" output files that are known to have the right answer. A problem arises in that code changes over time. As improvements or new features are added, the output files change and a simple diff comparison between them can't tell if they are actually related. When new functionality is added, we still expect the methods to be able to reproduce old results (if those results were actually true). With minor modifications to the fortran code, the framework will be enabled to compare output files from different file versions.

Output Templates

An output file template is an XML file that specifies the structure of output files and which parts of output files are common between versions of the same file. The naming convention is to use the name of the output file and append a ".xml" to the end; e.g. structs.out becomes structs.out.xml. The framework assumes that the directory which holds all the templates is stored in a folder called "templates" within the code folder. If that is not the case, you should specify a custom folder as a argument to the unit testing script.

An output template has four sections: <preamble>, <body>, <outcomes> and <comparisons>.

The preamble of an output file refers to those entries in the file that are not repeated for each element of an output set. For example, an output file may have a collection of structures with the first line in the file specifying how many there are. In that case, the line describing how many there are is part of the preamble. The individual structures would form the body. The body, then, is a template of a single element that is repeated many times within the output.

The outcomes section specifies which entries can be ignored because they are not relevant to the content (e.g. uncommented text entries that describe data).

The comparison section defines how individual elements in the output file should be compared. For example, if there is inherent randomness in a result, a tolerance can be specified and as long as the corresponding entries are "close enough", the comparison would be successful.

Each of these sections in the output file is covered in detail in the sections below. Here is an example of a complete output template for multiple versions of a standard output file.

<preamble> Section

versions is the only attribute on the <preamble> tag; it specifies which versions are represented by its contents. Although this could easily be ascertained by enumerating all the children and then looking for unique versions, it is just as easy to put it as an attribute.

The preamble section can have two types of children, <line> and <lines>. Most of their attributes are common. We will treat those first:

  • id a name to give the value extracted for this line in the output file.
  • names specifies a list of names to assign to elements in the line. If a list of consecutive elements should be treated as a vector, specify a name followed by "*n" where n is the number of consecutive elements to take into the list. The total number of elements specified in names (with *n) should equal the total number of values. If there are fewer names than values, all remaining values are saved in the last name.
  • type a comma-separated list of data types that appear in the line and the order in which they appear. For example, "1 3 5 0.0005 10 10" would be "int, float, int".
  • values the number of values of each type to read. In the preceding example, values would be "3, 1, 2" meaning read 3 ints, 1 float and then 2 ints. If there are a variable number of items in the line, use "". When "" is used, all the remaining items on the line are extracted using the type associated with "*" in the list. There needs to be a one-to-one correspondence between type and values.
  • versions specifies which output file versions have this line.
  • store instructs the framework to make value(s) from the line available globally to any <lines> that appear after it. The format is name=$ where $ represents the list of values extracted from the line. The expression on the RHS of "=" can be any valid python expression. It will be evaluated using eval() after replacing all $ with the name of the list containing the values; e.g. nclusters=$[0]-1 will take the first value of the line, subtract 1 and store it in a variable called "nclusters" for use by other <lines>. Since the = is used to name the variable-value pair, the equal operator from python can be accessed using operator.eq as in conc=(True if operator.eq($[0].lower(), 't') else False). For multiple stored values from a single line, use a ;-separated list.

When store appears in the <preamble> the values it stores are available to any <lines> in the preamble and <body>. When it appears on a <line> inside the body, its values are only available to lines within the same, current, body element iteration.

The following attributes only appear in <lines>. The lines tag is almost identical to <line> except that it repeats count times. The alternative would be to have count identical <line> elements in a row in the template.

  • count specifies how many times to repeat the line template (can be zero in which case the line template is not used). The value can be the name of a variable, a constant integer or any valid python code, as in $(knary if conc else 0).

<body> Section

The body is also composed of <line> and <lines> elements that have identical functionality. The difference between <body> and <preamble> is that the body definitions are repeated until a stop condition is met. In other words, the preamble is stepped through exactly once and each line in the preamble is matched with the corresponding line in the output file. When the body is processed, its definitions are stepped through repeatedly until the stop condition is met. Each time the body template is processed, it creates a body "block".

The body has extra attributes to define how it repeats and how to match the sets of values that it parses:

  • count for a finite stop condition, specifies how many times to repeat the body template.
  • key if elements of the body appear in a different order between versions of files, a key can be specified on which to compare body blocks. The key is an id.name combination from a <line> whose value will be used to match up body blocks.
  • stop specifies when the body will stop iterating. Possible values are "finite" or "EOF". When finite is specified, the body will repeat exactly count times. When "EOF" is specified, the body will keep repeating until it runs out of lines to interpret.

<outcomes> Section

The main purpose of an outcomes section is to describe how line values affect the outcome of a file comparison. Its children are <ignore> elements, each of which has a single attribute.

id specifies the id of a line whose values should not be used in computing the percentage match between files.

<comparisons> Section

As mentioned above, the comparisons section defines how specific line values are compared. The children of <comparisons> are <compare> elements. Comparisons are always used by the framework: if no comparisons section is specified, it generates one with the default operator of "equals" for all lines and line.name combinations.

  • id specifies the line id or id.name pair that the comparison will be applied to.
  • operator can either be "equals" or "finite". If the operator is finite, a tolerance formula can be applied to the values when determining whether they should be considered equal.
  • tolerance a formula to use when comparing the elements specified by id. The $ represents the dictionary of named values compiled for the line if its names attribute was specified. Only named values can be compared with finite tolerance comparisons.

Built-in Comparison Templates

Fortpy ships with some default templates for comparing straight-forward files.

  • integer.xml compares single integers, vectors and 2-D arrays.
  • float.xml compares single floats, vectors and 2-D arrays.