Skip to content

Commit

Permalink
Merge pull request #98 from jjmccollum/97-positive-proportionsproport…
Browse files Browse the repository at this point in the history
…ion-base

97 positive proportionsproportion base
  • Loading branch information
jjmccollum authored Jan 15, 2025
2 parents dbdaaa5 + fa4e6b2 commit abfa61b
Show file tree
Hide file tree
Showing 6 changed files with 323 additions and 21 deletions.
12 changes: 9 additions & 3 deletions docs/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -626,17 +626,23 @@ Collations can also be converted to tabular formats.
Within Python, the ``collation`` class's ``to_numpy`` method can be invoked to convert a collation to a NumPy ``array`` with rows for variant readings, columns for witnesses, and frequency values in the cells.
Where a witness has missing data at a variation, its frequencies for different readings at this unit can be split evenly over 1 using the ``split_missing`` argument; otherwise, the witness will have frequencies of 0 for all readings at that unit.
The same class's ``to_distance_matrix`` method produces a NumPy ``array`` with rows and columns for witnesses, where each cell contains the number of units where the row witness and column witness both have unambiguous readings and these readings disagree.
The cells can instead be populated with the proportion of disagreements to units where the row and column witnesses have readings with the ``proportion`` argument.
The cells can instead be populated with the proportion of disagreements among units where the row and column witnesses have readings with the ``proportion`` argument.
If you specify the ``show_ext`` argument as True, then each cell will be populated by the number or proportion of disagreements followed by the number of units where both witnesses have have unambiguous readings (e.g., 3/50 or 0.06/50).
The same class's ``to_similarity_matrix`` method produces a NumPy ``array`` with rows and columns for witnesses, where each cell contains the number of units where the row witness and column witness both have unambiguous readings and these readings agree.
The cells can instead be populated with the proportion of agreements among units where the row and column witnesses have readings with the ``proportion`` argument.
If you specify the ``show_ext`` argument as True, then each cell will be populated by the number or proportion of agreements followed by the number of units where both witnesses have have unambiguous readings (e.g., 47/50 or 0.94/50).
The same class's ``to_nexus_table`` method produces a NumPy ``array`` with rows for witnesses, columns for variation unit IDs, and attested reading IDs in the cells, resembling a NEXUS sequence.
By default, cells corresponding to ambiguous readings are written as space-separated sequences of readings between braces, but they can be written as missing states with the ``ambiguous_as_missing`` argument.
The same class's ``to_long_table`` method produces a NumPy ``array`` with columns for witness ID, variation unit ID, reading index, and reading text and rows for all combinations of these values found in the collation.
The ``to_dataframe`` method invokes ``to_numpy`` by default, but if the ``table_type`` argument is ``distance``, ``nexus`` or ``long``, then it will invoke ``to_distance_matrix``, ``to_nexus_table`` or ``to_long_table``, respectively.
It returns a Pandas ``DataFrame`` augmented with row and column labels (or, in the case of a long table, just column labels).

From the command line, the standard reading-witness matrix or long table can be written to a specified CSV, TSV, or Excel (.xlsx) file.
If you specify the output filename with its extension, ``teiphy`` will infer which format to use.
If you specify the output filename with its extension, ``teiphy`` will infer which format to use.
If you want to write a distance matrix, a similarity matrix, a NEXUS-style table, or a long table to output instead of a reading-witness matrix, then you can do so by specifying the ``--table distance``, ``--table similarity``, ``--table nexus``, or ``--table long`` command-line argument, respectively.
If you are writing a reading-witness matrix to output, you can set the method's ``split_missing`` argument using the ``--split-missing`` command-line flag.
If you want to write a distance matrix, a NEXUS-style table, or a long table to output instead of a reading-witness matrix, then you can do so by specifying the ``--table distance``, ``--table nexus``, or ``--table long`` command-line argument, respectively.
If you are writing a distance or similarity matrix to output, then you can set the method's ``proportion`` and ``show_ext`` arguments using using the ``--proportion`` and ``--show-ext`` command-line flags, respectively.
As with plain NEXUS outputs, if you are writing a NEXUS table to output, then you can set the method's ``ambiguous_as_missing`` argument using the ``--ambiguous-as-missing`` command-line flag.

Other Options
-------------
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "teiphy"
version = "0.1.18"
version = "0.1.19"
description = "Converts TEI XML collations to NEXUS and other formats"
authors = ["Joey McCollum and Robert Turnbull"]
license = "MIT"
Expand Down
Loading

0 comments on commit abfa61b

Please sign in to comment.