-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add IGDWriter, IGDFile->IGDReader, and add IGDTransformer (#5)
* Add IGD writing capability, IGDTransformer * IGDFile is now IGDReader, and takes an IO buffer instead of a filename. This is more flexible, and removes the need to have a context manager. IGDFile remains, but will be removed soon and should be avoided. * IGDWriter is new: create IGD files in Python. * IGDTransformer lets you take an input IGD buffer and modify the sample information, producing a new output IGD buffer. * Documentation and examples * Make flake8 happy, fix test dependencies * Bump version
- Loading branch information
Showing
15 changed files
with
777 additions
and
111 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Minimal makefile for Sphinx documentation | ||
# | ||
|
||
# You can set these variables from the command line, and also | ||
# from the environment for the first two. | ||
SPHINXOPTS ?= | ||
SPHINXBUILD ?= sphinx-build | ||
SOURCEDIR = . | ||
BUILDDIR = _build | ||
|
||
# Put it first so that "make" without argument is like "make help". | ||
help: | ||
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) | ||
|
||
.PHONY: help Makefile | ||
|
||
# Catch-all target: route all unknown targets to Sphinx using the new | ||
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). | ||
%: Makefile | ||
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# For the full list of built-in configuration values, see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Project information ----------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information | ||
|
||
project = 'pyigd' | ||
copyright = '2024, Drew DeHaas' | ||
author = 'Drew DeHaas' | ||
release = '0.2' | ||
|
||
# -- General configuration --------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration | ||
|
||
extensions = ["sphinx.ext.autodoc"] | ||
|
||
templates_path = ['_templates'] | ||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] | ||
|
||
|
||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output | ||
|
||
html_theme = 'sphinx_rtd_theme' | ||
html_static_path = ['_static'] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
.. pyigd documentation master file, created by | ||
sphinx-quickstart on Wed Aug 21 14:20:22 2024. | ||
You can adapt this file completely to your liking, but it should at least | ||
contain the root `toctree` directive. | ||
pyigd Documentation | ||
=================== | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Contents: | ||
|
||
pyigd is a Python library for reading and writing Indexable Genotype Data | ||
(IGD) files. `IGD is a binary format <https://github.com/aprilweilab/picovcf/blob/main/IGD.FORMAT.md>`_ that is very simple and uses no | ||
compression, but tends to be quite small and fast. | ||
|
||
IGDReader example | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Example of how to read an IGD file. See :py:class:`pyigd.IGDReader` for the relevant API reference. | ||
|
||
:: | ||
|
||
import pyigd | ||
import sys | ||
|
||
if len(sys.argv) < 2: | ||
print("Pass in IGD filename") | ||
exit(1) | ||
|
||
with open(sys.argv[1], "rb") as f: | ||
igd_file = pyigd.IGDReader(f) | ||
print(f"Version: {igd_file.version}") | ||
print(f"Ploidy: {igd_file.ploidy}") | ||
print(f"Variants: {igd_file.num_variants}") | ||
print(f"Individuals: {igd_file.num_individuals}") | ||
print(f"Source: {igd_file.source}") | ||
print(f"Description: {igd_file.description}") | ||
for variant_index in range(igd_file.num_variants): | ||
print(f"REF: {igd_file.get_ref_allele(variant_index)}, ALT: {igd_file.get_alt_allele(variant_index)}") | ||
position, is_missing, sample_list = igd_file.get_samples(variant_index) | ||
print( (position, is_missing, len(sample_list)) ) | ||
|
||
IGDWriter example | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Example of how to write an IGD file. See :py:class:`pyigd.IGDWriter` for the relevant API reference. Some of the methods invoked have a required order, and in general all the variants (sample info) needs to be written immediately after the header. The IGDWriter maintains some state, and for large datasets can use a non-trivial amount of RAM to maintain indexes while writing variant data. | ||
|
||
:: | ||
|
||
num_individuals = 100 | ||
with open("output.igd", "wb") as f: | ||
w = IGDWriter(f, num_individuals) | ||
w.write_header() | ||
w.write_variant(100, "A", "G", [1, 22, 99, 101]) | ||
w.write_variant(101, "A", "C", list(range(200))) | ||
w.write_index() | ||
w.write_variant_info() | ||
w.write_individual_ids([]) # optional | ||
w.write_variant_ids([]) # optional | ||
# We write the header twice because the IGDWriter is updating fields | ||
# on the header as we call the other methods above. For example, it is | ||
# counting the number of variants being written and that gets updated | ||
# in the header the second time we write it. | ||
f.seek(0) | ||
w.write_header() | ||
|
||
|
||
Indices and tables | ||
================== | ||
|
||
* :ref:`genindex` | ||
* :ref:`modindex` | ||
* :ref:`search` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
pyigd | ||
===== | ||
|
||
.. toctree:: | ||
:maxdepth: 4 | ||
|
||
pyigd |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
pyigd package | ||
============= | ||
|
||
Module contents | ||
--------------- | ||
|
||
.. automodule:: pyigd | ||
:members: | ||
:undoc-members: | ||
:show-inheritance: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import pyigd | ||
import sys | ||
|
||
if len(sys.argv) < 3: | ||
print("Pass in IGD input/output filenames") | ||
exit(1) | ||
|
||
# Our own IGDTransformer. This just deletes the first sample from every variant, but | ||
# in general you can change the sample list any way you want. Return None to delete | ||
# the variant entirely. | ||
class MyXformer(pyigd.IGDTransformer): | ||
def modify_samples(self, position, is_missing, samples): | ||
return samples[1:] | ||
|
||
with open(sys.argv[1], "rb") as fin, open(sys.argv[2], "wb") as fout: | ||
xformer = MyXformer(fin, fout, use_bitvectors=False) | ||
xformer.transform() | ||
print(f"Copied {sys.argv[1]} to {sys.argv[2]} and deleted the first sample from every variant.") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
import pyigd | ||
import sys | ||
|
||
if len(sys.argv) < 3: | ||
print("Pass in IGD input/output filenames") | ||
exit(1) | ||
|
||
# Our own IGDTransformer. This just deletes the first sample from every variant, but | ||
# in general you can change the sample list any way you want. Return None to delete | ||
# the variant entirely. | ||
class MyXformer(pyigd.IGDTransformer): | ||
def modify_samples(self, position, is_missing, samples): | ||
# BitVector version. The ith element being 1 means the ith sample has the variant. | ||
for i in range(len(samples)): | ||
if samples[i]: | ||
samples[i] = 0 | ||
return samples | ||
|
||
with open(sys.argv[1], "rb") as fin, open(sys.argv[2], "wb") as fout: | ||
xformer = MyXformer(fin, fout, use_bitvectors=True) | ||
xformer.transform() | ||
print(f"Copied {sys.argv[1]} to {sys.argv[2]} and deleted the first sample from every variant.") |
Oops, something went wrong.