Skip to content

Latest commit

 

History

History
11 lines (8 loc) · 3.93 KB

output.rst

File metadata and controls

11 lines (8 loc) · 3.93 KB

Description and Output

Pling runs via the python script run_pling.py, which creates a config file and runs three snakemake workflows in succession. It starts by calculating containment distances and transforming nucleotide sequences into integer sequences (integerisation, details below), then calculates DCJ-Indel distances, and finally uses these to build a network and cluster on it. The outputs are:

  • Containment communities: Pling defines broad plasmid communities by building a containment network. In this network, nodes represent plasmids, and an edge is drawn if two plasmids fulfil the containment distance threshold. If plasmid A is smaller than plasmid B, then the containment distance between the two plasmids is the percentage of plasmid A that is not contained in plasmid B. A plasmid community corresponds exactly to a connected component in this network. Distances and plasmid to community assignments can be found in the folder containment.
  • Hub plasmids: Pling identifies "hub plasmids", which are plasmids that are densely connected on the containment network, but their neighbours are not interconnected. In practice, these are usually relatively small plasmids that consist mostly of a large mobile genetic element, which has spread across many diverse, unrelated plasmids. They are listed in dcj_thresh_4_graph/objects/hub_plasmids.csv
  • Integer sequences: Pling outputs the integer sequences used to calculate DCJ-Indel distances. These are outputted in UniMoG-format (see https://bibiserv.cebitec.uni-bielefeld.de/dcj?id=dcj_manual for description), and also a mapping of integer to sequence coordinates, or a mapping of integer to gene name, is provided. The integer sequences are calculated in batches, so there is a unimog and a map file per batch, all found in the folder unimogs.
  • DCJ-Indel subcommunities: Pling identifies plasmid subcommunities by constructing a DCJ-Indel network, and then clustering on this network. The DCJ-Indel network is a subnetwork of the containment network initially built. Edges are kept if a pair of plasmids fulfil the DCJ-Indel distance threshold. Hub plasmids are isolated in the DCJ-Indel network, with no edges connecting to them (even if they fulfil the DCJ-Indel threshold). On this network pling clusters using asynchronous label propagation community detection algorithm. Plasmid clusters are labelled by containment community and DCJ-Indel subcommunity, e.g. community_2_subcommunity_13, and plasmid to subcommunity assignments are found in dcj_thresh_4_graph/objects/typing.tsv. The DCJ-Indel distances are in file all_plasmid_distances.tsv (note that the DCJ-Indel distance is only calculated between plasmids which fulfil the containment threshold, so not all pairs of plasmids will be in the file).
  • Visualisations: Alongside the distances and clustering, pling outputs network visualisations to aid in further analysis. These can be useful to look if you want to spot interesting relationships between plasmids, e.g. plasmid fusions. They include visualisations of the full containment network and each containment network individually, found under containment/containment_communities/visualisations. All nodes are coloured black, but the edges are labelled by containment distance. Additionally, under dcj_thresh_4_graph/communities are visualisations of each plasmid community, where nodes are coloured by subcommunity assignment and edges are labelled with both containment distance and DCJ-Indel distance. Finally, under dcj_thresh_4_graph/subcommunities are visualisations of each plasmid subcommunity, where nodes all have the same colour, and edges are labelled by containment distance and DCJ-Indel distance. These subcommunity visualisations don't include edges that don't fulfil the DCJ-Indel threshold, but the others do. You can also optionally output json files of the networks, see Advanced Usage and General Advice for more information.