-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to generated reference genomes in json format #43
Comments
Dear @yazhinia Multi- vs single-sample benchmarkingAs you might have read in the BinBencher.jl paper, BinBencher (BB) allows you to benchmark with multiple samples, without correctness issues. However, if you still want to benchmark once per sample, that'll work well, too. Only for gold standard approach?BB works only when you have the actual ground truth. For assembled data, you will need to somehow learn the ground truth for the contigs. Of course, since any such learned truth will be incomplete, the benchmarking will be slightly inaccurate. If you have actual non-simulated contigs, then I'm skeptical BB will be any good. You could try phylogenetically placing your contigs with something like GTDB-tk, but I doubt the results would be particularly good, and certainly not good enough to function as the ground truth for benchmarking. How to actually create the
|
Dear author, Gold standard genomes are available but contigs binned are assembled by assembler Creating Reference |
Dear @yazhinia, BB handles zero-abundance genomes just fine. These will appear in the reference as normal genomes, but without any contigs assigned to them (hopefully!). |
Dear @jakobnissen , Thank you again. |
Dear @yazhinia I've now written up a first draft of the documentation for BinBencher: https://viralinstruction.com/BinBencherBackend.jl/dev/ |
Dear @jakobnissen For If these files are already generated for CAMI2 datasets and accessible for others, I can benefit from directly using them at the moment. |
The child name in |
Hello developers,
Thank you for developing a nice benchmarking tool. How to generate reference genomes in json format for BinBencher assessment? I couldn't understand from the documentation.
I wanted to use it to assess bins generated from multi-split binning. Contigs were generated from assembly of each sample and binning was performed on the concatenated set as you suggested in the VAMB paper. At the moment, I consider sample-wise assessment i.e., using only genomes that are present in a sample and bins obtained from that sample (splitted by sample after binning) for the assessment.
Is it applicable only for dataset where contigs were obtained through gold standard approach or is it also applicable for contigs obtained from any (meta-)genome assembler? If the later is yes, how do you get a mapped position for contigs in the bins and get genome-bin pairs? Would aligner like bowtie2, bwa mem, minimap2 be a recommended approach (though time-consuming)?
Thank you for your inputs.
The text was updated successfully, but these errors were encountered: