This is a slightly modified version of the PacBio repository to allow more customisation, specifically to make it easier to run the FALCON assembler on the Wellcome Trust Sanger Institute's compute cluster.
git clone
cd FALCON-integrate
git submodule update --init
I suggest you install the Python code under your own Python installation: e.g.
~/programs/Python-2.7.12/python install --user
To run the FALCON code you will also need to compile the modified DALIGNER code and have the folder in your PATH:
cd ../DAZZ_DB
The FALCON_unzip pipeline here is in its original PacBio version - needs some modification to get it run at Sanger I have the pipeline running but it is not public on gitHub; please contact me directly (see below)
This fork is maintained by Milan Malinsky Any issues with getting FALCON work at Sanger, please let me know at [email protected]
Some more information is on the PacBio wiki.
In case you are unfamiliar with git-submodules, they are quite easy to use from the command-line:
git submodule update --init
If that fails, you can update your git, or try this:
git submodule init
git submodule update
which is almost the same thing.
these are just examples of parameters that worked reasonably well for one particular assembly; much more info on parameter tuning is here:
for i in {1..N}; do bsub -G yourGroup -o correct_o_%J -e correct_e_%J -R'select[mem>15000] rusage[mem=15000]' -M15000 sh -c 'LA4Falcon -H 5000 -fo DATABASE_patched.db DATABASE_patched.${1}.las | --output_multi --n_core 0 --min_cov 6 --max_cov_aln 60 --max_n_read 200 > DATABASE_patched.${1}.corrected_max_cov_aln60_multi.fasta' ec $i; done
for i in {1..N}; do echo $i; awk 'BEGIN{ FS ="_"; printThis = 0;}{ if (substr($1,1,1) == ">") { if ($2 > 7000) { printThis = 1; print;} else {printThis = 0;}} else { if (printThis == 1) {print;}} }' DATABASE_patched.${i}.corrected_max_cov_aln60_multi.fasta > DATABASE_patched.${i}.corrected_max_cov_aln60_multi_min7kb.fasta
/path/to/FALCON-integrate/ DATABASE_patched.${i}.corrected_max_cov_aln60_multi_min7kb.fasta
geting new .las files
for i in {1..N}; do echo DATABASE_patched.corrected_max_cov_aln60_multi_min7kb.${i}.las >> fofn.txt; done
bsub -G yourGroup -o overlapFilter_o_%J -e overlapFilter_e_%J -R'select[mem>25000] rusage[mem=25000]' -M25000 sh -c ' --db DATABASE_patched.corrected_max_cov_aln60_multi_min7kb.db --fofn fofn.txt --n_core 0 --min_cov 10 --max_cov 120 --bestn 10 --max_diff 90 > filtered_overlaps_DATABASE_patched.corrected_max_cov_aln60_multi_min7kb.ovlp'
for minl in 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000; do bsub -G yourGroup -o graphBuild_o_%J -e graphBuild_e_%J -R'select[mem>16000] rusage[mem=16000]' -M16000 sh -c ' --min_len $1 --params_fn minl_${1} filtered_overlaps_DATABASE_patched.corrected_max_cov_aln60_multi_min7kb.ovlp; --run_name minl_${1} DATABASE_patched.corrected_max_cov_aln60_multi_min7kb_renumbered_onlyReadID.fasta' assemle $minl; done