Skip to content

Commit

Permalink
Refactor code for finding ORFs and improve variable naming
Browse files Browse the repository at this point in the history
  • Loading branch information
camilogarciabotero committed Jul 8, 2024
1 parent db91d60 commit 6aeddf5
Showing 1 changed file with 11 additions and 5 deletions.
16 changes: 11 additions & 5 deletions docs/src/features.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,17 @@ As mentioned above the `lors` calculates the log odds ratio of the ORF sequence
Now we can even analyse how is the distribution of the ORFs' scores as a function of their lengths compared to random sequences.

```julia
lambda = fasta2bioseq("test/data/NC_001416.1.fasta")[1]
using FASTX, CairoMakie

lambaorfs = findorfs(lambda, finder=NaiveFinder, minlen=100, scheme=lors)
lambdafile = "test/data/NC_001416.1.fasta"

# read the lambda genome as a `BioSequence`
open(FASTA.Reader, lambdafile) do reader
lambdaseq = FASTX.sequence(LongDNA{4}, collect(reader)[1])
end

# find the ORFs in the lambda genome
lambaorfs = findorfs(lambdaseq, finder=NaiveFinder, minlen=100, scheme=lors)

lambdascores = score.(lambaorfs)
lambdalengths = length.(lambaorfs)
Expand All @@ -121,8 +129,6 @@ randlengths = length.(vseqs)
randscores = lors.(vseqs)

## plot the scores as a function of the lengths
using CairoMakie

f = Figure()
ax = Axis(f[1, 1], xlabel="Length", ylabel="Log-odds ratio (Bits)")

Expand Down Expand Up @@ -150,4 +156,4 @@ f

![](assets/lors-lambda.png)

What this plot shows is that the ORFs in the lambda genome have a higher scores than random sequences of the same length. The score is a measure of how likely a sequence given the coding model is compared to the non-coding model. In other words, the higher the score the more likely the sequence is coding. So, the plot shows that the ORFs in the lambda genome are more likely to be coding regions than random sequences. It also shows that the longer the ORF the higher the score, which is expected since longer ORFs are more likely to be coding regions than shorter ones.
What this plot shows is that the ORFs in the lambda genome have a higher scores than random sequences of the same length. The score is a measure of how likely a sequence given the coding model is compared to the non-coding model. In other words, the higher the score the more likely the sequence is coding. So, the plot shows that the ORFs in the lambda genome are more likely to be coding regions than random sequences. It also shows that the longer the ORF the higher the score, which is expected since longer ORFs are more likely to be coding regions than shorter ones.

0 comments on commit 6aeddf5

Please sign in to comment.