- Fixed issue with plotly rendering in R-shiny context, it
was throwing an error
"TypeError: postRenderHandlers is undefined"
. Upon deeper inspection, theuiOutput()
element being used to display both ggplot and ggplotly output, was somehow not providing the plotly javascript dependencies to include relevantpostRenderHandlers
. The workaround was to add an emptyplotlyOutput("plotly_blank")
to the UI, which appears to cause the required javascript dependencies to be loaded. Stackoverflow threads and Github issues suggest there exists a method to load dependencies when embedding htmlwidgets inside tagList context -- for example when embedding a custom htmlwidget inside a table cell. Perhaps the proper remedy is related.
- Fixed rare error in
annotateGRfromGR()
with numeric columns, which failed to initialize a new GRanges column with numeric(0) instead of using a numeric form of NA, which was accomplished withc(0, NA)[2]
.
- Changed
plotSashimi()
argument defaultylab="read depth"
, previously it was"score"
. In future, normalized data might useylab="normalized read depth"
but that would be a custom option for the analyst. plotSashimi()
arguments"xlabel"
and"xlabel_ref"
control the x-axis label, which by defaultxlabel_ref=TRUE
will use the reference (chromosome) as the x-axis label, which makes sense because the axis shows chromosome coordinates. The"xlabel"
argument is intended to allow a fully custom label.
- Fixed bug in
compressPolyM()
edge case, polygons with only two x values and y values all zero; threw an errorError in x[idx1, ] : only 0's may be mixed with negative subscripts
. New rule requires at least 5 x-axis values, otherwise compression doesn't seem necessary anyway.
- New argument
compress_introns
added toexoncov2polygon()
andprepareSashimi()
, which determines whether to callcompressPolygonM()
when creating polygon coordiantes for coverage data. Settingcompress_introns=FALSE
saves about 20% of the time it takes to prepare sashimi data. - Changed default
compressPolyM()
argument fromminRatio=3
tominRatio=5
, the threshold of compression above which polygon coordinates are adjusted. - Changed
sashimiAppServer()
to create the memoise version ofprepareSashimi()
only once, instead of re-creating it each time. Probably no noticeable effect, but it feels better.
- New README.Rmd file with a visual example and description
of a sashimi plot. New small data object
"demodata"
which contains the minimum required to produce the Gria1 plot. - Added splicejam hexsticker with an embedded sashimi plot. An R package isn't an R package without a hexsticker.
- Slight enhancement to
prepareSashimi()
to change the"name"
column to a factor, in order to force the drawing order of junction arcs, so junctions are drawn in order of most dominant, to least dominant, then shorter spans to longer spans. Smaller score, wider junctions are drawn last, since they are typically the most difficult to see otherwise. "Dominant"" refers to thejunction_rank
calculated by having the highest score among junctions that start or end at the same coordinate:junction_rank=3
means the junction has the highest score on both sides, and has the darkest color;junction_rank=2
has the highest score on one side but not both sides, and is lighter in color; andjunction_rank=1
does not have the highest score on either side of the junction, and has the lightest color. Thejunction_rank=1
entries are now drawn last, since they typically have the lowest scores, and are small and thus easily obscured.
splicejam-sashimi-server()
was updated to handle changes in the gene search field properly, avoiding edge cases with un-detected genes that have no sashimi data.prepareSashimi()
was updated to handle absence of splice junction data, in a more robust way.spliceGR2junctionDF()
was updated to handle missing junction data, ultimately returning NULL which is easier handle consistently by other downstream functions. Also made small update to handlesample_id
as a factor when input data is supplied only as a GRanges object.
bgaPlotly3d()
new argumentsampleGroups
allows data to be re-grouped to define custom group centroids. Main driver is when running BGA on un-grouped data, for example where each replicate is its own group. UsingsampleGroups
allows the proper sample group to be highlighted on the plot. This scenario is mainly only advised when needing to run standard PCA or COA, or when there are fewer than four groups, since BGA producesn-1
dimensions. Code was also slightly update to prepare for broader use outside of BGA.bgaPlotly3d()
was updated to handle the"textposition"
argument to plotly labels, allowing individual adjustment of each centroid and sample label.bgaPlotly3d()
new argumentgeneLabels
allows replacing gene identifier with a custom label, for example replacing assay ID with gene symbol.gene2gg()
updated to maintain a minimum y-axis range whenlabelExons=FALSE
. Previously the exons could be cropped.
bgaPlotly3d()
bug fixed which prevented display of scaled coordinates from thebgaInfo
object, mainly because the scaled coordinates are not supplied for individual samples. Instead the scaling factor (adjustment used to convert centroid coordinates from raw to scaled) is applied to sample coordinates to produce equivalent scaled coordinates. That said, I recommend using un-scaled coordinates, which keeps the x, y, and z axis scales proportional to their relative contribution, which visually reinforces the relative strength of separation in each dimension.
- Fixed regression in
stat_diagonal_wide_arc()
which callsggforce::StatBezier
. Version 3.1.0 of ggforce changed the required call to that function, breaking the dependency and causing junction arcs to be invisible as a result. The new interface mimics howggforce::stat_diagonal_wide()
callsggforce::StatBezier
. I hope the interface does not change in future. - Added package version dependency for
ggforce
version 0.3.1, which is the point where the API changed withggforce::StatBezier
. - Fixed vignettes which assumed optional R packages were installed,
for example
org.Mm.eg.db
. Now if the package is not available, it skips the corresponding sections.
- Updated the
"Guides"
tab of the R-shiny Sashimi app. It now also prints the R version, and the package versions of jam packages. - Updated
DESCRIPTION
to include higher specific version numbers for several packages, to force an update if needed. - Updated vignette to remove hard dependency on the
tximportData
package, making it optional. This change prevents the 250 MB package from being required only for the example vignette. In future, a random generated expression data matrix may be created.
- Fixed small bug in
getGRcoverageFromBw()
that used default memoise path"memoise_coverage"
instead of"coverage_memoise"
as with other functions. Only affected directly callinggetGRcoverageFromBw()
since other functions passed the directory as an argument, and thus probably only me.
- The default plotly does not enable highlighting. It is currently too slow and laggy, subject of future optimization.
- Font sizing is reworked, to start with 14 point font and
add/subtract from that font size. It converts to ggplot2
convention of
"mm"
units, since it appears to ignore grid"pt"
units. The bast plot font size can be adjusted, affecting everything including exon labels. The exon labels can separately be adjusted relative to the base font size. - The coordinate label on the R-shiny app includes the gene, to make it clear when a new gene search results in a new coordinate range.
- The
scale_factor
values are applied outside the memoise cache strategy, which should allow modifying thescale_factor
values without invalidating the cache. It seems correct to cache the unadjusted data, then separately apply any adjustment. - The
"farrisdata"
package was added to the"Guides"
tab which lists the versions of relevant packages. - The plot config side menu icon was changed from
"wrench"
to"gear"
, thanks to @DivadNojnarg of the"RinteRface/shinydashboardPlus"
package on Github! - Updated R package dependencies for minimum versions for
"shinydashboardPlus"
and"farrisdata"
.
- In near future,
sashimiAppConstants()
may become the standard way to prepare the dependencies for sashimi plots, including the gene-transcript-exon models, the filesDFdata.frame
, the flattened exons, and color substitutions. This function will be able to take a GTF file, and prepare all the tx2geneDF cross-references, the exon and CDS exons, the flattened exons, usingdetectedTx
if available for enhanced exon structures. - Package dependency on memoise, using Github version
1.1.0.9000
which allows invalidating a single problem cache file. It has been available since October 2018 but not released to CRAN yet. This changes is a step toward recovering from network outages, which currently create empty cache files which cannot be recovered without emptying the entire cache.
getGRcoverageFromBw()
now checks if memoise cached data isNULL
, which happens during network outage. If the data returned from the cache isNULL
it clears the cache for that key, then tries again.import_juncs_from_bed()
now checks memoise cached data forNULL
as described above.sashimiAppConstants()
was refactored, to define variables in a systematic way, by default updating global environment for convenience, but optionally creating a new environment inside which the required variables are populated. In future this technique may become the default here along with downstream functions, which may reference the environment rather than data objects themselves.
- Updated
groups2contrasts()
to detect when any factor level contains"-"
, and converts the"-"
to"."
prior to detecting contrasts. Previously the"-"
interfered with proper contrast names, and resulted in some contrasts not being returned by this function. Note that this function avoids usingbase::make.names()
because that function aggressively converts other valid characters to"."
. However, if downstream functions requirebase::make.names()
compliant names, run that function prior to callinggroups2contrasts()
. sortSamples()
was updated to match"wildtype"
as a control term.groups2contrasts()
was updated to check for specific scenarios where iFactors and iSamples may be missing, to cover a variety of common scenarios. Error messages were updated to be specific to the expected input.flattenExonsBy()
removed the...
dots argument, to try to appease the memoise caching logic, which is suspected to be comparing the...
data when invalidating the cached files. Ironically, this change itself will invalidate existing cached files.internal_junc_score()
was updated to handle data with nosampleColname
column, which fixed an issue with the example code.stackJunctions()
was updated to handle multiple sample_id in the same operation, keeping each sample_id independent during stacking. This update should fix the edge cases where junctions appeared to be improperly stacked.to_basic.GeomShape()
is now exported, since it was causing problems during the call toplotly::ggplotly()
when convertingggforce::geom_shape()
to plotly format. Somehow it worked for the Sashimi coverageggforce::geom_shape()
but not for gene exonggforce::geom_shape()
. I am not always here to understand fully, but to make things work.
prepareSashimi()
now returns list item"df"
which contains the merged coordinates for coverage, junctions, and labels. Thisdata.frame
overrides the need forplotSashimi()
to merge this data on the fly, and takes some extra logic away regarding junction rank, label positions, etc. It also returns"ref2c"
as an attribute to the"df"
, to make sure it is available even when requesting only the"df"
format. The"ref2c"
contains the functions used to compress the introns on the x-axis scale.plotSashimi()
now expects the inputsashimi
object to contain list element named"df"
to include the full coordinatedata.frame
used for plotting. Note thatplotSashimi()
will now require output fromprepareSashimi()
in version 46 or higher. Fortunately, memoise already invalidates the cache for even the slightest change in any molecule in the world, so caching should not be problematic.
- Made Sashimi plot the default tab. Fixed regression caused by page load with partially initialized input values.
- Added
aboutExtra
as optional text to describe the data used in the R-shiny app. - Added R package versions to the Guides tab.
- Added option to show legend in plotly interactive plots, which allows for some interesting filtering options, like hiding coverage, or junctions, or subsets of junctions based on predominance (.1 is minor, .2 is mixed, and .3 is major predominance.) Junction ranks are based upon having the highest score at each junction end, typically the predominant junction represents the predominant transcript isoform.
plotSashimi()
new argumentjunc_alpha
to control the alpha transparency of junctions, to allow transparency in cases where the intron coverage may be obscured by the junction arc.- Increased default minimum junction arc height from 100 to 200, the effect should be slightly higher junction arcs. In future, will consider inspecting intervening coverage max height as perhaps better estimate of a minimum junction arc, plus some buffer height.
- Fixed regression in R-shiny, updating the progress bar using gene in the caption, but the function did not need to know the gene.
- Improved overall handling of reactive gene, sample_id values.
plotSashimi()
andgene2gg()
have argumentlabel_coords
for optional x-axis range to subset labels before displaying withggrepel::geom_text_repel()
. This change solves the issue where zooming the x-axis range withcoord_cartesian()
kept all labels which were then arranged at the edges of the plot. This change also necessitates re-creating the ggplot object, since it has to change the label coordinate data.- Fixed some legacy code that assigned
gene
andsample_id
to the global environment. - Removed the
"Samples and Data"
tab from R-shiny for now. - Improved level of detail in shiny progress bars, now shows each file being loaded. Not sure it is actually better than having less detail.
- The order of
sample_id
items in R-shiny"Sample Selection"
is now maintained, allowing custom sorting of samples. - Fixed error when junctions had no intervening junctions to use when determining the minimum height for a junction arc.
- Fixed errors when part of the Sashimi plot data did not exist, for example no junction data or no coverage data.
prepareSashimiData()
argumentuse_memoise
enables memoise caching of individual bigWig files per gene, and individual junction files per gene. The file paths are defined bymemoise_coverage_path
andmemoise_junction_path
. These options should greatly enhance the success rate of caching, at the expense of creating more cache files.getGRcoverageFromBw()
now has argumentsuse_memoise
andmemoise_coverage_path
to cache at the level of each bigWig file and genomic range.- R-shiny by default caches sashimi data, so repeated calls for the
same
gene
and same set ofsample_id
will retrieve the full cached sashimi data. However, if any smaller argument changes, including changes to any bigWig coverage or junction BED file, each individual step is also cached to prevent retrieving the same coverage or junction data repeatedly. For practical R-shiny usage, wheresample_id
is frequently changed, this update should be a substantial improvement. - Changed default R-shiny to non-interactive. One day will switch it back, but need plotly ninja skills meanwhile.
- Changed
sashimiAppConstants()
to improve handling of memoise flatExonsByTx data. - Updated the Guides tab with a description of Sashimi plots, and a how-to for creating a Sashimi plot.
- Added
shinyjs
package dependency.
- Fixed regression in junction scale factors, not consistently applied
in
prepareSashimi()
.
- Dependency added for
shinyjqui
to useorderInput()
for drag-and-drop selection and ordering ofsample_id
values. - Dependency added for
shinycssloaders
.
- New tab "Sample Selection" which focuses on selecting and ordering
the unique sample_id entries found in
filesDF
. - "Samples and Data" tab allows editing columns, and will
include updated
"scale_factor"
values in normalizing coverage and junction scores. - "Samples and Data" tab uses
color_sub
to colorize the table, and will create colors for any undefined values in the"sample_id"
column. Other columns are colorized when all values matchnames(color_sub)
otherwise are not colorized. Columns are arranged so any extra columns (such as group, subgroup, batch, etc.) would appear besidesample_id
, and typically would be colorized also usingcolor_sub
. - Added spinning loader indicator to the plot panel, which covers the time after data is prepared, but before ggplot has created the actual visualization.
- The "Sample Selection" tab now allows setting the number of plot panel columns. However, it breaks the synchrony with the gene-exon model, since the gene panel is still full-width. Already thinking of alternatives.
- Plot settings like panel height, font relative sizing, and exon label size (for non-interactive) are available.
- Added panel height to R-shiny UI, controlling the allocated height for each facet panel.
- Numerous updates to the plotly highlighting methods.
- Reorganized the R-shiny UI, still in progress.
- Added R-shiny tab "Sample and Data" intended to customize and select sample_id entries to display or hide in the Sashimi plots. Purely aesthetic and non-functional, still testing out the many javascript table options available.
- Fixed regression in use of
color_sub
to pre-define categorical colors in sashimi plots. - Updated plotly highlighting, increasing the success rate in most test cases.
grl2df()
now returns seqnames, and adjusts yBaseline within each seqname (chromosome). To plot multi-chromosome GRangesList, use+facet_wrap(~seqnames)
- R-shiny enabled some plotly highlighting features, currently in testing phases.
- Major refactor of
plotSashimi()
to merge together the junction, coverage, and label coordinate data.frames to enable plotly and crosstalk to highlight features. The end result looks amazing.
- R-shiny new options: minimum junctions; show by strand.
- R-shiny now defines each to the global environment if they do not already exists: exonsByTx, cdsByTx, flatExonsByGene, flatExonsByTx, tx2geneDF, detectedTx, detectedGenes. This mechanism is used for now, both to help define custom input data, but also to help define the data values needed to produce plots manually outside R-shiny.
stackJunctions()
now also returns "junction_rank" scored from 1 to 3, where 1=minor, 2=partial, 3=major, based upon the rank of junctions entering and leaving exons.- Junctions are drawn in order from dominant to minor,
which has the effect of ensuring smaller junctions are
displayed. The
junc_fill
is converted to gradient so a darker color is used for dominant junctions, to try to highlight the major isoform splice junctions per sample. Unfortunately, plotly does not honor the polygon render order (yet).
- Added
flatExonsByTx
to R-shiny app, to be able to display transcript isoform exon structures alongside the flattened gene-exon model. - Added R-shiny option to display transcript isoforms alongside the gene-exon model. Also option to show all or detected transcripts.
- Splice junctions now require at least one junction end within the range being displayed, to avoid junctions that do not connect with the visible gene model.
- The x-axis range of multiple plots are more consistently controlled, to avoid one panel autoscaling to display a wider junction than other panels.
- Plotly views contain custom tooltip text, using the sample_id, the name of the exon or feature, and the track (referring to the name of the coverage).
- New ggplot2 geom
geom_diagonal_wide_arc()
, extendsggforce::geom_diagonal_wide()
by connecting the left and right halves of a diagonal into one continuous ribbon arc.
plotSashimi()
now usesgeom_diagonal_wide_arc()
to display junction arcs instead of using twoggforce::geom_diagonal_wide()
. For now,prepareSashimi()
still provides coordinates to create two polygons, which are combined in the geom. Still todo: figure out how to add a label; and make the middle x position immune to x-axis rescaling.
- R-shiny app now properly keeps interactive plot x-axis ranges in sync, when zooming the plot.
- Fixed bug in
prepareSashimi()
that occurred when no junctions overlapped another, resulting in error "invalid 'type' (S4) of argument".
defineDetectedTx()
has a new argumentzeroAsNA
to handle the special case where some expression values reported as zero should be treated asNA
(no data obtained) and therefore will not be included in group mean calculations. As transcript-exon models get more "comprehensive", kmer quantitation tools such as Salmon and Kallisto sometimes need to assign abundance to one of several nearly identical isoforms, and in low count scenarios all counts may be assigned to one or another isoform, leaving zero in the alternate position. Excluding zero ensures that group mean values represent only the assigned quantitation.
- Packages now imported for
launchSashimiApp()
:shiny
,shinydashboard
,shinydashboardPlus
,shinyWidgets
. getGRcoverageFromBw()
now returns NULL when a BigWig file is not accessible. Previously any error caused"Error in seqinfo(con) : UCSC library operation failed"
which could mean the file does not exist, or any number of other validation checks failed. SincegetGRcoverageFromBw()
used a vector of BigWig files, any error caused the entire set to fail.getFirstStrandedFromGRL()
added package prefix to the use ofIRanges::heads()
which was not imported directly.
- R-shiny plots now set the plot height for ggplot2 or plotly, depending upon the number of samples, and presence of gene-exon model. Future versions will allow R-shiny user customization.
bgaPlotly3d()
removed some unnecessary print functions.
- This version is an incremental update, while we evaluate some display options of Sashimi plots in the R-shiny app.
- Minor aesthetic changes to R-shiny app, including plotly interactive options, and evaluating some UI elements in sidebar.
- Added option to display gene-exon model in R-shiny.
makeTx2geneFromGtf()
help docs recommend installing theR.utils
package when.gz
files are required for import, since this process usesdata.table::fread()
.- Package dependencies were added for shiny, shinydashboard.
to_basic.GeomShape()
is a hidden function intended only to provide basic support ofggforce::geom_shape()
when converting ggplot objects to plotly.
- R-shiny Sashimi app now allows zooming into coordinate ranges, defined per gene.
- First working Sashimi R-shiny app, still has issues with
certain genes that needs debugging. Defaults to using
Farris et al data from
farrisdata
package, but can be overridden with global environment variables.
- Initial R-shiny
launchSashimiApp()
function. - Suggests the
farrisdata
package for example data used for the Farris et all RNA-seq mouse hippocampus manuscript.
- Fixed examples that did not explicitly call library() for required libraries.
- Fixed error in
codonUsage2df()
example data file path.
- Added specific TODO items.
- Added two function categories:
"jam ALE-specific RNA-seq functions"
, and"jam codon-usage RNA-seq functions"
to help organize the large list of accessory functions.
- Minor fixes to examples.
- Added data
test_exon_gr
,test_junc_gr
,test_cov_gr
for easy example data for exons, junctions, and coverage, respectively. - Added "wide" variants of the above test data, with introns about 100x larger than exons, consistent with mammalian gene structures.
- Added examples to each data, showing how one would use the raw data to generate different visualizations used in Sashimi plots.
- Added sample data to the vignettes and some examples.
- Renamed
flattenExonsByGene()
toflattenExonsBy()
to reflect that the function handlesby="gene"
andby="tx"
.
- Added
create-a-sashimi-plot.Rmd
which walks through a full example showing how to create a Sashimi plot. - Added examples to
stackJunctions()
with schematics, including example on plotting junctions by themselves.
- Removed
compressGRgaps()
for now, since the methods now try to keep GRanges intact, and instead transform the x-axis scale to compress visible gaps.
bgaPlotly3d()
now properly handles ellipsoid colors, previously the colors were assigned but not honored byplotly::add_trace()
.combineGRcoverage()
determines strandedness by requiring all values to be at or below zero, with at least one negative value. Otherwise, data can have position and negative values and will be considered positive stranded.plotSashimi()
replacesjamSashimi()
because it is just more intuitive... Ah well.- Fixed small typo in plotSashimi.Rd help that included an unmatched quote.
jamSashimi()
is used to plot Sashimi data prepared byprepareSashimi()
in order to separate the download and preparation of Sashimi data from the visualization.
prepareSashimi()
is refactored to remove the plot functions, moving them to the newjamSashimi()
.flattenExonsByGene()
can now handle Tx data, mainly useful to add CDS regions to existing exon models.gene2gg()
is more robust to edge input cases.
prepareSashimi()
is a workhorse function that prepares several types of Sashimi-like data output, including ggplot2-based Sashimi plots. This plot uses vanilla ggplot2 with custom x-axis scaling to compress genomic coordinates, while keeping data in proper genomic coordinate space.gene2gg()
is a lightweight function that creates gene and transcript exons models, and returns a ggplot2 object for plotting. It can optionally return the data.frame used by ggplot2.grl2df()
is similar tofortify()
from ggplot2, or thebroom::tidy()
functions, that take a custom object and return a data.frame for downstream use. Here the function currently works with "rectangle" data (like exons, introns, peaks, etc.), and "junction" (like junction regions to be represented byggforce::geom_diagonal_wide()
). In future it may also handle sequence coverage polygons.exoncov2polygon()
converts a GRanges object containing coverage in the form of NumericList for each exon or intron, into a data.frame suitable for use bygeom_polygon()
orggforce::geom_shape()
.getGRcoverageFromBw()
loads bigWig coverage data for a GRanges of exons and introns, returning a GRanges object with columns representing each sample_id. It callscombineGRcoverage()
to combine coverages by strand within eachsample_id
.flattenExonsByGene()
is intended to provide a set of unique exons per gene, using all or only detected transcript exon models. It numbers exons by contiguous segment, then labels subsections of each exon with a letter suffix, for example "exon1", "exon2a", "exon2b", "exon3". Lastly, it can annotate exons as CDS or non-CDS, if given a set ofcdsByTx
data.make_ref2compressed()
takes a GRanges object of exons (or any useful feature like ChIP-seq peaks, etc) and returns functions needed to compress x-axis coordinates, in order to shrink gaps/introns to a fixed width, suitable for use by ggplot2.spliceGR2junctionDF()
takes junctions GRanges data, and flatExonsByGene, and summarizes and annotates each junction by the gene_exon connected, and combines scores when multiple junctions are within "spliceBuffer" distance of an exon boundary, by default 3 bases. It can accept a GRanges object that was derived from multiple sample_id, and will return data summarized within each sample_id. It callsstackJunctions()
to combine junctions per replicate.simplifyXY()
takes a vector of coordinates, and simplifies them to the minimal set of points to represent the polygon. For sequence coverage data, that process can reduce individual points by 90%, but it works with any coordinate data. When two or more consecutive line segments have identical angle (with non-zero distance), they are combined using only the first and last points.getGRgaps()
,getGRLgaps()
,addGRgaps()
,addGRLgaps()
are functions that take GRanges input, and return either just the gaps between non-overlapping regions, or the original features with gaps inserted between them. Useful to convert a set of exons, to a set of exons and introns.
- Added package dependencies to ggplot2, ggforce, ggrepel
- Need a way of annotating GRangesList gaps by the parent feature name.
- In future, allow samples to have replicates, optionally allow each
replicate to be independently plotted, while being part of the
parent
sample_id
. - Allow alternative input for
prepareSashimi()
, namely BAM alignment files, from which coverage and junctions can be derived. - Consider making
ref2c
chromosome-wide, however it would require x-axis coordinates to know their context, in terms of chromosome/seqname. - Multiple vignettes to demonstrate workflows: Sashimi plots; preparing
exon GRangesList; using
annotateGRLfromGRL()
to add annotations, etc.
curateVtoDF()
,curateDFtoDF()
are data curation functions to curate a vector, or a data.frame, into a data.frame with consistent, usable nomenclature for downstream analysis. They use a flexible yaml format that should help automate analysis pipelines that start with raw data file import.exoncov2polygon()
andcompressPolygonM()
are basic functions for sashimi plots, efficiently converting exon/intron coverages to multi-polygons, then optionally compressing introns and reducing the information content to roughly similar resolution as uncompressed regions.spliceGR2junctionDF()
is the summary function to convert a set of splice junction read counts to gene-annotated junctions, grouped and summed where needed.closestExonToJunctions()
called byspliceGR2junctionDF()
is used to annotate junction ends near compatible exon boundaries, with some buffer distance allowed to "snap" junctions to the edge.
- Fixed issue where
numTxs
was not getting populated inrunDiffSplice()
, otherwise the stats summary is not changed.
- Added "openxlsx" to Suggests, for exporting Rmarkdown stats tables to Excel format.
runDiffSplice()
includes examples usinggroups2contrasts()
, also allowstxColname,geneColname
to be defined and therefore custom.- Changed all
verbose=TRUE
toverbose=FALSE
by default. - Changed
makeTx2geneFromGtf()
to usedata.table::fread()
ability to uncompress .gz files, hopefully making it cross-platform. groups2contrasts()
was updated to handle single-factor experiments, and be more robust to two-factor experiments with missing groups in the full design table.
- Added basic RNA-seq workflow to vignettes.
runDiffSplice()
is a wrapped aroundlimma::diffSplice()
intended to capture several steps of pre- and post-processing, optionally applyinglimma::voom()
.groups2contrasts()
takes a vector or data.frame of sample groups, and determines the relevant pairwise and two-way contrasts, returning a design matrix and contrast matrix.sortSamples()
sorts biological samples so that known patterns of control group terms are returned before non-control groups, in order to help provide useful defaults when setting up sample group contrasts.strsplitOrdered()
providesbase::strsplit()
but returns factor output whose factor levels are influenced either by factor input, or by other arguments.
limma
package was added to Imports.runDiffSplice()
has an argumentspliceTest
which definestest
when callinglimma::topSplice()
. Only the "t" (t-test) returns fold change, therefore hits are not filtered by fold change for "F" or "simes".
ale2violin()
takes output fromtx2ale()
with some arguments, and produces a ggplot2 violin plot object, as well as the underlying data. It allows a custom filtering function, which allows filtering gene lists for relevant regions of expression.
shrinkMatrix()
minor fix to remove the shadowx
in the resulting colnames.tx2ale()
modified to be tolerant of NA values in the expression matrix.
sortGRL()
sorts GRangesList objects by chromosome and position.getFirstStrandedFromGRL()
return the first GRanges feature per GRangesList, ordered properly by strand.annotateGRfromGR()
which annotates one GRanges object based upon overlaps with another GRanges object.annotateGRLfromGRL()
which annotates one GRangesList object based upon overlaps only with the same index entries in a second GRangesList object.findOverlapsGRL()
runsGenomicRanges::findOverlaps()
for the case of two GRangesList objects, matching at the GRanges level but restricting overlaps to those matching the same GRangesList index.assignGRLexonNames()
assigns exon names and numbers to a disjoint set of exons per gene model.
- Added "S4Vectors" to package imports, since some functions like
lengths()
have been moved there. Another reason using a package prefix is clunky at best. If methods have dispatch, and if they did not conflict between S3 and S4 methods, that would be the better strategy. It is not a solution to hardcode package names into function bodies, since every package maintainer has to stay updated on the source package of all other dependent functions. Those details should be irrelevant to other package maintainers.
- The "arules" package was moved to "Imports" since it is required
for the
list2im()
function. A slower workaround could be written, but ultimately the arules package is preferred. - Configured the package to use pkgdown for function references.
- Added "#' @imports jamba" to import all jamba R functions.
factor2label()
will convert a factor to a factor label, with the same order of levels as the input factor, but including summary stats like the number of items for each factor level. Useful for ggplot2 visualizations, to include counts in the color legend for example.
- Added reference file "Mouse_codon_usage.txt" to the "extdata"
folder, as example input both for farrisSeq.Rmd, and for other
species. Access the file path with this command:
system.file("extdata", "Mouse_codon_usage.txt", package="farrisdata")
codonUsage2df()
imports a text codon usage file, and returns a data.frame. Support functiondna2codon()
simply takes a character vector and returns vector whose elements all have three characters, e.g. all(nchar(x) == 3).- Two geometric mean functions:
jamGeomean()
is preferred by the Jam packages, butgeomean()
is provided for direct comparison to the classical approach. detectedTxInfo()
summarizes the data used to define detected transcripts for a given gene, or for a given set of transcripts.
- Minor fix to
makeTx2geneFromGtf()
to remove requirement forcol.names
during import. - Several minor documentation updates.
- Updated the DESCRIPTION file to include proper "Remotes" entries and depencies for jamba, colorjam, and jamma packages.
- Fixed issue where the
data.table
package required a specific#' @import data.table
entry in the roxygen2 entry for theshrinkMatrix
function. This issue preventeddefineDetectedTx()
from working within an R package.
defineDetectedTx()
is a core RNA-seq function to determine which transcript isoforms are considered "detected" based upon counts (or pseudocounts), TPM values if available, and the relative abundance of isoforms per gene. When "everything with over 10 counts" is not sufficient.tx2ale()
takes a set of transcript exon models, and returns a set of alternate 3'UTR regions, similar to ALE (alternative last exon.) It differs slightly from other methods in that it aggregates transcript quantities by shared ALE regions, producing a matrix based upon unique ALEs. It does not use counts in the ALE region itself, but the aggregate abundance of all isoforms that contain each ALE. This method allows tools like Salmon or Kallisto to quantify isoforms using the best available kmers per isoform, without being restricted to the ALE regions which have a very wide distribution of lengths.makeTx2geneFromGtf()
takes a GTF file, and produces adata.frame
with tx (transcript) to gene associations.
list2im()
converts a list to an incidence matrix. It uses thearules
package fast methods for creatingtransactions
objects, which are sparse binary matrices with linked data.frames used to describe rows and columns. Similar toSummarizedExperiment
but with optimization specifically for list-to-matrix and matrix-to-list conversion. Their implementation is memory efficient as well.bgaPlotly3d()
takes a"bga"
class, as produced by themade4::bga()
function, and produces a 3-D plotly visualization. It adds ability to group centroids into "supergroups" which are connected by a spline 3-D curve.spline3d()
is the supporting function to draw interpolated 3-D curves through a set of coordinates.shrinkMatrix()
shrinks a numeric matrix by groups of rows, essentially a wrapper function for thedata.table
package functions.