DGEobj is an S3 data class that provides a flexible container for DGE analysis results.
The DGEobj is designed to be extensible allowing definition of new data types as needed.
A set of accessory functions to deposit, query and retrieve subsets of a data workflow has been provided. Attributes are used to capture metadata such as species and genemodel and including reproducibility information such that a 3rd party can access a DGEobj history to see how each data object was created.
Operationally, the DGEobj is styled after the RangedSummarizedExperiment (RSE). The DGEobj has data slots for row (gene), col (samples), assays (anything with n rows X m samples dimensions) and meta data (anything that can't be keyed to row, col or assay). The key motivations for creating the DGEobj data structure is that the RSE only allows one data item each in the row and col slots and thus is unsuitable for capturing the plethora of data objects created during a typical DGE workflow. The DGEobj data structure then can hold any number of row and col data objects. Thus, the DGEobj data structure is engineered for capturing multiple steps of a downstream analysis. For example, fit objects and topTable results are accommodated by the row type object.
Certain object types, primarily the count matrix and associated row and column info, are defined as unique which means only one instance of that type may be added to the DGEobj.
When multiple objects of one type are included in a DGEobj (e.g. two different fits), the concept of parent attributes is used to associate downstream data objects (e.g. contrasts) with the appropriate data object they are derived from.
A DGE obj is fundamentally a list of data objects. Each data object deposited in a DGEobj is accompanied by attributes including a Type, a Basetype, a dateCreated and funArgs text field. Example data objects that are useful to capture as documentation of an analysis include, design matrices, DGElists, fit objects, topTable output etc.
There are 4 fundamental and immutable basetypes (row, col, assay, meta). These are used under the hood to define how to subset each data element.
To provide flexibility there are number of predefined types (showTypes() to list) and a newType function provides extensibility to create new data types as needed. Each type must be associated with a basetype. Except for the unique fields described above, you can have multiple instances of any Type as long as each instance is given a unique "itemName". A data structure defining a DGEobj's customized structure is stored as the "objDef" attribute on the DGEobj (myObjDef <- attr(dgeObj, "objDef")).
funArgs is a text field intended to hold details of creating the object. Passing funArgs = match.call() is a convenient way to automate capture of the calling arguments of the current function when that best describes how an object was created. The user can also supply a custom user-authored text comment for this purpose.
- initDGEobj: Initialize a new DGEobj (requires counts, design and geneDat (or isoformDat or exonDat)
- addItem: Add a data item to a DGEobj
- rmItem: Remove a data item from a DGEobj
- newType: Define a new data type
- setAttributes: Allows setting multiple attributes (facilitates attribute templates)
- dim: Return the dimensions of the DGEobj (the assay dimensions)
- length: Returns the number of stored data items
- dimnames: Return the row (gene) and column (sample) names
- itemNames: Return a character vector of data item names
- showTypes: Show the type definitions of a DGEobj (all currently defined types)
- inventory: Print a summary of the contents of a DGEobj, date created and optionally the funArg history
- getItem: Return a data item(s) by itemname
- getItems: Return data items by a list of itemnames
- getType: Return data item(s) by itemType
- getBaseType: Return data item(s) by baseType
- as.list: unclass a DGEobj to simple list
- convertDGEobj: convert base data (counts, gene and sample annotation) from DGEobj to RangedSummarizedExperiment or ExpressionSet
- convertRSE: convert base data (counts, gene and sample annotation) from RSE to DGEobj or ExpressionSet
If the geneData object (row annotatioon) contains chromosome position information (chr, start, end, strand), a GRanges object will also be created.
During initialization, a copy of the counts, gene annotation and sample annotation is duplicated and stored in the meta slot with an "_orig" suffix on the itemName. This preserves the original data after you have subsetted the original data.
It is best to run the install from a fresh R session before loading any packages because loaded packages cannot be updated.
require(devtools)
devtools::install_git("https://github.com/jrthompson54/DGEobj")