Skip to content

Module : threshold filtering

Jaze8 edited this page Jun 7, 2018 · 2 revisions

Module : threshold-filtering

This module removes bad quality cells from the dataset based on user-set thresholds.

  • Internal name : threshold-filtering

  • Avalaible : local mode

  • Input Ports :

    • matrix : initial count matrix (tsv)
    • cells : initial cells metadata (tsv)
    • genes : genes metadata (tsv)
  • Output Ports :

    • completcellsoutput : initial cells metadata (tsv) (completed with quality metrics)
    • matrixoutput : filtered count matrix (tsv)
    • cellsoutput : filtered cells metadata (tsv)
  • Optional parameters :

Parameter Type Description Default Value
detection_threshold integer Minimal number of reads to consider a feature as detected 10
expression_option string Type of feature to consider (Endogenous, Nuclear or All) Endogenous
expression_threshold integer Minimal number of detected features 4000
reads_threshold integer Minimal number of mapped reads to keep a cell 200 000
prop_mt float Maximum proportion of reads mapping to mitochondrial features 0.1
prop_sp float Maximum proportion of reads mapping to exogenous features 0.5
nb_filters int Minimum number of failures triggering removal 1
  • Configuration example
<step id="QC" skip="false">
	<module>threshold-filtering</module>
	<parameters>
		<parameter>
			<name>expression_option</name>
			<value>Endogenous</value>	
		</parameter>
		<parameter>
			<name>reads_threshold</name>
			<value>500000</value>	
		</parameter>
		<parameter>
			<name>expression_threshold</name>
			<value>1500</value>	
		</parameter>
	  </parameters>
</step>

Interpreting output files

Scatter Plot

After cleaning data, the module produces two scatter plot, showing all cells in term of number of features (y-axis) and number of reads (x-axis).

Raw_Cellplot

The first one shows all cells, and the filtering thresholds. Cells in red are eliminated.

Filtered_cellplot

The second one shows the remaining cells after filtering. At the end of the filtering, cells should behave like a mixture of Gaussians, i.e. you can wrap them in a given number of ellipses.