Add basic_stats and max_pmito_s in Preprocessing #466

Ukyeon · 2023-03-22T16:11:37Z

Add basic_stats into early step in the preprocessing.
Add new parameter max_pmito_s in the filter_cells_by_outliers function so that it allow user to remove cells that have too many mitochondrial genes expressed.
Fix minor issues regarding docstring.

Xiaojieqiu · 2023-04-01T16:13:13Z

dynamo/preprocessing/Preprocessor.py

-        if self.collapse_species_adata:
-            main_info("applying collapse species adata...")
-            self.collapse_species_adata(adata)
-
-        if self.convert_gene_name:
-            main_info("applying convert_gene_name function...")
-            self.convert_gene_name(adata)
-            main_info("making adata observation index unique after gene name conversion...")
-            self.unique_var_obs_adata(adata)


we should actually keep this. because we don't always need to run collapse_species_adata and convert_gene_name. We only need to do this if we have e.g. uu, ul, su, sl 4 species and the gene name is not in official name but emsemble id

in fact, we need to to have a function to directly detect whether it is necessary to collapse or convert gene names automatically

I am uncertain if the current 'if statements' are necessary since the functions being called already contain condition checking statements.
For example,
if np.all([name in adata.layers.keys() for name in splicing_and_labeling]): -----> if we have e.g. uu, ul, su, sl 4 species
if np.all(adata.var_names.str.startswith("ENS")) or scopes is not None: ------> the gene name is not in official name but emsemble id

In my opinion, we can remove the if statements and call each function, including the unique_var_obs_adata to clarify and simplify the code. However, if you believe that those statements are necessary or a redundant function to check is needed, I am open to creating them.

dynamo/preprocessing/Preprocessor.py

dynamo/preprocessing/preprocessor_utils.py

Ukyeon · 2023-05-04T23:38:35Z

This fix has already been applied to the master branch.

Xiaojieqiu reviewed Apr 1, 2023

View reviewed changes

LoveLennone and others added 2 commits April 2, 2023 00:05

fix to the # of cells that are removed

6573c1a

Merge branch 'master' into Implement_callers

3416393

Ukyeon closed this May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic_stats and max_pmito_s in Preprocessing #466

Add basic_stats and max_pmito_s in Preprocessing #466

Ukyeon commented Mar 22, 2023

Xiaojieqiu Apr 1, 2023

Xiaojieqiu Apr 1, 2023

Ukyeon Apr 2, 2023

Ukyeon commented May 4, 2023

Add basic_stats and max_pmito_s in Preprocessing #466

Add basic_stats and max_pmito_s in Preprocessing #466

Conversation

Ukyeon commented Mar 22, 2023

Xiaojieqiu Apr 1, 2023

Choose a reason for hiding this comment

Xiaojieqiu Apr 1, 2023

Choose a reason for hiding this comment

Ukyeon Apr 2, 2023

Choose a reason for hiding this comment

Ukyeon commented May 4, 2023