Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic_stats and max_pmito_s in Preprocessing #466

Closed
wants to merge 2 commits into from

Conversation

Ukyeon
Copy link
Contributor

@Ukyeon Ukyeon commented Mar 22, 2023

  1. Add basic_stats into early step in the preprocessing.
  2. Add new parameter max_pmito_s in the filter_cells_by_outliers function so that it allow user to remove cells that have too many mitochondrial genes expressed.
  3. Fix minor issues regarding docstring.

Comment on lines 235 to 243
if self.collapse_species_adata:
main_info("applying collapse species adata...")
self.collapse_species_adata(adata)

if self.convert_gene_name:
main_info("applying convert_gene_name function...")
self.convert_gene_name(adata)
main_info("making adata observation index unique after gene name conversion...")
self.unique_var_obs_adata(adata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should actually keep this. because we don't always need to run collapse_species_adata and convert_gene_name. We only need to do this if we have e.g. uu, ul, su, sl 4 species and the gene name is not in official name but emsemble id

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, we need to to have a function to directly detect whether it is necessary to collapse or convert gene names automatically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am uncertain if the current 'if statements' are necessary since the functions being called already contain condition checking statements.
For example,
if np.all([name in adata.layers.keys() for name in splicing_and_labeling]): -----> if we have e.g. uu, ul, su, sl 4 species
if np.all(adata.var_names.str.startswith("ENS")) or scopes is not None: ------> the gene name is not in official name but emsemble id

In my opinion, we can remove the if statements and call each function, including the unique_var_obs_adata to clarify and simplify the code. However, if you believe that those statements are necessary or a redundant function to check is needed, I am open to creating them.

dynamo/preprocessing/Preprocessor.py Outdated Show resolved Hide resolved
dynamo/preprocessing/preprocessor_utils.py Outdated Show resolved Hide resolved
@Ukyeon Ukyeon closed this May 4, 2023
@Ukyeon
Copy link
Contributor Author

Ukyeon commented May 4, 2023

This fix has already been applied to the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants