-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic_stats and max_pmito_s in Preprocessing #466
Conversation
Ukyeon
commented
Mar 22, 2023
- Add basic_stats into early step in the preprocessing.
- Add new parameter max_pmito_s in the filter_cells_by_outliers function so that it allow user to remove cells that have too many mitochondrial genes expressed.
- Fix minor issues regarding docstring.
dynamo/preprocessing/Preprocessor.py
Outdated
if self.collapse_species_adata: | ||
main_info("applying collapse species adata...") | ||
self.collapse_species_adata(adata) | ||
|
||
if self.convert_gene_name: | ||
main_info("applying convert_gene_name function...") | ||
self.convert_gene_name(adata) | ||
main_info("making adata observation index unique after gene name conversion...") | ||
self.unique_var_obs_adata(adata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should actually keep this. because we don't always need to run collapse_species_adata and convert_gene_name. We only need to do this if we have e.g. uu, ul, su, sl 4 species and the gene name is not in official name but emsemble id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, we need to to have a function to directly detect whether it is necessary to collapse or convert gene names automatically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am uncertain if the current 'if statements' are necessary since the functions being called already contain condition checking statements.
For example,
if np.all([name in adata.layers.keys() for name in splicing_and_labeling]): -----> if we have e.g. uu, ul, su, sl 4 species
if np.all(adata.var_names.str.startswith("ENS")) or scopes is not None: ------> the gene name is not in official name but emsemble id
In my opinion, we can remove the if statements and call each function, including the unique_var_obs_adata to clarify and simplify the code. However, if you believe that those statements are necessary or a redundant function to check is needed, I am open to creating them.
This fix has already been applied to the master branch. |