Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better collapse workflow for duplicate rows #61

Open
davideby opened this issue Feb 3, 2023 · 0 comments
Open

Better collapse workflow for duplicate rows #61

davideby opened this issue Feb 3, 2023 · 0 comments

Comments

@davideby
Copy link
Contributor

davideby commented Feb 3, 2023

Ticket to group a few related issues

  1. We are currently dealing with duplicate rows by choosing a single representative - effectively arbitrarily - and throwing out the rest. Instead, we should deal with them in the same way as with collapse, using the chosen operation (Max, etc).
  2. We should add a new Remove Duplicates mode for Collapse, which will work with no chip selected but detect and remove any duplicates in the fashion described in (1).
  3. Attempting to run with either No Collapse or Remap Only should be an error rather than a warning.
  4. We should flag duplicate rows in a dataset immediately on loading and issue a warning (right now we wait until processing). The warning should recommend use of Collapse or Remove Duplicates, or else for the user to fix the issue on their own outside of GSEA. The situation we suspect here is that the user may have performed a mapping step on their own (going from say Ensembl IDs to gene symbols) that resulted in duplicates. The suggestion would be to just use the original IDs instead; this advice might need to be given via link-out to the Wiki.

A semi-related issue is that turning off the "Omit unmapped symbols" setting could result in the introduction of duplicates in the case of mixed-namepsace datasets, e.g. with Affy probes + Gene Symbols. In this case, collapsing would map over all of the probes to genes but they might then be duplicates of the genes already present.

This last issue is hypothetical. It is unknown whether this actually happens. We need the collapse workflow to deal with these symbols combined with those coming out of the collapse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant