Better collapse workflow for duplicate rows #61

davideby · 2023-02-03T00:27:20Z

Ticket to group a few related issues

We are currently dealing with duplicate rows by choosing a single representative - effectively arbitrarily - and throwing out the rest. Instead, we should deal with them in the same way as with collapse, using the chosen operation (Max, etc).
We should add a new Remove Duplicates mode for Collapse, which will work with no chip selected but detect and remove any duplicates in the fashion described in (1).
Attempting to run with either No Collapse or Remap Only should be an error rather than a warning.
We should flag duplicate rows in a dataset immediately on loading and issue a warning (right now we wait until processing). The warning should recommend use of Collapse or Remove Duplicates, or else for the user to fix the issue on their own outside of GSEA. The situation we suspect here is that the user may have performed a mapping step on their own (going from say Ensembl IDs to gene symbols) that resulted in duplicates. The suggestion would be to just use the original IDs instead; this advice might need to be given via link-out to the Wiki.

A semi-related issue is that turning off the "Omit unmapped symbols" setting could result in the introduction of duplicates in the case of mixed-namepsace datasets, e.g. with Affy probes + Gene Symbols. In this case, collapsing would map over all of the probes to genes but they might then be duplicates of the genes already present.

This last issue is hypothetical. It is unknown whether this actually happens. We need the collapse workflow to deal with these symbols combined with those coming out of the collapse.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better collapse workflow for duplicate rows #61

Better collapse workflow for duplicate rows #61

davideby commented Feb 3, 2023

Better collapse workflow for duplicate rows #61

Better collapse workflow for duplicate rows #61

Comments

davideby commented Feb 3, 2023