Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

string value columns are converted to number vlues #37

Closed
logicvay2010 opened this issue Sep 25, 2023 · 7 comments
Closed

string value columns are converted to number vlues #37

logicvay2010 opened this issue Sep 25, 2023 · 7 comments

Comments

@logicvay2010
Copy link

Hi Alister,

I found the starfile program quite useful for my development! However, when read the "star" file, I found some columns supposed to be String values that can be converted to numbers in the Dataframe. In my example, the '01' for 'rlnTomoName' column is read as number 1 which could cause an issue when matching with the original tomogram. So could make an update on this? Thanks!

Best,
Hui

@alisterburt
Copy link
Collaborator

Hey Hui, I’m glad you’re finding the package useful!

I’m letting pandas decide which columns are numeric by running pd.to_numeric and seeing if it errors - columns which raise an error are left as strings.

How would you like to see the proposed feature exposed to users? We could maybe add a list of column names to not auto-cast in starfile.read

This isn’t high priority for me but I would be happy to review a PR if you’re keen to see it added!

@logicvay2010
Copy link
Author

Hi Alister,

Thanks for the response! I am writing a GUI program for placing back sub-tomograms into the original tomograms, so I need to specify certain tomo names that appear in the star file to be generated. Sure, it's not an urgent thing. I just feel that it makes more sense to at least add some guaranteed not-numbered columns to be skipped during the read. An easy way to do that might be pre-scanning in _parse_loop_block(self) -> pd.DataFrame, if the column name is in a list of save words then just keep them as they were. ["rlnTomoName"].

It's just happened when I want just simplify my tomoName, I agree this scenario is not common. Technically, the name can be named after a number, and adding that hard-coded feature would make this program "safer" to use. Of course, it can be written in some constant-only file and be imported into that parser function so it can be extended afterward.

By the way, I am using imodmodel too, excellent work!

Thanks!
Hui

@EuanPyle
Copy link
Contributor

EuanPyle commented Feb 6, 2024

Hey Alister,
I've actually just had a bug report from RELION repeating this issue (3dem/relion#1079) where the user had named their TS only numbers (e.g. '0035') which starfile converts to '35' so during TSA, RELION can't find the correct rlnTomoName. Maybe it's worth re-visitng this issue? I imagine there will be lots of people new to tomography using the new v5 workflow so this mistake might become more common!

@alisterburt
Copy link
Collaborator

Yeah we can definitely implement the solution outlined above, not hitting named columns with pd.to_numeric -> I'm not sure when I'll get to this but would happily accept a PR if you want to spend the time!

@EuanPyle
Copy link
Contributor

EuanPyle commented Feb 6, 2024

No probs, I'll shoot a PR when I get a second!

@alisterburt
Copy link
Collaborator

@logicvay2010 @EuanPyle's PR with this feature just got merged, available in starfile>=0.5.6

@logicvay2010
Copy link
Author

@logicvay2010 @EuanPyle's PR with this feature just got merged, available in starfile>=0.5.6

Awesome! Thanks, @EuanPyle !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants