Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discard the selavy unit row before reading #473

Merged
merged 2 commits into from
Mar 17, 2021
Merged

Conversation

marxide
Copy link
Contributor

@marxide marxide commented Mar 16, 2021

I encountered a problem with the way the pipeline currently reads in a modified Selavy catalogue. It's quite tricky to replicate the Selavy ASCII output format exactly, so I took some short cuts, namely filling the row of column units with dashes. I did this because the pipeline will always discard the second row (i.e. the first row after the column headers) anyway.

When loading this Selavy-like ASCII catalogue, every row has a dtype of object (string). This is mostly fine as we convert the columns to the proper dtypes defined in the Selavy translator. However, it's not fine for the has_siblings column since converting a string to bool will always result in True.

This doesn't happen with the original Selavy catalogues since the has_siblings column has no unit, so when Pandas parses the catalogue the has_sibling column contains only numbers.

This PR discards the Selavy unit row before reading the rest of the catalogue. Previously, this row was discarded after reading. Discarding it beforehand allows Pandas to better determine an appropriate default column dtype.

Previously, this row was discarded after reading. Discarding it beforehand allows Pandas to better determine an appropriate default column dtype.
@marxide marxide added bug Something isn't working python Pull requests that update Python code labels Mar 16, 2021
@marxide marxide self-assigned this Mar 16, 2021
ajstewart
ajstewart previously approved these changes Mar 17, 2021
Copy link
Contributor

@ajstewart ajstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I had no idea it was like that! Completely agree with this change.

One day I'll remember to do this before review...
@marxide
Copy link
Contributor Author

marxide commented Mar 17, 2021

Thanks, Adam. I forgot to update the change log again so I just pushed that. Merging without re-review.

@marxide marxide merged commit c98d25f into master Mar 17, 2021
@marxide marxide deleted the fix-selavy-loading-dtype branch March 17, 2021 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants