discard the selavy unit row before reading #473

marxide · 2021-03-16T21:36:37Z

I encountered a problem with the way the pipeline currently reads in a modified Selavy catalogue. It's quite tricky to replicate the Selavy ASCII output format exactly, so I took some short cuts, namely filling the row of column units with dashes. I did this because the pipeline will always discard the second row (i.e. the first row after the column headers) anyway.

When loading this Selavy-like ASCII catalogue, every row has a dtype of object (string). This is mostly fine as we convert the columns to the proper dtypes defined in the Selavy translator. However, it's not fine for the has_siblings column since converting a string to bool will always result in True.

This doesn't happen with the original Selavy catalogues since the has_siblings column has no unit, so when Pandas parses the catalogue the has_sibling column contains only numbers.

This PR discards the Selavy unit row before reading the rest of the catalogue. Previously, this row was discarded after reading. Discarding it beforehand allows Pandas to better determine an appropriate default column dtype.

Previously, this row was discarded after reading. Discarding it beforehand allows Pandas to better determine an appropriate default column dtype.

ajstewart

Huh, I had no idea it was like that! Completely agree with this change.

One day I'll remember to do this before review...

marxide · 2021-03-17T16:03:30Z

Thanks, Adam. I forgot to update the change log again so I just pushed that. Merging without re-review.

discard the selavy unit row before reading

5b690ef

Previously, this row was discarded after reading. Discarding it beforehand allows Pandas to better determine an appropriate default column dtype.

marxide added bug Something isn't working python Pull requests that update Python code labels Mar 16, 2021

marxide self-assigned this Mar 16, 2021

ajstewart previously approved these changes Mar 17, 2021

View reviewed changes

updated CHANGELOG.md

67c39fe

One day I'll remember to do this before review...

marxide dismissed ajstewart’s stale review via 67c39fe March 17, 2021 16:01

marxide merged commit c98d25f into master Mar 17, 2021

marxide deleted the fix-selavy-loading-dtype branch March 17, 2021 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discard the selavy unit row before reading #473

discard the selavy unit row before reading #473

marxide commented Mar 16, 2021

ajstewart left a comment

marxide commented Mar 17, 2021

discard the selavy unit row before reading #473

discard the selavy unit row before reading #473

Conversation

marxide commented Mar 16, 2021

ajstewart left a comment

Choose a reason for hiding this comment

marxide commented Mar 17, 2021