Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Binary ngc file parser #830

Merged
merged 13 commits into from
Aug 17, 2019
Merged

ENH: Binary ngc file parser #830

merged 13 commits into from
Aug 17, 2019

Conversation

ksunden
Copy link
Member

@ksunden ksunden commented Dec 11, 2018

Still need:

  • populate label for variables
  • replace pass with meaningful warning when header is unexpected
  • example files distributed (those I have are unpublished, currently, and larger than I'd like, only mild compression)
  • tests
  • Update doc rst files
  • Ascii text file parser (probably separate method, possibly separate PR entirely) (The ASCII format seems untenable to me to parse in a particularly sane way, given the lack of metadata. It can be done, but it is not clear to me how to do so without making either lots of assumptions or adding so many parameters that it becomes difficult to use. A specific case parser can be maintained out of tree more easily for now. I am open to discussion/ being convinced it is possible to write a parser that is user-friendly, but I do feel that any such contributions are separable from the implementation of this particular parser)

@ksunden ksunden added this to the 3.2.1 milestone Dec 11, 2018
@ksunden ksunden self-assigned this Dec 11, 2018
@ksunden
Copy link
Member Author

ksunden commented Dec 11, 2018

img_20181211_103609

My whiteboard notes from writing this parser:

red circle in the middle is a 32 bit x n (where n is 16 bit int read immediately prior).
I do not know what the meaning of the value is, but use it such that nonzero means there is an array for that index following, and zero mean there is not. This is an assumption that I made based on a small number of examples, but it held for those.

There is a table of metadata at the bottom, which may be useful someday, but is not parsed now, and likely won't be until it is specifically requested/implemented by the interested party.

@untzag
Copy link
Member

untzag commented Dec 11, 2018

@ksunden please consider if there is a more meaningful name than from_ngc

a more explicit name that refers to the instrument or software that generates such files would be preferred

@ksunden
Copy link
Member Author

ksunden commented Dec 12, 2018

it is unclear to me whether this file format is specific to a particular instrument (yes, all examples I have are from one instrument, all I know is that it is a horiba raman microscope, but I do not know much more) The program is something like LabSpec

The header of the file is "NGSNextGen", which mostly leads me to "Next Generation Sequencing", though I have not seen any other files which share the format in the sequencing community, so not sure that is a real link.

@ksunden
Copy link
Member Author

ksunden commented Jan 14, 2019

@darienmorrow Do you have any distributable datasets?

I can take care of most of the points fairly quickly, but need datasets to test against

@ksunden ksunden modified the milestones: 3.2.1, 3.2.2 Jan 15, 2019
@darienmorrow
Copy link
Member

@ksunden The datasets I have are rather large. We should talk about if we actually want to distribute them. The microscope these are taken from can also take 1D spectra, but I don't have any examples of it.
examples.zip

@ksunden ksunden modified the milestones: 3.2.2, 3.2.3 Mar 21, 2019
@ksunden
Copy link
Member Author

ksunden commented Jul 17, 2019

Aramis_acquisition_information.pdf
Aramis_acquisitions.zip

These are the small test data files (and some additional information).

The same acquisition is stored in several formats (ASCII, ng[cs], t[vs]f ) only ngc is intended to be supported by this parser at this time, ngs may be soon as well, depending on how similar/what the headers give me to work with

@ksunden ksunden changed the title [WIP] ENH: Binary ngc file parser ENH: Binary ngc file parser Jul 18, 2019
@ksunden
Copy link
Member Author

ksunden commented Jul 18, 2019

For now, I only put the example data in the tests directory, If we wish to include either some of those or some other dataset in datasets itself, I am open to doing so.

Copy link
Member

@untzag untzag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🌵

@untzag untzag merged commit 9693647 into master Aug 17, 2019
@untzag untzag deleted the from_ngc branch August 17, 2019 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants