Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between karabo-bridge-serve-files and live data #330

Open
FilipeMaia opened this issue Jul 15, 2022 · 8 comments
Open

Inconsistency between karabo-bridge-serve-files and live data #330

FilipeMaia opened this issue Jul 15, 2022 · 8 comments

Comments

@FilipeMaia
Copy link

I'm using karabo-bridge-serve-files on recorded data to prepare for online analysis in an upcoming beamtime.
To be most useful it would be great if karabo-bridge-serve-files would be as close as possible to reading live data.
It seems that live data includes timestamps but karabo-bridge-serve-files does not.

Are there other differences between live data and that served by karabo-bridge-serve-files?

@FilipeMaia
Copy link
Author

For example, previous code seems to suggest that live raw AGIPD data came in two arrays, image.data and image.gain. Is this still the case?
This never happens with karabo-bridge-serve-files.

@FilipeMaia
Copy link
Author

Also with live that the fastest changing dimension in an AGIPD image.data array is cell number while that seems to be the slowest with data served by karabo-bridge-serve-files. Is there a better way to simulate live data?

@FilipeMaia
Copy link
Author

I found there's a --dummy-timestamps parameter! But the other questions still remain.

@philsmt
Copy link
Contributor

philsmt commented Jul 18, 2022

Hi Filipe,

Unfortunately it is possible there are inconsistencies between live and recorded data. This mostly comes down to differences in structure between Karabo's Hash protocol and how the DAQ ends up laying it out in HDF files, as well as the functional distinctions between online and offline corrections, the latter creating an entirely new set of files. I'm sorry this is not universally in a good state yet. A complete answer to your question unfortunately depends on a lot of factors like which kind of input files are used for karabo-bridge-serve-files and where the karabo-bridge was connected to, which are of course lots of details you should not need to concern yourself with as a user.

Some initial observations I can probably make:

  • For the shape issues, I suspect your experiences come from using the legacy online correction software that indeed reversed the axis order to move cells from the slowest to fastest axis in C order. The new online correction software is configurable here and will reshape it for free to any order you wish in GPU memory.

  • The key question is a bit more tricky. In general raw data coming from the detector carries data as (cells, 2, x, y) with signal intensity and analog gain in the second axis. As you are probably referring to the legacy online correction software (see above), this means the karabo-bridge was probably connected to either the splitter stage or thresholding stage and not the DAQ. The former creates two keys image.data and image.gain from raw data with still carrying the analog gain information while the latter converts it to digital gain. The new correction software has only a single stage in a format compatible to offline data, although currently lacking gain information as there had been no need for it yet.

The fundamental difference in data layout between raw and corrected data in both offline and online is desired to homogenize the corrected data format as much as possible between detectors, which not all share details such as gain thresholding. In addition even with carrying the gain information here the change in data type motivates moving it a different key.

Did you consider whether you want to work with either raw data or corrected AGIPD data exclusively, or possibly both? Note that with the new correction software there is no performance difference between the two.

@FilipeMaia
Copy link
Author

Hi Philipp,

Thanks for the detailed reply!

I understand that there will be differences between the data saved in hdf5 and the streamed data but it would be very useful if karabo-bridge-serve-files would be able to translate between the hdf5 format and the streamed format so we could use it for testing online analysis codes.

I think you're correct in identifying the reasons for the discrepancies. I'm currently using https://github.com/European-XFEL/EXtra-foam/blob/dev/extra_foam/pipeline/processors/image_assembler.py as a guide for the differences between online and offline.

Given that there are multiple ways that the data can be streamed online (e.g. you mentioned being able to reshape in any way you wish) does the stream contain some information to tell us how the data is being shaped or if it comes from a file or is live? Even some version information about the streamer could be useful.

We'll try out the corrected data, but it would be useful if one could have the option to also access the raw data.

@FilipeMaia
Copy link
Author

Also is there any documentation on the new correction software/zmq bridge and is the code available somewhere?

@philsmt
Copy link
Contributor

philsmt commented Jul 19, 2022

I understand that there will be differences between the data saved in hdf5 and the streamed data but it would be very useful if karabo-bridge-serve-files would be able to translate between the hdf5 format and the streamed format so we could use it for testing online analysis codes.

Definitely, I've raised it internally and we should aim to offer options to solve this automatically. That being said, there's also a somewhat less documented tool (which is being addressed right now) called karabo-bridge-recorder (you will need to login, but it should be accessible) that records an actual data stream verbatim. Naturally it does not help after the fact, but can be used to replay an authentic online stream at any later point in time.

Given that there are multiple ways that the data can be streamed online (e.g. you mentioned being able to reshape in any way you wish) does the stream contain some information to tell us how the data is being shaped or if it comes from a file or is live? Even some version information about the streamer could be useful.

Not really, the karabo-bridge so far was seen only for online analysis and not a universal data streaming format. I understand your motivation as maintainer of a cross-facility tool for versioning. Our current plan is to actually coalesce on a standardized format in the first place, but we will keep versioning it in mind for future changes.
Concerning the memory order example, while it is not contained right now in the stream, it could be added easily in its string representation (it's specified by a letter code, e.g. cxy puts y as fastest axis and cell as slowest). Ultimately it should be set to whatever works for you best, as memory order can make quite a drastic impact for the data sizes we're speaking of with full-rate detector data.

Also is there any documentation on the new correction software/zmq bridge and is the code available somewhere?

We're preparing documentation for the new online correction software as we speak. You can find a build of the latest version here, but please keep in mind things are in flux and links may point to git.xfel.eu repositories. In most cases I would expect your account is able to access it anyway. It is still being expanded, but if you find anything particular missing, don't hesitate to tell us please!

@FilipeMaia
Copy link
Author

Definitely, I've raised it internally and we should aim to offer options to solve this automatically. That being said, there's also a somewhat less documented tool (which is being addressed right now) called karabo-bridge-recorder (you will need to login, but it should be accessible) that records an actual data stream verbatim.

Ah I didn't know about it. Do you guys have some sample stream recorded I could use?

Not really, the karabo-bridge so far was seen only for online analysis and not a universal data streaming format. I understand your motivation as maintainer of a cross-facility tool for versioning. Our current plan is to actually coalesce on a standardized format in the first place, but we will keep versioning it in mind for future changes.

Standardising is good, but that does not remove the need for some version information (because standards evolve). You could even have version information from the different parts, for example from the bridge itself but also from the calibration pipeline (now calng). It should also contain information about memory order like you suggest. This would make it much simpler for downstream software to handle the data (at the moment I'm looking at the different dimensions and guessing which ones correspond to the x and y axis of the modules assuming they are 512x128, which is too fragile).

We're preparing documentation for the new online correction software as we speak. You can find a build of the latest version here, but please keep in mind things are in flux and links may point to git.xfel.eu repositories.

That's great! You should spread this information more widely. I think many people would like to know exactly how the calibration is being done to be able to trust it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants