Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow libecl reading with large ensembles #418

Closed
anders-kiaer opened this issue Aug 25, 2020 · 4 comments
Closed

Slow libecl reading with large ensembles #418

anders-kiaer opened this issue Aug 25, 2020 · 4 comments
Assignees
Labels
Data input This issue related to extracting/manipulating or organizing input data to Webviz help wanted Extra attention is needed

Comments

@anders-kiaer
Copy link
Collaborator

anders-kiaer commented Aug 25, 2020

In use cases like e.g. FlowNet, it takes a veeeeeeeeeeeeeeeery long time to extract libecl data (through the fmu-ensemblescalls?). It also takes some time in large classical ensembles.

The parquet format supports metadata at multiple levels, and there are ways to also use that with pandas:
pandas-dev/pandas#20534 (comment)

Add (or extend) some forward model in semeio to read+store time series + necessary metadata to parquet, and read that by default in Webviz if the file exists?

Opinions @asnyv @berland?

@anders-kiaer anders-kiaer added Data input This issue related to extracting/manipulating or organizing input data to Webviz help wanted Extra attention is needed labels Aug 25, 2020
@berland
Copy link
Contributor

berland commented Aug 26, 2020

I assume you are talking about pre-processing on realization level to some data format that is easier to parse later on. equinor/fmu-tools#47 is an unfinished attempt at this, to make a standardized forward model that is being run on each realization, and where dumping to parquet or uploading anywhere would be possible.

The ERT api could be another possibility.

@anders-kiaer
Copy link
Collaborator Author

Related: equinor/webviz-config#382

@anders-kiaer
Copy link
Collaborator Author

anders-kiaer commented May 31, 2021

After the work with @sigurdp on reducing memory footprint, the next major framework improvement would be to reduce Webviz build time (it is very slow for large ensembles/models). A brief offline chat with @jcrivenaes and @perolavsvendsen confirms that something like what is sketched below would be in line with SUMO and FMU metadata/Drogon.

Proposal: Basically a very simple job that reads .UNSMRY using libecl on realization level (while still on the cluster) and dumps it raw to a open standardized, efficient binary format, e.g. compressed .arrow files or HDF5. Metadata (units, is_rate/is_total...) could be added as metadata to the same file. Related issue wrt. metadata speed: #539

Another use case of such a job would be to optionally, for certain selected vectors which are sparse by nature (WSTAT and grid cell completion status, e.g. OPEN, SHUT, STOP) - to be stored "sparse" only (.UNSMRY would store WSTAT and grid cell completion status for every time step and COMPDAT grid cell, even though it easily could be ~99.9% redundant data). @lindjoha

image

@anders-kiaer
Copy link
Collaborator Author

anders-kiaer commented Jun 28, 2021

After #673 there is now a forward model dumping .arrow files (https://arrow.apache.org/overview/), which have shown crazy good read performance on RHEL when testing on real models.

@anders-kiaer anders-kiaer moved this to Done 🏁 in Webviz Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data input This issue related to extracting/manipulating or organizing input data to Webviz help wanted Extra attention is needed
Projects
Archived in project
Development

No branches or pull requests

3 participants