Slow libecl reading with large ensembles #418

anders-kiaer · 2020-08-25T18:55:54Z

In use cases like e.g. FlowNet, it takes a veeeeeeeeeeeeeeeery long time to extract libecl data (through the fmu-ensemblescalls?). It also takes some time in large classical ensembles.

The parquet format supports metadata at multiple levels, and there are ways to also use that with pandas:
pandas-dev/pandas#20534 (comment)

Add (or extend) some forward model in semeio to read+store time series + necessary metadata to parquet, and read that by default in Webviz if the file exists?

Opinions @asnyv @berland?

The text was updated successfully, but these errors were encountered:

berland · 2020-08-26T07:34:16Z

I assume you are talking about pre-processing on realization level to some data format that is easier to parse later on. equinor/fmu-tools#47 is an unfinished attempt at this, to make a standardized forward model that is being run on each realization, and where dumping to parquet or uploading anywhere would be possible.

The ERT api could be another possibility.

anders-kiaer · 2021-01-22T18:51:15Z

Related: equinor/webviz-config#382

anders-kiaer · 2021-05-31T10:40:33Z

After the work with @sigurdp on reducing memory footprint, the next major framework improvement would be to reduce Webviz build time (it is very slow for large ensembles/models). A brief offline chat with @jcrivenaes and @perolavsvendsen confirms that something like what is sketched below would be in line with SUMO and FMU metadata/Drogon.

Proposal: Basically a very simple job that reads .UNSMRY using libecl on realization level (while still on the cluster) and dumps it raw to a open standardized, efficient binary format, e.g. compressed .arrow files or HDF5. Metadata (units, is_rate/is_total...) could be added as metadata to the same file. Related issue wrt. metadata speed: #539

Another use case of such a job would be to optionally, for certain selected vectors which are sparse by nature (WSTAT and grid cell completion status, e.g. OPEN, SHUT, STOP) - to be stored "sparse" only (.UNSMRY would store WSTAT and grid cell completion status for every time step and COMPDAT grid cell, even though it easily could be ~99.9% redundant data). @lindjoha

anders-kiaer · 2021-06-28T19:53:09Z

After #673 there is now a forward model dumping .arrow files (https://arrow.apache.org/overview/), which have shown crazy good read performance on RHEL when testing on real models.

anders-kiaer added Data input This issue related to extracting/manipulating or organizing input data to Webviz help wanted Extra attention is needed labels Aug 25, 2020

anders-kiaer assigned sigurdp Jun 17, 2021

anders-kiaer closed this as completed Jun 28, 2021

anders-kiaer moved this to Done 🏁 in Webviz Jan 9, 2023

anders-kiaer added this to Webviz Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow libecl reading with large ensembles #418

Slow libecl reading with large ensembles #418

anders-kiaer commented Aug 25, 2020 •

edited

Loading

berland commented Aug 26, 2020

anders-kiaer commented Jan 22, 2021

anders-kiaer commented May 31, 2021 •

edited

Loading

anders-kiaer commented Jun 28, 2021 •

edited

Loading

Slow libecl reading with large ensembles #418

Slow libecl reading with large ensembles #418

Comments

anders-kiaer commented Aug 25, 2020 • edited Loading

berland commented Aug 26, 2020

anders-kiaer commented Jan 22, 2021

anders-kiaer commented May 31, 2021 • edited Loading

anders-kiaer commented Jun 28, 2021 • edited Loading

anders-kiaer commented Aug 25, 2020 •

edited

Loading

anders-kiaer commented May 31, 2021 •

edited

Loading

anders-kiaer commented Jun 28, 2021 •

edited

Loading