Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in flownet_save_iteration_analytics in CI/CD build #61

Closed
wouterjdb opened this issue Jun 3, 2020 · 11 comments · Fixed by #90
Closed

Error in flownet_save_iteration_analytics in CI/CD build #61

wouterjdb opened this issue Jun 3, 2020 · 11 comments · Fixed by #90
Labels
bug Something isn't working

Comments

@wouterjdb
Copy link
Collaborator

In a successful github workflow run one finds the following error in the logs:

The script 'ExternalErtScript' caused an error while running:

Simulations completed.
Traceback (most recent call last):
  File "/home/runner/work/flownet/flownet/flownet_venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DATE'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/runner/work/flownet/flownet/flownet_venv/bin/flownet_save_iteration_analytics", line 8, in <module>
    sys.exit(save_iteration_analytics())
  File "/home/runner/work/flownet/flownet/flownet_venv/lib/python3.7/site-packages/flownet/ahm/_ahm_iteration_analytics.py", line 420, in save_iteration_analytics
    df_sim = df_sim[df_sim["DATE"].isin(df_obs["DATE"])]
  File "/home/runner/work/flownet/flownet/flownet_venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2995, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/runner/work/flownet/flownet/flownet_venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DATE'

The DATE key is missing in a pandas dataframe. Could this be a result of recent changes in the time shifting /resampling that were done?

@wouterjdb wouterjdb added the bug Something isn't working label Jun 3, 2020
@tayloris
Copy link
Collaborator

tayloris commented Jun 18, 2020

I get the same message for Egg model now. I noticed that ahm_config.ert has by default the following parameters for ERT analysis section:

"analysis": {
            "metric": "[RMSE]",
            "quantity": "[WOPR:BR-P-]",
            "start": "2001-04-01",
            "end": "2006-01-01",
            "outfile": "analysis_metrics_iteration",

I'm not familiar with this parameter analysis . Anyhow, I added manually to the corresponding parameter for Egg model in the configurations file

  analysis:
      metric:   "[RMSE]"
      quantity: "[WOPR:PROD]"
      start:    "2011-06-15"
      end:      "2018-12-05"
      outfile:  "analysis_metrics_iteration"

This makes disappear the error message. Nevertheles, I have another problem and I'm not sure it is related to this error. The problem is that the omp models of the network, has not solution, they run but the solution is constant like there are flowrate or BHP conditions that are never impose.

@wouterjdb
Copy link
Collaborator Author

Thanks for the tip @tayloris. I did as suggested and now I'm getting:

All 1 active jobs complete and data loaded.
The script 'ExternalErtScript' caused an error while running:
Traceback (most recent call last):
  File "/home/media-unix/flownet/venv/bin/flownet_save_iteration_analytics", line 11, in <module>
    load_entry_point('flownet', 'console_scripts', 'flownet_save_iteration_analytics')()
  File "/home/media-unix/flownet/src/flownet/ahm/_ahm_iteration_analytics.py", line 417, in save_iteration_analytics
    df_obs = make_observation_dataframe(obs, key_list_data)
  File "/home/media-unix/flownet/src/flownet/ahm/_ahm_iteration_analytics.py", line 207, in make_observation_dataframe
    for value in obs.get(key)["observations"]:
TypeError: 'NoneType' object is not subscriptable

Is that by any means the same thing as you got?

@tayloris
Copy link
Collaborator

I got another error related to a well status at that time.

However, in your case, it may be that the "start" and "end" date are out of your simulation time. Or it can be that it doesn't find the parameter "quantity".

For instance, all the producer in EGG model is named by PROD + a number so that's why I specified [WOPR:PROD] in my case.

@tayloris
Copy link
Collaborator

if "analysis" is not specified in the configuration file then flownet should not add any default value in " "analysis" and also it should not run an analysis workflow in ERT

@wouterjdb
Copy link
Collaborator Author

I tried specifying other options but I ended up with errors none the less. I also tried some try-except statements but I can't yet fool the workflow into just ignoring what it can't find (it just fails one step later).

if "analysis" is not specified in the configuration file then flownet should not add any default value in " "analysis" and also it should not run an analysis workflow in ERT

I agree. 👍

@wouterjdb
Copy link
Collaborator Author

Any ideas @edubarrosTNO ?

@wouterjdb
Copy link
Collaborator Author

wouterjdb commented Jun 24, 2020

I found two problems:

  1. If observations do not exists for all vectors that are requested the algorithm fails
  2. If a well does not run all the way to the end of the simulation NaN values are introduced in the algorithm (and thus it will crash).

However, if I run the code now locally I'm getting different results for each time I run the same code on the exact same data:

quantity,iteration,MAE
WOPR:D,1,0.1434708635998197
WOPR:D,1,0.12671471885705474
WOPR:D,1,0.1267147188570547
WOPR:D,1,0.10091584544572182
WOPR:D,1,0.2281198618816107

I still need to identify where the random generator is located... 😆

@edubarrosTNO
Copy link
Contributor

@wouterjdb and @tayloris, I have just seen the history of messages here. Yes, I like the idea of not running the analysis workflow on ERT if no analysis parameters are provided. I will probably start with this one.

regarding the random behavior, I have also observed that before. I looked a bit into it a few weeks ago, and I think this might be related to this part of the code that is reading observations from the observation yaml file (if I remember correctly, I got this bit of code from you @wouterjdb quite some time ago):

with open(args.yamlobs) as stream:
    obs = {
        item.pop("key"): item
        for item in yaml.safe_load(stream).get("smry", [dict()])
    }

I think this is loading the measurement data into a structure that doesn't preserve order (causing random order). I thought that the remainder of the code was carefully handling this, but apparently it is not. I'll try to find out more about it and address this.

@wouterjdb
Copy link
Collaborator Author

Yes, a dict has an arbitrary order and looping over a dict is therefore not a good idea. The key-value pairs are however what they are, so you can use a dict as a look-up table.

There is also a collection which is called OrderedDict, which does preserve order.

@anders-kiaer
Copy link
Collaborator

dict itself on Python3.6+ preserves insertion order (this was not the case on <= 3.5). If you want to be formal, preserved insertion order is not in the Python spec before 3.7, but it is already implemented in CPython 3.6 as an "implementation detail" (which most/all FlowNet users will be using).

@wouterjdb
Copy link
Collaborator Author

dict itself on Python3.6+ preserves insertion order (this was not the case on <= 3.5). If you want to be formal, preserved insertion order is not in the Python spec before 3.7, but it is already implemented in CPython 3.6 as an "implementation detail" (which most/all FlowNet users will be using).

Ah, good to know. Wasn't aware of that change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants