[IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file #6694

eram576 · 2024-09-25T03:49:10Z

When installing the multpl openbb extension and building the Python interface as recommended by the docs, I encounter an error when executing the build command as the temporary cache file cannot be renamed and subsequently I encounter an error pulling data from the multpl provider.

I only encounter this issue when using a Spark cluster (specifically in Databricks) but when I run the same code on my local Desktop there are no issues, the build succeeds and I can pull data as usual from the multpl provider. I believe the failure when using a Spark cluster is due to multiple processes running simultaneously using multiple ruff invocations as described in this issue.

I am wondering if renaming the cache file is critical to the building step and can be ignored if it fails so the build can still succeed and I can continue using obb to pull data from the given provider.

To Reproduce

%pip install openbb openbb-multpl 

# To restart Python using Databricks utils cmd
dbutils.library.restartPython()

import openbb
openbb.build()

# To restart Python using Databricks utils cmd
dbutils.library.restartPython()

from openbb import obb

obb.index.sp500_multiples(series_name='pe_month', provider='multpl')

Screenshots

Desktop (see more details here):

Apache Spark 3.5.0, Scala 2.12
Operating System: Ubuntu 22.04.3 LTS
Java: Zulu 8.74.0.17-CA-linux64
Python: 3.10.12
R: 4.3.1
Delta Lake: 3.1.0

The text was updated successfully, but these errors were encountered:

jmaslek · 2024-09-25T12:43:40Z

If you try openbb.build(lint=False) - does that work?

deeleeramone · 2024-09-25T16:34:53Z

I think this is likely a limitation related to the specific host service. Multiprocessing can turn into a nightmare pretty fast. It might help if you can disable that in the host service. Otherwise, dynamic command execution should get around it.

Ruff has apparently resolved the referenced issue, but this does not appear to be the same error. "No such file or directory" suggests it is a file system error.

Even if Ruff fails, the static assets will still have been generated and built.

The error message - "This portal is not running" - implies that async event loops are not being handled somewhere in the pipeline. This can be caused from $PATH issues that introduce system packages to the environment instead of isolating the environment completely and only calling packages from within it. This is often a problem with an incorrectly configured Anaconda Navigator installation.

See this for a comparable to Databricks. You may need to add --force-reinstall and --no-cache-dir when invoking pip install to specifically override any pre-existing wheels.

Potential Solution

In situations like this, it will be better to run functions using CommandRunner().run because it does not require the OpenBB app factory to run commands. https://docs.openbb.co/platform/user_guides/dynamic_command_execution

For Streamlit Cloud apps, you have to do it this way because they do not provide access to the file system site-packages where the static assets are stored.

This is async and you need to manage this carefully throughout the entire pipeline.

You might want to add this to the code after the import blocks:

import nest_asyncio

nest_asyncio.apply()

eram576 · 2024-09-26T03:43:50Z

If you try openbb.build(lint=False) - does that work?

Unfortunately this doesn't work, it results in the same error and obb.index.sp500_multiples(series_name='pe_month', provider='multpl') doesnt work after that too.

eram576 · 2024-09-26T03:46:43Z

I think this is likely a limitation related to the specific host service. Multiprocessing can turn into a nightmare pretty fast. It might help if you can disable that in the host service. Otherwise, dynamic command execution should get around it.

Ruff has apparently resolved the referenced issue, but this does not appear to be the same error. "No such file or directory" suggests it is a file system error.

Even if Ruff fails, the static assets will still have been generated and built.

The error message - "This portal is not running" - implies that async event loops are not being handled somewhere in the pipeline. This can be caused from $PATH issues that introduce system packages to the environment instead of isolating the environment completely and only calling packages from within it. This is often a problem with an incorrectly configured Anaconda Navigator installation.

See this for a comparable to Databricks. You may need to add --force-reinstall and --no-cache-dir when invoking pip install to specifically override any pre-existing wheels.

Potential Solution

In situations like this, it will be better to run functions using CommandRunner().run because it does not require the OpenBB app factory to run commands. https://docs.openbb.co/platform/user_guides/dynamic_command_execution

For Streamlit Cloud apps, you have to do it this way because they do not provide access to the file system site-packages where the static assets are stored.

This is async and you need to manage this carefully throughout the entire pipeline.

You might want to add this to the code after the import blocks:
import nest_asyncio

nest_asyncio.apply()

This works thanks a lot! Am able to use CommandRunner().run and async functions to pull data from the provider.

eram576 changed the title ~~[IMPROVE] Can't complete openbb.build() when using Spark due to failure to rename temporary cache file~~ [IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file Sep 25, 2024

OpenBB-finance locked and limited conversation to collaborators Sep 26, 2024

deeleeramone converted this issue into discussion #6700 Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file #6694

[IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file #6694

eram576 commented Sep 25, 2024

jmaslek commented Sep 25, 2024

deeleeramone commented Sep 25, 2024 •

edited

Loading

eram576 commented Sep 26, 2024

eram576 commented Sep 26, 2024

Potential Solution

This issue was moved to a discussion.

This issue was moved to a discussion.

[IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file #6694

[IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file #6694

Comments

eram576 commented Sep 25, 2024

jmaslek commented Sep 25, 2024

deeleeramone commented Sep 25, 2024 • edited Loading

Potential Solution

eram576 commented Sep 26, 2024

eram576 commented Sep 26, 2024

Potential Solution

This issue was moved to a discussion.

deeleeramone commented Sep 25, 2024 •

edited

Loading