Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file #6694

Closed
eram576 opened this issue Sep 25, 2024 · 4 comments

Comments

@eram576
Copy link

eram576 commented Sep 25, 2024

When installing the multpl openbb extension and building the Python interface as recommended by the docs, I encounter an error when executing the build command as the temporary cache file cannot be renamed and subsequently I encounter an error pulling data from the multpl provider.

I only encounter this issue when using a Spark cluster (specifically in Databricks) but when I run the same code on my local Desktop there are no issues, the build succeeds and I can pull data as usual from the multpl provider. I believe the failure when using a Spark cluster is due to multiple processes running simultaneously using multiple ruff invocations as described in this issue.

I am wondering if renaming the cache file is critical to the building step and can be ignored if it fails so the build can still succeed and I can continue using obb to pull data from the given provider.

To Reproduce

%pip install openbb openbb-multpl 

# To restart Python using Databricks utils cmd
dbutils.library.restartPython()

import openbb
openbb.build()

# To restart Python using Databricks utils cmd
dbutils.library.restartPython()

from openbb import obb

obb.index.sp500_multiples(series_name='pe_month', provider='multpl')

Screenshots
image
image

Desktop (see more details here):

  • Apache Spark 3.5.0, Scala 2.12
  • Operating System: Ubuntu 22.04.3 LTS
  • Java: Zulu 8.74.0.17-CA-linux64
  • Python: 3.10.12
  • R: 4.3.1
  • Delta Lake: 3.1.0
@eram576 eram576 changed the title [IMPROVE] Can't complete openbb.build() when using Spark due to failure to rename temporary cache file [IMPROVE] Can't complete openbb.build() when using Spark due to failure in renaming temporary cache file Sep 25, 2024
@jmaslek
Copy link
Collaborator

jmaslek commented Sep 25, 2024

If you try openbb.build(lint=False) - does that work?

@deeleeramone
Copy link
Contributor

deeleeramone commented Sep 25, 2024

I think this is likely a limitation related to the specific host service. Multiprocessing can turn into a nightmare pretty fast. It might help if you can disable that in the host service. Otherwise, dynamic command execution should get around it.

Ruff has apparently resolved the referenced issue, but this does not appear to be the same error. "No such file or directory" suggests it is a file system error.

Even if Ruff fails, the static assets will still have been generated and built.

The error message - "This portal is not running" - implies that async event loops are not being handled somewhere in the pipeline. This can be caused from $PATH issues that introduce system packages to the environment instead of isolating the environment completely and only calling packages from within it. This is often a problem with an incorrectly configured Anaconda Navigator installation.

See this for a comparable to Databricks. You may need to add --force-reinstall and --no-cache-dir when invoking pip install to specifically override any pre-existing wheels.

Potential Solution

In situations like this, it will be better to run functions using CommandRunner().run because it does not require the OpenBB app factory to run commands. https://docs.openbb.co/platform/user_guides/dynamic_command_execution

For Streamlit Cloud apps, you have to do it this way because they do not provide access to the file system site-packages where the static assets are stored.

This is async and you need to manage this carefully throughout the entire pipeline.

You might want to add this to the code after the import blocks:

import nest_asyncio

nest_asyncio.apply()

@eram576
Copy link
Author

eram576 commented Sep 26, 2024

If you try openbb.build(lint=False) - does that work?

Unfortunately this doesn't work, it results in the same error and obb.index.sp500_multiples(series_name='pe_month', provider='multpl') doesnt work after that too.

@eram576
Copy link
Author

eram576 commented Sep 26, 2024

I think this is likely a limitation related to the specific host service. Multiprocessing can turn into a nightmare pretty fast. It might help if you can disable that in the host service. Otherwise, dynamic command execution should get around it.

Ruff has apparently resolved the referenced issue, but this does not appear to be the same error. "No such file or directory" suggests it is a file system error.

Even if Ruff fails, the static assets will still have been generated and built.

The error message - "This portal is not running" - implies that async event loops are not being handled somewhere in the pipeline. This can be caused from $PATH issues that introduce system packages to the environment instead of isolating the environment completely and only calling packages from within it. This is often a problem with an incorrectly configured Anaconda Navigator installation.

See this for a comparable to Databricks. You may need to add --force-reinstall and --no-cache-dir when invoking pip install to specifically override any pre-existing wheels.

Potential Solution

In situations like this, it will be better to run functions using CommandRunner().run because it does not require the OpenBB app factory to run commands. https://docs.openbb.co/platform/user_guides/dynamic_command_execution

For Streamlit Cloud apps, you have to do it this way because they do not provide access to the file system site-packages where the static assets are stored.

This is async and you need to manage this carefully throughout the entire pipeline.

You might want to add this to the code after the import blocks:

import nest_asyncio

nest_asyncio.apply()

This works thanks a lot! Am able to use CommandRunner().run and async functions to pull data from the provider.

@OpenBB-finance OpenBB-finance locked and limited conversation to collaborators Sep 26, 2024
@deeleeramone deeleeramone converted this issue into discussion #6700 Sep 26, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants