Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update _SparkBackend.fetch() to return iterator instead of list #62

Merged
merged 2 commits into from
Mar 21, 2024

Conversation

qziyuan
Copy link
Contributor

@qziyuan qziyuan commented Mar 20, 2024

_SparkBackend.fetch() should return an iterator instead of list.

Copy link

github-actions bot commented Mar 20, 2024

✅ 18/18 passed, 1 skipped, 6m52s total

Running from acceptance #31

Copy link

codecov bot commented Mar 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.95%. Comparing base (9b5450e) to head (5273264).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #62   +/-   ##
=======================================
  Coverage   82.95%   82.95%           
=======================================
  Files           7        7           
  Lines         534      534           
  Branches      105      105           
=======================================
  Hits          443      443           
  Misses         55       55           
  Partials       36       36           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -200,7 +200,7 @@ def execute(self, sql: str) -> None:
def fetch(self, sql: str) -> Iterator[Row]:
logger.debug(f"[spark][fetch] {self._only_n_bytes(sql, self._debug_truncate_bytes)}")
try:
return self._spark.sql(sql).collect()
return iter(self._spark.sql(sql).collect())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to do this, by the way? Shouldn't we just change the result type to Iterable?

Copy link
Contributor Author

@qziyuan qziyuan Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just make it consistent with the current return type of fetch defined in SqlBackend, StatementExecutionBackend and MockBackend.
Instead, we can also change the result type to Iterable in all these places. But would it break the static code analysis in downstream code who is already assuming it returns Iterator, and is using method like next()?

@nfx nfx merged commit 20f3509 into main Mar 21, 2024
9 checks passed
@nfx nfx deleted the fix/return_iter_spark_backend_fetch branch March 21, 2024 09:06
nfx added a commit that referenced this pull request Mar 25, 2024
* Fixed `Builder` object is not callable error ([#67](#67)). In this release, we have made an enhancement to the `Backends` class in the `databricks/labs/lsql/backends.py` file. The `DatabricksSession.builder()` method call in the `__init__` method has been changed to `DatabricksSession.builder`. This update uses the `builder` attribute to create a new instance of `DatabricksSession` without calling it like a function. The `sdk_config` method is then used to configure the instance with the required settings. Finally, the `getOrCreate` method is utilized to obtain a `SparkSession` object, which is then passed as a parameter to the parent class constructor. This modification simplifies the code and eliminates the error caused by treating the `builder` attribute as a callable object. Software engineers may benefit from this change by having a more streamlined and error-free codebase when working with the open-source library.
* Prevent silencing of `pylint` ([#65](#65)). In this release, we have introduced a new job, "no-lint-disabled", to the GitHub Actions workflow for the repository. This job runs on the latest Ubuntu version and checks out the codebase with a full history. It verifies that no new instances of code suppressing `pylint` checks have been added, by filtering the differences between the current branch and the main branch for new lines of code, and then checking if any of those new lines contain a `pylint` disable comment. If any such lines are found, the job will fail and print a message indicating the offending lines of code, thereby ensuring that the codebase maintains a consistent level of quality by not allowing linting checks to be bypassed.
* Updated `_SparkBackend.fetch()` to return iterator instead of list ([#62](#62)). In this release, the `fetch()` method of the `_SparkBackend` class has been updated to return an iterator instead of a list, which can result in reduced memory usage and improved performance, as the results of the SQL query can now be processed one element at a time. A new exception has been introduced to wrap any exceptions that occur during query execution, providing better debugging and error handling capabilities. The `test_runtime_backend_fetch()` unit test has been updated to reflect this change, and users of the `fetch()` method should be aware that it now returns an iterator and must be consumed to obtain the desired data. Thorough testing is recommended to ensure that the updated method still meets the needs of the application.
@nfx nfx mentioned this pull request Mar 25, 2024
nfx added a commit that referenced this pull request Mar 25, 2024
* Fixed `Builder` object is not callable error
([#67](#67)). In this
release, we have made an enhancement to the `Backends` class in the
`databricks/labs/lsql/backends.py` file. The
`DatabricksSession.builder()` method call in the `__init__` method has
been changed to `DatabricksSession.builder`. This update uses the
`builder` attribute to create a new instance of `DatabricksSession`
without calling it like a function. The `sdk_config` method is then used
to configure the instance with the required settings. Finally, the
`getOrCreate` method is utilized to obtain a `SparkSession` object,
which is then passed as a parameter to the parent class constructor.
This modification simplifies the code and eliminates the error caused by
treating the `builder` attribute as a callable object. Software
engineers may benefit from this change by having a more streamlined and
error-free codebase when working with the open-source library.
* Prevent silencing of `pylint`
([#65](#65)). In this
release, we have introduced a new job, "no-lint-disabled", to the GitHub
Actions workflow for the repository. This job runs on the latest Ubuntu
version and checks out the codebase with a full history. It verifies
that no new instances of code suppressing `pylint` checks have been
added, by filtering the differences between the current branch and the
main branch for new lines of code, and then checking if any of those new
lines contain a `pylint` disable comment. If any such lines are found,
the job will fail and print a message indicating the offending lines of
code, thereby ensuring that the codebase maintains a consistent level of
quality by not allowing linting checks to be bypassed.
* Updated `_SparkBackend.fetch()` to return iterator instead of list
([#62](#62)). In this
release, the `fetch()` method of the `_SparkBackend` class has been
updated to return an iterator instead of a list, which can result in
reduced memory usage and improved performance, as the results of the SQL
query can now be processed one element at a time. A new exception has
been introduced to wrap any exceptions that occur during query
execution, providing better debugging and error handling capabilities.
The `test_runtime_backend_fetch()` unit test has been updated to reflect
this change, and users of the `fetch()` method should be aware that it
now returns an iterator and must be consumed to obtain the desired data.
Thorough testing is recommended to ensure that the updated method still
meets the needs of the application.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants