Update `_SparkBackend.fetch()` to return iterator instead of list #62

qziyuan · 2024-03-20T16:59:26Z

_SparkBackend.fetch() should return an iterator instead of list.

github-actions · 2024-03-20T17:02:49Z

✅ 18/18 passed, 1 skipped, 6m52s total

_{Running from acceptance #31}

codecov · 2024-03-20T17:27:31Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.95%. Comparing base (9b5450e) to head (5273264).

Additional details and impacted files

@@           Coverage Diff           @@
##             main      #62   +/-   ##
=======================================
  Coverage   82.95%   82.95%           
=======================================
  Files           7        7           
  Lines         534      534           
  Branches      105      105           
=======================================
  Hits          443      443           
  Misses         55       55           
  Partials       36       36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nfx · 2024-03-20T21:53:59Z

src/databricks/labs/lsql/backends.py

@@ -200,7 +200,7 @@ def execute(self, sql: str) -> None:
    def fetch(self, sql: str) -> Iterator[Row]:
        logger.debug(f"[spark][fetch] {self._only_n_bytes(sql, self._debug_truncate_bytes)}")
        try:
-            return self._spark.sql(sql).collect()
+            return iter(self._spark.sql(sql).collect())


Why do we need to do this, by the way? Shouldn't we just change the result type to Iterable?

It's just make it consistent with the current return type of fetch defined in SqlBackend, StatementExecutionBackend and MockBackend.
Instead, we can also change the result type to Iterable in all these places. But would it break the static code analysis in downstream code who is already assuming it returns Iterator, and is using method like next()?

* Fixed `Builder` object is not callable error ([#67](#67)). In this release, we have made an enhancement to the `Backends` class in the `databricks/labs/lsql/backends.py` file. The `DatabricksSession.builder()` method call in the `__init__` method has been changed to `DatabricksSession.builder`. This update uses the `builder` attribute to create a new instance of `DatabricksSession` without calling it like a function. The `sdk_config` method is then used to configure the instance with the required settings. Finally, the `getOrCreate` method is utilized to obtain a `SparkSession` object, which is then passed as a parameter to the parent class constructor. This modification simplifies the code and eliminates the error caused by treating the `builder` attribute as a callable object. Software engineers may benefit from this change by having a more streamlined and error-free codebase when working with the open-source library. * Prevent silencing of `pylint` ([#65](#65)). In this release, we have introduced a new job, "no-lint-disabled", to the GitHub Actions workflow for the repository. This job runs on the latest Ubuntu version and checks out the codebase with a full history. It verifies that no new instances of code suppressing `pylint` checks have been added, by filtering the differences between the current branch and the main branch for new lines of code, and then checking if any of those new lines contain a `pylint` disable comment. If any such lines are found, the job will fail and print a message indicating the offending lines of code, thereby ensuring that the codebase maintains a consistent level of quality by not allowing linting checks to be bypassed. * Updated `_SparkBackend.fetch()` to return iterator instead of list ([#62](#62)). In this release, the `fetch()` method of the `_SparkBackend` class has been updated to return an iterator instead of a list, which can result in reduced memory usage and improved performance, as the results of the SQL query can now be processed one element at a time. A new exception has been introduced to wrap any exceptions that occur during query execution, providing better debugging and error handling capabilities. The `test_runtime_backend_fetch()` unit test has been updated to reflect this change, and users of the `fetch()` method should be aware that it now returns an iterator and must be consumed to obtain the desired data. Thorough testing is recommended to ensure that the updated method still meets the needs of the application.

let _SparkBackend.fetch() to return iterator instead of list

81b3181

qziyuan temporarily deployed to runtime March 20, 2024 16:59 — with GitHub Actions Inactive

update unit test

5273264

qziyuan temporarily deployed to runtime March 20, 2024 17:24 — with GitHub Actions Inactive

nfx requested changes Mar 20, 2024

View reviewed changes

nfx merged commit 20f3509 into main Mar 21, 2024
9 checks passed

nfx deleted the fix/return_iter_spark_backend_fetch branch March 21, 2024 09:06

nfx mentioned this pull request Mar 25, 2024

Release v0.2.4 #69

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `_SparkBackend.fetch()` to return iterator instead of list #62

Update `_SparkBackend.fetch()` to return iterator instead of list #62

qziyuan commented Mar 20, 2024 •

edited

Loading

github-actions bot commented Mar 20, 2024 •

edited

Loading

codecov bot commented Mar 20, 2024

nfx Mar 20, 2024

qziyuan Mar 21, 2024 •

edited

Loading

Update _SparkBackend.fetch() to return iterator instead of list #62

Update _SparkBackend.fetch() to return iterator instead of list #62

Conversation

qziyuan commented Mar 20, 2024 • edited Loading

github-actions bot commented Mar 20, 2024 • edited Loading

codecov bot commented Mar 20, 2024

Codecov Report

nfx Mar 20, 2024

Choose a reason for hiding this comment

qziyuan Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

Update `_SparkBackend.fetch()` to return iterator instead of list #62

Update `_SparkBackend.fetch()` to return iterator instead of list #62

qziyuan commented Mar 20, 2024 •

edited

Loading

github-actions bot commented Mar 20, 2024 •

edited

Loading

qziyuan Mar 21, 2024 •

edited

Loading