Skip to content

Commit

Permalink
Merge pull request pandas-dev#6 from dimastbk/issue-50395
Browse files Browse the repository at this point in the history
bump python-calamine to 0.1.0
  • Loading branch information
kostyafarber authored Mar 29, 2023
2 parents a0d4193 + 0a431c5 commit 745cd09
Show file tree
Hide file tree
Showing 18 changed files with 54 additions and 57 deletions.
2 changes: 1 addition & 1 deletion ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,5 @@ dependencies:
- zstandard>=0.15.2

- pip:
- tzdata>=2022.1
- python-calamine
- tzdata>=2022.1
1 change: 1 addition & 0 deletions ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,4 +55,5 @@ dependencies:
- zstandard>=0.15.2

- pip:
- python-calamine>=0.1.0
- tzdata>=2022.1
2 changes: 1 addition & 1 deletion ci/deps/actions-38-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -59,5 +59,5 @@ dependencies:

- pip:
- pyqt5==5.15.1
- python-calamine==0.0.8
- python-calamine==0.1.0
- tzdata==2022.1
2 changes: 1 addition & 1 deletion ci/deps/actions-38.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,5 @@ dependencies:
- zstandard>=0.15.2

- pip:
- python-calamine
- tzdata>=2022.1
- python-calamine
2 changes: 1 addition & 1 deletion ci/deps/actions-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,5 +55,5 @@ dependencies:
- zstandard>=0.15.2

- pip:
- python-calamine
- tzdata>=2022.1
- python-calamine
6 changes: 3 additions & 3 deletions ci/deps/circle-38-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,6 @@ dependencies:
- xlrd>=2.0.1
- xlsxwriter>=1.4.3
- zstandard>=0.15.2
- pip:
- python-calamine

- pip:
- python-calamine
1 change: 1 addition & 0 deletions doc/source/getting_started/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -345,6 +345,7 @@ xlrd 2.0.1 excel Reading Excel
xlsxwriter 1.4.3 excel Writing Excel
openpyxl 3.0.7 excel Reading / writing for xlsx files
pyxlsb 1.0.8 excel Reading for xlsb files
python-calamine 0.1.0 excel Reading for xls/xlsx/xlsb/ods files
========================= ================== =============== =============================================================

HTML
Expand Down
3 changes: 2 additions & 1 deletion doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3420,7 +3420,8 @@ Excel files
The :func:`~pandas.read_excel` method can read Excel 2007+ (``.xlsx``) files
using the ``openpyxl`` Python module. Excel 2003 (``.xls``) files
can be read using ``xlrd``. Binary Excel (``.xlsb``)
files can be read using ``pyxlsb``.
files can be read using ``pyxlsb``. Also, all this formats can be read using ``python-calamine``,
but this library has sime limitation, for example, can't detect date in most formats.
The :meth:`~DataFrame.to_excel` instance method is used for
saving a ``DataFrame`` to Excel. Generally the semantics are
similar to working with :ref:`csv<io.read_csv_table>` data.
Expand Down
1 change: 0 additions & 1 deletion doc/source/whatsnew/v2.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -275,7 +275,6 @@ Other enhancements
- Improved error message in :func:`to_datetime` for non-ISO8601 formats, informing users about the position of the first error (:issue:`50361`)
- Improved error message when trying to align :class:`DataFrame` objects (for example, in :func:`DataFrame.compare`) to clarify that "identically labelled" refers to both index and columns (:issue:`50083`)
- Performance improvement in :func:`to_datetime` when format is given or can be inferred (:issue:`50465`)
- Added ``calamine`` as an engine to ``read_excel`` (:issue: ``50395``)
- Added support for :meth:`Index.min` and :meth:`Index.max` for pyarrow string dtypes (:issue:`51397`)
- Added :meth:`DatetimeIndex.as_unit` and :meth:`TimedeltaIndex.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`50616`)
- Added :meth:`Series.dt.unit` and :meth:`Series.dt.as_unit` to convert to different resolutions; supported resolutions are "s", "ms", "us", and "ns" (:issue:`51223`)
Expand Down
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Other enhancements
- Improve error message when having incompatible columns using :meth:`DataFrame.merge` (:issue:`51861`)
- Improved error message when creating a DataFrame with empty data (0 rows), no index and an incorrect number of columns. (:issue:`52084`)
- :meth:`arrays.SparseArray.map` now supports ``na_action`` (:issue:`52096`).
- Added ``calamine`` as an engine to ``read_excel`` (:issue:`50395`)

.. ---------------------------------------------------------------------------
.. _whatsnew_210.notable_bug_fixes:
Expand Down
2 changes: 1 addition & 1 deletion pandas/compat/_optional.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"pyarrow": "7.0.0",
"pyreadstat": "1.1.2",
"pytest": "7.0.0",
"python-calamine": "0.0.8",
"python-calamine": "0.1.0",
"pyxlsb": "1.0.8",
"s3fs": "2021.08.0",
"scipy": "1.7.1",
Expand Down
10 changes: 5 additions & 5 deletions pandas/core/config_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -503,11 +503,11 @@ def use_inf_as_na_cb(key) -> None:
auto, {others}.
"""

_xls_options = ["xlrd"]
_xlsm_options = ["xlrd", "openpyxl"]
_xlsx_options = ["xlrd", "openpyxl"]
_ods_options = ["odf"]
_xlsb_options = ["pyxlsb"]
_xls_options = ["xlrd", "calamine"]
_xlsm_options = ["xlrd", "openpyxl", "calamine"]
_xlsx_options = ["xlrd", "openpyxl", "calamine"]
_ods_options = ["odf", "calamine"]
_xlsb_options = ["pyxlsb", "calamine"]


with cf.config_prefix("io.excel.xls"):
Expand Down
14 changes: 9 additions & 5 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,13 +149,15 @@
of dtype conversion.
engine : str, default None
If io is not a buffer or path, this must be set to identify io.
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb".
Supported engines: "xlrd", "openpyxl", "odf", "pyxlsb", "calamine".
Engine compatibility :
- "xlrd" supports old-style Excel files (.xls).
- "openpyxl" supports newer Excel file formats.
- "odf" supports OpenDocument file formats (.odf, .ods, .odt).
- "pyxlsb" supports Binary Excel files.
- "calamine" supports Excel (.xls, .xlsx, .xlsm, .xlsb)
and OpenDocument (.ods) file formats.
.. versionchanged:: 1.2.0
The engine `xlrd <https://xlrd.readthedocs.io/en/latest/>`_
Expand Down Expand Up @@ -375,7 +377,7 @@ def read_excel(
| Callable[[str], bool]
| None = ...,
dtype: DtypeArg | None = ...,
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb"] | None = ...,
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb", "calamine"] | None = ...,
converters: dict[str, Callable] | dict[int, Callable] | None = ...,
true_values: Iterable[Hashable] | None = ...,
false_values: Iterable[Hashable] | None = ...,
Expand Down Expand Up @@ -414,7 +416,7 @@ def read_excel(
| Callable[[str], bool]
| None = ...,
dtype: DtypeArg | None = ...,
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb"] | None = ...,
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb", "calamine"] | None = ...,
converters: dict[str, Callable] | dict[int, Callable] | None = ...,
true_values: Iterable[Hashable] | None = ...,
false_values: Iterable[Hashable] | None = ...,
Expand Down Expand Up @@ -453,7 +455,7 @@ def read_excel(
| Callable[[str], bool]
| None = None,
dtype: DtypeArg | None = None,
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb"] | None = None,
engine: Literal["xlrd", "openpyxl", "odf", "pyxlsb", "calamine"] | None = None,
converters: dict[str, Callable] | dict[int, Callable] | None = None,
true_values: Iterable[Hashable] | None = None,
false_values: Iterable[Hashable] | None = None,
Expand Down Expand Up @@ -1418,13 +1420,15 @@ class ExcelFile:
.xls, .xlsx, .xlsb, .xlsm, .odf, .ods, or .odt file.
engine : str, default None
If io is not a buffer or path, this must be set to identify io.
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``
Supported engines: ``xlrd``, ``openpyxl``, ``odf``, ``pyxlsb``, ``calamine``
Engine compatibility :
- ``xlrd`` supports old-style Excel files (.xls).
- ``openpyxl`` supports newer Excel file formats.
- ``odf`` supports OpenDocument file formats (.odf, .ods, .odt).
- ``pyxlsb`` supports Binary Excel files.
- ``calamine`` supports Excel (.xls, .xlsx, .xlsm, .xlsb)
and OpenDocument (.ods) file formats.
.. versionchanged:: 1.2.0
Expand Down
51 changes: 17 additions & 34 deletions pandas/io/excel/_calaminereader.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,36 +5,31 @@
datetime,
time,
)
from tempfile import NamedTemporaryFile
from typing import (
TYPE_CHECKING,
Union,
cast,
)

from pandas._typing import (
FilePath,
ReadBuffer,
Scalar,
StorageOptions,
)
from pandas.compat._optional import import_optional_dependency
from pandas.util._decorators import doc

import pandas as pd
from pandas.core.shared_docs import _shared_docs

from pandas.io.common import stringify_path
from pandas.io.excel._base import (
BaseExcelReader,
inspect_excel_format,
)
from pandas.io.excel._base import BaseExcelReader

ValueT = Union[int, float, str, bool, time, date, datetime]
if TYPE_CHECKING:
from pandas._typing import (
FilePath,
ReadBuffer,
Scalar,
StorageOptions,
)

_CellValueT = Union[int, float, str, bool, time, date, datetime]

class CalamineExcelReader(BaseExcelReader):
_sheet_names: list[str] | None = None

class CalamineExcelReader(BaseExcelReader):
@doc(storage_options=_shared_docs["storage_options"])
def __init__(
self,
Expand All @@ -55,26 +50,14 @@ def __init__(

@property
def _workbook_class(self):
from python_calamine import CalamineReader
from python_calamine import CalamineWorkbook

return CalamineReader
return CalamineWorkbook

def load_workbook(self, filepath_or_buffer: FilePath | ReadBuffer[bytes]):
if hasattr(filepath_or_buffer, "read") and hasattr(filepath_or_buffer, "seek"):
filepath_or_buffer = cast(ReadBuffer, filepath_or_buffer)
ext = inspect_excel_format(filepath_or_buffer)
with NamedTemporaryFile(suffix=f".{ext}", delete=False) as tmp_file:
filepath_or_buffer.seek(0)
tmp_file.write(filepath_or_buffer.read())
filepath_or_buffer = tmp_file.name
else:
filepath_or_buffer = stringify_path(filepath_or_buffer)

assert isinstance(filepath_or_buffer, str)

from python_calamine import CalamineReader
from python_calamine import load_workbook

return CalamineReader.from_path(filepath_or_buffer)
return load_workbook(filepath_or_buffer) # type: ignore[arg-type]

@property
def sheet_names(self) -> list[str]:
Expand All @@ -91,7 +74,7 @@ def get_sheet_by_index(self, index: int):
def get_sheet_data(
self, sheet, file_rows_needed: int | None = None
) -> list[list[Scalar]]:
def _convert_cell(value: ValueT) -> Scalar:
def _convert_cell(value: _CellValueT) -> Scalar:
if isinstance(value, float):
val = int(value)
if val == value:
Expand All @@ -105,7 +88,7 @@ def _convert_cell(value: ValueT) -> Scalar:

return value

rows: list[list[ValueT]] = sheet.to_python(skip_empty_area=False)
rows: list[list[_CellValueT]] = sheet.to_python(skip_empty_area=False)
data: list[list[Scalar]] = []

for row in rows:
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ computation = ['scipy>=1.7.1', 'xarray>=0.21.0']
fss = ['fsspec>=2021.07.0']
aws = ['s3fs>=2021.08.0']
gcp = ['gcsfs>=2021.07.0', 'pandas-gbq>=0.15.0']
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'python-calamine>=0.1.0', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
parquet = ['pyarrow>=7.0.0']
feather = ['pyarrow>=7.0.0']
hdf5 = [# blosc only available on conda (https://github.com/Blosc/python-blosc/issues/297)
Expand Down Expand Up @@ -104,7 +104,7 @@ all = ['beautifulsoup4>=4.9.3',
'pytest>=7.0.0',
'pytest-xdist>=2.2.0',
'pytest-asyncio>=0.17.0',
'python-calamine>=0.0.8',
'python-calamine>=0.1.0',
'python-snappy>=0.6.0',
'pyxlsb>=1.0.8',
'qtpy>=2.2.0',
Expand Down
3 changes: 3 additions & 0 deletions scripts/tests/data/deps_expected_random.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,6 @@ dependencies:
- xlrd>=2.0.1
- xlsxwriter>=1.4.3
- zstandard>=0.15.2

- pip:
- python-calamine>=0.1.0
3 changes: 2 additions & 1 deletion scripts/tests/data/deps_minimum.toml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ computation = ['scipy>=1.7.1', 'xarray>=0.21.0']
fss = ['fsspec>=2021.07.0']
aws = ['s3fs>=2021.08.0']
gcp = ['gcsfs>=2021.07.0', 'pandas-gbq>=0.15.0']
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
excel = ['odfpy>=1.4.1', 'openpyxl>=3.0.7', 'python-calamine>=0.1.0', 'pyxlsb>=1.0.8', 'xlrd>=2.0.1', 'xlsxwriter>=1.4.3']
parquet = ['pyarrow>=7.0.0']
feather = ['pyarrow>=7.0.0']
hdf5 = [# blosc only available on conda (https://github.com/Blosc/python-blosc/issues/297)
Expand Down Expand Up @@ -104,6 +104,7 @@ all = ['beautifulsoup4>=5.9.3',
'pytest>=7.0.0',
'pytest-xdist>=2.2.0',
'pytest-asyncio>=0.17.0',
'python-calamine>=0.1.0',
'python-snappy>=0.6.0',
'pyxlsb>=1.0.8',
'qtpy>=2.2.0',
Expand Down
3 changes: 3 additions & 0 deletions scripts/tests/data/deps_unmodified_random.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,3 +55,6 @@ dependencies:
- xlrd>=2.0.1
- xlsxwriter>=1.4.3
- zstandard>=0.15.2

- pip:
- python-calamine>=0.1.0

0 comments on commit 745cd09

Please sign in to comment.