Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timezone info is not present in the result when querying with columns_tzs or query_tz #210

Closed
pkit opened this issue Jun 27, 2023 · 8 comments · Fixed by #218
Closed

Timezone info is not present in the result when querying with columns_tzs or query_tz #210

pkit opened this issue Jun 27, 2023 · 8 comments · Fixed by #218
Labels
bug Something isn't working

Comments

@pkit
Copy link

pkit commented Jun 27, 2023

Simple repro:

import clickhouse_connect

c = clickhouse_connect.create_client()
c.command("CREATE TABLE t1 (id UInt64, ts DateTime64(3, 'UTC') DEFAULT now64(3)) ENGINE = MergeTree() ORDER BY id")
c.insert("t1", data=[(1,)], column_names=["id"])
rows = c.query("SELECT id, ts FROM t1 ORDER BY ts", column_tzs={"ts": "UTC"}).result_set
print(rows)
rows = c.query("SELECT id, ts FROM t1 ORDER BY ts", query_tz="UTC").result_set
print(rows)

Produces "naive" datetime objects without any TZ info:

[(1, datetime.datetime(2023, 6, 27, 19, 47, 53, 536000))]
[(1, datetime.datetime(2023, 6, 27, 19, 47, 53, 536000))]

Version of clickhouse-connect: 0.6.4

@pkit pkit added the bug Something isn't working label Jun 27, 2023
@genzgd
Copy link
Collaborator

genzgd commented Jun 27, 2023

This is expected behavior. As stated in the changelog for version 0.5.17:

Note if the detected timezone according to the above precedence is UTC, clickhouse-connect will always return a naive datetime object with no timezone information

Applying a timezone is quite expensive in Python and if the user really needs a UTC timezone applied, they should do in their own application code.

@genzgd genzgd closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2023
@pkit
Copy link
Author

pkit commented Jun 27, 2023

This one is not according to the python standard, see: https://docs.python.org/3/library/datetime.html#datetime.datetime.astimezone

If self is naive, it is presumed to represent time in the system timezone.

@genzgd
Copy link
Collaborator

genzgd commented Jun 27, 2023

In this case I think it's reasonable to prefer performance over a convention/presumption.

The most common use case for ClickHouse clients is in container applications using UTC timestamps. Converting ClickHouse integer epoch timestamps (which is how all times are stored in ClickHouse) to Python naive datetime objects is already annoyingly expensive (numpy datetime or Pandas objects are much faster and cleaner). Adding even more processing cost by making them UTC timezone aware when that conversion gains nothing in the majority of applications is something that I think the application developer should do consciously.

@pkit
Copy link
Author

pkit commented Jun 27, 2023

Without convention it cannot be passed to any other library, as it will implicitly assume that internally it's a system time, and it can be pretty deep in other people code.
I.e. safest bet would be just to convert it to tz-aware anyway...

conversion gains nothing in the majority of applications

Unfortunately it gains quite a lot, as naive datetime objects are pretty bad and lead to subtle and hard to find bugs, see here

I've just expected that if user was explicit enough to set query_tz or columns_tzs they do want to have tz-aware objects. And silently ignoring it is strange.

@genzgd
Copy link
Collaborator

genzgd commented Jun 27, 2023

Fair point about query_tz or column_tzs if they differ from the system timezone. I'll think about how that might be implemented.

@genzgd genzgd reopened this Jun 27, 2023
@genzgd genzgd linked a pull request Jul 6, 2023 that will close this issue
2 tasks
@JakkuSakura
Copy link

JakkuSakura commented Nov 20, 2024

FYI, I made this monkey patch to attach timezone even for UTC (and add whenever support)

from datetime import tzinfo
from typing import List, Any, Union, Sequence, MutableSequence, Optional

import clickhouse_connect
import pandas as pd
import whenever
from clickhouse_connect.datatypes.temporal import DateTime64
from clickhouse_connect.driver import tzutil
from clickhouse_connect.driver.client import Client
from clickhouse_connect.driver.common import first_value, write_array
from clickhouse_connect.driver.ctypes import numpy_conv
from clickhouse_connect.driver.insert import InsertContext
from clickhouse_connect.driver.query import QueryResult, QueryContext
from clickhouse_connect.driver.summary import QuerySummary
from clickhouse_connect.driver.types import ByteSource


def active_tz(self, datatype_tz: Optional[tzinfo]):
    if self.column_tz:
        active_tz = self.column_tz
    elif datatype_tz:
        active_tz = datatype_tz
    elif self.query_tz:
        active_tz = self.query_tz
    elif self.response_tz:
        active_tz = self.response_tz
    elif self.apply_server_tz:
        active_tz = self.server_tz
    else:
        active_tz = tzutil.local_tz
    # if active_tz == pytz.UTC:
    #     return None
    return active_tz


QueryContext.active_tz = active_tz

# optional
def _read_binary_naive(self, column: Sequence):
    new_col = []
    app = new_col.append
    dt_from = whenever.Instant.from_timestamp_nanos
    for ticks in column:
        app(dt_from(ticks))
    return new_col
# optional
def _read_binary_tz(self, column: Sequence, tz_info: tzinfo):
    new_col = []
    app = new_col.append
    dt_from = whenever.ZonedDateTime.from_timestamp_nanos
    for ticks in column:
        app(dt_from(ticks, tz=str(tz_info)))
    return new_col


def _read_column_binary(self, source: ByteSource, num_rows: int, ctx: QueryContext):
    if self.read_format(ctx) == 'int':
        return source.read_array('q', num_rows)
    active_tz = ctx.active_tz(self.tzinfo)
    if ctx.use_numpy:
        np_array = numpy_conv.read_numpy_array(source, self.np_type, num_rows)
        if ctx.as_pandas and active_tz:
            return pd.DatetimeIndex(np_array, tz='UTC').tz_convert(active_tz)
        return np_array
    column = source.read_array('q', num_rows)
    # if active_tz and active_tz != pytz.UTC:
    if active_tz:
        return self._read_binary_tz(column, active_tz)
    return self._read_binary_naive(column)


def _write_column_binary(self, column: Union[Sequence, MutableSequence], dest: bytearray, ctx: InsertContext):
    first = first_value(column, self.nullable)
    if isinstance(first, int) or self.write_format(ctx) == 'int':
        if self.nullable:
            column = [x if x else 0 for x in column]
    elif isinstance(first, (whenever.Instant, whenever.ZonedDateTime)):
        if self.nullable:
            column = [x.timestamp_nanos() if x else 0 for x in column]
        else:
            column = [x.timestamp_nanos() for x in column]

    else:
        prec = self.prec
        if self.nullable:
            column = [((int(x.timestamp()) * 1000000 + x.microsecond) * prec) // 1000000 if x else 0
                      for x in column]
        else:
            column = [((int(x.timestamp()) * 1000000 + x.microsecond) * prec) // 1000000 for x in column]
    write_array('q', column, dest, ctx.column_name)

# optional
DateTime64._read_binary_naive = _read_binary_naive
DateTime64._read_binary_tz = _read_binary_tz

DateTime64._read_column_binary = _read_column_binary
DateTime64._write_column_binary = _write_column_binary

@genzgd
Copy link
Collaborator

genzgd commented Nov 20, 2024

@JakkuSakura Are you only complaining about UTC That behavior is by design since it is more performant and can simplify downstream applications.

@JakkuSakura
Copy link

JakkuSakura commented Nov 20, 2024

@genzgd I use polars as the downstream library. however, without the correct UTC timezone, I have to do the timezone conversion for every datetime column, otherwise I always get wrong time in next step

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants