Timezone info is not present in the result when querying with `columns_tzs` or `query_tz` #210

pkit · 2023-06-27T19:52:37Z

Simple repro:

import clickhouse_connect

c = clickhouse_connect.create_client()
c.command("CREATE TABLE t1 (id UInt64, ts DateTime64(3, 'UTC') DEFAULT now64(3)) ENGINE = MergeTree() ORDER BY id")
c.insert("t1", data=[(1,)], column_names=["id"])
rows = c.query("SELECT id, ts FROM t1 ORDER BY ts", column_tzs={"ts": "UTC"}).result_set
print(rows)
rows = c.query("SELECT id, ts FROM t1 ORDER BY ts", query_tz="UTC").result_set
print(rows)

Produces "naive" datetime objects without any TZ info:

[(1, datetime.datetime(2023, 6, 27, 19, 47, 53, 536000))]
[(1, datetime.datetime(2023, 6, 27, 19, 47, 53, 536000))]

Version of clickhouse-connect: 0.6.4

The text was updated successfully, but these errors were encountered:

genzgd · 2023-06-27T19:56:48Z

This is expected behavior. As stated in the changelog for version 0.5.17:

Note if the detected timezone according to the above precedence is UTC, clickhouse-connect will always return a naive datetime object with no timezone information

Applying a timezone is quite expensive in Python and if the user really needs a UTC timezone applied, they should do in their own application code.

pkit · 2023-06-27T20:02:42Z

This one is not according to the python standard, see: https://docs.python.org/3/library/datetime.html#datetime.datetime.astimezone

If self is naive, it is presumed to represent time in the system timezone.

genzgd · 2023-06-27T21:25:43Z

In this case I think it's reasonable to prefer performance over a convention/presumption.

The most common use case for ClickHouse clients is in container applications using UTC timestamps. Converting ClickHouse integer epoch timestamps (which is how all times are stored in ClickHouse) to Python naive datetime objects is already annoyingly expensive (numpy datetime or Pandas objects are much faster and cleaner). Adding even more processing cost by making them UTC timezone aware when that conversion gains nothing in the majority of applications is something that I think the application developer should do consciously.

pkit · 2023-06-27T21:39:07Z

Without convention it cannot be passed to any other library, as it will implicitly assume that internally it's a system time, and it can be pretty deep in other people code.
I.e. safest bet would be just to convert it to tz-aware anyway...

conversion gains nothing in the majority of applications

Unfortunately it gains quite a lot, as naive datetime objects are pretty bad and lead to subtle and hard to find bugs, see here

I've just expected that if user was explicit enough to set query_tz or columns_tzs they do want to have tz-aware objects. And silently ignoring it is strange.

genzgd · 2023-06-27T21:41:23Z

Fair point about query_tz or column_tzs if they differ from the system timezone. I'll think about how that might be implemented.

JakkuSakura · 2024-11-20T13:47:34Z

FYI, I made this monkey patch to attach timezone even for UTC (and add whenever support)

from datetime import tzinfo
from typing import List, Any, Union, Sequence, MutableSequence, Optional

import clickhouse_connect
import pandas as pd
import whenever
from clickhouse_connect.datatypes.temporal import DateTime64
from clickhouse_connect.driver import tzutil
from clickhouse_connect.driver.client import Client
from clickhouse_connect.driver.common import first_value, write_array
from clickhouse_connect.driver.ctypes import numpy_conv
from clickhouse_connect.driver.insert import InsertContext
from clickhouse_connect.driver.query import QueryResult, QueryContext
from clickhouse_connect.driver.summary import QuerySummary
from clickhouse_connect.driver.types import ByteSource


def active_tz(self, datatype_tz: Optional[tzinfo]):
    if self.column_tz:
        active_tz = self.column_tz
    elif datatype_tz:
        active_tz = datatype_tz
    elif self.query_tz:
        active_tz = self.query_tz
    elif self.response_tz:
        active_tz = self.response_tz
    elif self.apply_server_tz:
        active_tz = self.server_tz
    else:
        active_tz = tzutil.local_tz
    # if active_tz == pytz.UTC:
    #     return None
    return active_tz


QueryContext.active_tz = active_tz

# optional
def _read_binary_naive(self, column: Sequence):
    new_col = []
    app = new_col.append
    dt_from = whenever.Instant.from_timestamp_nanos
    for ticks in column:
        app(dt_from(ticks))
    return new_col
# optional
def _read_binary_tz(self, column: Sequence, tz_info: tzinfo):
    new_col = []
    app = new_col.append
    dt_from = whenever.ZonedDateTime.from_timestamp_nanos
    for ticks in column:
        app(dt_from(ticks, tz=str(tz_info)))
    return new_col


def _read_column_binary(self, source: ByteSource, num_rows: int, ctx: QueryContext):
    if self.read_format(ctx) == 'int':
        return source.read_array('q', num_rows)
    active_tz = ctx.active_tz(self.tzinfo)
    if ctx.use_numpy:
        np_array = numpy_conv.read_numpy_array(source, self.np_type, num_rows)
        if ctx.as_pandas and active_tz:
            return pd.DatetimeIndex(np_array, tz='UTC').tz_convert(active_tz)
        return np_array
    column = source.read_array('q', num_rows)
    # if active_tz and active_tz != pytz.UTC:
    if active_tz:
        return self._read_binary_tz(column, active_tz)
    return self._read_binary_naive(column)


def _write_column_binary(self, column: Union[Sequence, MutableSequence], dest: bytearray, ctx: InsertContext):
    first = first_value(column, self.nullable)
    if isinstance(first, int) or self.write_format(ctx) == 'int':
        if self.nullable:
            column = [x if x else 0 for x in column]
    elif isinstance(first, (whenever.Instant, whenever.ZonedDateTime)):
        if self.nullable:
            column = [x.timestamp_nanos() if x else 0 for x in column]
        else:
            column = [x.timestamp_nanos() for x in column]

    else:
        prec = self.prec
        if self.nullable:
            column = [((int(x.timestamp()) * 1000000 + x.microsecond) * prec) // 1000000 if x else 0
                      for x in column]
        else:
            column = [((int(x.timestamp()) * 1000000 + x.microsecond) * prec) // 1000000 for x in column]
    write_array('q', column, dest, ctx.column_name)

# optional
DateTime64._read_binary_naive = _read_binary_naive
DateTime64._read_binary_tz = _read_binary_tz

DateTime64._read_column_binary = _read_column_binary
DateTime64._write_column_binary = _write_column_binary

genzgd · 2024-11-20T14:14:45Z

@JakkuSakura Are you only complaining about UTC That behavior is by design since it is more performant and can simplify downstream applications.

JakkuSakura · 2024-11-20T14:17:10Z

@genzgd I use polars as the downstream library. however, without the correct UTC timezone, I have to do the timezone conversion for every datetime column, otherwise I always get wrong time in next step

pkit added the bug Something isn't working label Jun 27, 2023

genzgd closed this as not planned Won't fix, can't repro, duplicate, stale Jun 27, 2023

genzgd reopened this Jun 27, 2023

genzgd linked a pull request Jul 6, 2023 that will close this issue

0 6 5 release #218

Merged

2 tasks

genzgd closed this as completed in #218 Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timezone info is not present in the result when querying with `columns_tzs` or `query_tz` #210

Timezone info is not present in the result when querying with `columns_tzs` or `query_tz` #210

pkit commented Jun 27, 2023 •

edited

Loading

genzgd commented Jun 27, 2023 •

edited

Loading

pkit commented Jun 27, 2023

genzgd commented Jun 27, 2023

pkit commented Jun 27, 2023

genzgd commented Jun 27, 2023

JakkuSakura commented Nov 20, 2024 •

edited

Loading

genzgd commented Nov 20, 2024

JakkuSakura commented Nov 20, 2024 •

edited

Loading

Timezone info is not present in the result when querying with columns_tzs or query_tz #210

Timezone info is not present in the result when querying with columns_tzs or query_tz #210

Comments

pkit commented Jun 27, 2023 • edited Loading

genzgd commented Jun 27, 2023 • edited Loading

pkit commented Jun 27, 2023

genzgd commented Jun 27, 2023

pkit commented Jun 27, 2023

genzgd commented Jun 27, 2023

JakkuSakura commented Nov 20, 2024 • edited Loading

genzgd commented Nov 20, 2024

JakkuSakura commented Nov 20, 2024 • edited Loading

Timezone info is not present in the result when querying with `columns_tzs` or `query_tz` #210

Timezone info is not present in the result when querying with `columns_tzs` or `query_tz` #210

pkit commented Jun 27, 2023 •

edited

Loading

genzgd commented Jun 27, 2023 •

edited

Loading

JakkuSakura commented Nov 20, 2024 •

edited

Loading

JakkuSakura commented Nov 20, 2024 •

edited

Loading