-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timezone info is not present in the result when querying with columns_tzs
or query_tz
#210
Comments
This is expected behavior. As stated in the changelog for version 0.5.17:
Applying a timezone is quite expensive in Python and if the user really needs a UTC timezone applied, they should do in their own application code. |
This one is not according to the python standard, see: https://docs.python.org/3/library/datetime.html#datetime.datetime.astimezone
|
In this case I think it's reasonable to prefer performance over a convention/presumption. The most common use case for ClickHouse clients is in container applications using UTC timestamps. Converting ClickHouse integer epoch timestamps (which is how all times are stored in ClickHouse) to Python naive datetime objects is already annoyingly expensive (numpy datetime or Pandas objects are much faster and cleaner). Adding even more processing cost by making them UTC timezone aware when that conversion gains nothing in the majority of applications is something that I think the application developer should do consciously. |
Without convention it cannot be passed to any other library, as it will implicitly assume that internally it's a system time, and it can be pretty deep in other people code.
Unfortunately it gains quite a lot, as naive datetime objects are pretty bad and lead to subtle and hard to find bugs, see here I've just expected that if user was explicit enough to set |
Fair point about query_tz or column_tzs if they differ from the system timezone. I'll think about how that might be implemented. |
FYI, I made this monkey patch to attach timezone even for UTC (and add whenever support) from datetime import tzinfo
from typing import List, Any, Union, Sequence, MutableSequence, Optional
import clickhouse_connect
import pandas as pd
import whenever
from clickhouse_connect.datatypes.temporal import DateTime64
from clickhouse_connect.driver import tzutil
from clickhouse_connect.driver.client import Client
from clickhouse_connect.driver.common import first_value, write_array
from clickhouse_connect.driver.ctypes import numpy_conv
from clickhouse_connect.driver.insert import InsertContext
from clickhouse_connect.driver.query import QueryResult, QueryContext
from clickhouse_connect.driver.summary import QuerySummary
from clickhouse_connect.driver.types import ByteSource
def active_tz(self, datatype_tz: Optional[tzinfo]):
if self.column_tz:
active_tz = self.column_tz
elif datatype_tz:
active_tz = datatype_tz
elif self.query_tz:
active_tz = self.query_tz
elif self.response_tz:
active_tz = self.response_tz
elif self.apply_server_tz:
active_tz = self.server_tz
else:
active_tz = tzutil.local_tz
# if active_tz == pytz.UTC:
# return None
return active_tz
QueryContext.active_tz = active_tz
# optional
def _read_binary_naive(self, column: Sequence):
new_col = []
app = new_col.append
dt_from = whenever.Instant.from_timestamp_nanos
for ticks in column:
app(dt_from(ticks))
return new_col
# optional
def _read_binary_tz(self, column: Sequence, tz_info: tzinfo):
new_col = []
app = new_col.append
dt_from = whenever.ZonedDateTime.from_timestamp_nanos
for ticks in column:
app(dt_from(ticks, tz=str(tz_info)))
return new_col
def _read_column_binary(self, source: ByteSource, num_rows: int, ctx: QueryContext):
if self.read_format(ctx) == 'int':
return source.read_array('q', num_rows)
active_tz = ctx.active_tz(self.tzinfo)
if ctx.use_numpy:
np_array = numpy_conv.read_numpy_array(source, self.np_type, num_rows)
if ctx.as_pandas and active_tz:
return pd.DatetimeIndex(np_array, tz='UTC').tz_convert(active_tz)
return np_array
column = source.read_array('q', num_rows)
# if active_tz and active_tz != pytz.UTC:
if active_tz:
return self._read_binary_tz(column, active_tz)
return self._read_binary_naive(column)
def _write_column_binary(self, column: Union[Sequence, MutableSequence], dest: bytearray, ctx: InsertContext):
first = first_value(column, self.nullable)
if isinstance(first, int) or self.write_format(ctx) == 'int':
if self.nullable:
column = [x if x else 0 for x in column]
elif isinstance(first, (whenever.Instant, whenever.ZonedDateTime)):
if self.nullable:
column = [x.timestamp_nanos() if x else 0 for x in column]
else:
column = [x.timestamp_nanos() for x in column]
else:
prec = self.prec
if self.nullable:
column = [((int(x.timestamp()) * 1000000 + x.microsecond) * prec) // 1000000 if x else 0
for x in column]
else:
column = [((int(x.timestamp()) * 1000000 + x.microsecond) * prec) // 1000000 for x in column]
write_array('q', column, dest, ctx.column_name)
# optional
DateTime64._read_binary_naive = _read_binary_naive
DateTime64._read_binary_tz = _read_binary_tz
DateTime64._read_column_binary = _read_column_binary
DateTime64._write_column_binary = _write_column_binary |
@JakkuSakura Are you only complaining about |
@genzgd I use polars as the downstream library. however, without the correct UTC timezone, I have to do the timezone conversion for every datetime column, otherwise I always get wrong time in next step |
Simple repro:
Produces "naive" datetime objects without any TZ info:
Version of clickhouse-connect:
0.6.4
The text was updated successfully, but these errors were encountered: