-
Notifications
You must be signed in to change notification settings - Fork 794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't error when hashing data that can't be serialised to JSON #3161
Don't error when hashing data that can't be serialised to JSON #3161
Conversation
Converting non-serialisable data to a string seems fine when we're just generating a hash. This allows `to_dict` to be called when there's `datetime.datetime` etc. in the data
Thank you for the suggestion @btabram! You already provide a textual description how this can be of benefit for you and others, that is nice, but it would be great if you also could provide an minimal working example in code. That make this a bit easier to review. Would that be possible to provide? Thanks again! |
Thanks for the quick response @mattijn! Here's some code that works with my change but fails without it: import pandas as pd
from altair import Chart, TopLevelMixin
from datetime import time
from pydantic import BaseModel
class Result(BaseModel):
chart: TopLevelMixin
class Config:
arbitrary_types_allowed = True
extra = "forbid"
json_encoders = {
TopLevelMixin: TopLevelMixin.to_dict,
}
# Create a chart
df = pd.DataFrame({"t": [time(second=i) for i in range(10)], "value": range(10)})
chart = Chart(df).mark_point().encode(x="t", y="value")
# Now try to serialise the result so the vega chart can be rendered elsewhere
result = Result(chart=chart)
print(result.json()) In my real workflow the data is coming from an Excel spreadsheet (and |
Thanks for adding context. In your case I think this should work temporarily: with alt.data_transformers.enable(consolidate_datasets=False):
print(result.json(encoder=str)) Related is this issue: #2199. And the interesting thing is, if we use other, private, functions within Altair this hash can be computed correctly: import pandas as pd
import datetime
from altair.utils.data import _data_to_json_string, _compute_data_hash
data_date = pd.DataFrame({
"time": [datetime.date(2020, 5, 1), datetime.date(2020, 5, 3)],
"value": [3, 5],
})
data_time = pd.DataFrame({
"time": [datetime.time(second=1), datetime.time(second=2)],
"value": [3, 5],
})
json_data_date = _data_to_json_string(data_date)
json_data_time = _data_to_json_string(data_time)
_compute_data_hash(json_data_date), _compute_data_hash(json_data_time)
Regarding your proposed solution. I think it would be better to rework this A bit more debugging. Using the conventional sanitization of dataframes this works: from altair.utils.core import sanitize_dataframe
sanitize_dataframe(data_date) But through our new dataframe interchange protocol this errors: from altair.utils.core import sanitize_arrow_table
import pyarrow.interchange as pi
pa_table = sanitize_arrow_table(pi.from_dataframe(data_date))
Basically the dataframe cannot be changed into an arrow table. Another caveat is that this also means that this only can work if the type is specified in the encoding, since the automatic inference of dtypes is now done using the dataframe interchange protocol if it is available (if I remember correctly @jonmmease, see https://github.com/altair-viz/altair/blob/main/altair/utils/core.py#L589) |
That's correct, we use the DataFrame interchange protocol approach unless pyarrow is not available or pandas is too old. I'm not clear on how we would support |
Apologies for taking so long to reply. I can't get that workaround working for me @mattijn but it's okay because I can just convert or drop the data the doesn't serialise. @jonmmease I'm using a javascript library to render the chart and it looked fine but actually when I started investigating I realised it was just plotting as strings and I was lucky that my points were evenly spaced. So it's not a very useful chart and this request isn't so useful either. Sorry. However, I do still have a usecase where I'd appreciate this change, but I accept it's not a very important one so feel free to just close this. I'm often reading a DataFrame from Excel and then passing that DataFrame straight to Altair and sometimes there's a Thanks for taking the time to consder this 🙂 |
The intended behavior of OP might be fixed by this code change: #3377 (comment). @btabram, could you have a look if Altair version 5.3 provides the behavior you are looking for? |
This is a suggestion that would help a particular issue I've got but it might be a bit niche. Thanks for looking!
Converting non-serialisable data to a string instead of erroring seems fine when it's just for generating a hash. This allows
to_dict
to be called when there's data thatjson.dumps
can't serialise.In my case I've got
datetime.time
instances in the dataframe I'm plotting a chart from and am using Pydantic (which can convertdatetime.time
to JSON) to serialise the chart dictionary.