You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have an ADX table with dynamic columns that I want to store json objects in (e.g. arrays, dicts as per https://learn.microsoft.com/en-us/kusto/query/scalar-data-types/dynamic?view=microsoft-fabric. As these structures exist as objects in pandas dataframes I was naively using ingest_from_dataframe and didn't really notice at first that this was doing anything wrong as it passes the data type validation in kusto. However, arrays with length > 1 were actually being sent as an array containing a single string which was a load of single quoted items.
Now I've looked under the covers, it's clear that all this library does when ingesting from dataframe is dump the dataframe to CSV then ingest from that. Dumping to CSV ruins json formatting in pandas (single quotes strings). I feel like you should dump to json (ensure date format is compatible with ADX timestamps with df.to_json(filename, orient="records", date_format="iso") instead and ingest from that file.
I'm now doing this instead as a workaround:
Old method (broken):
log.info("Ingesting data into Azure Data Cluster")
ingestion_properties=IngestionProperties(
database=DATABASE,
table=TABLE,
data_format=DataFormat.JSON,
)
rsp=ingest_client.ingest_from_dataframe(
df, ingestion_properties=ingestion_properties
)
New method:
df.to_json(filename, orient="records", date_format="iso"`)
log.info("Ingesting data into Azure Data Cluster")
ingestion_properties=IngestionProperties(
database=DATABASE,
table=TABLE,
data_format=DataFormat.MULTIJSON,
)
rsp=ingest_client.ingest_from_file(
filename, ingestion_properties=ingestion_properties
)
Problem description
I have an ADX table with dynamic columns that I want to store json objects in (e.g. arrays, dicts as per https://learn.microsoft.com/en-us/kusto/query/scalar-data-types/dynamic?view=microsoft-fabric. As these structures exist as objects in pandas dataframes I was naively using
ingest_from_dataframe
and didn't really notice at first that this was doing anything wrong as it passes the data type validation in kusto. However, arrays with length > 1 were actually being sent as an array containing a single string which was a load of single quoted items.Now I've looked under the covers, it's clear that all this library does when ingesting from dataframe is dump the dataframe to CSV then ingest from that. Dumping to CSV ruins json formatting in pandas (single quotes strings). I feel like you should dump to json (ensure date format is compatible with ADX timestamps with
df.to_json(filename, orient="records", date_format="iso"
) instead and ingest from that file.I'm now doing this instead as a workaround:
Old method (broken):
New method:
Output of
pip freeze
aiohappyeyeballs==2.4.4
aiohttp==3.11.11
aiosignal==1.3.2
anyio==4.8.0
appdirs==1.4.4
astroid==3.3.8
async-timeout==5.0.1
attrs==24.3.0
azure-core==1.32.0
azure-identity==1.19.0
azure-kusto-data==4.6.3
azure-kusto-ingest==4.6.3
azure-storage-blob==12.24.0
azure-storage-queue==12.12.0
bandit==1.8.2
black==24.10.0
certifi==2024.12.14
cffi==1.17.1
charset-normalizer==3.4.1
click==8.1.8
cryptography==43.0.3
Deprecated==1.2.15
dill==0.3.9
docformatter==1.7.5
dparse==0.6.4
exceptiongroup==1.2.2
flake8==7.1.1
flake8-bandit==4.1.1
flake8-black==0.3.6
flake8-bugbear==24.12.12
flake8-docstrings==1.7.0
flake8-isort==6.1.1
frozenlist==1.5.0
h11==0.14.0
h2==4.1.0
hpack==4.0.0
httpcore==1.0.7
httpx==0.28.1
hyperframe==6.0.1
idna==3.10
ijson==3.3.0
importlib_metadata==8.5.0
isodate==0.7.2
isort==5.13.2
markdown-it-py==3.0.0
mccabe==0.7.0
mdurl==0.1.2
microsoft-kiota-abstractions==1.9.0
microsoft-kiota-authentication-azure==1.9.0
microsoft-kiota-http==1.9.0
microsoft-kiota-serialization-form==1.9.0
microsoft-kiota-serialization-json==1.9.0
microsoft-kiota-serialization-multipart==1.9.0
microsoft-kiota-serialization-text==1.9.0
msal==1.31.1
msal-extensions==1.2.0
msgraph-core==1.2.0
msgraph-sdk==1.17.0
multidict==6.1.0
mypy==1.14.1
mypy-extensions==1.0.0
numpy==2.0.2
opentelemetry-api==1.29.0
opentelemetry-sdk==1.29.0
opentelemetry-semantic-conventions==0.50b0
ossaudit==0.5.0
packaging==24.2
pandas==2.2.3
pandas-stubs==2.2.2.240807
pathspec==0.12.1
pbr==6.1.0
platformdirs==4.3.6
portalocker==2.10.1
propcache==0.2.1
pycodestyle==2.12.1
pycparser==2.22
pydocstyle==6.3.0
pyflakes==3.2.0
Pygments==2.19.1
PyJWT==2.10.1
pylint==3.3.3
python-dateutil==2.9.0.post0
python-static-checks==4.0.2
pytz==2024.2
PyYAML==6.0.2
requests==2.32.3
rich==13.9.4
six==1.17.0
sniffio==1.3.1
snowballstemmer==2.2.0
std-uritemplate==2.0.1
stevedore==5.4.0
-e git+https://git.datcon.co.uk/tools.core/subscription-migration.git@a63e428f3f941348e7a43942f41a19ed6f278db8#egg=subscription_migration
tenacity==9.0.0
texttable==1.7.0
tomli==2.2.1
tomlkit==0.13.2
types-Deprecated==1.2.15.20241117
types-docutils==0.21.0.20241128
types-openpyxl==3.1.5.20241225
types-Pygments==2.19.0.20250107
types-python-dateutil==2.9.0.20241206
types-pytz==2024.2.0.20241221
types-requests==2.32.0.20241016
types-setuptools==75.8.0.20250110
typing_extensions==4.12.2
tzdata==2024.2
untokenize==0.1.1
urllib3==2.3.0
wrapt==1.17.2
yarl==1.18.3
zipp==3.21.0
The text was updated successfully, but these errors were encountered: