You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are running Spark + delta on CDH platform, and delta tables are stored on HDFS. We also use delta-rs project to read delta tables in some Python projects. Currently delta-rs does not support HDFS storage(delta-io/delta-rs#300). So we have to use Minio over HDFS as a workaround for this issue.
When we try to read delta table files through minio, it would return some error like following:
Is it possible to disable the checksum check for Minio over HDFS? Anyway, the checksum file is not guaranteed to be created (e.g, a json file is created and then the system crashes) and you may still hit the same issue.
AFAIK, the HDFS checksum doesn't work very well with file overwriting. E.g, let's say we have a file (A, A.crc). If we want to overwrite the file A, we may end up with overwriting file A, but A.crc is not updated due to system crash. Then the table will be broken.
Unless HDFS can provide an API to update the file content and its crc automatically, we won't be able to enable checksum for Delta.
We are running Spark + delta on CDH platform, and delta tables are stored on HDFS. We also use delta-rs project to read delta tables in some Python projects. Currently delta-rs does not support HDFS storage(delta-io/delta-rs#300). So we have to use Minio over HDFS as a workaround for this issue.
When we try to read delta table files through minio, it would return some error like following:
We also verified with
hadoop fs -checksum
command on delta log files, it would returnNONE
instead of some valid checksum code.After investigating the code in delta project, we found that checksum is explicitly disabled when creating log files:
delta/core/src/main/scala/org/apache/spark/sql/delta/storage/HDFSLogStore.scala
Line 103 in 2ddff5e
We are wondering what is the reason for this? And is there any recommended workaround for this issue? Thanks in advance.
The text was updated successfully, but these errors were encountered: