Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default checksum type to CRC32C #315

Closed
wants to merge 1 commit into from

Conversation

symious
Copy link

@symious symious commented Apr 27, 2023

Changing the default checksum type to CRC32.

We have some users using uploading files which has the checksum type of "CRC32", but the default type is "CRC32C" since the very begining.

This will cause a checksum mismatch issue when users are using distcp to copy the files. Although we can work around this issue on HDFS side by adding "-pc" to the distcp command, we think it's also good to change the default value from GO side.

@symious
Copy link
Author

symious commented May 4, 2023

@colinmarc Could you help to review the PR?

@colinmarc
Copy link
Owner

Hi @symious,

The default checksum per the source is CRC32: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto#L530

I'm not inclined to change it here (the protos have always been a straight copy from the source), but maybe I'm missing something? I'll close this for now, but please feel free to make the argument!

@colinmarc colinmarc closed this Jul 11, 2023
@symious
Copy link
Author

symious commented Jul 11, 2023

@colinmarc Thank you for the reply.

The default value Clients are using should be here: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java#L133

And CRC32C has been more efficient than the CRC32.

@colinmarc
Copy link
Owner

Huh, okay, thanks for clarifying that. I merged this in 325f0cf.

@symious
Copy link
Author

symious commented Jul 11, 2023

@colinmarc Thank you for the change.

@symious symious deleted the DEFAULT_CHECKSUM_TYPE branch July 11, 2023 14:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants