-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hash, crypto: add WriteByte, WriteString method to hash implementations #38776
Comments
Unfortunately, the proposed change would not be backward compatible. It would mean that existing types that satisfy the hash.Hash interface would no longer implement the interface. That could break working code, and violates the Go 1 compatibility guarantee (https://golang.org/doc/go1compat). So, in short, we can't do this. |
Got it. How about adding another interface, or just adding the WriteByte() method to the standard library's hash implementations? Hashing is sometimes part of performance-sensitive code paths, and it would be beneficial to avoid conversions to byte slices whose only purpose is to satisfy the API. |
I'm not sure we need another interface, since people can always do a type assertion to Do you want to repurpose this proposal for adding |
How does performance benefit? My experience with WriteByte is that it is slower than appending to a byte slice and use the classic Write method every 256 or 512 bytes. |
The performance benefit isn't for hashing byte slices. It's for hashing everything else: primitives, structs, maps, arrays, and combinations thereof. |
Can you provide some example code helping me to understand your statement? |
Usually hashes can operate much faster on a block of data than a single byte at a time. What is the use case where WriteByte would be preferable over constructing a (presumably larger than one byte) slice and calling Write? |
To implement |
I have combined bufio.Writer and hash.Hash to create a buffered hash Test here: https://play.golang.org/p/IHx5GcvLW1v package main
import (
"bufio"
"crypto/sha256"
"fmt"
"hash"
)
type BufferedHash struct {
h hash.Hash
*bufio.Writer
}
func NewBufferedHash(h hash.Hash) *BufferedHash {
return &BufferedHash{
h: h,
Writer: bufio.NewWriter(h),
}
}
func (bh *BufferedHash) Sum(p []byte) []byte {
if err := bh.Flush(); err != nil {
panic(err)
}
return bh.h.Sum(p)
}
func (bh *BufferedHash) Reset() {
bh.h.Reset()
bh.Writer.Reset(bh.h)
}
func (bh *BufferedHash) Size() int {
return bh.h.Size()
}
func (bh *BufferedHash) BlockSize() int {
return bh.h.BlockSize()
}
type T struct {
A byte
B string
C byte
D string
}
func HashT(bh *BufferedHash, t T) {
bh.WriteByte(t.A)
bh.WriteString(t.B)
bh.WriteByte(t.C)
bh.WriteString(t.D)
}
func main() {
bh := NewBufferedHash(sha256.New())
t := T{A: 'A', B: "B", C: 'C', D: "D"}
HashT(bh, t)
fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
bh.Reset()
t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
HashT(bh, t)
fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
} |
The proverb "If I Had More Time, I Would Have Written a Shorter Letter" applies here. There is no need for creating an extra type: https://play.golang.org/p/Pp6GVhLpEx_9 package main
import (
"crypto/sha256"
"fmt"
"io"
)
type T struct {
A byte
B string
C byte
D string
}
func SerializeT(w io.Writer, t T) {
fmt.Fprintf(w, "%c%s%c%s", t.A, t.B, t.C, t.D)
}
func main() {
h := sha256.New()
t := T{A: 'A', B: "B", C: 'C', D: "D"}
SerializeT(h, t)
fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
h.Reset()
t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
SerializeT(h, t)
fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
} |
What's the context where you are hashing non-byte-slices with functions like sha256? I suppose the crypto/* hashes all buffer already and the hash/* function all operate byte at a time. But they all still run faster with large sequences. |
I'm building a relational database. I understand the reservations about changes / additions, but at high scale and high performance, it's important for APIs to not require avoidable overhead. |
The argument I was trying to make against adding WriteByte is precisely that it really can't be very high performance. Arranging for larger Writes is always going to beat a WriteByte loop. The reservation about provided WriteByte is exactly that it would tempt people toward a less efficient path. We may still want to add it for convenience, especially for cases that don't care about "high scale and high performance", but I don't think you'd want to use it in your relational database. |
All the hashes have buffers underneath, so they can all implement WriteByte efficiently - well, as efficiently as anyone can implement WriteByte. It's still more efficient to call Write with many bytes than to call WriteByte in a loop, but given that io.ByteWriter exists, it seems reasonable to make the hash.Hash implementations implement it. Earlier this year we declined #14757 because the implementation would have to use unsafe, but @bradfitz points out that the buffer that enables WriteByte would also enable a safe implementation of WriteString. So maybe we should add WriteString at the same time, using safe code. (If passed a long string, WriteString would have to copy into the buffer, process the buffer, and repeat. That would still be a bit of copying, but not more than converting to a []byte.) Will retitle this issue to be WriteByte and WriteString and leave open for another week, but this seems headed for likely accept. |
The premise that all hashes have buffers underneath is not correct. The non-cryptographic hashes in the adler32, crc32, crc64 and fnv packages in the hash directory of the standard library don't have buffers. It is of course possible to implement WriteByte and WriteString for those hashes based on the Write logic. The cost of the proposal, implement WriteByte and WriteString for all hashes in the standard library, is increased code size and additional test code. Implementation will require the replication of the Write logic in The convenience argument for WriteByte still doesn't convince me. Why is it necessary to add a method to each hash function to do something that will result in slow code. Beginners will still struggle because they I wonder whether we should look at the more general problem: How can WriteByte and WriteString be supported for an io.Writer? One option is to use bufio.Writer as a wrapper. But it complicates the program logic by requiring calls to Flush to ensure all data is written to the underlying writer. For WriteString there is the io.WriteString function, which has the disadvantage that it allocates a new byte slice and copies the data from string. The package unsafe is probably not used because the requirement For WriteByte an io.WriteByte convenience function would address the problem. Performance is not a concern here. Both proposals still allow the implementation of WriteByte and WriteString by hashes, but wouldn't make it mandatory. |
adler32, crc32, crc64, and fnv have no buffer because they are byte-at-a-time algorithms (the chunk size is 1 byte). An io.WriteByte convenience function would have to allocate on every byte in the fallback, like io.WriteString allocates on every call (but with many fewer calls in typical cases!). That's too expensive to hide in an innocuous-looking function. |
Thanks Ross for the response. I agree and I stated already it is possible to implement WriteByte and WriteString for adler32, etc. If a type supports the Write method it is always possible to implement While I understand the performance argument for WriteString, I'm still not convinced about WriteByte. What is the actual use case requiring the implementation for all hashes? The original proposal cited the direct marshaling or serialization of a struct value into the hash. But that doesn't convince because the struct may include other types like larger integers and hashes will not support those directly. There is also the question about consistency for writers in the standard library. After this proposal is implemented all hashes will support WriteByte and WriteString, but os..File supports WriteString but not WriteByte and net.TCPConn supports only Write. Shouldn't there be a general rule for supporting WriteByte and WriteString? |
The ByteWriter docs are not very helpful - there's nothing anywhere about what WriteByte means. These hashes can implement it efficiently enough and so it's probably worth doing. |
Based on the discussion above, this seems like a likely accept. |
Change https://golang.org/cl/301189 mentions this issue: |
The sha256 hash writer doesn't implement WriteString. (See golang/go#38776.) As a consequence, we end up converting many strings to []byte. Wrapping a bufio.Writer around the hash writer lets us avoid these conversions by using WriteString. Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe. The reason is that the sha256 writer does internal buffering, but doesn't do any when handed larger writers. Using a bufio.Writer merely shifts the data copying from one buffer to a different one. Using a concrete type for Print and print cuts 10% off of the execution time. name old time/op new time/op delta Hash-8 15.3µs ± 0% 11.5µs ± 0% -24.84% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Hash-8 2.82kB ± 0% 1.98kB ± 0% -29.57% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Hash-8 140 ± 0% 82 ± 0% -41.43% (p=0.000 n=10+10) Signed-off-by: Josh Bleecher Snyder <[email protected]>
The sha256 hash writer doesn't implement WriteString. (See golang/go#38776.) As a consequence, we end up converting many strings to []byte. Wrapping a bufio.Writer around the hash writer lets us avoid these conversions by using WriteString. Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe. The reason is that the sha256 writer does internal buffering, but doesn't do any when handed larger writers. Using a bufio.Writer merely shifts the data copying from one buffer to a different one. Using a concrete type for Print and print cuts 10% off of the execution time. name old time/op new time/op delta Hash-8 15.3µs ± 0% 11.5µs ± 0% -24.84% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Hash-8 2.82kB ± 0% 1.98kB ± 0% -29.57% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Hash-8 140 ± 0% 82 ± 0% -41.43% (p=0.000 n=10+10) Signed-off-by: Josh Bleecher Snyder <[email protected]>
The sha256 hash writer doesn't implement WriteString. (See golang/go#38776.) As a consequence, we end up converting many strings to []byte. Wrapping a bufio.Writer around the hash writer lets us avoid these conversions by using WriteString. Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe. The reason is that the sha256 writer does internal buffering, but doesn't do any when handed larger writers. Using a bufio.Writer merely shifts the data copying from one buffer to a different one. Using a concrete type for Print and print cuts 10% off of the execution time. name old time/op new time/op delta Hash-8 15.3µs ± 0% 11.5µs ± 0% -24.84% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Hash-8 2.82kB ± 0% 1.98kB ± 0% -29.57% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Hash-8 140 ± 0% 82 ± 0% -41.43% (p=0.000 n=10+10) Signed-off-by: Josh Bleecher Snyder <[email protected]>
We can make a fast safe implementation of func write[T interface{ string | []byte }](d *digest, p T) (int, error) {
// Write implementation...
}
func (d *digest) Write(p []byte) (int, error) { return write[[]byte](d, p) }
func (d *digest) WriteString(p string) (int, error) { return write[string](d, p) } |
Any idea when this proposal is going to be implemented? I recently needed to work with sha256 and had to wrap it with a bufio.Writer to get decent performance (https://github.com/imacks/aws-sigv4/blob/master/signer.go#L267). I tried using unsafe |
Change https://go.dev/cl/481478 mentions this issue: |
@imacks It's not clear to me that this will help you, but https://go.dev/cl/481478 is a patch to add |
Change https://go.dev/cl/483815 mentions this issue: |
Change https://go.dev/cl/483816 mentions this issue: |
This can reduce allocations when hashing a string or byte rather than []byte. For #38776 Change-Id: I1c6dd1bc018220784a05939e92b47558c0562110 Reviewed-on: https://go-review.googlesource.com/c/go/+/481478 Reviewed-by: Joel Sing <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]>
This can reduce allocations when hashing a string or byte rather than []byte. For #38776 Change-Id: I7c1fbdf15abf79d2faf360f75adf4bc550a607e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/483815 TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Joel Sing <[email protected]>
This can reduce allocations when hashing a string or byte rather than []byte. For #38776 Change-Id: I4926ae2749f6b167edbebb73d8f68763ffb2f0c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/483816 Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Reviewed-by: Joel Sing <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]>
Change https://go.dev/cl/492356 mentions this issue: |
Change https://go.dev/cl/492355 mentions this issue: |
Change https://go.dev/cl/492375 mentions this issue: |
This reverts CL 483816 Reason for revert: can cause cgo errors when using boringcrypto. See #59954. For #38776 For #59954 Change-Id: I23a2a1f0aed2a08b73855b5038ccb24a4d0a02c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/492355 Run-TryBot: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]>
This reverts CL 481478 Reason for revert: can cause cgo errors when using boringcrypto. See #59954. For #38776 For #59954 Change-Id: Ic520f9fede152d22ab69996ad84c44f3e0d783bc Reviewed-on: https://go-review.googlesource.com/c/go/+/492356 Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]>
This reverts CL 483815 Reason for revert: can cause cgo errors when using boringcrypto. See #59954. For #38776 For #59954 Change-Id: I1f7e0fb06b627971811623927e3d98c0fdbc730b Reviewed-on: https://go-review.googlesource.com/c/go/+/492375 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> TryBot-Bypass: Ian Lance Taylor <[email protected]>
@ianlancetaylor I see in https://go-review.googlesource.com/c/go/+/492356 that your patch got reverted due to boringcrypto. I seem to understand that with Go 1.24 boringcrypto is being replaced by FIPS (#69536), so does that mean that we can re-apply the patch to master now? cc @FiloSottile |
Probably, for 1.25. |
This proposal was initially for embedding io.ByteWriter in hash.Hash, or adding a WriteByte() method with the same signature.
This method is already added in the new maphash.Hash. Adding it elsewhere will extend the benefits in performance and usability to the other Hash implementations.
Per feedback of @ianlancetaylor below, I'm instead proposing the addition
WriteByte()
from io.ByteWriter to the standard library hash.Hash implementations, including:adler32
crc32
crc64
fnv
The text was updated successfully, but these errors were encountered: