hash, crypto: add WriteByte, WriteString method to hash implementations #38776

geraldss · 2020-04-30T18:56:35Z

This proposal was initially for embedding io.ByteWriter in hash.Hash, or adding a WriteByte() method with the same signature.

This method is already added in the new maphash.Hash. Adding it elsewhere will extend the benefits in performance and usability to the other Hash implementations.

Per feedback of @ianlancetaylor below, I'm instead proposing the addition WriteByte() from io.ByteWriter to the standard library hash.Hash implementations, including:

adler32
crc32
crc64
fnv

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2020-04-30T21:15:05Z

Unfortunately, the proposed change would not be backward compatible. It would mean that existing types that satisfy the hash.Hash interface would no longer implement the interface. That could break working code, and violates the Go 1 compatibility guarantee (https://golang.org/doc/go1compat).

So, in short, we can't do this.

geraldss · 2020-04-30T22:02:12Z

Got it. How about adding another interface, or just adding the WriteByte() method to the standard library's hash implementations?

Hashing is sometimes part of performance-sensitive code paths, and it would be beneficial to avoid conversions to byte slices whose only purpose is to satisfy the API.

ianlancetaylor · 2020-04-30T22:10:41Z

I'm not sure we need another interface, since people can always do a type assertion to io.ByteWriter.

Do you want to repurpose this proposal for adding WriteByte methods to various hash implementations?

ulikunitz · 2020-05-01T08:43:46Z

How does performance benefit? My experience with WriteByte is that it is slower than appending to a byte slice and use the classic Write method every 256 or 512 bytes.

geraldss · 2020-05-01T13:34:49Z

The performance benefit isn't for hashing byte slices. It's for hashing everything else: primitives, structs, maps, arrays, and combinations thereof.

ulikunitz · 2020-05-01T14:42:15Z

Can you provide some example code helping me to understand your statement?

rsc · 2020-06-10T19:57:16Z

Usually hashes can operate much faster on a block of data than a single byte at a time.
One potential problem with adding WriteByte is that using it would be inherently slower
than passing in a larger slice of data.

What is the use case where WriteByte would be preferable over constructing a (presumably larger than one byte) slice and calling Write?

geraldss · 2020-06-10T22:01:49Z

type T struct {
    A byte
    B string
    C byte
    D string
}

func HashT(h hash.Hash, t *T) { ... }

To implement HashT(), it would be convenient if there were no conversions to byte slices. The current option is to use encoding/binary, but that API doesn't express the avoidance of byte slices when it calls a generic io.Writer. Ditto for supporting WriteString().

ulikunitz · 2020-06-11T08:21:08Z

I have combined bufio.Writer and hash.Hash to create a buffered hash

Test here: https://play.golang.org/p/IHx5GcvLW1v

package main

import (
	"bufio"
	"crypto/sha256"
	"fmt"
	"hash"
)

type BufferedHash struct {
	h hash.Hash
	*bufio.Writer
}

func NewBufferedHash(h hash.Hash) *BufferedHash {
	return &BufferedHash{
		h:      h,
		Writer: bufio.NewWriter(h),
	}
}

func (bh *BufferedHash) Sum(p []byte) []byte {
	if err := bh.Flush(); err != nil {
		panic(err)
	}
	return bh.h.Sum(p)
}

func (bh *BufferedHash) Reset() {
	bh.h.Reset()
	bh.Writer.Reset(bh.h)
}

func (bh *BufferedHash) Size() int {
	return bh.h.Size()
}

func (bh *BufferedHash) BlockSize() int {
	return bh.h.BlockSize()
}

type T struct {
	A byte
	B string
	C byte
	D string
}

func HashT(bh *BufferedHash, t T) {
	bh.WriteByte(t.A)
	bh.WriteString(t.B)
	bh.WriteByte(t.C)
	bh.WriteString(t.D)
}

func main() {
	bh := NewBufferedHash(sha256.New())

	t := T{A: 'A', B: "B", C: 'C', D: "D"}
	HashT(bh, t)

	fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
	bh.Reset()

	t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
	HashT(bh, t)
	fmt.Printf("hash(%+v): %x\n", t, bh.Sum(nil))
}

ulikunitz · 2020-06-11T08:46:48Z

The proverb "If I Had More Time, I Would Have Written a Shorter Letter" applies here. There is no need for creating an extra type: https://play.golang.org/p/Pp6GVhLpEx_9

package main

import (
	"crypto/sha256"
	"fmt"
	"io"
)

type T struct {
	A byte
	B string
	C byte
	D string
}

func SerializeT(w io.Writer, t T) {
	fmt.Fprintf(w, "%c%s%c%s", t.A, t.B, t.C, t.D)
}

func main() {
	h := sha256.New()

	t := T{A: 'A', B: "B", C: 'C', D: "D"}
	SerializeT(h, t)

	fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
	h.Reset()

	t = T{A: 'A', B: "B", C: 'C', D: "Dee"}
	SerializeT(h, t)
	fmt.Printf("hash(%+v): %x\n", t, h.Sum(nil))
}

rsc · 2020-06-24T18:04:32Z

What's the context where you are hashing non-byte-slices with functions like sha256?
If you are building a hash table, hash/maphash is the package to use, and maphash.Hash does have WriteByte.
If you need a well-defined fixed hash function, that's almost always for use with a specific byte sequence.

I suppose the crypto/* hashes all buffer already and the hash/* function all operate byte at a time. But they all still run faster with large sequences.

geraldss · 2020-06-24T18:10:10Z

I'm building a relational database. I understand the reservations about changes / additions, but at high scale and high performance, it's important for APIs to not require avoidable overhead.

rsc · 2020-07-15T18:03:53Z

but at high scale and high performance, it's important for APIs to not require avoidable overhead.

The argument I was trying to make against adding WriteByte is precisely that it really can't be very high performance. Arranging for larger Writes is always going to beat a WriteByte loop. The reservation about provided WriteByte is exactly that it would tempt people toward a less efficient path.

We may still want to add it for convenience, especially for cases that don't care about "high scale and high performance", but I don't think you'd want to use it in your relational database.

rsc · 2020-08-05T17:32:57Z

All the hashes have buffers underneath, so they can all implement WriteByte efficiently - well, as efficiently as anyone can implement WriteByte.

It's still more efficient to call Write with many bytes than to call WriteByte in a loop, but given that io.ByteWriter exists, it seems reasonable to make the hash.Hash implementations implement it.
(To be clear, we can't modify hash.Hash itself, as was originally proposed.)

Earlier this year we declined #14757 because the implementation would have to use unsafe, but @bradfitz points out that the buffer that enables WriteByte would also enable a safe implementation of WriteString. So maybe we should add WriteString at the same time, using safe code. (If passed a long string, WriteString would have to copy into the buffer, process the buffer, and repeat. That would still be a bit of copying, but not more than converting to a []byte.)

Will retitle this issue to be WriteByte and WriteString and leave open for another week, but this seems headed for likely accept.

ulikunitz · 2020-08-06T09:06:46Z

The premise that all hashes have buffers underneath is not correct. The non-cryptographic hashes in the adler32, crc32, crc64 and fnv packages in the hash directory of the standard library don't have buffers. It is of course possible to implement WriteByte and WriteString for those hashes based on the Write logic.

The cost of the proposal, implement WriteByte and WriteString for all hashes in the standard library, is increased code size and additional test code. Implementation will require the replication of the Write logic in
both new methods unless the methods are implemented as wrappers around Write.

The convenience argument for WriteByte still doesn't convince me. Why is it necessary to add a method to each hash function to do something that will result in slow code. Beginners will still struggle because they
have to know that they need to convert the hash to a ByteWriter and experienced developers will be able to use fmt.Fprintf(w, "%c", c) or write their own wrapper since performance cannot be the concern in that
case.

I wonder whether we should look at the more general problem: How can WriteByte and WriteString be supported for an io.Writer?

One option is to use bufio.Writer as a wrapper. But it complicates the program logic by requiring calls to Flush to ensure all data is written to the underlying writer.

For WriteString there is the io.WriteString function, which has the disadvantage that it allocates a new byte slice and copies the data from string. The package unsafe is probably not used because the requirement
that a Write method should not modify the slice cannot be enforced by the compiler. I suggest to provide an unsafe.WriteString function that assumes that the Writer doesn't modify the slice. It may be used in
cases where performance is critical.

For WriteByte an io.WriteByte convenience function would address the problem. Performance is not a concern here.

Both proposals still allow the implementation of WriteByte and WriteString by hashes, but wouldn't make it mandatory.

rsc · 2020-08-12T17:40:35Z

adler32, crc32, crc64, and fnv have no buffer because they are byte-at-a-time algorithms (the chunk size is 1 byte).
They can implement WriteByte and WriteString by calling the single-byte update function.
(That may involve refactoring Write, but inlining should be good enough now to keep from slowing down Write.)

An io.WriteByte convenience function would have to allocate on every byte in the fallback, like io.WriteString allocates on every call (but with many fewer calls in typical cases!). That's too expensive to hide in an innocuous-looking function.

ulikunitz · 2020-08-13T07:44:37Z

Thanks Ross for the response. I agree and I stated already it is possible to implement WriteByte and WriteString for adler32, etc. If a type supports the Write method it is always possible to implement
WriteByte and WriteString regardless whether the Write operation is buffered or not.

While I understand the performance argument for WriteString, I'm still not convinced about WriteByte. What is the actual use case requiring the implementation for all hashes? The original proposal cited the direct marshaling or serialization of a struct value into the hash. But that doesn't convince because the struct may include other types like larger integers and hashes will not support those directly.

There is also the question about consistency for writers in the standard library. After this proposal is implemented all hashes will support WriteByte and WriteString, but os..File supports WriteString but not WriteByte and net.TCPConn supports only Write. Shouldn't there be a general rule for supporting WriteByte and WriteString?

rsc · 2020-08-26T17:48:29Z

% go doc io.ByteWriter
package io // import "io"

type ByteWriter interface {
	WriteByte(c byte) error
}
    ByteWriter is the interface that wraps the WriteByte method.

% go doc io.ByteReader
package io // import "io"

type ByteReader interface {
	ReadByte() (byte, error)
}
    ByteReader is the interface that wraps the ReadByte method.

    ReadByte reads and returns the next byte from the input or any error
    encountered. If ReadByte returns an error, no input byte was consumed, and
    the returned byte value is undefined.

    ReadByte provides an efficient interface for byte-at-time processing. A
    Reader that does not implement ByteReader can be wrapped using
    bufio.NewReader to add this method.

%

The ByteWriter docs are not very helpful - there's nothing anywhere about what WriteByte means.
It's possible we should say the same in ByteWriter as in ByteReader: if you can implement it efficiently, then it's okay to have one. If not, then not.

These hashes can implement it efficiently enough and so it's probably worth doing.

rsc · 2020-08-26T17:48:44Z

Based on the discussion above, this seems like a likely accept.

gopherbot · 2021-03-12T11:02:55Z

Change https://golang.org/cl/301189 mentions this issue: crypto/*, hash: add WriteString method to hash.Hash and all the algorithms

The sha256 hash writer doesn't implement WriteString. (See golang/go#38776.) As a consequence, we end up converting many strings to []byte. Wrapping a bufio.Writer around the hash writer lets us avoid these conversions by using WriteString. Using a bufio.Writer is, perhaps surprisingly, almost as cheap as using unsafe. The reason is that the sha256 writer does internal buffering, but doesn't do any when handed larger writers. Using a bufio.Writer merely shifts the data copying from one buffer to a different one. Using a concrete type for Print and print cuts 10% off of the execution time. name old time/op new time/op delta Hash-8 15.3µs ± 0% 11.5µs ± 0% -24.84% (p=0.000 n=10+10) name old alloc/op new alloc/op delta Hash-8 2.82kB ± 0% 1.98kB ± 0% -29.57% (p=0.000 n=10+10) name old allocs/op new allocs/op delta Hash-8 140 ± 0% 82 ± 0% -41.43% (p=0.000 n=10+10) Signed-off-by: Josh Bleecher Snyder <[email protected]>

tdakkota · 2021-09-01T09:00:06Z

Earlier this year we declined #14757 because the implementation would have to use unsafe, but @bradfitz points out that the buffer that enables WriteByte would also enable a safe implementation of WriteString.

We can make a fast safe implementation of WriteString using generics.

func write[T interface{ string | []byte }](d *digest, p T) (int, error) {
          // Write implementation...
}

func (d *digest) Write(p []byte) (int, error) { return write[[]byte](d, p) }
func (d *digest) WriteString(p string) (int, error) { return write[string](d, p) }

imacks · 2023-04-02T13:54:26Z

Any idea when this proposal is going to be implemented? I recently needed to work with sha256 and had to wrap it with a bufio.Writer to get decent performance (https://github.com/imacks/aws-sigv4/blob/master/signer.go#L267). I tried using unsafe s2b at first and although it reduced allocs, benchmark suggests things are slower.

gopherbot · 2023-04-03T04:22:35Z

Change https://go.dev/cl/481478 mentions this issue: crypto/sha256: add WriteString and WriteByte method

ianlancetaylor · 2023-04-03T04:24:00Z

@imacks It's not clear to me that this will help you, but https://go.dev/cl/481478 is a patch to add WriteString and WriteByte to crypto/sha256. Does it make a difference for your code?

gopherbot · 2023-04-11T22:29:33Z

Change https://go.dev/cl/483815 mentions this issue: crypto/sha1: add WriteString and WriteByte method

gopherbot · 2023-04-12T01:56:44Z

Change https://go.dev/cl/483816 mentions this issue: crypto/sha512: add WriteString and WriteByte method

This can reduce allocations when hashing a string or byte rather than []byte. For #38776 Change-Id: I1c6dd1bc018220784a05939e92b47558c0562110 Reviewed-on: https://go-review.googlesource.com/c/go/+/481478 Reviewed-by: Joel Sing <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]>

This can reduce allocations when hashing a string or byte rather than []byte. For #38776 Change-Id: I7c1fbdf15abf79d2faf360f75adf4bc550a607e9 Reviewed-on: https://go-review.googlesource.com/c/go/+/483815 TryBot-Result: Gopher Robot <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Joel Sing <[email protected]>

This can reduce allocations when hashing a string or byte rather than []byte. For #38776 Change-Id: I4926ae2749f6b167edbebb73d8f68763ffb2f0c1 Reviewed-on: https://go-review.googlesource.com/c/go/+/483816 Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Reviewed-by: Joel Sing <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]>

gopherbot · 2023-05-03T21:08:43Z

Change https://go.dev/cl/492356 mentions this issue: Revert "crypto/sha256: add WriteString and WriteByte method"

gopherbot · 2023-05-03T21:08:44Z

Change https://go.dev/cl/492355 mentions this issue: Revert "crypto/sha512: add WriteString and WriteByte method"

gopherbot · 2023-05-03T21:08:45Z

Change https://go.dev/cl/492375 mentions this issue: Revert "crypto/sha1: add WriteString and WriteByte method"

This reverts CL 483816 Reason for revert: can cause cgo errors when using boringcrypto. See #59954. For #38776 For #59954 Change-Id: I23a2a1f0aed2a08b73855b5038ccb24a4d0a02c0 Reviewed-on: https://go-review.googlesource.com/c/go/+/492355 Run-TryBot: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]>

This reverts CL 481478 Reason for revert: can cause cgo errors when using boringcrypto. See #59954. For #38776 For #59954 Change-Id: Ic520f9fede152d22ab69996ad84c44f3e0d783bc Reviewed-on: https://go-review.googlesource.com/c/go/+/492356 Reviewed-by: Ian Lance Taylor <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> TryBot-Result: Gopher Robot <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Auto-Submit: Ian Lance Taylor <[email protected]>

This reverts CL 483815 Reason for revert: can cause cgo errors when using boringcrypto. See #59954. For #38776 For #59954 Change-Id: I1f7e0fb06b627971811623927e3d98c0fdbc730b Reviewed-on: https://go-review.googlesource.com/c/go/+/492375 Auto-Submit: Ian Lance Taylor <[email protected]> Reviewed-by: Bryan Mills <[email protected]> Run-TryBot: Ian Lance Taylor <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]> TryBot-Bypass: Ian Lance Taylor <[email protected]>

mvdan · 2024-11-24T19:30:28Z

@ianlancetaylor I see in https://go-review.googlesource.com/c/go/+/492356 that your patch got reverted due to boringcrypto. I seem to understand that with Go 1.24 boringcrypto is being replaced by FIPS (#69536), so does that mean that we can re-apply the patch to master now? cc @FiloSottile

ianlancetaylor · 2024-11-25T20:31:05Z

Probably, for 1.25.

gopherbot added this to the Proposal milestone Apr 30, 2020

gopherbot added the Proposal label Apr 30, 2020

geraldss changed the title ~~proposal: Add io.ByteWriter to hash.Hash~~ proposal: Add io.ByteWriter to hash.Hash implementations Apr 30, 2020

rsc changed the title ~~proposal: Add io.ByteWriter to hash.Hash implementations~~ proposal: hash, crypto: add WriteByte method to hash implementations Jun 10, 2020

rsc mentioned this issue Jul 8, 2020

proposal: review meeting minutes #33502

Open

rsc changed the title ~~proposal: hash, crypto: add WriteByte method to hash implementations~~ proposal: hash, crypto: add WriteByte, WriteString method to hash implementations Aug 5, 2020

rsc mentioned this issue Aug 5, 2020

proposal: crypto, hash: add WriteString support #14757

Closed

rsc added the Proposal-FinalCommentPeriod label Aug 26, 2020

odeke-em self-assigned this Mar 20, 2021

rsc moved this to Accepted in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

dsnet mentioned this issue Aug 10, 2022

proposal: hash: all implementations of Hash should implement io.StringWriter #54379

Closed

ianlancetaylor mentioned this issue Apr 13, 2023

cmd/compile: better optimization of type switches in instantiated generic functions #59591

Open

dmitshur modified the milestones: Backlog, Go1.21 Apr 25, 2023

odeke-em modified the milestones: Go1.21, Backlog Jul 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hash, crypto: add WriteByte, WriteString method to hash implementations #38776

hash, crypto: add WriteByte, WriteString method to hash implementations #38776

geraldss commented Apr 30, 2020 •

edited

Loading

ianlancetaylor commented Apr 30, 2020

geraldss commented Apr 30, 2020

ianlancetaylor commented Apr 30, 2020

ulikunitz commented May 1, 2020

geraldss commented May 1, 2020 •

edited

Loading

ulikunitz commented May 1, 2020

rsc commented Jun 10, 2020

geraldss commented Jun 10, 2020 •

edited

Loading

ulikunitz commented Jun 11, 2020 •

edited

Loading

ulikunitz commented Jun 11, 2020

rsc commented Jun 24, 2020

geraldss commented Jun 24, 2020

rsc commented Jul 15, 2020

rsc commented Aug 5, 2020

ulikunitz commented Aug 6, 2020

rsc commented Aug 12, 2020

ulikunitz commented Aug 13, 2020

rsc commented Aug 26, 2020

rsc commented Aug 26, 2020

gopherbot commented Mar 12, 2021

tdakkota commented Sep 1, 2021

imacks commented Apr 2, 2023

gopherbot commented Apr 3, 2023

ianlancetaylor commented Apr 3, 2023

gopherbot commented Apr 11, 2023

gopherbot commented Apr 12, 2023

gopherbot commented May 3, 2023

gopherbot commented May 3, 2023

gopherbot commented May 3, 2023

mvdan commented Nov 24, 2024

ianlancetaylor commented Nov 25, 2024

hash, crypto: add WriteByte, WriteString method to hash implementations #38776

hash, crypto: add WriteByte, WriteString method to hash implementations #38776

Comments

geraldss commented Apr 30, 2020 • edited Loading

ianlancetaylor commented Apr 30, 2020

geraldss commented Apr 30, 2020

ianlancetaylor commented Apr 30, 2020

ulikunitz commented May 1, 2020

geraldss commented May 1, 2020 • edited Loading

ulikunitz commented May 1, 2020

rsc commented Jun 10, 2020

geraldss commented Jun 10, 2020 • edited Loading

ulikunitz commented Jun 11, 2020 • edited Loading

ulikunitz commented Jun 11, 2020

rsc commented Jun 24, 2020

geraldss commented Jun 24, 2020

rsc commented Jul 15, 2020

rsc commented Aug 5, 2020

ulikunitz commented Aug 6, 2020

rsc commented Aug 12, 2020

ulikunitz commented Aug 13, 2020

rsc commented Aug 26, 2020

rsc commented Aug 26, 2020

gopherbot commented Mar 12, 2021

tdakkota commented Sep 1, 2021

imacks commented Apr 2, 2023

gopherbot commented Apr 3, 2023

ianlancetaylor commented Apr 3, 2023

gopherbot commented Apr 11, 2023

gopherbot commented Apr 12, 2023

gopherbot commented May 3, 2023

gopherbot commented May 3, 2023

gopherbot commented May 3, 2023

mvdan commented Nov 24, 2024

ianlancetaylor commented Nov 25, 2024

geraldss commented Apr 30, 2020 •

edited

Loading

geraldss commented May 1, 2020 •

edited

Loading

geraldss commented Jun 10, 2020 •

edited

Loading

ulikunitz commented Jun 11, 2020 •

edited

Loading