Skip to content

Commit

Permalink
Version 4.7. Made minor code rearrangements, improved comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
avaneev committed Dec 8, 2022
1 parent 6abf877 commit 6dd7586
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 122 deletions.
18 changes: 11 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,9 @@ This function passes all [SMHasher](https://github.com/rurban/smhasher) tests.

A correct way to hash an array of independent values, and which does not
require pre-buffering, is to pass previous hash value as a seed value. This
method may be as fast or faster than pre-buffering, especially if lengthes
of values in the array are not small. An additional 1-2 cycles/hash advantage
is obtained if fixed-size values are being hashed incrementally (due to
method may be as fast or faster than pre-buffering, especially if lengths of
values in the array are not small. An additional 1-2 cycles/hash advantage is
obtained if fixed-size values are being hashed incrementally (due to
compiler's branching optimization). In most cases, incremental hashing of even
a few 2-8-byte values may be faster than using pre-buffering if the overall
input length is not known in advance.
Expand All @@ -53,13 +53,13 @@ input length is not known in advance.
Note that this approach is not the same as "streamed" hashing since this
approach implicitly encodes the length of each independent value. Such kind of
hashing can be beneficial when a database record is being hashed, when it is
necessary to separate fields by means of encoding their lengthes.
necessary to separate fields by means of encoding their lengths.

## Streamed Hashing ##

The `komihash.h` also features a fast streamed (continuous) implementation
of the `komihash` hash function. Streamed hashing expects any number of
`update` calls inbetween the `init` and `final` calls:
The `komihash.h` file also features a fast continuously streamed
implementation of the `komihash` hash function. Streamed hashing expects any
number of `update` calls inbetween the `init` and `final` calls:

```
komihash_stream_t ctx;
Expand All @@ -73,6 +73,10 @@ of the `komihash` hash function. Streamed hashing expects any number of
uint64_t Hash = komihash_stream_final( &ctx );
```

Since the `final` function is non-destructive for the context structure, the
function can be used to obtain intermediate "incremental" hashes of the data
stream being hashed, and the hashing can then be resumed.

The hash value produced via streamed hashing can be used in the
discrete-incremental hashing outlined above (e.g., for files and blobs).

Expand Down
215 changes: 100 additions & 115 deletions komihash.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/**
* komihash.h version 4.6
* komihash.h version 4.7
*
* The inclusion file for the "komihash" hash function, "komirand" 64-bit
* PRNG, and streamed "komihash" implementation.
Expand Down Expand Up @@ -119,7 +119,7 @@

#endif // likelihood macros

// In-memory data prefetch macro (temporal locality=1, in case a collision
// Memory address prefetch macro (temporal locality=1, in case a collision
// resolution would be necessary).

#if defined( __GNUC__ ) || defined( __clang__ )
Expand Down Expand Up @@ -328,32 +328,26 @@ static inline uint64_t kh_lpu64ec_l4( const uint8_t* const Msg,

#endif // defined( _MSC_VER )

/**
* Macro for common hashing round with 16-byte input, using the "r1h"
* temporary variable.
*/
// Macro for common hashing round with 16-byte input, using the "r1h"
// temporary variable.

#define KOMIHASH_HASH16( m ) \
kh_m128( Seed1 ^ kh_lu64ec( m ), \
Seed5 ^ kh_lu64ec( m + 8 ), &Seed1, &r1h ); \
Seed5 += r1h; \
Seed1 ^= Seed5;

/**
* Macro for common hashing round without input, using the "r2h" temporary
* variable.
*/
// Macro for common hashing round without input, using the "r2h" temporary
// variable.

#define KOMIHASH_HASHROUND() \
kh_m128( Seed1, Seed5, &Seed1, &r2h ); \
Seed5 += r2h; \
Seed1 ^= Seed5;

/**
* Macro for common hashing finalization round, with the final hashing input
* expected in the "r1h" and "r2h" temporary variables. The macro inserts the
* function return instruction.
*/
// Macro for common hashing finalization round, with the final hashing input
// expected in the "r1h" and "r2h" temporary variables. The macro inserts the
// function return instruction.

#define KOMIHASH_HASHFIN() \
kh_m128( r1h, r2h, &Seed1, &r1h ); \
Expand All @@ -362,17 +356,14 @@ static inline uint64_t kh_lpu64ec_l4( const uint8_t* const Msg,
KOMIHASH_HASHROUND(); \
return( Seed1 );

/**
* Macro for a common 64-byte full-performance hashing loop. Expects Msg and
* MsgLen values (greater than 63), requires initialized Seed1-8 values, uses
* r1h-r4h temporary variables.
*
* The "shifting" arrangement of Seed1-4 (below) does not increase individual
* SeedN's PRNG period beyond 2^64, but reduces a chance of any occassional
* synchronization between PRNG lanes happening. Practically, Seed1-4 together
* become a single "fused" 256-bit PRNG value, having a summary PRNG period
* of 2^66.
*/
// Macro for a common 64-byte full-performance hashing loop. Expects Msg and
// MsgLen values (greater than 63), requires initialized Seed1-8 values, uses
// r1h-r4h temporary variables.
//
// The "shifting" arrangement of Seed1-4 (below) does not increase individual
// SeedN's PRNG period beyond 2^64, but reduces a chance of any occassional
// synchronization between PRNG lanes happening. Practically, Seed1-4 together
// become a single "fused" 256-bit PRNG value, having a summary PRNG period.

#define KOMIHASH_HASHLOOP64() \
do \
Expand Down Expand Up @@ -405,15 +396,65 @@ static inline uint64_t kh_lpu64ec_l4( const uint8_t* const Msg,
\
} while( KOMIHASH_LIKELY( MsgLen > 63 ));

/**
* The hashing epilogue function (for internal use).
*
* @param Msg Pointer to the remaining part of the message.
* @param MsgLen Remaining part's length, can be zero.
* @param Seed1 Latest Seed1 value.
* @param Seed5 Latest Seed5 value.
* @return 64-bit hash value.
*/

static inline uint64_t komihash_epi( const uint8_t* Msg, size_t MsgLen,
uint64_t Seed1, uint64_t Seed5 )
{
uint64_t r1h, r2h;

KOMIHASH_PREFETCH( Msg );

if( KOMIHASH_LIKELY( MsgLen > 31 ))
{
KOMIHASH_HASH16( Msg );
KOMIHASH_HASH16( Msg + 16 );

Msg += 32;
MsgLen -= 32;
}

if( MsgLen > 15 )
{
KOMIHASH_HASH16( Msg );

Msg += 16;
MsgLen -= 16;
}

if( MsgLen > 7 )
{
r2h = Seed5 ^ kh_lpu64ec_l4( Msg + 8, MsgLen - 8 );
r1h = Seed1 ^ kh_lu64ec( Msg );
}
else
{
r1h = Seed1 ^ kh_lpu64ec_l4( Msg, MsgLen );
r2h = Seed5;
}

KOMIHASH_HASHFIN();
}

/**
* KOMIHASH hash function. Produces and returns a 64-bit hash value of the
* specified message, string, or binary data block. Designed for 64-bit
* hash-table and hash-map uses. Produces identical hashes on both big- and
* little-endian systems.
*
* @param Msg0 The message to produce a hash from. The alignment of this
* pointer is unimportant.
* @param MsgLen Message's length, in bytes.
* pointer is unimportant. It is valid to pass 0 when MsgLen==0 (assuming that
* compiler's implementation of the address prefetch is non-failing for any
* address).
* @param MsgLen Message's length, in bytes, can be zero.
* @param UseSeed Optional value, to use instead of the default seed. To use
* the default seed, set to 0. The UseSeed value can have any bit length and
* statistical quality, and is used only as an additional entropy source. May
Expand Down Expand Up @@ -515,37 +556,7 @@ static inline uint64_t komihash( const void* const Msg0, size_t MsgLen,
Seed1 ^= Seed2 ^ Seed3 ^ Seed4;
}

KOMIHASH_PREFETCH( Msg );

if( KOMIHASH_LIKELY( MsgLen > 31 ))
{
KOMIHASH_HASH16( Msg );
KOMIHASH_HASH16( Msg + 16 );

Msg += 32;
MsgLen -= 32;
}

if( MsgLen > 15 )
{
KOMIHASH_HASH16( Msg );

Msg += 16;
MsgLen -= 16;
}

if( MsgLen > 7 )
{
r2h = Seed5 ^ kh_lpu64ec_l4( Msg + 8, MsgLen - 8 );
r1h = Seed1 ^ kh_lu64ec( Msg );
}
else
{
r1h = Seed1 ^ kh_lpu64ec_l4( Msg, MsgLen );
r2h = Seed5;
}

KOMIHASH_HASHFIN();
return( komihash_epi( Msg, MsgLen, Seed1, Seed5 ));
}

/**
Expand All @@ -571,15 +582,17 @@ static inline uint64_t komirand( uint64_t* const Seed1, uint64_t* const Seed2 )
return( *Seed1 );
}

#ifndef KOMIHASH_BUFSIZE
#define KOMIHASH_BUFSIZE 768 ///< Buffer size in bytes, must be a
///< multiple of 64, and not less than 128.
#endif // KOMIHASH_BUFSIZE
#if !defined( KOMIHASH_BUFSIZE )

#define KOMIHASH_BUFSIZE 768 // Streamed hashing's buffer size in bytes,
// must be a multiple of 64, and not less than 128.

#endif // !defined( KOMIHASH_BUFSIZE )

/**
* Structure that holds streamed hashing state. The komihash_init() function
* should be called to initalize the structure before hashing. Note that the
* default buffer size is modestly-sized, permitting placement of this
* Context structure that holds streamed hashing state. The komihash_init()
* function should be called to initalize the structure before hashing. Note
* that the default buffer size is modest, permitting placement of this
* structure on stack. Seed[ 0 ] is used as initial UseSeed storage.
*/

Expand All @@ -594,7 +607,7 @@ typedef struct {
/**
* Function initializes the streamed "komihash" session.
*
* @param ctx Pointer to the context structure.
* @param[out] ctx Pointer to the context structure.
* @param UseSeed Optional value, to use instead of the default seed. To use
* the default seed, set to 0. The UseSeed value can have any bit length and
* statistical quality, and is used only as an additional entropy source. May
Expand All @@ -605,7 +618,6 @@ typedef struct {
static inline void komihash_stream_init( komihash_stream_t* const ctx,
const uint64_t UseSeed )
{
ctx -> fb[ 7 ] = 0;
ctx -> Seed[ 0 ] = UseSeed;
ctx -> BufFill = 0;
ctx -> IsHashing = 0;
Expand All @@ -614,11 +626,11 @@ static inline void komihash_stream_init( komihash_stream_t* const ctx,
/**
* Function updates the streamed hashing state with a new input data.
*
* @param ctx Pointer to the context structure. The structure must be
* initialized via the komihash_stream_init().
* @param[in,out] ctx Pointer to the context structure. The structure must be
* initialized via the komihash_stream_init() function.
* @param Msg0 The next part of the whole message being hashed. The alignment
* of this pointer is unimportant.
* @param MsgLen Message's length, in bytes.
* of this pointer is unimportant. It is valid to pass 0 when MsgLen==0.
* @param MsgLen Message's length, in bytes, can be zero.
*/

static inline void komihash_stream_update( komihash_stream_t* const ctx,
Expand Down Expand Up @@ -726,20 +738,20 @@ static inline void komihash_stream_update( komihash_stream_t* const ctx,

if( SwMsgLen == 0 )
{
if( MsgLen == 0 )
{
ctx -> fb[ 7 ] = Msg[ -1 ];
ctx -> BufFill = 0;
return;
}

break;
}

Msg = SwMsg;
MsgLen = SwMsgLen;
SwMsgLen = 0;
}

if( MsgLen == 0 )
{
ctx -> fb[ 7 ] = Msg[ -1 ];
ctx -> BufFill = 0;
return;
}
}

memcpy( ctx -> Buf + BufFill, Msg, MsgLen );
Expand All @@ -748,11 +760,15 @@ static inline void komihash_stream_update( komihash_stream_t* const ctx,

/**
* Function finalizes the streamed hashing session, and returns the resulting
* hash of the previously hashed data. The value is equal to the value
* hash value of the previously hashed data. This value is equal to the value
* returned by the komihash() function for the same provided data.
*
* @param ctx Pointer to the context structure. The structure must be
* initialized via the komihash_stream_init().
* Note that since this function is non-destructive for the context structure,
* the function can be used to obtain intermediate hashes of the data stream
* being hashed, and the hashing can then be resumed.
*
* @param[in] ctx Pointer to the context structure. The structure must be
* initialized via the komihash_stream_init() function.
* @return 64-bit hash value.
*/

Expand All @@ -778,49 +794,18 @@ static inline uint64_t komihash_stream_final( komihash_stream_t* const ctx )
uint64_t Seed6 = ctx -> Seed[ 5 ];
uint64_t Seed7 = ctx -> Seed[ 6 ];
uint64_t Seed8 = ctx -> Seed[ 7 ];
uint64_t r1h, r2h;

if( MsgLen > 63 )
{
uint64_t r3h, r4h;
uint64_t r1h, r2h, r3h, r4h;

KOMIHASH_HASHLOOP64();
}

Seed5 ^= Seed6 ^ Seed7 ^ Seed8;
Seed1 ^= Seed2 ^ Seed3 ^ Seed4;

KOMIHASH_PREFETCH( Msg );

if( KOMIHASH_LIKELY( MsgLen > 31 ))
{
KOMIHASH_HASH16( Msg );
KOMIHASH_HASH16( Msg + 16 );

Msg += 32;
MsgLen -= 32;
}

if( MsgLen > 15 )
{
KOMIHASH_HASH16( Msg );

Msg += 16;
MsgLen -= 16;
}

if( MsgLen > 7 )
{
r2h = Seed5 ^ kh_lpu64ec_l4( Msg + 8, MsgLen - 8 );
r1h = Seed1 ^ kh_lu64ec( Msg );
}
else
{
r1h = Seed1 ^ kh_lpu64ec_l4( Msg, MsgLen );
r2h = Seed5;
}

KOMIHASH_HASHFIN();
return( komihash_epi( Msg, MsgLen, Seed1, Seed5 ));
}

/**
Expand Down

0 comments on commit 6dd7586

Please sign in to comment.