-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This package is initialized using the reviewed mapping functions
contained in these OpenTelemetry-Go PRs: open-telemetry/opentelemetry-go#2982 open-telemetry/opentelemetry-go#2502 The data structure was reviewed by Lightstep engineers for inclusion in otel-launcher-go: lightstep/otel-launcher-go#174 lightstep/otel-launcher-go#215 lightstep/otel-launcher-go#222
- Loading branch information
Joshua MacDonald
committed
Oct 5, 2022
1 parent
d53dd27
commit b3ce265
Showing
16 changed files
with
3,035 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,215 @@ | ||
# Base-2 Exponential Histogram | ||
|
||
## Design | ||
|
||
This is a fixed-size data structure for aggregating the OpenTelemetry | ||
base-2 exponential histogram introduced in [OTEP | ||
149](https://github.com/open-telemetry/oteps/blob/main/text/0149-exponential-histogram.md) | ||
and [described in the metrics data | ||
model](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/datamodel.md#exponentialhistogram). | ||
The exponential histogram data point is characterized by a `scale` | ||
factor that determines resolution. Positive scales correspond with | ||
more resolution, and negatives scales correspond with less resolution. | ||
|
||
Given a maximum size, in terms of the number of buckets, the | ||
implementation determines the best scale possible given the set of | ||
measurements received. The size of the histogram is configured using | ||
the `WithMaxSize()` option, which defaults to 160. | ||
|
||
The implementation here maintains the best resolution possible. Since | ||
the scale parameter is shared by the positive and negative ranges, the | ||
best value of the scale parameter is determined by the range with the | ||
greater difference between minimum and maximum bucket index: | ||
|
||
```golang | ||
func bucketsNeeded(minValue, maxValue float64, scale int32) int32 { | ||
return bucketIndex(maxValue, scale) - bucketIndex(minValue, scale) + 1 | ||
} | ||
|
||
func bucketIndex(value float64, scale int32) int32 { | ||
return math.Log(value) * math.Ldexp(math.Log2E, scale) | ||
} | ||
``` | ||
|
||
The best scale is uniquely determined when `maxSize/2 < | ||
bucketsNeeded(minValue, maxValue, scale) <= maxSize`. This | ||
implementation maintains the best scale by rescaling as needed to stay | ||
within the maximum size. | ||
|
||
## Layout | ||
|
||
### Mapping function | ||
|
||
The `mapping` sub-package contains the equations specified in the [data | ||
model for Exponential Histogram data | ||
points](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/data-model.md#exponentialhistogram). | ||
|
||
There are two mapping functions used, depending on the sign of the | ||
scale. Negative and zero scales use the `mapping/exponent` mapping | ||
function, which computes the bucket index directly from the bits of | ||
the `float64` exponent. This mapping function is used with scale `-10 | ||
<= scale <= 0`. Scales smaller than -10 map the entire normal | ||
`float64` number range into a single bucket, thus are not considered | ||
useful. | ||
|
||
The `mapping/logarithm` mapping function uses `math.Log(value)` times | ||
the scaling factor `math.Ldexp(math.Log2E, scale)`. This mapping | ||
function is used with `0 < scale <= 20`. The maximum scale is | ||
selected because at scale 21, simply, it becomes difficult to test | ||
correctness--at this point `math.MaxFloat64` maps to index | ||
`math.MaxInt32` and the `math/big` logic used in testing breaks down. | ||
|
||
### Data structure | ||
|
||
The `structure` sub-package contains a Histogram aggregator for use by | ||
the OpenTelemetry-Go Metrics SDK as well as OpenTelemetry Collector | ||
receivers, processors, and exporters. | ||
|
||
## Implementation | ||
|
||
The implementation maintains a slice of buckets and grows the array in | ||
size only as necessary given the actual range of values, up to the | ||
maximum size. The structure of a single range of buckets is: | ||
|
||
```golang | ||
type buckets struct { | ||
backing bucketsVarwidth[T] // for T = uint8 | uint16 | uint32 | uint64 | ||
indexBase int32 | ||
indexStart int32 | ||
indexEnd int32 | ||
} | ||
``` | ||
|
||
The `backing` field is a generic slice of `[]uint8`, `[]uint16`, | ||
`[]uint32`, or `[]uint64`. | ||
|
||
The positive and negative backing arrays are independent, so the | ||
maximum space used for `buckets` by one `Aggregator` is twice the | ||
configured maximum size. | ||
|
||
### Backing array | ||
|
||
The backing array is circular. The first observation is counted in | ||
the 0th index of the backing array and the initial bucket number is | ||
stored in `indexBase`. After the initial observation, the backing | ||
array grows in either direction (i.e., larger or smaller bucket | ||
numbers), until rescaling is necessary. This mechanism allows the | ||
histogram to maintain the ideal scale without shifting values inside | ||
the array. | ||
|
||
The `indexStart` and `indexEnd` fields store the current minimum and | ||
maximum bucket number. The initial condition is `indexBase == | ||
indexStart == indexEnd`, representing a single bucket. | ||
|
||
Following the first observation, new observations may fall into a | ||
bucket up to `size-1` in either direction. Growth is possible by | ||
adjusting either `indexEnd` or `indexStart` as long as the constraint | ||
`indexEnd-indexStart < size` remains true. | ||
|
||
Bucket numbers in the range `[indexBase, indexEnd]` are stored in the | ||
interval `[0, indexEnd-indexBase]` of the backing array. Buckets in | ||
the range `[indexStart, indexBase-1]` are stored in the interval | ||
`[size+indexStart-indexBase, size-1]` of the backing array. | ||
|
||
Considering the `aggregation.Buckets` interface, `Offset()` returns | ||
`indexStart`, `Len()` returns `indexEnd-indexStart+1`, and `At()` | ||
locates the correct bucket in the circular array. | ||
|
||
### Determining change of scale | ||
|
||
The algorithm used to determine the (best) change of scale when a new | ||
value arrives is: | ||
|
||
```golang | ||
func newScale(minIndex, maxIndex, scale, maxSize int32) int32 { | ||
return scale - changeScale(minIndex, maxIndex, scale, maxSize) | ||
} | ||
|
||
func changeScale(minIndex, maxIndex, scale, maxSize int32) int32 { | ||
var change int32 | ||
for maxIndex - minIndex >= maxSize { | ||
maxIndex >>= 1 | ||
minIndex >>= 1 | ||
change++ | ||
} | ||
return change | ||
} | ||
``` | ||
|
||
The `changeScale` function is also used to determine how many bits to | ||
shift during `Merge`. | ||
|
||
### Downscale function | ||
|
||
The downscale function rotates the circular backing array so that | ||
`indexStart == indexBase`, using the "3 reversals" method, before | ||
combining the buckets in place. | ||
|
||
### Merge function | ||
|
||
`Merge` first calculates the correct final scale by comparing the | ||
combined positive and negative ranges. The destination aggregator is | ||
then downscaled, if necessary, and the `UpdateByIncr` code path to add | ||
the source buckets to the destination buckets. | ||
|
||
### Scale function | ||
|
||
The `Scale` function returns the current scale of the histogram. | ||
|
||
If the scale is variable and there are no non-zero values in the | ||
histogram, the scale is zero by definition; when there is only a | ||
single value in this case, its scale is MinScale (20) by definition. | ||
|
||
If the scale is fixed because of range limits, the fixed scale will be | ||
returned even for any size histogram. | ||
|
||
### Handling subnormal values | ||
|
||
Subnormal values are those in the range [0x1p-1074, 0x1p-1022), these | ||
being numbers that "gradually underflow" and use less than 52 bits of | ||
precision in the significand at the smallest representable exponent | ||
(i.e., -1022). Subnormal numbers present special challenges for both | ||
the exponent- and logarithm-based mapping function, and to avoid | ||
additional complexity induced by corner cases, subnormal numbers are | ||
rounded up to 0x1p-1022 in this implementation. | ||
|
||
Handling subnormal numbers is difficult for the logarithm mapping | ||
function because Golang's `math.Log()` function rounds subnormal | ||
numbers up to 0x1p-1022. Handling subnormal numbers is difficult for | ||
the exponent mapping function because Golang's `math.Frexp()`, the | ||
natural API for extracting a value's base-2 exponent, also rounds | ||
subnormal numbers up to 0x1p-1022. | ||
|
||
While the additional complexity needed to correctly map subnormal | ||
numbers is small in both cases, there are few real benefits in doing | ||
so because of the inherent loss of precision. As secondary | ||
motivation, clamping values to the range [0x1p-1022, math.MaxFloat64] | ||
increases symmetry. This limit means that minimum bucket index and the | ||
maximum bucket index have similar magnitude, which helps support | ||
greater maximum scale. Supporting numbers smaller than 0x1p-1022 | ||
would mean changing the valid scale interval to [-11,19] compared with | ||
[-10,20]. | ||
|
||
### UpdateByIncr interface | ||
|
||
The OpenTelemetry metrics SDK `Aggregator` type supports an `Update()` | ||
interface which implies updating the histogram by a count of 1. This | ||
implementation also supports `UpdateByIncr()`, which makes it possible | ||
to support counting multiple observations in a single API call. This | ||
extension is useful in applying `Histogram` aggregation to _sampled_ | ||
metric events (e.g. in the [OpenTelemetry statsd | ||
receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/statsdreceiver)). | ||
|
||
Another use for `UpdateByIncr` is in a Span-to-metrics pipeline | ||
following [probability sampling in OpenTelemetry tracing | ||
(WIP)](https://github.com/open-telemetry/opentelemetry-specification/pull/2047). | ||
|
||
## Acknowledgements | ||
|
||
This implementation is based on work by [Yuke | ||
Zhuge](https://github.com/yzhuge) and [Otmar | ||
Ertl](https://github.com/oertl). See | ||
[NrSketch](https://github.com/newrelic-experimental/newrelic-sketch-java/blob/1ce245713603d61ba3a4510f6df930a5479cd3f6/src/main/java/com/newrelic/nrsketch/indexer/LogIndexer.java) | ||
and | ||
[DynaHist](https://github.com/dynatrace-oss/dynahist/blob/9a6003fd0f661a9ef9dfcced0b428a01e303805e/src/main/java/com/dynatrace/dynahist/layout/OpenTelemetryExponentialBucketsLayout.java) | ||
repositories for more detail. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
// expohisto contains two sub-packages: (1) the `mapping` package | ||
// includes ways to convert between values and bucket index numbers as | ||
// a function of scale, (2) the `structure` package contains a generic | ||
// data structure. | ||
package expohisto // import "github.com/lightstep/go-expohisto" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
module github.com/lightstep/go-expohisto | ||
|
||
go 1.19 | ||
|
||
require github.com/stretchr/testify v1.8.0 | ||
|
||
require ( | ||
github.com/davecgh/go-spew v1.1.1 // indirect | ||
github.com/pmezard/go-difflib v1.0.0 // indirect | ||
gopkg.in/yaml.v3 v3.0.1 // indirect | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= | ||
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= | ||
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= | ||
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= | ||
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= | ||
github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= | ||
github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= | ||
github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= | ||
github.com/stretchr/testify v1.8.0 h1:pSgiaMZlXftHpm5L7V1+rVB+AZJydKsMxsQBIJw4PKk= | ||
github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= | ||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM= | ||
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= | ||
gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= | ||
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= | ||
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,127 @@ | ||
// Copyright The OpenTelemetry Authors | ||
// | ||
// Licensed under the Apache License, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, software | ||
// distributed under the License is distributed on an "AS IS" BASIS, | ||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
// See the License for the specific language governing permissions and | ||
// limitations under the License. | ||
|
||
package exponent // import "github.com/lightstep/go-expohisto/mapping/exponent" | ||
|
||
import ( | ||
"fmt" | ||
"math" | ||
|
||
"github.com/lightstep/go-expohisto/mapping" | ||
"github.com/lightstep/go-expohisto/mapping/internal" | ||
) | ||
|
||
const ( | ||
// MinScale defines the point at which the exponential mapping | ||
// function becomes useless for float64. With scale -10, ignoring | ||
// subnormal values, bucket indices range from -1 to 1. | ||
MinScale int32 = -10 | ||
|
||
// MaxScale is the largest scale supported in this code. Use | ||
// ../logarithm for larger scales. | ||
MaxScale int32 = 0 | ||
) | ||
|
||
type exponentMapping struct { | ||
shift uint8 // equals negative scale | ||
} | ||
|
||
// exponentMapping is used for negative scales, effectively a | ||
// mapping of the base-2 logarithm of the exponent. | ||
var prebuiltMappings = [-MinScale + 1]exponentMapping{ | ||
{10}, | ||
{9}, | ||
{8}, | ||
{7}, | ||
{6}, | ||
{5}, | ||
{4}, | ||
{3}, | ||
{2}, | ||
{1}, | ||
{0}, | ||
} | ||
|
||
// NewMapping constructs an exponential mapping function, used for scales <= 0. | ||
func NewMapping(scale int32) (mapping.Mapping, error) { | ||
if scale > MaxScale { | ||
return nil, fmt.Errorf("exponent mapping requires scale <= 0") | ||
} | ||
if scale < MinScale { | ||
return nil, fmt.Errorf("scale too low") | ||
} | ||
return &prebuiltMappings[scale-MinScale], nil | ||
} | ||
|
||
// minNormalLowerBoundaryIndex is the largest index such that | ||
// base**index is <= MinValue. A histogram bucket with this index | ||
// covers the range (base**index, base**(index+1)], including | ||
// MinValue. | ||
func (e *exponentMapping) minNormalLowerBoundaryIndex() int32 { | ||
idx := int32(internal.MinNormalExponent) >> e.shift | ||
if e.shift < 2 { | ||
// For scales -1 and 0 the minimum value 2**-1022 | ||
// is a power-of-two multiple, meaning it belongs | ||
// to the index one less. | ||
idx-- | ||
} | ||
return idx | ||
} | ||
|
||
// maxNormalLowerBoundaryIndex is the index such that base**index | ||
// equals the largest representable boundary. A histogram bucket with this | ||
// index covers the range (0x1p+1024/base, 0x1p+1024], which includes | ||
// MaxValue; note that this bucket is incomplete, since the upper | ||
// boundary cannot be represented. One greater than this index | ||
// corresponds with the bucket containing values > 0x1p1024. | ||
func (e *exponentMapping) maxNormalLowerBoundaryIndex() int32 { | ||
return int32(internal.MaxNormalExponent) >> e.shift | ||
} | ||
|
||
// MapToIndex implements mapping.Mapping. | ||
func (e *exponentMapping) MapToIndex(value float64) int32 { | ||
// Note: we can assume not a 0, Inf, or NaN; positive sign bit. | ||
if value < internal.MinValue { | ||
return e.minNormalLowerBoundaryIndex() | ||
} | ||
|
||
// Extract the raw exponent. | ||
rawExp := internal.GetNormalBase2(value) | ||
|
||
// In case the value is an exact power of two, compute a | ||
// correction of -1: | ||
correction := int32((internal.GetSignificand(value) - 1) >> internal.SignificandWidth) | ||
|
||
// Note: bit-shifting does the right thing for negative | ||
// exponents, e.g., -1 >> 1 == -1. | ||
return (rawExp + correction) >> e.shift | ||
} | ||
|
||
// LowerBoundary implements mapping.Mapping. | ||
func (e *exponentMapping) LowerBoundary(index int32) (float64, error) { | ||
if min := e.minNormalLowerBoundaryIndex(); index < min { | ||
return 0, mapping.ErrUnderflow | ||
} | ||
|
||
if max := e.maxNormalLowerBoundaryIndex(); index > max { | ||
return 0, mapping.ErrOverflow | ||
} | ||
|
||
return math.Ldexp(1, int(index<<e.shift)), nil | ||
} | ||
|
||
// Scale implements mapping.Mapping. | ||
func (e *exponentMapping) Scale() int32 { | ||
return -int32(e.shift) | ||
} |
Oops, something went wrong.