-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSET-3998 feat: export Logs resource info based on export_resource_info_on_event configuration #1
DSET-3998 feat: export Logs resource info based on export_resource_info_on_event configuration #1
Conversation
…n_event configuration
exporter/datasetexporter/README.md
Outdated
@@ -33,6 +33,8 @@ If you do not want to specify `api_key` in the file, you can use the [builtin fu | |||
- `traces`: | |||
- `aggregate` (default = false): Count the number of spans and errors belonging to a trace. | |||
- `max_wait` (default = 5s): The maximum waiting for all spans from single trace to arrive; ignored if `aggregate` is false. | |||
- `logs`: | |||
- `export_resource_info_on_event` (default = false): Include resource info to DataSet Event while exporting Logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably also add something along the lines of "Turning this functionality on and including resource information with each log line event can result in substantial DataSet billable log volume increase".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And perhaps documenting somewhere (not sure if README is the best place) how example DataSet event looks like with this config value on and off (so it's easier for user to see what kind of impact it can have on the event size and so they have more data to decide if they want to enable it or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already described in config.go, but sure good idea to do it also here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally table with options in the readme would be auto-generated from code to avoid duplication, things getting out of sync, etc.
But that can be done in the future.
}) | ||
err := config.Unmarshal(configMap) | ||
assert.Nil(t, err) | ||
assert.Equal(t, true, config.LogsSettings.ExportResourceInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it also a good idea to add a test case for the default value aka false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See line 46
assert.Equal(t, expected, was) | ||
} | ||
|
||
func TestBuildEventFromLogExportResources(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In scenarios like that, I like to have test cases for 3 scenarios:
- Implicit default value
- Default value explicitly specified in the config
- Default value is overridden in the config
It looks like there is already 1) and 3), but may also be a good idea to cover 2) as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomaz-s1 what is the added value of second scenario?
- verifies the config value if missing and also behaviour of application using false value
- verifies the behaviour of application using true value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just makes it easier to spot / catch in case default value changes (but yeah, this would also be caught by the other tests which exercises implicit default value).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default changes the scenario with Implicit default value would fail so hopefully we would adjust the tests to still cover both scenarios
…emetry#24676) **Description:** The metadata.yml for the SSH check receiver currently documents a resource attribute containing the SSH endpoint but this is not emitted. This PR updates the receiver to include this resource attribute. **Link to tracking Issue:** open-telemetry#24441 **Testing:** Example collector config: ```yaml receivers: sshcheck: endpoint: 13.245.150.131:22 username: ec2-user key_file: /Users/dewald.dejager/.ssh/sandbox.pem collection_interval: 15s known_hosts: /Users/dewald.dejager/.ssh/known_hosts ignore_host_key: false resource_attributes: "ssh.endpoint": enabled: true exporters: logging: verbosity: detailed prometheus: endpoint: 0.0.0.0:8081 resource_to_telemetry_conversion: enabled: true service: pipelines: metrics: receivers: [sshcheck] exporters: [logging, prometheus] ``` The log output looks like this: ``` 2023-07-30T16:52:38.724+0200 info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "logging", "resource metrics": 1, "metrics": 2, "data points": 2} 2023-07-30T16:52:38.724+0200 info ResourceMetrics #0 Resource SchemaURL: Resource attributes: -> ssh.endpoint: Str(13.245.150.131:22) ScopeMetrics #0 ScopeMetrics SchemaURL: InstrumentationScope otelcol/sshcheckreceiver 0.82.0-dev Metric #0 Descriptor: -> Name: sshcheck.duration -> Description: Measures the duration of SSH connection. -> Unit: ms -> DataType: Gauge NumberDataPoints #0 StartTimestamp: 2023-07-30 14:52:22.381672 +0000 UTC Timestamp: 2023-07-30 14:52:38.404003 +0000 UTC Value: 319 Metric #1 Descriptor: -> Name: sshcheck.status -> Description: 1 if the SSH client successfully connected, otherwise 0. -> Unit: 1 -> DataType: Sum -> IsMonotonic: false -> AggregationTemporality: Cumulative NumberDataPoints #0 StartTimestamp: 2023-07-30 14:52:22.381672 +0000 UTC Timestamp: 2023-07-30 14:52:38.404003 +0000 UTC Value: 1 ``` And the Prometheus metrics look like this: ``` # HELP sshcheck_duration Measures the duration of SSH connection. # TYPE sshcheck_duration gauge sshcheck_duration{ssh_endpoint="13.245.150.131:22"} 311 # HELP sshcheck_status 1 if the SSH client successfully connected, otherwise 0. # TYPE sshcheck_status gauge sshcheck_status{ssh_endpoint="13.245.150.131:22"} 1 ```
) **Description:** Adding command line argument `--status-code` to `telemetrygen traces`, which accepts `(Unset,Error,Ok)` (case sensitive) or the enum equivalent of `(0,1,2)`. Running ```shell telemetrygen traces --otlp-insecure --traces 1 --status-code 1 ``` against a minimal local collector yields ```txt 2023-07-29T21:27:57.862+0100 info ResourceSpans #0 Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0 Resource attributes: -> service.name: Str(telemetrygen) ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope telemetrygen Span #0 Trace ID : f6dc4be32c78b9999c69d504a79e68c1 Parent ID : 4e2cd6e0e90cf2ea ID : 20835413e32d26a5 Name : okey-dokey Kind : Server Start time : 2023-07-29 20:27:57.861602 +0000 UTC End time : 2023-07-29 20:27:57.861726 +0000 UTC Status code : Error Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-client) Span #1 Trace ID : f6dc4be32c78b9999c69d504a79e68c1 Parent ID : ID : 4e2cd6e0e90cf2ea Name : lets-go Kind : Client Start time : 2023-07-29 20:27:57.861584 +0000 UTC End time : 2023-07-29 20:27:57.861726 +0000 UTC Status code : Error Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-server) ``` and similarly (the string version) ```shell telemetrygen traces --otlp-insecure --traces 1 --status-code '"Ok"' ``` produces ```txt Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0 Resource attributes: -> service.name: Str(telemetrygen) ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope telemetrygen Span #0 Trace ID : dfd830da170acfe567b12f87685d7917 Parent ID : 8e15b390dc6a1ccc ID : 165c300130532072 Name : okey-dokey Kind : Server Start time : 2023-07-29 20:29:16.026965 +0000 UTC End time : 2023-07-29 20:29:16.027089 +0000 UTC Status code : Ok Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-client) Span #1 Trace ID : dfd830da170acfe567b12f87685d7917 Parent ID : ID : 8e15b390dc6a1ccc Name : lets-go Kind : Client Start time : 2023-07-29 20:29:16.026956 +0000 UTC End time : 2023-07-29 20:29:16.027089 +0000 UTC Status code : Ok Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-server) ``` The default is `Unset` which is the current behaviour. **Link to tracking Issue:** 24286 **Testing:** Added unit tests which covers both valid and invalid inputs. **Documentation:** Command line arguments are self documenting via the usage info in the flag. Co-authored-by: Pablo Baeyens <[email protected]>
open-telemetry#29116) **Description:** As originally proposed in open-telemetry#26991 before I got distracted Exposes the duration of generated spans as a command line parameter. It uses a `DurationVar` flag so units can be easily provided and are automatically applied. Example usage: ```bash telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ns # nanoseconds telemetrygen traces --traces 100 --otlp-insecure --span-duration 10us # microseconds telemetrygen traces --traces 100 --otlp-insecure --span-duration 10ms # milliseconds telemetrygen traces --traces 100 --otlp-insecure --span-duration 10s # seconds ``` **Testing:** Ran without the argument provided `telemetrygen traces --traces 1 --otlp-insecure` and seen spans publishing with the default value. Ran again with the argument provided: `telemetrygen traces --traces 1 --otlp-insecure --span-duration 1s` And observed the expected output: ``` Resource SchemaURL: https://opentelemetry.io/schemas/1.4.0 Resource attributes: -> service.name: Str(telemetrygen) ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope telemetrygen Span #0 Trace ID : 8b441587ffa5820688b87a6b511d634c Parent ID : 39faad428638791b ID : 88f0886894bd4ee2 Name : okey-dokey Kind : Server Start time : 2023-11-12 02:05:07.97443 +0000 UTC End time : 2023-11-12 02:05:08.97443 +0000 UTC Status code : Unset Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-client) Span #1 Trace ID : 8b441587ffa5820688b87a6b511d634c Parent ID : ID : 39faad428638791b Name : lets-go Kind : Client Start time : 2023-11-12 02:05:07.97443 +0000 UTC End time : 2023-11-12 02:05:08.97443 +0000 UTC Status code : Unset Status message : Attributes: -> net.peer.ip: Str(1.2.3.4) -> peer.service: Str(telemetrygen-server) {"kind": "exporter", "data_type": "traces", "name": "debug"} ``` **Documentation:** No documentation added. --------- Co-authored-by: Pablo Baeyens <[email protected]>
**Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> This PR implements the new container logs parser as it was proposed at open-telemetry#31959. **Link to tracking Issue:** <Issue number if applicable> open-telemetry#31959 **Testing:** <Describe what testing was performed and which tests were added.> Added unit tests. Providing manual testing steps as well: ### How to test this manually 1. Using the following config file: ```yaml receivers: filelog: start_at: end include_file_name: false include_file_path: true include: - /var/log/pods/*/*/*.log operators: - id: container-parser type: container output: m1 - type: move id: m1 from: attributes.k8s.pod.name to: attributes.val - id: some type: add field: attributes.key2.key_in value: val2 exporters: debug: verbosity: detailed service: pipelines: logs: receivers: [filelog] exporters: [debug] processors: [] ``` 2. Start the collector: `./bin/otelcontribcol_linux_amd64 --config ~/otelcol/container_parser/config.yaml` 3. Use the following bash script to create some logs: ```bash #! /bin/bash echo '2024-04-13T07:59:37.505201169-05:00 stdout P This is a very very long crio line th' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log echo '{"log":"INFO: log line here","stream":"stdout","time":"2029-03-30T08:31:20.545192187Z"}' >> /var/log/pods/kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log echo '2024-04-13T07:59:37.505201169-05:00 stdout F at is awesome! crio is awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log echo '2021-06-22T10:27:25.813799277Z stdout P some containerd log th' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log echo '{"log":"INFO: another log line here","stream":"stdout","time":"2029-03-30T08:31:20.545192187Z"}' >> /var/log/pods/kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log echo '2021-06-22T10:27:25.813799277Z stdout F at is super awesome! Containerd is awesome' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log echo '2024-04-13T07:59:37.505201169-05:00 stdout F standalone crio line which is awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log echo '2021-06-22T10:27:25.813799277Z stdout F standalone containerd line that is super awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log ``` 4. Run the above as a bash script to verify any parallel processing. Verify that the output is correct. ### Test manually on k8s 1. `make docker-otelcontribcol && docker tag otelcontribcol otelcontribcol-dev:0.0.1 && kind load docker-image otelcontribcol-dev:0.0.1` 2. Install using the following helm values file: ```yaml mode: daemonset presets: logsCollection: enabled: true image: repository: otelcontribcol-dev tag: "0.0.1" pullPolicy: IfNotPresent command: name: otelcontribcol config: exporters: debug: verbosity: detailed receivers: filelog: start_at: end include_file_name: false include_file_path: true exclude: - /var/log/pods/default_daemonset-opentelemetry-collector*_*/opentelemetry-collector/*.log include: - /var/log/pods/*/*/*.log operators: - id: container-parser type: container output: some - id: some type: add field: attributes.key2.key_in value: val2 service: pipelines: logs: receivers: [filelog] processors: [batch] exporters: [debug] ``` 3. Check collector's output to verify the logs are parsed properly: ```console 2024-05-10T07:52:02.307Z info LogsExporter {"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 2} 2024-05-10T07:52:02.307Z info ResourceLog #0 Resource SchemaURL: ScopeLogs #0 ScopeLogs SchemaURL: InstrumentationScope LogRecord #0 ObservedTimestamp: 2024-05-10 07:52:02.046236071 +0000 UTC Timestamp: 2024-05-10 07:52:01.92533954 +0000 UTC SeverityText: SeverityNumber: Unspecified(0) Body: Str(otel logs at 07:52:01) Attributes: -> log: Map({"iostream":"stdout"}) -> time: Str(2024-05-10T07:52:01.92533954Z) -> k8s: Map({"container":{"name":"busybox","restart_count":"0"},"namespace":{"name":"default"},"pod":{"name":"daemonset-logs-6f6mn","uid":"1069e46b-03b2-4532-a71f-aaec06c0197b"}}) -> logtag: Str(F) -> key2: Map({"key_in":"val2"}) -> log.file.path: Str(/var/log/pods/default_daemonset-logs-6f6mn_1069e46b-03b2-4532-a71f-aaec06c0197b/busybox/0.log) Trace ID: Span ID: Flags: 0 LogRecord #1 ObservedTimestamp: 2024-05-10 07:52:02.046411602 +0000 UTC Timestamp: 2024-05-10 07:52:02.027386192 +0000 UTC SeverityText: SeverityNumber: Unspecified(0) Body: Str(otel logs at 07:52:02) Attributes: -> log.file.path: Str(/var/log/pods/default_daemonset-logs-6f6mn_1069e46b-03b2-4532-a71f-aaec06c0197b/busybox/0.log) -> time: Str(2024-05-10T07:52:02.027386192Z) -> log: Map({"iostream":"stdout"}) -> logtag: Str(F) -> k8s: Map({"container":{"name":"busybox","restart_count":"0"},"namespace":{"name":"default"},"pod":{"name":"daemonset-logs-6f6mn","uid":"1069e46b-03b2-4532-a71f-aaec06c0197b"}}) -> key2: Map({"key_in":"val2"}) Trace ID: Span ID: Flags: 0 ... ``` **Documentation:** <Describe the documentation added.> Added Signed-off-by: ChrsMark <[email protected]>
…try#33225) **Description:** <Describe what has changed.> Using the DB span example below, X-Ray exporter failed to generate the expected DB call subsegment names because it could not parse JDBC connection strings that start with the `jdbc:` prefix. ``` Span #1 Trace ID : 663a0b68a5e3849c09c07f914b3df738 Parent ID : 1052e2a4a2516884 ID : 374de78b552e23c2 Name : orders@no-appsignals-mysql-1.cnkqok6c8mo1.eu-west-1.rds.amazonaws.com Kind : Client Start time : 2024-05-07 11:07:20.62 +0000 UTC End time : 2024-05-07 11:07:20.624 +0000 UTC Status code : Unset Status message : Attributes: -> db.connection_string: Str(jdbc:mysql://no-appsignals-mysql-1.cnkqok6c8mo1.eu-west-1.rds.amazonaws.com:3306) -> db.name: Str(orders) -> db.system: Str(MySQL) -> db.user: Str([email protected]) ``` **Link to tracking Issue:** <Issue number if applicable> **Testing:** <Describe what testing was performed and which tests were added.> local tests
…pen-telemetry#33353) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> Container parser should add k8s metadata as resource attributes and not as log record attributes. **Link to tracking Issue:** <Issue number if applicable> Fixes open-telemetry#33341 **Testing:** <Describe what testing was performed and which tests were added.> Manual testing on local k8s cluster: ```console 2024-06-04T06:40:08.219Z info ResourceLog #0 Resource SchemaURL: Resource attributes: -> k8s.pod.uid: Str(d5ecc924-e255-4525-b5be-6437939b1e4d) -> k8s.container.name: Str(busybox) -> k8s.namespace.name: Str(default) -> k8s.pod.name: Str(daemonset-logs-dhzcq) -> k8s.container.restart_count: Str(0) ScopeLogs #0 ScopeLogs SchemaURL: InstrumentationScope LogRecord #0 ObservedTimestamp: 2024-06-04 06:40:08.007370503 +0000 UTC Timestamp: 2024-06-04 06:40:07.855932421 +0000 UTC SeverityText: SeverityNumber: Unspecified(0) Body: Str(otel logs at 06:40:07) Attributes: -> logtag: Str(F) -> key2: Map({"key_in":"val2"}) -> log.file.path: Str(/var/log/pods/default_daemonset-logs-dhzcq_d5ecc924-e255-4525-b5be-6437939b1e4d/busybox/0.log) -> time: Str(2024-06-04T06:40:07.855932421Z) -> log.iostream: Str(stdout) Trace ID: Span ID: Flags: 0 LogRecord #1 ObservedTimestamp: 2024-06-04 06:40:08.007451031 +0000 UTC Timestamp: 2024-06-04 06:40:07.957875321 +0000 UTC SeverityText: SeverityNumber: Unspecified(0) Body: Str(otel logs at 06:40:07) Attributes: -> log.file.path: Str(/var/log/pods/default_daemonset-logs-dhzcq_d5ecc924-e255-4525-b5be-6437939b1e4d/busybox/0.log) -> log.iostream: Str(stdout) -> time: Str(2024-06-04T06:40:07.957875321Z) -> key2: Map({"key_in":"val2"}) -> logtag: Str(F) Trace ID: Span ID: Flags: 0 ``` **Documentation:** <Describe the documentation added.> ~ --------- Signed-off-by: ChrsMark <[email protected]>
…try.log_response_body` config (open-telemetry#33854) **Description:** <Describe what has changed.> <!--Ex. Fixing a bug - Describe the bug and how this fixes the issue. Ex. Adding a feature - Explain what this achieves.--> - Add `telemetry.log_request_body` and `telemetry.log_response_body` config for debugging. Debug log will contain field `request_body` and/or `response_body` in the same log line instead of separate lines to avoid interleaved log lines. - Change "Request failed" log level to debug. Output: ``` 2024-07-02T14:09:24.983+0100 debug elasticsearchexporter/elasticsearch_bulk.go:67 Request roundtrip completed. {"kind": "exporter", "data_type": "logs", "name": "elasticsearch", "response_body": "{\"version\":{\"number\":\"1.2.3\"}}\n", "path": "/", "method": "GET", "duration": 0.000865486, "status": "200 OK"} 2024-07-02T14:09:24.984+0100 debug elasticsearchexporter/elasticsearch_bulk.go:67 Request roundtrip completed. {"kind": "exporter", "data_type": "logs", "name": "elasticsearch", "request_body": "{\"create\":{\"_index\":\"logs-test-idx\"}}\n{\"@timestamp\":\"2024-07-02T13:09:24.970187592Z\",\"Attributes\":{\"a\":\"test\",\"b\":5,\"batch_index\":\"batch_1\",\"c\":3,\"d\":true,\"item_index\":\"item_1\"},\"Body\":\"Load Generator Counter #0\",\"Scope\":{\"name\":\"\",\"version\":\"\"},\"SeverityNumber\":11,\"SeverityText\":\"INFO3\",\"TraceFlags\":1}\n{\"create\":{\"_index\":\"logs-test-idx\"}}\n{\"@timestamp\":\"2024-07-02T13:09:24.970187592Z\",\"Attributes\":{\"a\":\"test\",\"b\":5,\"batch_index\":\"batch_1\",\"c\":3,\"d\":true,\"item_index\":\"item_2\"},\"Body\":\"Load Generator Counter #1\",\"Scope\":{\"name\":\"\",\"version\":\"\"},\"SeverityNumber\":11,\"SeverityText\":\"INFO3\",\"TraceFlags\":1}\n", "response_body": "{\"took\":0,\"errors\":false,\"items\":[{\"create\":{\"_index\":\"logs-test-idx\",\"_id\":\"\",\"_version\":0,\"result\":\"\",\"status\":201,\"_seq_no\":0,\"_primary_term\":0,\"_shards\":{\"total\":0,\"successful\":0,\"failed\":0},\"error\":{\"type\":\"\",\"reason\":\"\",\"caused_by\":{\"type\":\"\",\"reason\":\"\"}}}},{\"create\":{\"_index\":\"logs-test-idx\",\"_id\":\"\",\"_version\":0,\"result\":\"\",\"status\":201,\"_seq_no\":0,\"_primary_term\":0,\"_shards\":{\"total\":0,\"successful\":0,\"failed\":0},\"error\":{\"type\":\"\",\"reason\":\"\",\"caused_by\":{\"type\":\"\",\"reason\":\"\"}}}}]}\n", "path": "/_bulk", "method": "POST", "duration": 0.000539979, "status": "200 OK"} ``` Required config to log ``` exporters: elasticsearch: telemetry: log_request_body: true log_response_body: true service: telemetry: logs: level: debug ``` For easier analysis, limit the size of request body size. Use `num_workers`=1 and lower `flush.bytes` and/or `flush.interval`. **Link to tracking Issue:** <Issue number if applicable> **Testing:** <Describe what testing was performed and which tests were added.> Manually verified with a modified integration test. **Documentation:** <Describe the documentation added.>
Description: Introducing Logs resource info export based on new export_resource_info_on_event configuration
Link to tracking Issue: https://sentinelone.atlassian.net/browse/DSET-3998
Testing: TBD
Documentation: see exporter/datasetexporter/README.md