RUMM-2134 Write events to files in TLV format #841

maxep · 2022-05-04T11:49:45Z

What and why?

Write event to files in TLV format. This change prepares the introduction of event metadata to batches.

How?

All events will be serialised in TLV format using the following byte alignment:

+-  2 bytes -+-   4 bytes   -+- n bytes -|
| block type | data size (n) |    data   |
+------------+---------------+-----------+

This block type is a 2 bytes value describing how to decode the data. In this PR, only code 0x00 identifying an event is used.

The data size is written in 4 bytes so we are more flexible in term of size limit.

Note

By using TLV, the format differs from the RFC so we are more resilient to header (metadata) decoding failure.
Batches will be written in v2 folders, this PR does not remove any v1 folder, it will be dealt in another PR.

Review checklist

Feature or bugfix MUST have appropriate tests (unit, integration)
Make sure each commit and the PR mention the Issue number or JIRA reference
Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

Run unit tests
Run integration tests
Run smoke tests

Sources/Datadog/Core/Persistence/DataBlock.swift

ncreated

I really like the idea of using TLV, it's a nice enhancement to the originally proposed format 👍. It should be also simpler to reason about - we will start with only one BlockType and add more when iterating on V2 ✨.

My biggest worry is on introducing InputStream as part of this refactoring. I left deeper explanation in the comment. LMKWDYT 🙌.

Sources/Datadog/Core/Persistence/DataBlock.swift

Sources/Datadog/Core/Feature.swift

Sources/Datadog/Core/Persistence/Reading/FileReader.swift

ncreated · 2022-05-05T07:45:54Z

Sources/Datadog/Core/Persistence/DataBlock.swift

+internal extension Data {
+    /// Returns a Data block in Type-Lenght-Value format.
+    ///
+    /// A block follow TLV with bytes aligned such as:
+    ///
+    ///     +-  2 bytes -+-   4 bytes    -+- n bytes  -|
+    ///     | block type | block size (n) | block data |
+    ///     +------------+----------------+------------+
+    ///
+    /// - Parameters:
+    ///   - type: The data type
+    ///   - data: The data
+    /// - Returns: a byte sequence in TLV format.
+    static func block(_ type: BlockType, data: Data) -> Data {
+        return DataBlock(type: type, data: data).serialize()
+    }
+}


Is this extension needed 💭🤔? I can only see its usage in tests, not in the production code. Also, as we're using Foundation type (Data) to only attach a domain-specific (core) function to, so it doesn't seem to bring much improvement over using DataBlock(type:data:) directly, WDYT?

Yea, not really needed, it was more for convenience for future change. I will remove it!
Could be cool to leverage a pattern like AlamofireExtended in the future, so we can hide any extension behind a dd property!

I like the dd idea 👌. We use something similar in tracing tests to cast OT interfaces to our DD domain, but it could be used even more like you suggest 👍.

ncreated

Very elegant solution 👌! I left few feedbacks, mainly on:

removing remaining base64 encoding for encrypted data (we don't need it anymore)
covering more edge cases in DataBlock unit tests.

Sources/Datadog/Core/Persistence/DataBlock.swift

Sources/Datadog/Core/Persistence/Reading/FileReader.swift

Sources/Datadog/Core/Persistence/Writing/FileWriter.swift

Sources/Datadog/Core/Persistence/DataBlock.swift

ncreated · 2022-05-06T10:20:01Z

Tests/DatadogTests/Datadog/Core/Persistence/DataBlockTests.swift

+        XCTAssertEqual(blocks.first?.data.count, 0)
+        XCTAssertEqual(blocks.last?.data.count, 99)
+    }
+}


DataBlockReader is now a core element in the SDK and we only cover happy paths in these tests. Let's add tests for covering pesimisting scenarios:

reading unrecognized BlockType with DataBlockReader,

reading max and min BlockSize with DataBlockReader,

serializing max and min data.count with DataBlock.serialize().

As discussed, we won't be able to test maximum data block sizes (4GB) without redesigning TLV implementation and using generics to mock smaller block size footprint in bytes. Tho, I've added test for 0 bytes data and for large (10MB) data block.

ncreated

Looks good 👌🚀

RUMM-2134 Write events to files in TLV format

6a0a80d

maxep commented May 4, 2022

View reviewed changes

Sources/Datadog/Core/Persistence/DataBlock.swift Outdated Show resolved Hide resolved

maxep self-assigned this May 4, 2022

maxep commented May 4, 2022

View reviewed changes

Sources/Datadog/Core/Persistence/DataBlock.swift Outdated Show resolved Hide resolved

ncreated reviewed May 5, 2022

View reviewed changes

maxep added 3 commits May 5, 2022 14:10

RUMM-2134 Remove unused methods

443e87a

RUMM-2134 Add data block tests

6a982f8

RUMM-2134 Remove event type example

8938a88

maxep marked this pull request as ready for review May 5, 2022 12:16

maxep requested a review from a team as a code owner May 5, 2022 12:16

maxep requested a review from ncreated May 6, 2022 09:47

ncreated requested changes May 6, 2022

View reviewed changes

maxep added 2 commits May 9, 2022 10:55

RUMM-2134 Remove b64 for encryption

dbe7c16

RUMM-2134 Prevent data size overflows

6d98992

maxep force-pushed the maxep/RUMM-2134/v2-storage branch from 28ed87b to 6d98992 Compare May 9, 2022 09:01

maxep requested a review from ncreated May 9, 2022 09:11

ncreated approved these changes May 9, 2022

View reviewed changes

maxep merged commit 727f969 into feature/v2-storage May 9, 2022

maxep deleted the maxep/RUMM-2134/v2-storage branch May 9, 2022 09:47

0xnm mentioned this pull request May 12, 2022

RUMM-2177: Use TLV format for data storage DataDog/dd-sdk-android#931

Merged

3 tasks

ncreated mentioned this pull request Jul 21, 2022

Dogfood recent changes #937

Merged

6 tasks

cltnschlosser mentioned this pull request Dec 16, 2022

Crash: Could not allocate memory #1091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUMM-2134 Write events to files in TLV format #841

RUMM-2134 Write events to files in TLV format #841

maxep commented May 4, 2022

ncreated left a comment

ncreated May 5, 2022

maxep May 5, 2022

ncreated May 6, 2022

ncreated left a comment

ncreated May 6, 2022

maxep May 9, 2022 •

edited

Loading

ncreated left a comment

RUMM-2134 Write events to files in TLV format #841

RUMM-2134 Write events to files in TLV format #841

Conversation

maxep commented May 4, 2022

What and why?

How?

Note

Review checklist

Custom CI job configuration (optional)

ncreated left a comment

Choose a reason for hiding this comment

ncreated May 5, 2022

Choose a reason for hiding this comment

maxep May 5, 2022

Choose a reason for hiding this comment

ncreated May 6, 2022

Choose a reason for hiding this comment

ncreated left a comment

Choose a reason for hiding this comment

ncreated May 6, 2022

Choose a reason for hiding this comment

maxep May 9, 2022 • edited Loading

Choose a reason for hiding this comment

ncreated left a comment

Choose a reason for hiding this comment

maxep May 9, 2022 •

edited

Loading