-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-2134 Write events to files in TLV format #841
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of using TLV, it's a nice enhancement to the originally proposed format 👍. It should be also simpler to reason about - we will start with only one BlockType
and add more when iterating on V2 ✨.
My biggest worry is on introducing InputStream
as part of this refactoring. I left deeper explanation in the comment. LMKWDYT 🙌.
internal extension Data { | ||
/// Returns a Data block in Type-Lenght-Value format. | ||
/// | ||
/// A block follow TLV with bytes aligned such as: | ||
/// | ||
/// +- 2 bytes -+- 4 bytes -+- n bytes -| | ||
/// | block type | block size (n) | block data | | ||
/// +------------+----------------+------------+ | ||
/// | ||
/// - Parameters: | ||
/// - type: The data type | ||
/// - data: The data | ||
/// - Returns: a byte sequence in TLV format. | ||
static func block(_ type: BlockType, data: Data) -> Data { | ||
return DataBlock(type: type, data: data).serialize() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this extension needed 💭🤔? I can only see its usage in tests, not in the production code. Also, as we're using Foundation type (Data
) to only attach a domain-specific (core) function to, so it doesn't seem to bring much improvement over using DataBlock(type:data:)
directly, WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, not really needed, it was more for convenience for future change. I will remove it!
Could be cool to leverage a pattern like AlamofireExtended in the future, so we can hide any extension behind a dd
property!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the dd
idea 👌. We use something similar in tracing tests to cast OT interfaces to our DD domain, but it could be used even more like you suggest 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very elegant solution 👌! I left few feedbacks, mainly on:
- removing remaining
base64
encoding for encrypted data (we don't need it anymore) - covering more edge cases in
DataBlock
unit tests.
XCTAssertEqual(blocks.first?.data.count, 0) | ||
XCTAssertEqual(blocks.last?.data.count, 99) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataBlockReader
is now a core element in the SDK and we only cover happy paths in these tests. Let's add tests for covering pesimisting scenarios:
- reading unrecognized
BlockType
withDataBlockReader
, - reading max and min
BlockSize
withDataBlockReader
, - serializing max and min
data.count
withDataBlock.serialize()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we won't be able to test maximum data block sizes (4GB) without redesigning TLV implementation and using generics to mock smaller block size footprint in bytes. Tho, I've added test for 0 bytes data and for large (10MB) data block.
28ed87b
to
6d98992
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👌🚀
What and why?
Write event to files in TLV format. This change prepares the introduction of event metadata to batches.
How?
All events will be serialised in TLV format using the following byte alignment:
This
block type
is a 2 bytes value describing how to decode the data. In this PR, only code0x00
identifying an event is used.The
data size
is written in 4 bytes so we are more flexible in term of size limit.Note
By using TLV, the format differs from the RFC so we are more resilient to header (metadata) decoding failure.
Batches will be written in
v2
folders, this PR does not remove anyv1
folder, it will be dealt in another PR.Review checklist
Custom CI job configuration (optional)