Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-2134 Write events to files in TLV format #841

Merged
merged 6 commits into from
May 9, 2022

Conversation

maxep
Copy link
Member

@maxep maxep commented May 4, 2022

What and why?

Write event to files in TLV format. This change prepares the introduction of event metadata to batches.

How?

All events will be serialised in TLV format using the following byte alignment:

+-  2 bytes -+-   4 bytes   -+- n bytes -|
| block type | data size (n) |    data   |
+------------+---------------+-----------+

This block type is a 2 bytes value describing how to decode the data. In this PR, only code 0x00 identifying an event is used.

The data size is written in 4 bytes so we are more flexible in term of size limit.

Note

By using TLV, the format differs from the RFC so we are more resilient to header (metadata) decoding failure.
Batches will be written in v2 folders, this PR does not remove any v1 folder, it will be dealt in another PR.

Review checklist

  • Feature or bugfix MUST have appropriate tests (unit, integration)
  • Make sure each commit and the PR mention the Issue number or JIRA reference
  • Add CHANGELOG entry for user facing changes

Custom CI job configuration (optional)

  • Run unit tests
  • Run integration tests
  • Run smoke tests

@maxep maxep self-assigned this May 4, 2022
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of using TLV, it's a nice enhancement to the originally proposed format 👍. It should be also simpler to reason about - we will start with only one BlockType and add more when iterating on V2 ✨.

My biggest worry is on introducing InputStream as part of this refactoring. I left deeper explanation in the comment. LMKWDYT 🙌.

Comment on lines 55 to 71
internal extension Data {
/// Returns a Data block in Type-Lenght-Value format.
///
/// A block follow TLV with bytes aligned such as:
///
/// +- 2 bytes -+- 4 bytes -+- n bytes -|
/// | block type | block size (n) | block data |
/// +------------+----------------+------------+
///
/// - Parameters:
/// - type: The data type
/// - data: The data
/// - Returns: a byte sequence in TLV format.
static func block(_ type: BlockType, data: Data) -> Data {
return DataBlock(type: type, data: data).serialize()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this extension needed 💭🤔? I can only see its usage in tests, not in the production code. Also, as we're using Foundation type (Data) to only attach a domain-specific (core) function to, so it doesn't seem to bring much improvement over using DataBlock(type:data:) directly, WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, not really needed, it was more for convenience for future change. I will remove it!
Could be cool to leverage a pattern like AlamofireExtended in the future, so we can hide any extension behind a dd property!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the dd idea 👌. We use something similar in tracing tests to cast OT interfaces to our DD domain, but it could be used even more like you suggest 👍.

@maxep maxep marked this pull request as ready for review May 5, 2022 12:16
@maxep maxep requested a review from a team as a code owner May 5, 2022 12:16
@maxep maxep requested a review from ncreated May 6, 2022 09:47
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very elegant solution 👌! I left few feedbacks, mainly on:

  • removing remaining base64 encoding for encrypted data (we don't need it anymore)
  • covering more edge cases in DataBlock unit tests.

XCTAssertEqual(blocks.first?.data.count, 0)
XCTAssertEqual(blocks.last?.data.count, 99)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataBlockReader is now a core element in the SDK and we only cover happy paths in these tests. Let's add tests for covering pesimisting scenarios:

  • reading unrecognized BlockType with DataBlockReader,
  • reading max and min BlockSize with DataBlockReader,
  • serializing max and min data.count with DataBlock.serialize().

Copy link
Member Author

@maxep maxep May 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we won't be able to test maximum data block sizes (4GB) without redesigning TLV implementation and using generics to mock smaller block size footprint in bytes. Tho, I've added test for 0 bytes data and for large (10MB) data block.

@maxep maxep force-pushed the maxep/RUMM-2134/v2-storage branch from 28ed87b to 6d98992 Compare May 9, 2022 09:01
@maxep maxep requested a review from ncreated May 9, 2022 09:11
Copy link
Member

@ncreated ncreated left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👌🚀

@maxep maxep merged commit 727f969 into feature/v2-storage May 9, 2022
@maxep maxep deleted the maxep/RUMM-2134/v2-storage branch May 9, 2022 09:47
@ncreated ncreated mentioned this pull request Jul 21, 2022
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants