Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logstorage: Adds option to write logs in gzip format #442

Merged
merged 3 commits into from
Apr 19, 2023
Merged

logstorage: Adds option to write logs in gzip format #442

merged 3 commits into from
Apr 19, 2023

Conversation

LiquidityC
Copy link
Contributor

@LiquidityC LiquidityC commented Feb 16, 2023

Adds functionality to allow storing offline logs in gzipped files instead of standard dlt files.

This change got more invasive then I would have liked. But it can't be helped. Since I'm quite new to the project I wouldn't mind some extra scrutiny.

In particular in regards to the dlt_logstorage_get_idx_of_log_file() function which got a major overhaul. Perhaps I'm under-thinking this logic?

Currently the setting GzipCompression=1 is handled as an int. Is this something where we'd like to introduce a boolean settings type? It could still be 0/1 but we can parse it to a boolean instead perhaps? I don't mind putting in some extra work if it helps get this through. We (my company) would really benefit from it.

@LiquidityC LiquidityC marked this pull request as draft February 16, 2023 18:48
@LiquidityC LiquidityC marked this pull request as ready for review February 16, 2023 19:09
@LiquidityC
Copy link
Contributor Author

Fixed your comments. The optional compile possibility introduces a lot of #ifdef stuff. I'm not certain what the preferred approach is on these. I'm not a fan but also couldn't see any obvious alternatives.

@alexmohr
Copy link
Contributor

Fixed your comments. The optional compile possibility introduces a lot of #ifdef stuff. I'm not certain what the preferred approach is on these. I'm not a fan but also couldn't see any obvious alternatives.

That's something for @michael-methner to decide as I actually do not have any permissions here. Just stumbled upon your PR and was interested in the changes

@michael-methner
Copy link
Collaborator

Hello @LiquidityC ,
code changes looks good to me. @minminlittleshrimp will test your changes before we merge.

btw: Have you done performance measurements? Do you have an indication how much cpu performance is affected by turning on the compression?

Thank you very much.

@LiquidityC
Copy link
Contributor Author

Hello @LiquidityC , code changes looks good to me. @minminlittleshrimp will test your changes before we merge.

btw: Have you done performance measurements? Do you have an indication how much cpu performance is affected by turning on the compression?

Thank you very much.

No performance testing done no. I'll have to admit that I didn't check if you had any benchmarking setup in the project.

@LiquidityC
Copy link
Contributor Author

I did some very basic monitoring of CPU usage with and without gzip compression enabled (same build, only config change). With gzip enabled the worst case usage stuck solidly at 1.7% when hammering the daemon with 1 message every 1ms (WriteOnMsg strategy).

Without compression the cpu usage would fluctuate between 1.0-1.7%.

Wild guess is that this fluctuation comes down to I/O buffering when gzip is disabled. But when it's enabled we have a stable cpu (higher) usage consumed by compression and we draw no benefit from I/O buffering done by the OS.

This is by no means scientific. It's just me abusing the daemon and monitoring it with top. But I'd say that we have similar worst case performances but we take a hit on best case performance. This will be impacted by the writing strategies utilized.

In our use case I'm certain we have the cycles to spare. Our concern and the reason for me submitting this change is 'wear leveling' and a reduction in disk usage.

Copy link
Collaborator

@minminlittleshrimp minminlittleshrimp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add status so that user could check when build with zip, maybe like:
message(STATUS "WITH_DLT_LOGSTORAGE_CTRL_UDEV = ${WITH_DLT_LOGSTORAGE_CTRL_UDEV}")
message(STATUS "WITH_DLT_LOGSTORAGE_GZIP = ${WITH_DLT_LOGSTORAGE_GZIP}")
...

@LiquidityC
Copy link
Contributor Author

Squashed the commits.

@minminlittleshrimp
Copy link
Collaborator

Hello all,

I have done some raw stress tests from my side:

Setup:

dlt_logstorage.conf
[FILTER1]
EcuID=ECU1
FileSize=100000
File=FLASH
NOFiles=100
LogLevel=DLT_LOG_VERBOSE
GzipCompression=<1/0>

dlt.conf: enable:

OfflineLogstorageMaxDevices = 1
OfflineLogstorageDirPath = your_path

Enable $dlt-daemon
Enable Test binaries in turn
Observe and record result using Gnome system monitor
Result:


$ dlt-test-multi-process -m 100000 -p 100 -t 1 -l 1000 -d 10 -g
Setup done. Listening. My pid: 19858
without gzip: 0.25 ~ 0.38 ~ 0.5%CPU
with gzip: 0.37 ~ 0.5 ~ 0.63%CPU


$ dlt-test-stress -123
Starting stress test1...

  • Register 3000 contexts...
  • Unregister 3000 contexts...
    Finished stress test1

Starting stress test2...

  • Creating 64 Threads, each of them registers one context,
    sending one log message, then unregisters the context
    Finished stress test2

Starting stress test3...

  • Logging raw data, up to a size of 512
    Finished stress test3

without gzip: max 1.37%CPU
with gzip: max 1.38%CPU


$ dlt-test-stress-user -n 100000 -d 1 -s 10000
without gzip: max 2.18%CPU 10MB 100 ~ 700KiB/s
with gzip: max 2.85%CPU 10.6MB stable 290 ~ 300 KiB/s


Conclusion: Gzip feature is working fine with CPU consumption at an acceptable level
Regards

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Mar 15, 2023

Hello @LiquidityC

I would like to propose my suggestion for code sequential cohesion
There are 2 points could be considered to apply my patch:

  1. Keep the build logs silent: Current implementation throws some warning msgs
[ 31%] Building C object src/daemon/CMakeFiles/dlt-daemon.dir/__/shared/dlt_user_shared.c.o
[ 32%] Building C object src/daemon/CMakeFiles/dlt-daemon.dir/__/offlinelogstorage/dlt_offline_logstorage.c.o
/home/lum3hc/work/github/PR442/dlt-daemon/src/offlinelogstorage/dlt_offline_logstorage.c: In function ‘dlt_logstorage_check_gzip_compression’:
/home/lum3hc/work/github/PR442/dlt-daemon/src/offlinelogstorage/dlt_offline_logstorage.c:1090:60: warning: unused parameter ‘value’ [-Wunused-parameter]
                                                      char *value)
                                                            ^~~~~
At top level:
/home/lum3hc/work/github/PR442/dlt-daemon/src/offlinelogstorage/dlt_offline_logstorage.c:439:16: warning: ‘dlt_logstorage_read_bool’ defined but not used [-Wunused-function]
 DLT_STATIC int dlt_logstorage_read_bool(unsigned int *boolean, char *value)
                ^~~~~~~~~~~~~~~~~~~~~~~~
[ 34%] Building C object src/daemon/CMakeFiles/dlt-daemon.dir/__/offlinelogstorage/dlt_offline_logstorage_behavior.c.o
[ 35%] Linking C executable dlt-daemon
[ 35%] Built target dlt-daemon

  1. Beautify src code, make them procedural and sequential cohesion
DLT_STATIC int dlt_logstorage_check_nofiles(DltLogStorageFilterConfig *config,
                                            char *value)
{
    if ((config == NULL) || (value == NULL))
        return -1;

    return dlt_logstorage_read_number(&config->num_files, value);
}

DLT_STATIC int dlt_logstorage_check_gzip_compression(DltLogStorageFilterConfig *config,
                                                     char *value)
{
    if ((config == NULL) || (value == NULL))
        return -1;
        
    return dlt_logstorage_read_bool(&config->gzip_compression, value);
}

DLT_STATIC int dlt_logstorage_check_specificsize(DltLogStorageFilterConfig *config,
                                                 char *value)
{
    if ((config == NULL) || (value == NULL))
        return -1;

    return dlt_logstorage_read_number(&config->specific_size, value);
}

Please kindly have a look and leave your comment
My patch: sequential_cohesion.patch

Thank you so much for your contribution!
Minh

@LiquidityC
Copy link
Contributor Author

LiquidityC commented Mar 16, 2023

Hi @minminlittleshrimp

I addressed the comments you had however I don't agree with your patch as it introduces possible errors and still allows miss-configuration.

  1. Moving the #ifdef DLT_LOGSTORAGE_USE_GZIP into the function dlt_logstorage_read_bool() prevents re-use of the function because it's now bound to the GZIP feature. Which will require unrelated edits in future changes if anyone does want to reuse the function.
  2. The introduced #ifdef doesn't prevent the configuration from being set to true which could cause problems. See below: image The boolean is set to 0 within the #ifdef (*boolean = 0) but logic is allowed to continue and can re-set the boolean to 1 if this is what's configured.
  3. Also (looking at the image above). Removing the if (value == NULL) check can cause strnlen(value, 5) to segfault if value is in fact NULL which can be dangerous. The surrounding functions do employ this value == NULL check so I think we should follow suit here.
  4. In regards to beautifying and sequential cohesion. I'm not sure what you mean by sequential cohesion. Are you referring to the groups of functions similarity in logic and layout in code? I agree that this has value to an extent. But not at the cost of logic and safety.

I have submitted a patch that does address the compiler warnings and also prevent config->gzip_compression from being set to true when gzip is not enabled.

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Mar 16, 2023

Hello @LiquidityC
Thank you for pointing out.
I agree with you, I just did not consider the reusability of the function.
Your analysis is very detail, and I feel fine with your fix.
I would like to complete my review and we will wait for @michael-methner then.
Before that, could you resolve the conflict, and also the unit test result 😀 ?
Great to work with you on this topic!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
Could you kindly upload unit test result for this feature?
triggered by: -DWITH_DLT_UNIT_TESTS=ON
Many thanks in advance!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
Please kindly rebase and upload your unit test result onto this review comment.
Thank you in advance!

Copy link
Contributor Author

@LiquidityC LiquidityC Mar 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the log. I needed to fix some tests but there is one that keeps failing. I think the socket is blocking itself somehow. Perhaps this is a known problem test?

image
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, let me check that fail test case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TADA!

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for your hard work 💪

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do one more change. I'm forgetting about the compile flags. It will be in shortly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DltLogStorageFilterConfig::gzlog doesn't exist if WITH_DLT_LOGSTORAGE_GZIP isn't enabled.
I safed up most of the tests by ensuring data is cleared in the DltLogStorageConfig before usage since not doing this can cause some awkward bugs.
New commit submitted with this fix. Testresults are confirmed both with and without WITH_DLT_LOGSTORAGE_GZIP.

Sorry about all the confusion. Lot's of things on my table this week 😄

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, sooner or not DLT team will need to zero initialize those POD objects in those test suites.
Thank you so much!

Adds functionality to allow storing offline logs in gzipped files
instead of standard dlt files.
@minminlittleshrimp
Copy link
Collaborator

@lvklevankhanh
Please review and kindly approve for workflows!

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Mar 22, 2023

Hello @LiquidityC
For the next 2 weeks we actually plan to release a new minor update alpha version for dlt. Hence, we would like to postpone and merge your feature after the release. It would be great to have this new feature gzip merged in the new version.
FYI: #455

Many thanks,
DLT team

@LiquidityC
Copy link
Contributor Author

@minminlittleshrimp Understandable. Just to double check. I'll just wait. Apart from possible review comments/changes there is nothing else you require from me. Right?

@minminlittleshrimp
Copy link
Collaborator

@minminlittleshrimp Understandable. Just to double check. I'll just wait. Apart from possible review comments/changes there is nothing else you require from me. Right?

Sure, I think we’ve done the best so far. For now please kindly wait for the release.
Thanks for your understanding.

@michael-methner
Copy link
Collaborator

Hello @LiquidityC ,
thanks for the patch. It is ready to merge but we will delay the merge until we have released the next dlt version. We do not want to add features shortly before a release. Please wait for another two weeks.

@michael-methner michael-methner merged commit 5ca800c into COVESA:master Apr 19, 2023
@tjlingesh
Copy link

tjlingesh commented Aug 3, 2023

Hi @michael-methner / @LiquidityC
After including this patch, when I decompress .gz offline storage logs, I am getting 0 kb file size for the decompressed logs.
I am using gzip utility in my target, will that be enough? or do I need to explicitly use zlib library also in my machine/target?
Can you please help me how to proceed?

@LiquidityC
Copy link
Contributor Author

What size are the gz log files? What does your dlt_logstorage.conf look like?

@tjlingesh
Copy link

tjlingesh commented Aug 3, 2023

What size are the gz log files? What does your dlt_logstorage.conf look like?

Compressed log files are around 16 KB.

dlt_logstorage.conf file looks like
[FILTER1]
LogLevel=DLT_LOG_INFO
File=test
FileSize=20000
NOFiles=1
EcuID=ECU
SyncBehavior=ON_SPECIFIC_SIZE
SpecificSize=4096
GzipCompression=on

@LiquidityC
Copy link
Contributor Author

I don't think there's anything in the log then. Your sync behavior is set to write when a certain buffer size has been reached. Are you sure the logs have been written when you are checking the file? Perhaps use a more aggressive sync behavior just to confirm that the log is actually being written?

I also see that you haven't specified any LogAppName or ContextName in your filter. I'm not sure how the log storage behaves if these are omitted. You should probably specify at least one of these.
Is there any output from dlt-daemon when it's being started? If it's not happy with your dlt_logstorage.conf file there might be some info output about that on boot.

@tjlingesh
Copy link

I don't think there's anything in the log then. Your sync behavior is set to write when a certain buffer size has been reached. Are you sure the logs have been written when you are checking the file? Perhaps use a more aggressive sync behavior just to confirm that the log is actually being written?

I also see that you haven't specified any LogAppName or ContextName in your filter. I'm not sure how the log storage behaves if these are omitted. You should probably specify at least one of these. Is there any output from dlt-daemon when it's being started? If it's not happy with your dlt_logstorage.conf file there might be some info output about that on boot.

By adding LogAppName and ContextName and removing sync behavior in dlt_logstorage.conf file, decompressed logs are working fine.

Thanks for the quick response.

@minminlittleshrimp
Copy link
Collaborator

minminlittleshrimp commented Aug 4, 2023

Hello @tjlingesh

There are many options for the filter:

  • Only ECUid
  • Only Appid(s)
  • Only Ctxid(s)
  • Only 1 pair App-Ctx
  • Ecuid+Appid(s)
  • Ecuid+Ctxid(s)
  • Ecuid + 1 Appid + 1 Ctxid

Except for above scenarios, we do not support multi App + multi Ctx at the same time, and at least 1 field (Ecu,app,ctx) must be declared or else no logs will be collected.

For the sync behavior, you need to send a sync signal, say:

$ dlt-logstorage-ctrl -s

Could you please kindly try again with this command for the sync behavior?

Regards
DLT team

@@ -53,28 +53,30 @@

#include <search.h>
#include <stdbool.h>
#include <zlib.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, quick question regarding zlib use in dlt-deamon. If none of WITH_DLT_LOGSTORAGE_GZIP, WITH_DLT_COREDUMPHANDLER, or WITH_DLT_FILETRANSFER are defined, isn't ZLIB_LIBRARY undefined? And if so, how does CMake find this header (assuming it's not installed on the system), and how is zlib linked to dlt-daemon?

I'm trying to understand how this project works without zlib installed on the system already (if at all). Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this might be obsolete now since this PR is old and AFAIC see there has been some re-writing of logstorage. ZLIB is linked with the daemon in src/daemon/CMakeLists.txt. Guarded by the variables you mentioned.
If ZLIB is excluded the variable is set to empty string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants