Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct ndpi_detection_module_struct and multi-thread #1344

Closed
IvanNardi opened this issue Oct 13, 2021 · 4 comments
Closed

struct ndpi_detection_module_struct and multi-thread #1344

IvanNardi opened this issue Oct 13, 2021 · 4 comments
Labels

Comments

@IvanNardi
Copy link
Collaborator

I am trying to summarize here the various discussions on multi threads support.

What is the current situation?

*) struct ndpi_detection_module_struct is not thread-safe and it has never been.
*) This limitation is not well documented
*) ndpiReader example configures one struct ndpi_detection_module_struct for each thread
*) any implementations sharing the same struct ndpi_detection_module_struct across multiple threads is buggy
*) struct ndpi_detection_module_struct is not thread-safe because of the LRU caches (at least)
*) commit 730c236 made struct ndpi_detection_module_struct even less thread-safe
*) it seems possible to make struct ndpi_detection_module_struct thread-safe (even after 730c236)

Now, what options do we have?

  1. Do nothing (apart from improving documentation). Keep struct ndpi_detection_module_struct not thread safe; keep using one instance of it for each thread/core

  2. We acknowledge that having some shared structures/data among the thread/core might be useful but we don't want to change struct ndpi_detection_module_struct. We might implement two different "contexts": one "global context" (one for application; thread safe; to be defined) and we keep struct ndpi_detection_module_struct as a "local context" (one for thread/core). Nothing new here: this is a common pattern.

  3. We make struct ndpi_detection_module_struct full thread safe (does that imply a penalty cost when using one instance per thread?)

  4. ...

Let me know if I got something wrong and let's discuss!

IvanNardi referenced this issue Oct 13, 2021
)

There are no real reasons to embed `struct ndpi_packet_struct` (i.e. "packet")
in `struct ndpi_flow_struct` (i.e. "flow"). In other words, we can avoid
saving dissection information of "current packet" into the "flow" state,
i.e. in the flow management table.

The nDPI detection module processes only one packet at the time, so it is
safe to save packet dissection information in `struct ndpi_detection_module_struct`,
reusing always the same "packet" instance and saving a huge amount of memory.
Bottom line: we need only one copy of "packet" (for detection module),
not one for each "flow".

It is not clear how/why "packet" ended up in "flow" in the first place.
It has been there since the beginning of the GIT history, but in the original
OpenDPI code `struct ipoque_packet_struct` was embedded in
`struct ipoque_detection_module_struct`, i.e. there was the same exact
situation this commit wants to achieve.

Most of the changes in this PR are some boilerplate to update something
like "flow->packet" into something like "module->packet" throughout the code.
Some attention has been paid to update `ndpi_init_packet()` since we need
to reset some "packet" fields before starting to process another packet.

There has been one important change, though, in ndpi_detection_giveup().
Nothing changed for the applications/users, but this function can't access
"packet" anymore.
The reason is that this function can be called "asynchronously" with respect
to the data processing, i.e in context where there is no valid notion of
"current packet"; for example ndpiReader calls it after having processed all
the traffic, iterating the entire session table.

Mining LRU stuff seems a bit odd (even before this patch): probably we need
to rethink it, as a follow-up.
@vel21ripn
Copy link
Contributor

It makes sense to discuss.

Fully initializing ndpi_detection_module_struct in every thread is very memory intensive. This is not only the ndpi_detection_module_struct structure itself(228 kB), but also all the data in it. ahocorasick, patricia, ndpi_default_ports_tree_node_t also take up memory.
The efficiency of libcache generally drops to 0.
Based on this, I think that we should try to make libndpi thread-safe.

At this point, we know for sure that there are two elements in struct ndpi_detection_module_struct that are not multithread safe. These are libcache and LRU.
It's easy to fix.
Theoretically, there are common structures of the "ndpi_id_struct" type, but there practically all data is atomic except for irc_port, last_time_port_used.

I think that, first of all, we need a test case that could demonstrate two things: working in the mode of one thread gives and in the mode of several threads gives the same result, and the execution speed should decrease in proportion to the number of threads.

I think an ideal candidate or starting point for this is examples/ndpiSimpleIntegration.c

If successful, you can add constraints to the documentation.

The implementation of multithreading does not require additional resources as long as we have an insignificant amount of common modifiable data.

Before commit 730c236, one limitation was significant: one flow is simultaneously processed by one thread, since all the data we change is in the ndpi_flow structure.

After commit 730c236, it is quite easy to restore multithreading:

  1. replace "& ndpi_struct.packet" with a macro or inline function in the library code.
  2. add a call to the API that returns the address of the ndpi_packet_struct structure (though I don't know why this is needed in ndpiReader).
  3. Fix LRU
  4. replace libcache.

I haven't come across very often implementations of blocking in userspace. You need to understand which locking mechanism to use, provided that there are no lock-free options. So far, I think mutex is the only portable solution.
Maybe there are more lightweight implementations of locks similar to the linux kernel RCU?

@lucaderi
Copy link
Member

I agree that "sharing" a ndpi_struct.packet could save memory, but this if and only if all threads are supposed to do the same job (for example you use RSS to share the load across multiple threads). In this case you have a lot of traffic so a few KB extra isn't a problem if this guarantees you consistency.
I would spend time to see if the memory could be further reduced by removing unnecessary data structures that might ease this task of re-introducing multithread.
As of shared instances such as LRU and libcache, they are places where some synchronization is currently necessary, you are right. I believe that having a per-thread instance is a good solution for avoiding locks in multithreaded environments: what is you view on this?

@vel21ripn
Copy link
Contributor

I started testing the variant with asynchronous work in ndpiReader.
The main thread reads the file and distributes the packets among several threads.
The only limitation is that one connection can only be processed by one thread at a time.
For work, one ndpi_detection_module_struct structure and several threads executing ndpi_detection_process_packet () are used.

@IvanNardi
Copy link
Collaborator Author

Closing: struct ndpi_detection_module_struct is not thread safe but data can be shared among threads via "global context" (see 400cd51)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants