Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iox #324 roudi improve app shutdown #333

Merged

Conversation

marthtz
Copy link
Contributor

@marthtz marthtz commented Nov 2, 2020

Introduce 2-step shutdown procedure for apps/processes.
First with a SIGTERM and if they have not terminated after finallKillTime they're killed with SIGKILL.

marthtz and others added 2 commits November 2, 2020 13:41
…martC comment section

Signed-off-by: Eickhoff Bernhard (CC-AD/ESW1) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
@marthtz marthtz added the bugfix Solves a bug label Nov 2, 2020
@marthtz marthtz linked an issue Nov 2, 2020 that may be closed by this pull request
iceoryx_posh/source/roudi/roudi_process.cpp Show resolved Hide resolved
iceoryx_utils/README.md Outdated Show resolved Hide resolved
iceoryx_utils/README.md Show resolved Hide resolved
iceoryx_utils/README.md Outdated Show resolved Hide resolved
iceoryx_utils/README.md Outdated Show resolved Hide resolved
bool removeProcess(const ProcessName_t& name) noexcept;

/// @brief Removes the given process from the managed client process list without taking the list's lock!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not lead to races ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Could be dangerous. See @budrus, @elBoberido comments above. Will be addressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about passing a reference to a lock guard to this function, then it's impossible to call this without locking a mutex beforehand.
This should maybe be a rule of thumb when we have functions which need to be called with a locked mutex. Let the type system help us, memories are like tears in the rain ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a lock guard reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the list not be wrapped in a smart lock to avoid this? In most cases it is best that the data structure itself only provides access protected by mutex. If this is not easily possible (requires much refactoring, or undesired for some technical reason) the approach suggested by @elBoberido is the next best thing I guess (we could unfortunately pass an unreleated mutex in this lock_guard...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @MatthiasKillat . I would also explore and try to wrap the smart_lock around this list. Is this possible @marthtz

Copy link
Member

@elBoberido elBoberido Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elfenpiff then this has to be massively refactored or a recursive mutex has to be used. Keep in mind, the mutex might have to wrap more than the list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then I would say leave it ... I didn't see that coming.

Copy link
Contributor

@sculpordwarf sculpordwarf Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeProcess is private and only called twice in killAllProcesses. killAllProcesses already takes the lock. Making a mutex recursive, because you can`t manage the calls correctly in your private functions is not the right way in my opinion. If you are afraid of accidentally use your own private function wrong, I would recommend to use a subfunction in killAllProcesses via lambda in c++ and define it after the lock to make it even more clear:

void ProcessManager::killAllProcesses(const units::Duration finalKillTime) noexcept
{
    std::lock_guard<std::mutex> g(m_mutex);
    auto removeProcess = [](ProcessList_t::iterator& processIter) 
    {
       if (processIter != m_processList.end())
       {
          m_portManager.deletePortsOfProcess(processIter->getName());
          m_processIntrospection->removeProcess(processIter->getPid());
          processIter = m_processList.erase(processIter); // delete application
          return true;
       }
       return false;
    };
    cxx::vector<bool, MAX_PROCESS_NUMBER> processStillRunning(m_processList.size(), true);
    int i = 0;
    .
    .
    .
}

iceoryx_posh/include/iceoryx_posh/iceoryx_posh_types.hpp Outdated Show resolved Hide resolved
iceoryx_posh/source/roudi/roudi_process.cpp Show resolved Hide resolved
iceoryx_posh/source/roudi/roudi_process.cpp Outdated Show resolved Hide resolved
iceoryx_posh/source/roudi/roudi_process.cpp Outdated Show resolved Hide resolved
iceoryx_posh/source/roudi/roudi_process.cpp Outdated Show resolved Hide resolved
@@ -57,7 +57,8 @@ class RouDi
const config::MonitoringMode f_monitoringMode = config::MonitoringMode::ON,
const bool f_killProcessesInDestructor = true,
const MQThreadStart mqThreadStart = MQThreadStart::IMMEDIATE,
const version::CompatibilityCheckLevel compatibilityCheckLevel = version::CompatibilityCheckLevel::PATCH);
const version::CompatibilityCheckLevel compatibilityCheckLevel = version::CompatibilityCheckLevel::PATCH,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we reached now the maximum number of arguments for the good old roudi ctor. I would suggest the following:

  1. Create a struct called struct RoudiStartupParameters and move all the arguments from here in there.
  2. Use this struct RoudiStartupParameters in this ctor instead of all the arguments which provides us with the big advantage that we only have to change one place if we add another argument to roudi.
  3. Use it like
RouDi myRoudi(RoudiStartupParameters{memoryInterface, portManager, config::MonitoringMode::ON ....});

then we do not have to change much in the source code. We just have to add RoudiStartupParameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about creating a new issue for that with a separate PR to follow?! That'll help keep to keep our PRs smaller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I would disagree. The code changes should be minor and the struct would only need some member without any logic at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have something which we could use, it's the RouDiConfig ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkout latest commit.

iceoryx_posh/source/roudi/roudi_process.cpp Show resolved Hide resolved
iceoryx_utils/README.md Outdated Show resolved Hide resolved
iceoryx_utils/README.md Outdated Show resolved Hide resolved
iceoryx_utils/README.md Outdated Show resolved Hide resolved
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
}
}

auto it = m_processList.begin();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when m_processList is empty. Then it would point to m_processList.end() since it is equal to begin in this case and removeProcess would segfault since we are accessing an invalid element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method checks if it is iterator.end() so nothing bad happens. But you are right, it's not obvious from the function call

bool removeProcess(const ProcessName_t& name) noexcept;

/// @brief Removes the given process from the managed client process list without taking the list's lock!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about passing a reference to a lock guard to this function, then it's impossible to call this without locking a mutex beforehand.
This should maybe be a rule of thumb when we have functions which need to be called with a locked mutex. Let the type system help us, memories are like tears in the rain ;)

}
}

auto it = m_processList.begin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the method checks if it is iterator.end() so nothing bad happens. But you are right, it's not obvious from the function call

iceoryx_posh/source/roudi/roudi_process.cpp Show resolved Hide resolved
iceoryx_utils/README.md Show resolved Hide resolved
@mossmaurice mossmaurice added this to the v1.0 milestone Nov 4, 2020
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
Copy link
Contributor

@elfenpiff elfenpiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve bug in ProcessManager::registerProcess

iceoryx_posh/source/roudi/roudi_cmd_line_parser.cpp Outdated Show resolved Hide resolved
iceoryx_posh/source/roudi/roudi_cmd_line_parser.cpp Outdated Show resolved Hide resolved
iceoryx_posh/source/roudi/roudi_cmd_line_parser.cpp Outdated Show resolved Hide resolved
@@ -49,15 +49,35 @@ class RouDi
DEFER_START
};

struct RoudiStartupParameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at one day, this needs to go to RouDiConfig and also made available in the TOML config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably also valid for some other predefined values (e.g. discovery interval, etc.), isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would see this topic in a separate PR/issue. I created it here: #353

Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
Signed-off-by: Hintz Martin (CC-AD/ESW1) <[email protected]>
@budrus
Copy link
Contributor

budrus commented Nov 13, 2020

@ithier @elfenpiff LGTM?

@marthtz marthtz requested a review from orecham November 16, 2020 13:22
@marthtz marthtz merged commit 487652b into eclipse-iceoryx:master Nov 16, 2020
@marthtz marthtz deleted the iox-#324-roudi-improve-app-shutdown branch November 16, 2020 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix Solves a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RouDi does not free shared memory properly on x86 Linux
10 participants