Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iox #324 roudi improve app shutdown #333

Merged
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ install/
/**/*.user
iceoryx_utils/doc/html/
iceoryx_utils/doc/latex/
*.project
1 change: 1 addition & 0 deletions iceoryx_posh/include/iceoryx_posh/iceoryx_posh_types.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ constexpr char SHM_NAME[] = "/iceoryx_mgmt";
using namespace units::duration_literals;

// Timeout
constexpr units::Duration PROCESS_FINAL_KILL_TIME = 45_s;
marthtz marked this conversation as resolved.
Show resolved Hide resolved
constexpr units::Duration PROCESS_WAITING_FOR_ROUDI_TIMEOUT = 60_s;
constexpr units::Duration DISCOVERY_INTERVAL = 100_ms;
constexpr units::Duration PROCESS_KEEP_ALIVE_INTERVAL = 3 * DISCOVERY_INTERVAL; // > DISCOVERY_INTERVAL
Expand Down
4 changes: 3 additions & 1 deletion iceoryx_posh/include/iceoryx_posh/internal/roudi/roudi.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ class RouDi
const config::MonitoringMode f_monitoringMode = config::MonitoringMode::ON,
const bool f_killProcessesInDestructor = true,
const MQThreadStart mqThreadStart = MQThreadStart::IMMEDIATE,
const version::CompatibilityCheckLevel compatibilityCheckLevel = version::CompatibilityCheckLevel::PATCH);
const version::CompatibilityCheckLevel compatibilityCheckLevel = version::CompatibilityCheckLevel::PATCH,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we reached now the maximum number of arguments for the good old roudi ctor. I would suggest the following:

  1. Create a struct called struct RoudiStartupParameters and move all the arguments from here in there.
  2. Use this struct RoudiStartupParameters in this ctor instead of all the arguments which provides us with the big advantage that we only have to change one place if we add another argument to roudi.
  3. Use it like
RouDi myRoudi(RoudiStartupParameters{memoryInterface, portManager, config::MonitoringMode::ON ....});

then we do not have to change much in the source code. We just have to add RoudiStartupParameters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about creating a new issue for that with a separate PR to follow?! That'll help keep to keep our PRs smaller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I would disagree. The code changes should be minor and the struct would only need some member without any logic at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have something which we could use, it's the RouDiConfig ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checkout latest commit.

const units::Duration finalKillTime = PROCESS_FINAL_KILL_TIME);

virtual ~RouDi();

Expand Down Expand Up @@ -131,6 +132,7 @@ class RouDi

private:
config::MonitoringMode m_monitoringMode{config::MonitoringMode::ON};
units::Duration m_finalKillTime;
};

} // namespace roudi
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,14 @@ class ProcessManager : public ProcessManagerInterface
const uint64_t sessionId,
const version::VersionInfo& versionInfo) noexcept;

void killAllProcesses() noexcept;
/// @brief Kills all registered processes. First try with a SIGTERM and if they have not terminated after
/// finallKillTime they are killed with SIGKILL. If RouDi doesn't have sufficient rights to kill the process, the
marthtz marked this conversation as resolved.
Show resolved Hide resolved
/// process is considered killed.
/// @param [in] finalKillTime RouDi On termination RouDi kills the applications and watches for the specified time,
/// if they have shut down. If they have not terminated after the specified time, RouDi sends a SIGKILL to the
/// processes. If the processes have finished after a normal kill with SIGTERM before the specified time, RouDi goes
/// on directly.
void killAllProcesses(const units::Duration finalKillTime) noexcept;

void updateLivelinessOfProcess(const ProcessName_t& name) noexcept;

Expand Down Expand Up @@ -206,7 +213,42 @@ class ProcessManager : public ProcessManagerInterface
const uint64_t sessionId,
const version::VersionInfo& versionInfo) noexcept;

/// @brief Removes the process from the managed client process list, identified by its id.
/// @param [in] name The process name which should be removed.
/// @return Returns true if the process was found and removed from the internal list.
bool removeProcess(const ProcessName_t& name) noexcept;

/// @brief Removes the given process from the managed client process list without taking the list's lock!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not lead to races ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Could be dangerous. See @budrus, @elBoberido comments above. Will be addressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about passing a reference to a lock guard to this function, then it's impossible to call this without locking a mutex beforehand.
This should maybe be a rule of thumb when we have functions which need to be called with a locked mutex. Let the type system help us, memories are like tears in the rain ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a lock guard reference.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the list not be wrapped in a smart lock to avoid this? In most cases it is best that the data structure itself only provides access protected by mutex. If this is not easily possible (requires much refactoring, or undesired for some technical reason) the approach suggested by @elBoberido is the next best thing I guess (we could unfortunately pass an unreleated mutex in this lock_guard...).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @MatthiasKillat . I would also explore and try to wrap the smart_lock around this list. Is this possible @marthtz

Copy link
Member

@elBoberido elBoberido Nov 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elfenpiff then this has to be massively refactored or a recursive mutex has to be used. Keep in mind, the mutex might have to wrap more than the list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay then I would say leave it ... I didn't see that coming.

Copy link
Contributor

@sculpordwarf sculpordwarf Nov 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeProcess is private and only called twice in killAllProcesses. killAllProcesses already takes the lock. Making a mutex recursive, because you can`t manage the calls correctly in your private functions is not the right way in my opinion. If you are afraid of accidentally use your own private function wrong, I would recommend to use a subfunction in killAllProcesses via lambda in c++ and define it after the lock to make it even more clear:

void ProcessManager::killAllProcesses(const units::Duration finalKillTime) noexcept
{
    std::lock_guard<std::mutex> g(m_mutex);
    auto removeProcess = [](ProcessList_t::iterator& processIter) 
    {
       if (processIter != m_processList.end())
       {
          m_portManager.deletePortsOfProcess(processIter->getName());
          m_processIntrospection->removeProcess(processIter->getPid());
          processIter = m_processList.erase(processIter); // delete application
          return true;
       }
       return false;
    };
    cxx::vector<bool, MAX_PROCESS_NUMBER> processStillRunning(m_processList.size(), true);
    int i = 0;
    .
    .
    .
}

/// @param [in] processIter The process which should be removed.
/// @return Returns true if the process was found and removed from the internal list.
bool removeProcess(ProcessList_t::iterator& processIter) noexcept;

enum class ShutdownPolicy
{
SIG_TERM,
SIG_KILL
};

enum class ShudownLog
{
NONE,
FULL
};

/// @brief Kills the given process in m_processList with the given signal.
marthtz marked this conversation as resolved.
Show resolved Hide resolved
/// @param [in] process The process to kill.
/// @param [in] shutdownPolicy The kill signal passed to the system kill function.
/// @param [in] shudownLog Defines the logging detail.
/// @return Returns true if the sent kill signal was successful.
bool requestShutdownOfProcess(const RouDiProcess& process,
ShutdownPolicy shutdownPolicy,
ShudownLog shudownLog) noexcept;

/// @brief Checks if the given process has terminated.
/// @param [in] process The process to be checked.
/// @return True, if the process has terminated.
bool isProcessGone(const RouDiProcess& process) noexcept;
marthtz marked this conversation as resolved.
Show resolved Hide resolved

RouDiMemoryInterface& m_roudiMemoryInterface;
PortManager& m_portManager;
mepoo::SegmentManager<>* m_segmentManager{nullptr};
Expand Down
1 change: 1 addition & 0 deletions iceoryx_posh/include/iceoryx_posh/roudi/roudi_app.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ class RouDiApp
})
.get_value());
version::CompatibilityCheckLevel m_compatibilityCheckLevel{version::CompatibilityCheckLevel::PATCH};
units::Duration m_finalKillTime{PROCESS_FINAL_KILL_TIME};

private:
bool checkAndOptimizeConfig(const RouDiConfig_t& config) noexcept;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
#include "iceoryx_posh/version/compatibility_check_level.hpp"
#include "iceoryx_utils/cxx/expected.hpp"
#include "iceoryx_utils/cxx/optional.hpp"
#include "iceoryx_utils/internal/units/duration.hpp"
#include "iceoryx_utils/log/logcommon.hpp"

namespace iox
Expand Down Expand Up @@ -60,18 +61,20 @@ class CmdLineParser
char* argv[],
const CmdLineArgumentParsingMode cmdLineParsingMode = CmdLineArgumentParsingMode::ALL) noexcept;

bool getRun() const;
iox::log::LogLevel getLogLevel() const;
MonitoringMode getMonitoringMode() const;
version::CompatibilityCheckLevel getCompatibilityCheckLevel() const;
bool getRun() const noexcept;
iox::log::LogLevel getLogLevel() const noexcept;
MonitoringMode getMonitoringMode() const noexcept;
version::CompatibilityCheckLevel getCompatibilityCheckLevel() const noexcept;
cxx::optional<uint16_t> getUniqueRouDiId() const noexcept;
units::Duration getFinalKillTime() const noexcept;

protected:
bool m_run{true};
iox::log::LogLevel m_logLevel{iox::log::LogLevel::kWarn};
MonitoringMode m_monitoringMode{MonitoringMode::ON};
version::CompatibilityCheckLevel m_compatibilityCheckLevel{version::CompatibilityCheckLevel::PATCH};
cxx::optional<uint16_t> m_uniqueRouDiId;
units::Duration m_finalKillTime{PROCESS_FINAL_KILL_TIME};
};

} // namespace config
Expand Down
3 changes: 2 additions & 1 deletion iceoryx_posh/source/roudi/application/iceoryx_roudi_app.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ void IceOryxRouDiApp::run() noexcept
m_monitoringMode,
true,
RouDi::MQThreadStart::IMMEDIATE,
m_compatibilityCheckLevel);
m_compatibilityCheckLevel,
m_finalKillTime);
waitForSignal();
}
}
Expand Down
1 change: 1 addition & 0 deletions iceoryx_posh/source/roudi/application/roudi_app.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,7 @@ void RouDiApp::setCmdLineParserResults(const config::CmdLineParser& cmdLineParse
// the "and" is intentional, just in case the the provided RouDiConfig_t is empty
m_run &= cmdLineParser.getRun();
m_compatibilityCheckLevel = cmdLineParser.getCompatibilityCheckLevel();
m_finalKillTime = cmdLineParser.getFinalKillTime();
auto uniqueId = cmdLineParser.getUniqueRouDiId();
if (uniqueId)
{
Expand Down
6 changes: 4 additions & 2 deletions iceoryx_posh/source/roudi/roudi.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ RouDi::RouDi(RouDiMemoryInterface& roudiMemoryInterface,
const config::MonitoringMode monitoringMode,
const bool killProcessesInDestructor,
const MQThreadStart mqThreadStart,
const version::CompatibilityCheckLevel compatibilityCheckLevel)
const version::CompatibilityCheckLevel compatibilityCheckLevel,
const units::Duration finalKillTime)
: m_killProcessesInDestructor(killProcessesInDestructor)
, m_runThreads(true)
, m_roudiMemoryInterface(&roudiMemoryInterface)
Expand All @@ -42,6 +43,7 @@ RouDi::RouDi(RouDiMemoryInterface& roudiMemoryInterface,
*m_roudiMemoryInterface->segmentManager().value(),
m_prcMgr.addIntrospectionSenderPort(IntrospectionMempoolService, MQ_ROUDI_NAME))
, m_monitoringMode(monitoringMode)
, m_finalKillTime(finalKillTime)
{
m_processIntrospection.registerSenderPort(
m_prcMgr.addIntrospectionSenderPort(IntrospectionProcessService, MQ_ROUDI_NAME));
Expand Down Expand Up @@ -82,7 +84,7 @@ void RouDi::shutdown()

if (m_killProcessesInDestructor)
{
m_prcMgr.killAllProcesses();
m_prcMgr.killAllProcesses(m_finalKillTime);
}

if (m_processManagementThread.joinable())
Expand Down
42 changes: 34 additions & 8 deletions iceoryx_posh/source/roudi/roudi_cmd_line_parser.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,12 @@ void CmdLineParser::parse(int argc, char* argv[], const CmdLineArgumentParsingMo
{"log-level", required_argument, nullptr, 'l'},
{"ignore-version", required_argument, nullptr, 'i'},
{"unique-roudi-id", required_argument, nullptr, 'u'},
{"compatibility", required_argument, nullptr, 'c'},
{"final-kill-time", required_argument, nullptr, 'f'},
marthtz marked this conversation as resolved.
Show resolved Hide resolved
{nullptr, 0, nullptr, 0}};

// colon after shortOption means it requires an argument, two colons mean optional argument
constexpr const char* shortOptions = "hvm:l:u:";
constexpr const char* shortOptions = "hvm:l:u:c:f:";
int32_t index;
int32_t opt{-1};
while ((opt = getopt_long(argc, argv, shortOptions, longOptions, &index), opt != -1))
Expand All @@ -56,14 +58,19 @@ void CmdLineParser::parse(int argc, char* argv[], const CmdLineArgumentParsingMo
std::cout << "-l, --log-level <LEVEL> Set log level." << std::endl;
std::cout << " <LEVEL> {off, fatal, error, warning, info, debug, verbose}"
<< std::endl;
std::cout << "-c, --compatibility Set compatibility check level between runtime and RouDi"
std::cout << "-c, --compatibility Set compatibility check level between runtime and RouDi."
<< std::endl;
std::cout << " off: no check" << std::endl;
std::cout << " major: same major version " << std::endl;
std::cout << " minor: same minor version + major check" << std::endl;
std::cout << " patch: same patch version + minor check" << std::endl;
std::cout << " commitId: same commit ID + patch check" << std::endl;
std::cout << " buildDate: same build date + commId check" << std::endl;
std::cout << "-f, --final-kill-time <UINT> Sets the time when RouDi kills the apps hard, if they"
<< std::endl;
std::cout << " have't responded after the first soft kill, in seconds."
marthtz marked this conversation as resolved.
Show resolved Hide resolved
<< std::endl;

m_run = false;
break;
case 'v':
Expand All @@ -72,7 +79,6 @@ void CmdLineParser::parse(int argc, char* argv[], const CmdLineArgumentParsingMo
std::cout << "Commit ID: " << ICEORYX_SHA1 << std::endl;
m_run = false;
break;

case 'u':
{
uint16_t roudiId{0u};
Expand Down Expand Up @@ -103,7 +109,6 @@ void CmdLineParser::parse(int argc, char* argv[], const CmdLineArgumentParsingMo
}
break;
}

case 'l':
{
if (strcmp(optarg, "off") == 0)
Expand Down Expand Up @@ -142,6 +147,21 @@ void CmdLineParser::parse(int argc, char* argv[], const CmdLineArgumentParsingMo
}
break;
}
case 'f':
marthtz marked this conversation as resolved.
Show resolved Hide resolved
{
uint32_t finalKillTimeInSeconds{0u};
constexpr uint64_t MAX_FINAL_KILL_TIME = ((1ul << 32) - 1);
marthtz marked this conversation as resolved.
Show resolved Hide resolved
if (!cxx::convert::fromString(optarg, finalKillTimeInSeconds))
{
LogError() << "The final kill time must be in the range of [0, " << MAX_FINAL_KILL_TIME << "]";
m_run = false;
}
else
{
m_finalKillTime = units::Duration::seconds(static_cast<unsigned long long int>(finalKillTimeInSeconds));
}
break;
}
case 'x':
{
if (strcmp(optarg, "off") == 0)
Expand Down Expand Up @@ -189,20 +209,20 @@ void CmdLineParser::parse(int argc, char* argv[], const CmdLineArgumentParsingMo
}
}
} // namespace roudi
bool CmdLineParser::getRun() const
bool CmdLineParser::getRun() const noexcept
{
return m_run;
}
iox::log::LogLevel CmdLineParser::getLogLevel() const
iox::log::LogLevel CmdLineParser::getLogLevel() const noexcept
{
return m_logLevel;
}
MonitoringMode CmdLineParser::getMonitoringMode() const
MonitoringMode CmdLineParser::getMonitoringMode() const noexcept
{
return m_monitoringMode;
}

version::CompatibilityCheckLevel CmdLineParser::getCompatibilityCheckLevel() const
version::CompatibilityCheckLevel CmdLineParser::getCompatibilityCheckLevel() const noexcept
{
return m_compatibilityCheckLevel;
}
Expand All @@ -211,5 +231,11 @@ cxx::optional<uint16_t> CmdLineParser::getUniqueRouDiId() const noexcept
{
return m_uniqueRouDiId;
}

units::Duration CmdLineParser::getFinalKillTime() const noexcept
{
return m_finalKillTime;
}

} // namespace config
} // namespace iox
Loading