Introduce a "ESH start level" functionality #1896

tomhoefer · 2016-07-22T11:46:34Z

Hi all,

in our project we have a lot of event subscribers and registry change listeners implemented which are called during startup / shutdown of ESH as a matter of course. In shutdown phase these services will update their model accordingly which results in the problem that the model cannot be re-built after next startup (because the subscribers / listeners have been assumed that the item / thing / link has been really deleted). We should distinguish between event sending / listener notification for adding / removal of items / things / links during framework startup / shutdown phase and normal runtime.

For this reason I would like to disable in our project that events are sent / listeners are notified during startup / shutdown. I could imagine two ways to implement this:

Providing a RuntimeStateService that can be requested in order to get the information if the runtime is started. So for the beginning the service would only consist of a single operation boolean : isStarted() and it will be injected as dynamic dependency into the registries (things, items, links, rules). Then I would skip sending events / notification if the service is present and the runtime / framework is not fully started.
Each service that requires the runtime state has to implement a RuntimeStateListener interface which is tracked by a central RuntimeStateService to be provided by solutions based on ESH. As soon as the runtime state has changed to started the service will inform all listeners about this. Once the runtime left the started state again all listeners are informed about this. For the beginning I will implement the new RuntimeStateListener interface by the AbstractRegistry (concrete registries can decide if the runtime state listener is to be provided as a service).

I think that 2 is the only valid option to implement this requirement. In 1 the runtime state service could unregister too early so that the events are sent / listeners are notified again.

What do you think?

The text was updated successfully, but these errors were encountered:

maggu2810 · 2016-07-22T12:32:41Z

How do you want to decide that the framework is started or shut down?
If I update one Eclipse SmartHome bundle that triggers some restarts (deactivate, activate) of bundles and / or services. Which one is responsible for a whole "startup" / "shutdown" state?

Isn't this product specific which services needs to be available to signal "all present" / "normal runtime"?

tomhoefer · 2016-07-22T12:38:17Z

In our product we can rely on our OSGi runtime implementation that the framework is started properly. For solutions running e.g. on Equinox I thought to use a framework listener and listen on the FrameworkEvent.STARTED

maggu2810 · 2016-07-22T12:53:49Z

I have assumed you are talking of the Startup and Shutdown phase of the Eclipse SmartHome framework. But you refer to the start and stop of the OSGi Framework. Correct?

So, there are some options, using a FrameworkListener, using a SynchronousBundleListener for bundle 0 to handle on the stopping event, ...

Is the intention that the Eclipse SmartHome framework does not fire any event as long as the OSGi Framework is not fully started (starting up or shutting down)?
But I assume we also need to react on restarts of special bundles / services etc.
If a bundle is updated or services are restarted, which Framework or Bundle Events are triggered?
The OSGi framework is still "started" but ESH bundles could disappear (but I am could be wrong, I never watched all the events).

Which one needs to be observed to differ between "normal runtime" and non-normal one?

tomhoefer · 2016-07-22T15:04:35Z

But you refer to the start and stop of the OSGi Framework. Correct?

Yes

Is the intention that the Eclipse SmartHome framework does not fire any event as long as the OSGi Framework is not fully started (starting up or shutting down)?

Yes

If a bundle is updated or services are restarted, which Framework or Bundle Events are triggered?

I think this depends on the used OSGi runtime. In our project we dont want to be informed if entities are added or removed during startup and shutdown. We have already a dedicated state that declares the framework as started.

Which one needs to be observed to differ between "normal runtime" and non-normal one?

Especially ItemAddedEvent, ItemRevomedEvent, ThingAddedEvent and ThingRemovedEvent

Can you give me an example for which bundle / service you think we need to react on its restart?

maggu2810 · 2016-07-22T15:14:31Z

Can you give me an example for which bundle / service you think we need to react on its restart?

No, not ATM.
I need to think about the whole topic in more details.

You have written you are doing this already, so I assume you know that it is working and how it is working (the architecture). I don't. 😉 Give me some time.

tomhoefer · 2016-07-22T15:21:36Z

Haven´t yet started with the implementation ;) But because it is urgent I think that I will provide a PR in the following week

kaikreuzer · 2016-07-22T15:44:14Z

I agree that it isn't easy to say whether the system is up or not up. What does it mean if the OSGi framework keeps running, but ALL ESH bundles are fully stopped and restarted? I would consider this that ESH is NOT up - hence the feature should not about the OSGi framework, but about ESH itself.

"Up" means for me that certain services have started and are available. How can this be determined and others be notified about reaching (or leaving) this state?

I see several use cases of such a feature (from recent discussions):

avoid Item/Thing/etc added/removed events when the system is only started/stopped and hence only reconstructs the status quo from the last up-time. I have seen myself the log being cluttered on shutdown with 1000 "item removed" events, which clearly makes no sense. Usually, "item removed" should mean that it has been removed from the system and won't re-appear automatically again. This is the use case @tomhoefer describes above.
we recently introduced the XML processing vetoing (Reworked XML bundle processing and thing handler initialization #1856) - this also just tries to make sure that a certain state (XMLs loaded) has been reached before starting other services (the thing handlers) (the tricky thing here might be that it is more fine-grained as it blocks single bundles depending on more detailed processing information)
Very frequently the right moment for the startup rules is discussed. So far, they are potentially triggered when not all items have been restored yet in the registry, which causes all kinds of problems. For this it would also helpful to be notified about some "system up" state, so that the rules can be safely executed.

sjsf · 2016-08-04T07:43:41Z

IMHO, a single state will not fit all of our needs. As @kaikreuzer pointed out, we e.g. have services that require other services to be up and running and fully loaded (whatever that means). Then again, there might be other services which depend on the previous ones to be started. So we will end up having several different levels of "active", like e.g. the start levels for bundles in OSGi. Additionally, the definition of these levels is going to differ for every solution built on top of ESH.

Generally, the introduction of such a framework state in a dynamic system usually is a workaround to cover up for maybe-not-so-ideal design decisions in other places. I would suggest to first look into the individual use-cases and see if we somehow can fix the root causes.

Regarding the Item/Thing/etc added/removed events, the root cause is that we cannot distinguish whether they were loaded or newly created (or removed/unloaded respectively). I'd suggest that we fix this and also let listeners/subscribers decide what kind of event/notification they actually require by either introducing new event types (i.e. ThingLoadedEvent, etc...) plus RegistryLoadListener interface, or amending the existing events and RegistryChangeListener with the corresponding information.

…on until eclipse-archived#1896 is solved Signed-off-by: Kai Kreuzer <[email protected]>

…on until #1896 is solved (#2656) Signed-off-by: Kai Kreuzer <[email protected]>

…on until eclipse-archived#1896 is solved (eclipse-archived#2656) Signed-off-by: Kai Kreuzer <[email protected]>

sjsf · 2017-05-10T08:43:16Z

Okay, it has been a while now... As we can see, there recently have been quite some topics which relate to this issue, therefore I'd like to get back to it now. I still think we should avoid using such a "startup level" construct wherever possible! But I have to admit that there are some use-cases which won't really work without it (e.g. related to the rule engine).

There recently was a blog post by @pkriens which addresses this very topic. And I think we could realize our requirements with exactly this idea, using the OSGi means for our purpose. The relevant services that we need to wait for (e.g. XML processing per binding, providers being up and running) would somehow denote that they are "finished" by registering a marker service into the SCR, carrying some defined properties.

Our "AggregateStateService" however must be configurable, as not all the services are available in every solution. Imagine there would be a solution without support for DSL based configuration, then it really does not make sense to wait for the GenericProviders to finish their loading. I'd suggest using config admin for that purpose.

As a first step, I would drop the BundleProcessorVetoManager and use such OSGi services to mark fully loaded bindings accordingly.

As a next step, I would create an AggregateStateService and make all relevant entities denote that they are finished loading. The idea would be that every service that somehow needs waiting (e.g. a SystemStartupTrigger) would create a dependency to such an aggregated state only, not to the services themselves. By that we would decouple the dependency from a concrete service into a configurable one with a semantic meaning. At the same time this allows us to define different levels of "readyness" of the system. Of course, we need to carefully define all the required properties and states, as they somehow become "API for solution providers", i.e. they must not be a big pain to maintain and should change as seldom as possible.

Does this make sense to you? Any thoughts on this?

pkriens · 2017-05-10T14:27:45Z

Aren't there any companies that can run this through OSGi? This is a very foundational service and it belongs somewhere low in the stack like Equinox or Apache Felix?

I could provide an initial implementation since I got it already running

pkriens · 2018-01-22T10:36:31Z

@SJKA I think I share your view. The danger is that you start thinking global and that always falls apart in a component model. In general, you need to handle the dependency on the requirer side that has the actual knowledge of what it needs. I.e. a rule that need X should not be evaluated before X is present. This is much better than waiting to start the rule engine until all devices have started. You need to address these things where you have concrete information (like X.1) instead of trying to handle it global.
Hope this helps.

maggu2810 · 2018-01-22T11:03:39Z

I considered things, rules, ... ready that the framework stuff is ready (thing handler could start doing its work, rules could be proceeded, etc.).
Is waiting for "all things need to be present" e.g. to be ready to execute rules possible at all?
Thing about a binding / thing handler, that is fully initialized itself, but needs an undefined time until it could detect its things (if they are online) and communicate with this one.
Should the whole rule trigger "system started" wait for an undefined time?
If a rule needs to access that things, perhaps is should be triggered by "thing online" instead.

What are the main "wait conditions" we need at all -- and which part should wait?

sjsf · 2018-01-22T12:10:45Z

What are the main "wait conditions" we need at all -- and which part should wait?

Looking at the tons of issues which are linked against this one, I'm about to say: pretty much everything 😉 But that's exactly why I'd like to avoid - as tempting as it is.

However, in the end I think it's mainly about the rule engine(s). The other cases need to be looked more deeper into, and hopefully can be solved locally.

In the rule engine(s), the major pain-point are the "system started" triggers - all other triggers won't be triggered or executed anyway, because the system simply is not "ready enough" to generate and/or receive such events (e.g. ItemStateChangeEvent), so no problem there.

The linked issues mostly refer to "items not present" because this is the most obvious error when the language model cannot infer item references - but as you pointed out, this won't be enough: Once the items are there, we will run into the next problem: the linked things (as well as the links themselves, obviously) also need to be there - otherwise the items can be nicely resolved but any sent command ends up in nirvana. Speaking about that, the corresponding ThingHandlers obviously also need to be finished initializing. If they end up being OFFLINE because they cannot reach their devices: tough luck, this might always happen.

In an ideal world, we could analyze the rule actions for the items which are referenced and wait for their things to become ONLINE/OFFLINE/UNKNOWN. This however seems pretty much impossible with more advanced, dynamic scripts where e.g. items are looked up dynamically from the ItemRegistry. And even if we overcome this problem by only considering hard-referenced items and build a 90% approximation, it might still be surprising to users if e.g. multiple items are changed in a rule but one will never become "useable" because the corresponding binding is missing. Why doesn't it execute it for the others? Can't the computer "know" that this binding is missing?

Is waiting for "all things need to be present" e.g. to be ready to execute rules possible at all?

This indeed is the key question! If we build something that isn't capable of solving this, then we won't win anything and don't even need to start.

jboeddeker · 2018-01-22T12:40:02Z

In the rule engine(s), the major pain-point are the "system started" triggers - all other triggers won't be triggered or executed anyway, because the system simply is not "ready enough" to generate and/or receive such events (e.g. ItemStateChangeEvent), so no problem there.

No, from my opinion it's not just "system started". More problems are created from the ItemStateChanged triggers triggered for example by the persistence engines.
And some bindings take more time to initialize than others.

maggu2810 · 2018-01-22T12:47:24Z

More problems are created from the ItemStateChanged triggers triggered for example by the persistence engines.

Can you add more details? A persistence service can access the item registry on service activation and persist all non UnDefType.NULL states (WRT the discussion who is allowed to set the NULL state but that is currently mostly used by the framework on item creation only) to its storage. After it has been activated, it could store every item state change to the storage, too.

jboeddeker · 2018-01-23T00:50:02Z

Sorry, i think it was misunderstandable. It's not the persisting of items but restoring (strategy = restoreOnStartup) which causes the ItemChanged trigger to be fired. In my case this was a major problem, which was mainly solved when i excluded the change from Null from the trigger condition.

//Item someitem changed
Item someitem changed from X to Y or 
Item someitem changed from Y to X

This change removed much from the startup exceptions.

mherwege · 2018-01-23T09:46:39Z

I would add two more cases that could cause issues with rules when the system is started. I have seen all of these when starting openHab. A few restarts usually gets me over the problem, but that’s not very nice.

cron triggered rules, triggered when the system has not fully initialized all its items yet
a mix of items defined in items files and through pape UI: this can cause issues if one set is loaded, and the other set is not loaded yet. The rule could be triggered on the item from the loaded set, but still fail because it does not find another item referenced in the rule body. If this happens, the rule engine may generate a syntax error and never run the rule again.

maggu2810 · 2018-01-23T10:35:47Z

Should a rule be triggered at all if

items are used that are not available
items are available but not linked
items are available, linked, but thing has no handler assigned
items are available, linked, handler assigned, but thing is offline
...

Isn't the rule engine a special use case? I don't think that could be solved with a global "system is started and rules could be executed" state at all.
Isn't it something that could be known by the rule writer only if the items need to have linked channels (and so things) or not, if the things does need already a handler or not, if the thing itself needs to be online or not, ...?
Do you really think that every "user" wants the same stuff for the same usecase (especially WRT thing communication should be established)?

adimova · 2018-02-26T21:10:07Z

Should a rule be triggered at all if

I agree with @maggu2810, such rules should nod become IDLE. The problem is that currently the ModuleHandlers - which have the needed information - have no way to inform the RuleEngine of their state, and the changes in their state. I've proposed a solution in may comment in #4468.

kaikreuzer · 2018-02-27T08:12:03Z

@maggu2810 for this issue here, we are only talking about services that need to be fully started in the first place as a pre-condition to consider any kind of rule execution. Whatever might happen during normal operation time (items not there, things offline, whatever) is not relevant for this issue here, but is indeed something that needs to be handled in the appropriate components.

lolodomo · 2018-08-23T15:39:57Z

Bump 6 months later.
Is there really no solution we could implement ?
The different problems caused by rules started to much earlier is the most important issue in openHAB. Hopefully, it is not a blocking issue.
Is there no way to add a setting to delay the startup of the rule engine ? With such a setting, I will delay the startup of 2 minutes and 99% of problems are solved.

maggu2810 · 2018-08-24T22:09:38Z

@lolodomo For ESH itself we need a clean solution.

For downstream project or at least for your setup at home you can delay the startup of the automation part easily by adding a bundle that does nothing than delay the automation activation.
I tested a simple demo here that delays the bundle start:

Should be instantiated opened and closed by a bundle activator: https://github.com/maggu2810/shk/blob/delayed-start/bundles/shk-addon-delayed-automation-start/src/main/java/de/maggu2810/shk/addon/das/impl/automationcore/DelayedAutomationStart.java
Watch the bundle events: https://github.com/maggu2810/shk/blob/delayed-start/bundles/shk-addon-delayed-automation-start/src/main/java/de/maggu2810/shk/addon/das/impl/automationcore/BundleListenerImpl.java

You can improve it to start the delay as soon as e.g. smarthome core has been started, special services are available, ...

-- edit --

I improved the implementation to delay the activation of the automation bundle IF other service references are satisfied and stopping the bundle if that references are not available anymore.
See e.g.
https://github.com/maggu2810/shk/blob/delayed-start/bundles/shk-addon-delayed-automation-start/src/main/java/de/maggu2810/shk/addon/das/impl/automationcore/CheckAutomationRequirements.java
if the thing registry and the item registry is available the automation core bundle will be started with a delay of 15 seconds, otherwise the automation bundle is stopped.

kaikreuzer · 2018-09-14T08:35:17Z

I just came across https://github.com/apache/felix/tree/trunk/systemready - this sounds like a very nice fit for our issue and probably worth to further investigate.
@cschneider As you seem to be the main author of that project, please feel free to comment/advise here - if you do not think that it fits or that it is still in an too early phase, this would be a helpful input as well 😎.

cschneider · 2018-09-14T08:58:08Z

Systemready is still in an early stage. We currently mainly use it to report ready and alive for kubernetes. There is also a similar concept in sling called health checks. Last Wednesday I talked with the creators of this and we found quite a few things that should be added to systemready.

The main missing thing we found is having tags for system checks. Each tag could then represent one of the subsystems you talked about. This tags might then replace the ready and alive types.
Other things are executing each check separately and failing it if it takes too long or blocks. I will create some issues on systemready. Any help with that is welcome.
So I think systemready should be usable soon.

Generally for determining readiness it is not good enough to look at framework started or the fact that all bundles are started. Especially with declarative services a service might appear completely asynchronous from the bundle start. So a list of required services is the only stable way. Unfortunately we are having quite some difficulties creating and managing such a list for AEM. I wonder if a special annotation could help with that (like adding a tag to a service) that is then reflected in the Manifest.

I am not sure though if I would use this for switching on/off the internal eventing of esh. Maybe there is a different solution for this. How about having different events for a thing that really appears on the binding and a thing that is merely recreated because of a startup. In the same way when shutting down it should be clear if a thing is removed externally or just because of shutdown.

kaikreuzer added enhancement Core labels Jul 25, 2016

kaikreuzer mentioned this issue Jul 27, 2016

[HomeKit] - Losing Rooms/Zones in Homekit openhab/openhab-addons#929

Closed

kaikreuzer changed the title ~~Avoid sending of events and notification of listeners during startup / shutdown of ESH~~ Introduce a "ESH start level" functionality Aug 1, 2016

kaikreuzer mentioned this issue Aug 1, 2016

Start/stop script does not wait until openHAB2 is really up and running openhab/openhab-distro#258

Closed

kaikreuzer mentioned this issue Sep 6, 2016

Refactoring bridge/thing life cycle. #2087

Merged

maggu2810 mentioned this issue Sep 6, 2016

Item initialization / item channel link handling #2121

Closed

svilenvul mentioned this issue Sep 16, 2016

Improve Thing Handler Implementation documentation #2181

Open

kaikreuzer mentioned this issue Nov 18, 2016

BaseThingHandler:channelLinked() called frequently / too much #2491

Closed

kaikreuzer mentioned this issue Dec 9, 2016

changed start level of transformation services openhab/openhab-core#89

Merged

kaikreuzer added a commit to kaikreuzer/smarthome that referenced this issue Dec 14, 2016

setting start-level for transformation bundles as a short-term soluti…

302070d

…on until eclipse-archived#1896 is solved Signed-off-by: Kai Kreuzer <[email protected]>

kaikreuzer mentioned this issue Dec 14, 2016

setting start-level for transformation bundles as a short-term soluti… #2656

Merged

maggu2810 pushed a commit that referenced this issue Dec 15, 2016

setting start-level for transformation bundles as a short-term soluti…

fba8740

…on until #1896 is solved (#2656) Signed-off-by: Kai Kreuzer <[email protected]>

chaton78 pushed a commit to chaton78/smarthome that referenced this issue Dec 23, 2016

setting start-level for transformation bundles as a short-term soluti…

6e338dc

…on until eclipse-archived#1896 is solved (eclipse-archived#2656) Signed-off-by: Kai Kreuzer <[email protected]>

sjsf mentioned this issue Jan 4, 2017

[RFC] introduce CREATED and DELETED events for Things and Items #2743

Closed

This was referenced Mar 15, 2017

rule cannot be processed because of uninitialized script #3131

Open

Karaf: system:shutdown - Could not find an Event Factory #1339

Open

kaikreuzer mentioned this issue Apr 4, 2017

scriptable automation #1783

Merged

sjsf mentioned this issue Apr 12, 2017

Xml with ds #3236

Merged

kaikreuzer mentioned this issue Apr 23, 2017

Items label, state, and category values lost on server restart #2098

Closed

maggu2810 mentioned this issue Feb 6, 2018

Feature request: prevent rules to run before startup is complete openhab/openhab-distro#640

Closed

This was referenced Feb 20, 2018

No error when Thing is used without binding installed. #5122

Open

Fixed circular service reference in automation component #4468

Merged

maggu2810 mentioned this issue Mar 8, 2018

Rule “System started” runs twice #5188

Open

sjsf mentioned this issue Apr 18, 2018

Static Code Analysis Tool reports unaddressed TODO comments in XmlDocumentBundleTracker #5275

Closed

sjsf mentioned this issue Jun 15, 2018

HttpUtil leaks resources/threads on bundle update #5739

Open

kaikreuzer mentioned this issue Sep 17, 2018

System Ready Event openhab/openhab-core#400

Closed

kaikreuzer mentioned this issue Oct 29, 2018

Introduced new ui.start bundle, which brings custom lifecycle status … openhab/openhab-core#419

Merged

mstormi mentioned this issue Dec 22, 2018

major reprocessing on items addition #6410

Open

wborn mentioned this issue Jan 5, 2019

Update persistence.html openhab/openhab-addons#4514

Closed

kaikreuzer mentioned this issue Apr 28, 2020

Replaced "classic" rule engine by a DSLRuleProvider for the NGRE openhab/openhab-core#1451

Merged

kaikreuzer mentioned this issue May 10, 2020

set Karaf start levels to optimize openHAB bundle startup order openhab/openhab-core#1467

Closed

This was referenced May 20, 2020

[homekit] home app lose the room assignment openhab/openhab-addons#7701

Closed

[homekit] bugfix #7701. set correct configuration revision on bundle start. openhab/openhab-addons#7702

Merged

kaikreuzer mentioned this issue Dec 11, 2020

Implemented start level service openhab/openhab-core#1914

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a "ESH start level" functionality #1896

Introduce a "ESH start level" functionality #1896

tomhoefer commented Jul 22, 2016

maggu2810 commented Jul 22, 2016

tomhoefer commented Jul 22, 2016 •

edited

Loading

maggu2810 commented Jul 22, 2016

tomhoefer commented Jul 22, 2016

maggu2810 commented Jul 22, 2016

tomhoefer commented Jul 22, 2016

kaikreuzer commented Jul 22, 2016

sjsf commented Aug 4, 2016

sjsf commented May 10, 2017

pkriens commented May 10, 2017

pkriens commented Jan 22, 2018

maggu2810 commented Jan 22, 2018

sjsf commented Jan 22, 2018

jboeddeker commented Jan 22, 2018

maggu2810 commented Jan 22, 2018

jboeddeker commented Jan 23, 2018

mherwege commented Jan 23, 2018

maggu2810 commented Jan 23, 2018

adimova commented Feb 26, 2018

kaikreuzer commented Feb 27, 2018

lolodomo commented Aug 23, 2018

maggu2810 commented Aug 24, 2018 •

edited

Loading

kaikreuzer commented Sep 14, 2018

cschneider commented Sep 14, 2018

Introduce a "ESH start level" functionality #1896

Introduce a "ESH start level" functionality #1896

Comments

tomhoefer commented Jul 22, 2016

maggu2810 commented Jul 22, 2016

tomhoefer commented Jul 22, 2016 • edited Loading

maggu2810 commented Jul 22, 2016

tomhoefer commented Jul 22, 2016

maggu2810 commented Jul 22, 2016

tomhoefer commented Jul 22, 2016

kaikreuzer commented Jul 22, 2016

sjsf commented Aug 4, 2016

sjsf commented May 10, 2017

pkriens commented May 10, 2017

pkriens commented Jan 22, 2018

maggu2810 commented Jan 22, 2018

sjsf commented Jan 22, 2018

jboeddeker commented Jan 22, 2018

maggu2810 commented Jan 22, 2018

jboeddeker commented Jan 23, 2018

mherwege commented Jan 23, 2018

maggu2810 commented Jan 23, 2018

adimova commented Feb 26, 2018

kaikreuzer commented Feb 27, 2018

lolodomo commented Aug 23, 2018

maggu2810 commented Aug 24, 2018 • edited Loading

kaikreuzer commented Sep 14, 2018

cschneider commented Sep 14, 2018

tomhoefer commented Jul 22, 2016 •

edited

Loading

maggu2810 commented Aug 24, 2018 •

edited

Loading