-
Notifications
You must be signed in to change notification settings - Fork 779
Introduce a "ESH start level" functionality #1896
Comments
How do you want to decide that the framework is started or shut down? Isn't this product specific which services needs to be available to signal "all present" / "normal runtime"? |
In our product we can rely on our OSGi runtime implementation that the framework is started properly. For solutions running e.g. on Equinox I thought to use a framework listener and listen on the |
I have assumed you are talking of the Startup and Shutdown phase of the Eclipse SmartHome framework. But you refer to the start and stop of the OSGi Framework. Correct? So, there are some options, using a FrameworkListener, using a SynchronousBundleListener for bundle 0 to handle on the stopping event, ... Is the intention that the Eclipse SmartHome framework does not fire any event as long as the OSGi Framework is not fully started (starting up or shutting down)? Which one needs to be observed to differ between "normal runtime" and non-normal one? |
Yes
Yes
I think this depends on the used OSGi runtime. In our project we dont want to be informed if entities are added or removed during startup and shutdown. We have already a dedicated state that declares the framework as started.
Especially ItemAddedEvent, ItemRevomedEvent, ThingAddedEvent and ThingRemovedEvent Can you give me an example for which bundle / service you think we need to react on its restart? |
No, not ATM. You have written you are doing this already, so I assume you know that it is working and how it is working (the architecture). I don't. 😉 Give me some time. |
Haven´t yet started with the implementation ;) But because it is urgent I think that I will provide a PR in the following week |
I agree that it isn't easy to say whether the system is up or not up. What does it mean if the OSGi framework keeps running, but ALL ESH bundles are fully stopped and restarted? I would consider this that ESH is NOT up - hence the feature should not about the OSGi framework, but about ESH itself. "Up" means for me that certain services have started and are available. How can this be determined and others be notified about reaching (or leaving) this state? I see several use cases of such a feature (from recent discussions):
|
IMHO, a single state will not fit all of our needs. As @kaikreuzer pointed out, we e.g. have services that require other services to be up and running and fully loaded (whatever that means). Then again, there might be other services which depend on the previous ones to be started. So we will end up having several different levels of "active", like e.g. the start levels for bundles in OSGi. Additionally, the definition of these levels is going to differ for every solution built on top of ESH. Generally, the introduction of such a framework state in a dynamic system usually is a workaround to cover up for maybe-not-so-ideal design decisions in other places. I would suggest to first look into the individual use-cases and see if we somehow can fix the root causes. Regarding the Item/Thing/etc added/removed events, the root cause is that we cannot distinguish whether they were loaded or newly created (or removed/unloaded respectively). I'd suggest that we fix this and also let listeners/subscribers decide what kind of event/notification they actually require by either introducing new event types (i.e. ThingLoadedEvent, etc...) plus RegistryLoadListener interface, or amending the existing events and RegistryChangeListener with the corresponding information. |
…on until eclipse-archived#1896 is solved Signed-off-by: Kai Kreuzer <[email protected]>
…on until #1896 is solved (#2656) Signed-off-by: Kai Kreuzer <[email protected]>
…on until eclipse-archived#1896 is solved (eclipse-archived#2656) Signed-off-by: Kai Kreuzer <[email protected]>
Okay, it has been a while now... As we can see, there recently have been quite some topics which relate to this issue, therefore I'd like to get back to it now. I still think we should avoid using such a "startup level" construct wherever possible! But I have to admit that there are some use-cases which won't really work without it (e.g. related to the rule engine). There recently was a blog post by @pkriens which addresses this very topic. And I think we could realize our requirements with exactly this idea, using the OSGi means for our purpose. The relevant services that we need to wait for (e.g. XML processing per binding, providers being up and running) would somehow denote that they are "finished" by registering a marker service into the SCR, carrying some defined properties. Our "AggregateStateService" however must be configurable, as not all the services are available in every solution. Imagine there would be a solution without support for DSL based configuration, then it really does not make sense to wait for the GenericProviders to finish their loading. I'd suggest using config admin for that purpose. As a first step, I would drop the BundleProcessorVetoManager and use such OSGi services to mark fully loaded bindings accordingly. As a next step, I would create an AggregateStateService and make all relevant entities denote that they are finished loading. The idea would be that every service that somehow needs waiting (e.g. a SystemStartupTrigger) would create a dependency to such an aggregated state only, not to the services themselves. By that we would decouple the dependency from a concrete service into a configurable one with a semantic meaning. At the same time this allows us to define different levels of "readyness" of the system. Of course, we need to carefully define all the required properties and states, as they somehow become "API for solution providers", i.e. they must not be a big pain to maintain and should change as seldom as possible. Does this make sense to you? Any thoughts on this? |
Aren't there any companies that can run this through OSGi? This is a very foundational service and it belongs somewhere low in the stack like Equinox or Apache Felix? I could provide an initial implementation since I got it already running |
@SJKA I think I share your view. The danger is that you start thinking global and that always falls apart in a component model. In general, you need to handle the dependency on the requirer side that has the actual knowledge of what it needs. I.e. a rule that need X should not be evaluated before X is present. This is much better than waiting to start the rule engine until all devices have started. You need to address these things where you have concrete information (like X.1) instead of trying to handle it global. |
I considered things, rules, ... ready that the framework stuff is ready (thing handler could start doing its work, rules could be proceeded, etc.). What are the main "wait conditions" we need at all -- and which part should wait? |
Looking at the tons of issues which are linked against this one, I'm about to say: pretty much everything 😉 But that's exactly why I'd like to avoid - as tempting as it is. However, in the end I think it's mainly about the rule engine(s). The other cases need to be looked more deeper into, and hopefully can be solved locally. In the rule engine(s), the major pain-point are the "system started" triggers - all other triggers won't be triggered or executed anyway, because the system simply is not "ready enough" to generate and/or receive such events (e.g. ItemStateChangeEvent), so no problem there. The linked issues mostly refer to "items not present" because this is the most obvious error when the language model cannot infer item references - but as you pointed out, this won't be enough: Once the items are there, we will run into the next problem: the linked things (as well as the links themselves, obviously) also need to be there - otherwise the items can be nicely resolved but any sent command ends up in nirvana. Speaking about that, the corresponding ThingHandlers obviously also need to be finished initializing. If they end up being OFFLINE because they cannot reach their devices: tough luck, this might always happen. In an ideal world, we could analyze the rule actions for the items which are referenced and wait for their things to become ONLINE/OFFLINE/UNKNOWN. This however seems pretty much impossible with more advanced, dynamic scripts where e.g. items are looked up dynamically from the ItemRegistry. And even if we overcome this problem by only considering hard-referenced items and build a 90% approximation, it might still be surprising to users if e.g. multiple items are changed in a rule but one will never become "useable" because the corresponding binding is missing. Why doesn't it execute it for the others? Can't the computer "know" that this binding is missing?
This indeed is the key question! If we build something that isn't capable of solving this, then we won't win anything and don't even need to start. |
No, from my opinion it's not just "system started". More problems are created from the ItemStateChanged triggers triggered for example by the persistence engines. |
Can you add more details? A persistence service can access the item registry on service activation and persist all non UnDefType.NULL states (WRT the discussion who is allowed to set the NULL state but that is currently mostly used by the framework on item creation only) to its storage. After it has been activated, it could store every item state change to the storage, too. |
Sorry, i think it was misunderstandable. It's not the persisting of items but restoring (strategy = restoreOnStartup) which causes the ItemChanged trigger to be fired. In my case this was a major problem, which was mainly solved when i excluded the change from Null from the trigger condition.
This change removed much from the startup exceptions. |
I would add two more cases that could cause issues with rules when the system is started. I have seen all of these when starting openHab. A few restarts usually gets me over the problem, but that’s not very nice.
|
Should a rule be triggered at all if
Isn't the rule engine a special use case? I don't think that could be solved with a global "system is started and rules could be executed" state at all. |
I agree with @maggu2810, such rules should nod become IDLE. The problem is that currently the |
@maggu2810 for this issue here, we are only talking about services that need to be fully started in the first place as a pre-condition to consider any kind of rule execution. Whatever might happen during normal operation time (items not there, things offline, whatever) is not relevant for this issue here, but is indeed something that needs to be handled in the appropriate components. |
Bump 6 months later. |
@lolodomo For ESH itself we need a clean solution. For downstream project or at least for your setup at home you can delay the startup of the automation part easily by adding a bundle that does nothing than delay the automation activation.
You can improve it to start the delay as soon as e.g. smarthome core has been started, special services are available, ... -- edit -- I improved the implementation to delay the activation of the automation bundle IF other service references are satisfied and stopping the bundle if that references are not available anymore. |
I just came across https://github.com/apache/felix/tree/trunk/systemready - this sounds like a very nice fit for our issue and probably worth to further investigate. |
Systemready is still in an early stage. We currently mainly use it to report ready and alive for kubernetes. There is also a similar concept in sling called health checks. Last Wednesday I talked with the creators of this and we found quite a few things that should be added to systemready. The main missing thing we found is having tags for system checks. Each tag could then represent one of the subsystems you talked about. This tags might then replace the ready and alive types. Generally for determining readiness it is not good enough to look at framework started or the fact that all bundles are started. Especially with declarative services a service might appear completely asynchronous from the bundle start. So a list of required services is the only stable way. Unfortunately we are having quite some difficulties creating and managing such a list for AEM. I wonder if a special annotation could help with that (like adding a tag to a service) that is then reflected in the Manifest. I am not sure though if I would use this for switching on/off the internal eventing of esh. Maybe there is a different solution for this. How about having different events for a thing that really appears on the binding and a thing that is merely recreated because of a startup. In the same way when shutting down it should be clear if a thing is removed externally or just because of shutdown. |
Hi all,
in our project we have a lot of event subscribers and registry change listeners implemented which are called during startup / shutdown of ESH as a matter of course. In shutdown phase these services will update their model accordingly which results in the problem that the model cannot be re-built after next startup (because the subscribers / listeners have been assumed that the item / thing / link has been really deleted). We should distinguish between event sending / listener notification for adding / removal of items / things / links during framework startup / shutdown phase and normal runtime.
For this reason I would like to disable in our project that events are sent / listeners are notified during startup / shutdown. I could imagine two ways to implement this:
RuntimeStateService
that can be requested in order to get the information if the runtime is started. So for the beginning the service would only consist of a single operationboolean : isStarted()
and it will be injected as dynamic dependency into the registries (things, items, links, rules). Then I would skip sending events / notification if the service is present and the runtime / framework is not fully started.RuntimeStateListener
interface which is tracked by a centralRuntimeStateService
to be provided by solutions based on ESH. As soon as the runtime state has changed to started the service will inform all listeners about this. Once the runtime left the started state again all listeners are informed about this. For the beginning I will implement the new RuntimeStateListener interface by the AbstractRegistry (concrete registries can decide if the runtime state listener is to be provided as a service).I think that 2 is the only valid option to implement this requirement. In 1 the runtime state service could unregister too early so that the events are sent / listeners are notified again.
What do you think?
The text was updated successfully, but these errors were encountered: