Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing publishednodes.json causes LegacyJobOrchestrator to hang or lock up, does not process changes anymore #1032

Closed
derSchtefan opened this issue Feb 11, 2021 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@derSchtefan
Copy link

derSchtefan commented Feb 11, 2021

Describe the bug
When changing the publishednodes.json file in OpcPublisher 2.7.206.36449, the LegacyJobOrchestrator / PublishedNodesJobConverter often hangs after detecting a file change, and will neither change the published nodes, nor will it ever listen/see another file change. No error is ever logged.

This problem comes in two flavours:

  • Flavour 1: The file change is noticed, but nothing else happens. File changes from here on will not even be noticed anymore. The log will only read this once (and then will never log any publishednodes message again):
[19:20:23 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator] File /appdata/publishednodes.json has changed, reloading...
[19:20:23 DBG Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter] Reading published nodes file (00:00:00.0000011
  • Flavour 2: The file change is detected, the job converter succeeds, but it takes 15-25 seconds before the configuration change takes place and the SubscriptionWrapper is reacting. It appears sometimes that it can take even longer.

[18:38:32 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator] File /appdata/publishednodes.json has changed, reloading...
[18:38:32 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter] Read 1 items from published nodes file in 00:00:00.0004680
[18:38:32 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter] Converted items to jobs in 00:00:00.0005213
[18:38:33 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.DataFlowProcessingEngine] 
  DIAGNOSTICS INFORMATION for          : opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_2be4c25a1c7bb1d95a5226694bc0bcf8c9744d63
  // SNIP
[18:38:43 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.DataFlowProcessingEngine] 
  DIAGNOSTICS INFORMATION for          : opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_2be4c25a1c7bb1d95a5226694bc0bcf8c9744d63
  // SNIP
[18:38:44 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.SubscriptionServices+SubscriptionWrapper] Closing subscription opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768[18:38:44 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Subscription 'opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768' unregistered from session '2be4c25a1c7bb1d95a5226694bc0bcf8c9744d63' in state Running[18:38:45 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker+JobProcess] Job Standalone_iotedge-vbox-01_OPCPublisher cancelled.[18:38:45 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.SubscriptionServices+SubscriptionWrapper] Closing subscription opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768
[18:38:45 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker] Worker: StandalonePublisher_0, Job: Standalone_iotedge-vbox-01_OPCPublisher processing completed ...[18:38:45 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker] Worker: StandalonePublisher_0 processing job: Standalone_iotedge-vbox-01_OPCPublisher, mode: Active
[18:38:45 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Subscription 'opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768' registered/updated in session '2be4c25a1c7bb1d95a5226694bc0bcf8c9744d63' in state Running
[18:38:45 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.SubscriptionServices+SubscriptionWrapper] Added 5 monitored items to subscription opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768
[18:38:45 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.SubscriptionServices+SubscriptionWrapper] Now monitoring 5 nodes in subscription opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768
[18:38:45 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.DefaultSessionManager] Subscription 'opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768' registered/updated in session '2be4c25a1c7bb1d95a5226694bc0bcf8c9744d63' in state Refresh
[18:38:45 INF Microsoft.Azure.IIoT.OpcUa.Protocol.Services.SubscriptionServices+SubscriptionWrapper] Set Monitoring to Reporting for 5 nodes in subscription opc.tcp://192.168.11.60:62521/xx.xx:62521/xxxxxServer_3fd6a2c46567bc44317c56c68678b0dd825aa768

It is irrelevant which nodes we change, if we add/remove nodes or change a node's configuration. It seems to randomly work sometimes, and then refuses to work at all after the first failure.

All jsons we deploy are well-formed and after restarting the docker container the container can process the file.

To Reproduce

  • Deploy OpcPublisher standalone in version 2.7.206.36449 (mcr.microsoft.com/iotedge/opc-publisher:2.7.206) as standalone IoT Edge module with the following parameters:
"OPCPublisher": {
            "version": "1.0",
            "type": "docker",
            "status": "running",
            "restartPolicy": "always",
            "settings": {
              "image": "mcr.microsoft.com/iotedge/opc-publisher:latest",
              "createOptions": {
                "Hostname": "publisher",
                "Cmd": [
                  "--aa",
                  "--to",
                  "--tm",
                  "--loglevel=information",
                  "--fetchdisplayname=true",
                  "--pf=/appdata/publishednodes.json",
                  "--di=10",
                  "--fm=true",
                  "--si=3"
                ],
                "HostConfig": {
                  "Binds": [
                    "/iiotedge:/appdata"
                  ]
                }
              }
            }
          },
  • publishednodes.json contains this: (node count irrelevant, heartbeat interval can be present or missing, DisplayName can be present or missing, we use nsu= and i= addressing)
[
  {
    "EndpointUrl": "opc.tcp://192.168.xx.xx:62521/xxxxxServer",
    "UseSecurity": false,
    "OpcNodes": [
      { "Id": "nsu=http://secret.com/MS/Unit;i=6119", "HeartbeatInterval": 1, "OpcSamplingInterval": 1000, "OpcPublishingInterval": 1000 },
      { "Id": "nsu=http://secret.com/MS/Unit;i=6120", "HeartbeatInterval": 1, "OpcSamplingInterval": 1000, "OpcPublishingInterval": 1000 },
      { "Id": "nsu=http://secret.com/MS/Unit;i=6121", "HeartbeatInterval": 1, "OpcSamplingInterval": 1000, "OpcPublishingInterval": 1000 }
    ]
  }
]  
  • Change publishednodes.json by removing a node to this:
[
  {
    "EndpointUrl": "opc.tcp://192.168.xx.xx:62521/xxxxxServer",
    "UseSecurity": false,
    "OpcNodes": [
      { "Id": "nsu=http://secret.com/MS/Unit;i=6119", "HeartbeatInterval": 1, "OpcSamplingInterval": 1000, "OpcPublishingInterval": 1000 },
      { "Id": "nsu=http://secret.com/MS/Unit;i=6120", "HeartbeatInterval": 1, "OpcSamplingInterval": 1000, "OpcPublishingInterval": 1000 }
    ]
  }
]  
  • Observe the docker logs of OPCPublisher while changing the file. Observe occurence and timestamp of "File /appdata/publishednodes.json has changed", "Converted items to jobs", and "Closing subscription" messages
  • Change the file again by adding/removing nodes or changing whitespace, until those messages disappear and/or it takes longer and longer for the subscription change to take place
  • Observe that at this point no file change will ever trigger "File /appdata/publishednodes.json has changed" again.
  • Observe no error message to be logged
  • Observe that this very often already happens on the first change.

Expected behavior

  • "File /appdata/publishednodes.json has changed" messages appear
  • Changed file is processed and configuration is changed in a timely manner
  • Error messages are logged in case of an error in the file, file processing, unsubscribing from OPC UA, etc.
  • Future file changes are honoured

Desktop (please complete the following information):

  • Version 2.7.206.36449

Additional context
It might be that there is a race condition, or something happens during unsubscribing from the OPC UA server. The long time between "converted items to jobs" and actually unsubscribing that sometimes occurs might also give a clue what is going wrong. The LegacyJobOrchestrator might be blocked and/or crashes and then nothing in the system can react to changes.

@koepalex koepalex added bug Something isn't working Publisher labels Feb 26, 2021
@derSchtefan
Copy link
Author

It might relate to issue #1024, associated change 3d9d503, and a new unmerged commit "fix jobs reload issue for standalone mode" 450a518 by #cristipogacean in branch ompadapter/base

@fbeltrao
Copy link
Contributor

fbeltrao commented Jun 11, 2021

I could reproduce this by overwriting the publishednodes.json file with invalid content (notice the space before the port number):

[{"OpcNodes":[{"Id":"http://microsoft.com/Opc/OpcPlc/#s=FastUInt1","OpcSamplingInterval":500,"OpcPublishingInterval":10000,"HeartbeatInterval":10}],"EndpointUrl":"opc.tcp://opcplc1: 50000","UseSecurity":false,"ConfigurationId":"681f9962-ffc9-eb11-94b3-0003ff297e00"}]

Publisher is stuck here:

[07:34:51 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator] File /appdata/publisher1.json has changed, reloading...

Further changes to the file don't trigger the reload. Restarting publisher module results in:

[07:35:35 INF Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator] File /appdata/publisher1.json has changed, reloading...
[07:35:35 ERR Root] Error during module execution - restarting!
Autofac.Core.DependencyResolutionException: An exception was thrown while activating Microsoft.Azure.IIoT.Agent.Framework.Agent.Worker -> Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator.
 ---> Autofac.Core.DependencyResolutionException: An exception was thrown while invoking the constructor 'Void .ctor(Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter, Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.ILegacyCliModelProvider, Microsoft.Azure.IIoT.Agent.Framework.IAgentConfigProvider, Microsoft.Azure.IIoT.Agent.Framework.IJobSerializer, Serilog.ILogger, Microsoft.Azure.IIoT.Module.IIdentity)' on type 'LegacyJobOrchestrator'.
 ---> Newtonsoft.Json.JsonSerializationException: Error converting value "opc.tcp://opcplc1: 50000" to type 'System.Uri'. Path '[0].EndpointUrl', line 1, position 188.
 ---> System.ArgumentException: Could not cast or convert from System.String to System.Uri.
   at Newtonsoft.Json.Utilities.ConvertUtils.EnsureTypeAssignable(Object value, Type initialType, Type targetType)
   at Newtonsoft.Json.Utilities.ConvertUtils.ConvertOrCast(Object initialValue, CultureInfo culture, Type targetType)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.EnsureType(JsonReader reader, Object value, CultureInfo culture, JsonContract contract, Type targetType)
   --- End of inner exception stack trace ---
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.EnsureType(JsonReader reader, Object value, CultureInfo culture, JsonContract contract, Type targetType)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateList(IList list, JsonReader reader, JsonArrayContract contract, JsonProperty containerProperty, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateList(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, Object existingValue, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
   at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize(JsonReader reader, Type objectType)
   at Newtonsoft.Json.JsonSerializer.Deserialize(TextReader reader, Type objectType)
   at Microsoft.Azure.IIoT.Serializers.NewtonSoft.NewtonSoftJsonSerializer.Deserialize(ReadOnlyMemory`1 buffer, Type type) in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Serializers.NewtonSoft\src\Default\NewtonSoftJsonSerializer.cs:line 76
   at Microsoft.Azure.IIoT.Serializers.SerializerEx.Deserialize(ISerializer serializer, String str, Type type) in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Abstractions\src\Serializers\Extensions\SerializerEx.cs:line 143
   at Microsoft.Azure.IIoT.Serializers.SerializerEx.Deserialize[T](ISerializer serializer, String json) in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Abstractions\src\Serializers\Extensions\SerializerEx.cs:line 179
   at Microsoft.Azure.IIoT.Serializers.SerializerEx.Deserialize[T](ISerializer serializer, TextReader reader) in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Abstractions\src\Serializers\Extensions\SerializerEx.cs:line 167
   at Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Models.PublishedNodesJobConverter.Read(TextReader publishedNodesFile, LegacyCliModel legacyCliModel) in D:\a\1\s\components\opc-ua\src\Microsoft.Azure.IIoT.OpcUa.Edge.Publisher\src\Storage\PublishedNodesJobConverter.cs:line 54
   at Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator.RefreshJobFromFile() in D:\a\1\s\components\opc-ua\src\Microsoft.Azure.IIoT.OpcUa.Edge.Publisher\src\Engine\LegacyJobOrchestrator.cs:line 144
   at Microsoft.Azure.IIoT.OpcUa.Edge.Publisher.Engine.LegacyJobOrchestrator..ctor(PublishedNodesJobConverter publishedNodesJobConverter, ILegacyCliModelProvider legacyCliModelProvider, IAgentConfigProvider agentConfigPriovider, IJobSerializer jobSerializer, ILogger logger, IIdentity identity) in D:\a\1\s\components\opc-ua\src\Microsoft.Azure.IIoT.OpcUa.Edge.Publisher\src\Engine\LegacyJobOrchestrator.cs:line 62
   at lambda_method(Closure , Object[] )
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   --- End of inner exception stack trace ---
   at Autofac.Core.Activators.Reflection.ConstructorParameterBinding.Instantiate()
   at Autofac.Core.Activators.Reflection.ReflectionActivator.ActivateInstance(IComponentContext context, IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.CreateInstance(IEnumerable`1 parameters)
   --- End of inner exception stack trace ---
   at Autofac.Core.Resolving.InstanceLookup.CreateInstance(IEnumerable`1 parameters)
   at Autofac.Core.Resolving.InstanceLookup.Execute()
   at Autofac.Core.Resolving.ResolveOperation.GetOrCreateInstance(ISharingLifetimeScope currentOperationScope, ResolveRequest request)
   at Autofac.Core.Resolving.ResolveOperation.ResolveComponent(ResolveRequest request)
   at Autofac.Core.Resolving.ResolveOperation.Execute(ResolveRequest request)
   at Autofac.Core.Lifetime.LifetimeScope.ResolveComponent(ResolveRequest request)
   at Autofac.ResolutionExtensions.TryResolveService(IComponentContext context, Service service, IEnumerable`1 parameters, Object& instance)
   at Autofac.ResolutionExtensions.ResolveService(IComponentContext context, Service service, IEnumerable`1 parameters)
   at Autofac.ResolutionExtensions.Resolve(IComponentContext context, Type serviceType, IEnumerable`1 parameters)
   at Autofac.ResolutionExtensions.Resolve[TService](IComponentContext context, IEnumerable`1 parameters)
   at Autofac.ResolutionExtensions.Resolve[TService](IComponentContext context, Parameter[] parameters)
   at Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor.CreateWorker() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Agent.Framework\src\Agent\Default\WorkerSupervisor.cs:line 66
   at Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor.EnsureWorkersAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Agent.Framework\src\Agent\Default\WorkerSupervisor.cs:line 116
   at Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor.StartAsync() in D:\a\1\s\common\src\Microsoft.Azure.IIoT.Agent.Framework\src\Agent\Default\WorkerSupervisor.cs:line 41
   at Microsoft.Azure.IIoT.Modules.OpcUa.Publisher.ModuleProcess.RunAsync() in D:\a\1\s\modules\src\Microsoft.Azure.IIoT.Modules.OpcUa.Publisher\src\ModuleProcess.cs:line 111
[07:35:35 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor] Stopping worker supervisor
[07:35:35 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor] Worker supervisor successfully stopped
[07:35:35 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor] Stopping worker supervisor
[07:35:35 INF Microsoft.Azure.IIoT.Agent.Framework.Agent.WorkerSupervisor] Worker supervisor successfully stopped
[07:35:35 INF Microsoft.Azure.IIoT.Module.Framework.Hosting.ModuleHost] Stopping Module Host...
[07:35:35 INF Microsoft.Azure.IIoT.Module.Framework.Client.IoTSdkFactory] 1: Module xxxxxxxxxx disconnected due to Client_Close - now Disabled...
[07:35:36 INF Microsoft.Azure.IIoT.Module.Framework.Hosting.ModuleHost] Module Host stopped.
Unhandled exception. System.AggregateException: One or more errors occurred. (Object reference not set to an instance of an object.)
 ---> System.NullReferenceException: Object reference not set to an instance of an object.
   at Microsoft.Azure.IIoT.Modules.OpcUa.Publisher.ModuleProcess.RunAsync() in D:\a\1\s\modules\src\Microsoft.Azure.IIoT.Modules.OpcUa.Publisher\src\ModuleProcess.cs:line 126
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at System.Threading.Tasks.Task.Wait()
   at Microsoft.Azure.IIoT.Modules.OpcUa.Publisher.Program.Main(String[] args) in D:\a\1\s\modules\src\Microsoft.Azure.IIoT.Modules.OpcUa.Publisher\src\Program.cs:line 48

OS: Ubuntu 18.04
IoT Edge: 1.1.3
OPC-Publisher: 2.7.199

@dacolgit
Copy link
Member

Closed because the issue cannot be reproduced in version 2.8.
Please reopen the bug if you still see it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants