21/08/21 Fibaro Door sensor Zwave pool temp sensor failed, can't be re-included, is dead. Replaced with Qubino module temp sensor involving digging, channelling and extra wall boxes.
Just to pick one of your issues. I recently had a similar experience with one of my devices and found a way around it, without having to exclude and re-include the device, hence avoiding using another of the maximum of 250 available z-wave device id's, nor having to adjust any scripts. And ofcourse no digging, wall boxes etc...
To let you in on why I think this is a good work-around for the problem I need to give some background information first:
Most z-wave related problems are caused by Domoticz (no longer) having the correct device information in memory for your node. There are 4 ways (A.F.A.I.K) Domoticz updates this in-memory device information:
- when an (unsollicited) value-update is received for a device Domoticz doesn't know about yet/any more,
- when initializing the openzwave library,
- when including a device,
- when you request a node's NIF via the Refresh Node Information button
When a device is (re-)added because a value update is received for an 'unknown' device (situation 1), only the device information for that single value gets updated in Domoticz, plus it may not be complete and even not fully correct, as the node's information is not retrieved from the z-wave database (= the file somewhere inside the Config folder for your node). So any overrides or specific details will be missed and any other node capabilities won't be known either. Type information for the single device to store this value into is "guessed" from the value received. This is not a very good situation and should be avoided as much as possible. (More on this with situation 4).
When initializing the openzwave library (situation 2), the node information for all nodes is read from the openzwave cache (the xml file in your Config folder). Reading the xml file is -A.F.A.I.K- pretty solid code and it seems to work well, although I do have my doubts on a few very little things, among which is retrieving the node's options. BUT, the information in this cache is only as good as it was at the moment the information was put in there. And that's where the problem lies: writing the cache file is not 'transactional': the cache file gets updated with life status information without any protection for when/if the current operation fails. For example, if you're refreshing a node's information, first the device's information is removed from memory and a call is made to the node to send it's NIF (i.e. have the node send it's meta information on it's capabilities). If the node doesn't receive the request or the node can't respond (correctly), the meta information for the node will still be erased from Domoticz' memory. And if in this state you shutdown your domoticz or Domoticz crashes, the current device state in memory will automatically be written into the cache file (i.e. with NO device information for this node). The next time you boot Domoticz, the node will no longer be known to Domoticz and the device information for it will not be restored in memory: The node is still included with your controller, but Domoticz doesn't know about it any more. This is only one scenario how the cache can become corrupted, other things can happen too that can corrupt the information in memory, resulting in other corruptions in the cache file, and thus mis-reading the device's meta information upon the next re-start of Domoticz.
When including a New node (situation 3), Domoticz -among other things- requests the node to send it's NIF and upon receipt the meta information is built both in memory and in the cache (xml) file. Sometimes, especially in bussier networks, the response doesn't arrive or doesn't arrive completely. The node may be included with your controller, but Domoticz doesn't know about it. Typically you see timeout messages reported by the new node ID in Domoticz' log file, without the node being listed in Domoticz.
When refreshing a node (situation 4), Domoticz erases the node's meta information from memory and sends out a request for a NIF to the node, waiting for the response(s) to rebuild the in-memory and cached meta information. Same problems as when including a new node can occur here. Refreshing node information is actually our way to fix most of the problems caused by situation 1, 2, and 3 not completing correctly, but it can be a somewhat tedious process because the refresh can be somewhat unreliable itself.
So that is how Domoticz gets to know the meta information on your node(s), and now we get to one other component of the specific problem with this node: it is marked 'dead'. A 'dead' node is one that the controller or other nodes reportedly haven't been able to communicate with for quite some time already. Since there is only limited bandwidth to send and receive z-wave messages, nodes that don't respond can slow down the entire network, making devices that do work seem slow or even non-operational. For this reason z-wave has a provision to isolate such non-responding nodes so that the other nodes can continue to operate. This isolation is the 'dead' status: OpenZwave will no longer actually send out any messages to a dead node. Instead it will automatically give such message a failed status, saving a lot of time in the process. BUT: not sending out any messages to the node, also means we can't send it any messages to solve the problem why it is not responding...
Now that we know all that, what I did was this: I selected the "dead" node, then clicked the 'is node broken' button. It showed me: 'Node is OK' and I pressed 'OK'. But more importantly: the status of the node is now changed from 'dead' (the red icon) into 'OK' (the green checkmark). So after this I again selected the node and now I clicked the 'Refresh Node info' to reqeuest the node's NIF. Now I waited for the node to report it's NIF (Node Information Frame) (have the log shown in another window and wait for the 'Add_Value' messages) and the device was correctly registered in Domoticz and in the cache file again.
Receiving the responses to the NIF request can be somewhat unreliable: sometimes not all messages sent by the node are received, sometimes no messages are received at all and sometimes it just takes a very long time for the messages to arrive (timeout messages sent by the node during the process are not uncommon on nodes with multiple capabilities). And also, sometimes the request for the NIF doesn't arrive at the node, so it will not send any messages at all. This is why you need to keep a log window open whenever you have OpenZwave request a node's NIF to see if the add_value messages have come in. Do be patient, as it may take a couple of
minutes before the responses come in if your mesh routing is very bad. However, if they don't come in or not all come in, you may have to retry the NIF request (by doing a Refresh Node information) after 5 or so minutes. If after a lot of retries you still don't get the in memory meta information to come in correctly you may still have to (temporarily) move the node closer to the node or the controller closer to the node. And if it still doesn't work then, this is the point to give up and re-include the node. If the controller still knows about the node, it may assign it the same z-wave node-id. So don't reset the node, just re-include the node and start the inclusing process like you would on a newly bought node. Again you must be patient (the NIF request process takes a long time here too) and you may have to repeat a few times, as starting the inclusion process is often the same procedure as starting the exlusion procedure on many nodes, i.e. while the controller is waiting for an inclusion, your node may be performing an exclusion, you just don't have a way of knowing which one it is doing. So just try starting the inclusion in the node a couple of times, but wait at least 5 minutes after each try and keep a close eye on Domoticz' log file to see if any add_value messages show up. If they do, your mission is accomplished!
Godspeed!
Y.M.M.V. (Your Milage May Vary), it hasn't worked for me in all situations too, but at least it's another tool in our toolbelt to get non-working z-wave nodes back into operation.