Page 1 of 1

Making Domoticz failsafe

Posted: Saturday 23 May 2020 11:37
by thomasbaetge
Hi Everyone!

due to the Corona situation I am kind of stranded abroad for over 2 months now, without any physical access to my devices.
I didn't plan that, but that is a differend story anyway....

However, domoticz functioning is very critical to me, my heating and other essential things rely on it.
During this time I had 2 or 3 failures of my system, so I had to ask a neighbour to go to my house and restart my RPI.
The failures were related to a known bug, where the Wifi interface stops functioning, so the device is unreachable.

What I did so far to get on top of this:

Configured watchdog on the Raspi according to this guide:
https://www.domoticz.com/wiki/Setting_u ... i_watchdog
Additionally, a node red flow pings my router every 30 secs and writes a logfile, that gets monitored by watchdog
the wifi interface gets monitored by watchdog too
So in case any of this goes belly up, the watchdog should reboot the device (didn't happen yet).

the PI itself reboots twice a week, controlled by a cronjob
Domoticz service restarts every night

At the moment, the PI gets its power through a powerbank, that will keep it alive for a few hours in case of a power outage. I plan to remove this and plug a Shelly Plug into the power supply (when I get back home), so in worst case, I can hard reboot it from the shellies web interface (not the best thing to do, I know).

So....whast do you think about my measures? any comments are welcome, as well as any other/additional strategies... :D

Re: Making Domoticz failsafe

Posted: Saturday 23 May 2020 12:09
by McMelloW
Have a look at this article https://www.sigmdel.ca/michel/ha/rpi/ra ... og_en.html
This guy has more practical solutions on his site.

Ofcourse there are many roads to Rome :) even in the LockDown time

Re: Making Domoticz failsafe

Posted: Saturday 23 May 2020 12:23
by Egregius
If it’s the wifi connection that fails, why not just use a script to restart that?
Or better use cable.

Re: Making Domoticz failsafe

Posted: Saturday 23 May 2020 12:28
by elzorrovega
Hello,
I am a newbie to Domoticz (2019) but interested like you to have a reliable system.

I have the following configuration:

• Domoticz Version 2020.2 running in Raspberry PI 3 b+
• Aeon Labs Z‐Stick Gen5 (Z‐Wave USB Adapter00)
• Aeon Labs Siren Gen5 (Z-Wave Siren Gen5)
• FIBARO DOOR/WINDOW SENSOR 2 FGDW-002
• FIBARO MOTION SENSOR FGMS-001-FR-A-V1.01
• FIBARO WALL PLUG FGWP-102
• Zipato MINI KEYPAD RFiD/Z-WAVE
• Domoticz app on iOS


Thanks to Waaren's patient help I managed to solve some issues related to device settings via JSON commands and notifications. Once this is under control, I was thinking about writing a dzVent script to monitor Domoticz's health and notify user if necessary. I have not looked in the Forum if other users have already embarked in similar projects.

I just glanced at some lines in the Domoticz literature making referring to a second "controller". I work in Industrial Automation and one product we deliver are redundant processors.

Does the idea to use a second RPI sound plausible. I would suspect that hardware cost are minimal for you would need to invest in a second RPI and a USB Z-wave controller. The trick would be to have one as the Main and the other as the Reserve.

ElZorroVega :idea:

Re: Making Domoticz failsafe

Posted: Saturday 23 May 2020 12:40
by thomasbaetge
well....at the moment I am about 1500km away from plugin in a network cable :D
besides...I really love using wifi because it gives me the option to place the PI in almost every place in my house, as long there's power (that's easier to come by than LAN).

So far the watchdog in my setup seems to work. I restarted my router due to an update and subsequently the PI rebooted, because of the network failure (syslog):

Code: Select all

May 23 12:05:53 Controlberry watchdog[873]: device wlan0 did not receive anything since last check
May 23 12:06:01 Controlberry Node-RED[9771]: 23 May 12:06:01 - [info] [mqtt-broker:Controlberry] Verbindung zum Broker mqtt://192.168.2.40:1883 konnte nicht hergestellt werden.
May 23 12:06:07 Controlberry watchdog[873]: device wlan0 did not receive anything since last check
May 23 12:06:16 Controlberry Node-RED[9771]: 23 May 12:06:16 - [info] [mqtt-broker:Controlberry] Verbindung zum Broker mqtt://192.168.2.40:1883 konnte nicht hergestellt werden.
May 23 12:06:21 Controlberry watchdog[873]: device wlan0 did not receive anything since last check
May 23 12:06:31 Controlberry Node-RED[9771]: 23 May 12:06:31 - [info] [mqtt-broker:Controlberry] Verbindung zum Broker mqtt://192.168.2.40:1883 konnte nicht hergestellt werden.
May 23 12:06:35 Controlberry watchdog[873]: device wlan0 did not receive anything since last check
May 23 12:06:35 Controlberry watchdog[873]: Retry timed-out at 70 seconds for wlan0
May 23 12:06:35 Controlberry watchdog[873]: shutting down the system because of error 101 = 'Network is unreachable'
May 23 12:06:45 Controlberry systemd[1]: Received SIGTERM from PID 873 (watchdog).
that is kind of what I wanted to happen, I just hope the same will work, if the PI's interface stops working (but it should).
This time the interface check caused the reboot, because the NR ping timeout is set to 1000 secs.