Page 1 of 1

How to monitor if Domoticz is still functioning correctly?

Posted: Saturday 10 April 2021 19:49
by Hesmink
Hello All,

Probably answered somewhere, but searching the forum for 'monitor' and 'Domoticz' doesn't really help.

Today I noticed Domoticz was down (crashed because my hs110 smart plug did not function correctly).
I do have a slave Domoticz instance that monitors my energy usage, so I'm thinking of also using that second instance to monitor the primary one (and vice versa).

Has anything already been build for that?
Is there an 'alive' query I can send to a Domoticz server that indicates that it is still functioning correctly?

Re: How to monitor if Domoticz is still functioning correctly?

Posted: Saturday 10 April 2021 20:55
by Maxx
Did something like this some time ago. There are probably better ways to do this but this works for me

Adapt for your system by using your own ip addresses and make the required devices

Put this script in your Secondary system:

Code: Select all

return {
	on = {
		devices = {},
		timer = {'every minute'},
	},
    data = {    heartbeatCounter    = { initial = 0 },
                State               = { initial = 0 },
                timeOut             = { initial = 0 },
	        },

	execute = function(dz, triggeredItem)
	    local HeartcheckOut = dz.devices('HeartcheckMirror')
	    if dz.data.State == 0 then
	        dz.log('Increase heartbeat and send to main')
	        dz.data.heartbeatCounter = dz.data.heartbeatCounter + 1
	        local string = 'http://192.168.1.231:8080/json.htm?type=command&param=udevice&idx=5247&nvalue=0&svalue=' .. dz.data.heartbeatCounter
	        dz.log(string)
            dz.openURL({
            url = string,
            method = 'GET',
            callback = 'dataRetrieved'
         })	        
	        dz.data.timeOut = 0
	        dz.data.State = 1
        elseif dz.data.State == 1 then
            dz.log('Wait for heartbeat from main')
            dz.data.timeOut = dz.data.timeOut + 1
            dz.log(HeartcheckOut.counter)
            dz.log(dz.data.heartbeatCounter)
            dz.log('Timeout timer : ' .. dz.data.timeOut)
            if HeartcheckOut.counter == dz.data.heartbeatCounter then
                dz.data.State = 0
            elseif dz.data.timeOut > 10 then
                dz.data.State = 2
            end
        elseif dz.data.State == 2 then
            dz.log('No response from main')
            local text = "No response domoticz main"
            dz.notify('Heartbeat ',text,dz.PRIORITY_HIGH,dz.NSS_TELEGRAM)
            dz.data.timeOut = 0
            dz.data.State = 0
        end

	end
}

And this in your main system:

Code: Select all

return {
	on = {
		devices = {},
		timer = {'every minute'},

	},

--    logging =   {level                   = domoticz.LOG_DEBUG,
--                 marker                  = "Heart"},	

	execute = function(dz, triggeredItem)
        local HeartbeatIn  = dz.devices('Heart')
        dz.log(HeartbeatIn.counter)
            dz.openURL({
            url = 'http://192.168.1.232:8080/json.htm?type=command&param=udevice&idx=33&nvalue=0&svalue='.. HeartbeatIn.counter,
            method = 'GET',
           callback = 'dataRetrieved'
         })	        
	end
}

Re: How to monitor if Domoticz is still functioning correctly?

Posted: Saturday 10 April 2021 22:26
by waltervl
For standard monitoring on the primary device (but also on the secondary) you can follow the instructions in the wiki using the application monit:
https://www.domoticz.com/wiki/Monitoring_domoticz

They use the json command url:

Code: Select all

http://127.0.0.1:8080/json.htm?type=command&param=getversion
and content should have a value with '"status" : "OK"'
You can do the same with a dzVents script if you just want to check the primary from the slave.

Re: How to monitor if Domoticz is still functioning correctly?

Posted: Sunday 11 April 2021 7:03
by Antori91
Hello,
I use this own code (nodeJS code) to both monitor main and backup domoticz servers (and also Mqtt). If main or backup Domoticz is detected down, it tries to restart Domoticz. This code also synchronizes (using Mqtt) choosen devices between main and backup domoticz databases: https://github.com/Antori91/Home_Automa ... Cluster.js

Re: How to monitor if Domoticz is still functioning correctly?

Posted: Sunday 11 April 2021 14:20
by Hesmink
Thanks all, I opted for using a modified version of the simple bash shell script from the Wiki.
I modified it to only send a notification once, and use Pushover instead of Telegram.

Re: How to monitor if Domoticz is still functioning correctly?

Posted: Sunday 11 April 2021 16:45
by erem
Here is my script to check if Domoticz is active, and restart it if not.

Code: Select all

#!/bin/bash
#****************************************************************************
# program    : check-domoticz-active.sh
# programmer : RM
# date       : 2020-02-21
#
# install in cron with crontab -e to run every 5 minutes
#
#  m h dom mon dow   command
#  */5 * * * *       /home/pi/domoticz/scripts/check-domoticz-active.sh >/dev/null 2>&1
#
#
# prerequisites
# jq installed (apt-get install jq)
# make sure the cron entry (example above)has the full path to the script!!
#
# revision
# 0.0.1      2020-23-21   initial release
#
#
#****************************************************************************

i=0

while [ $i -lt 3 ]
do
	# check if domoticz responds to a json query
	DOMOTICZ=`curl -s --connect-timeout 2 --max-time 5 "http://127.0.0.1:8080/json.htm?type=devices&rid=1"`
	STATUS=`echo $DOMOTICZ | jq -r '.status'`
	if [ "$STATUS" == "OK" ] ; then
		echo "Domoticz responded"
		break			# all ok
	else
		i=$(( $i + 1 ))
		echo "Domoticz did not respond on try $i "
	fi
	sleep 5
done

# if we do not have an OK, domoticz did not respond, stop and start it
if [ "$STATUS" != "OK" ] ; then
	echo "Stopping domoticz"
	sudo systemctl stop domoticz.service
	sleep 10
	echo "Starting domoticz"
	NOW=$(date +"%Y-%m-%d %H:%M:%S")
	sudo systemctl start domoticz.service
	Exitcode=$? 	# save systemctl exit code
	if [ $Exitcode != 0 ] ; then
		echo "$NOW - sudo systemctl start domoticz.service failed with exit code $Exitcode"
	else
		echo "$NOW - sudo systemctl start domoticz.service completed with exit code $Exitcode"
	fi
fi

# end of source

Re: How to monitor if Domoticz is still functioning correctly?

Posted: Monday 12 April 2021 12:20
by twoenter
I use monit. I wrote a complete setup tutorial about it for synology nas:
https://www.twoenter.nl/blog/domoticz/d ... met-monit/
Blog is in Dutch but you can translate with google if you need to ;-)

Re: How to monitor if Domoticz is still functioning correctly?

Posted: Monday 12 April 2021 13:36
by jannl
This is my script, I also check if I (or someone else) is logged on, in that case I do not want Domoticz to be restarted, because it is most likely down intentionally.

Code: Select all

#!/bin/bash
# check domoticz
#-m 10 beperkt de duur tot 10 seconden
WHOCOUNT=$( who | wc -l )
if [[ "${WHOCOUNT}" != 0 ]]
then
    echo "Someone is logged on. No further actions" >&2
    exit
fi


status=`curl -m 20 -s -i -H "Accept: application/json" "http://192.168.1.2:8080/json.htm?type=devices&rid=1" | grep "status"| awk -F: '{print $2}'|sed 's/,//'| sed 's/\"//g
'`
if [ $status ]
then
        echo "Domoticz is al gestart"
else
        sudo /etc/init.d/domoticz.sh stop
        sleep 30
        sudo /etc/init.d/domoticz.sh start
        status=`curl -s  --form-string "token=XXXX"  --form-string "user=XXXX"  --form-string "message=Domoticz is opnieuw gestart"  https://api.pushover.net/1/messages.json`
fi


Re: How to monitor if Domoticz is still functioning correctly?

Posted: Monday 12 April 2021 17:45
by lost
erem wrote: Sunday 11 April 2021 16:45 Here is my script to check if Domoticz is active, and restart it if not.
Hello,

Just take care that domoticz service may still be up, with event system still running (schedules/scripts always done...) but web server side down so no user interaction possible.

In fact, this is what happened most to me in the past!

So checking service up may not always do the job & you may send a json query like other suggestions, or use httping (after install) in a simple cron triggered script.

On top of that, when http server side is down, stopping/restarting service may no work: At restart, in some situations http port bind was not possible because still 'in-use". I then had to restart whole system...

In the end, as domoticz down whatever the reason will always mean http server down, I just check this and after a retry the next minute, if still down I 1st restart service, wait 1mn for http server being up and if still down, do a full shutdown/reboot after saving last domoticz log lines for post-mortem debug if needed.

The /root/checkDomoticz.sh file:

Code: Select all

#!/bin/bash
# Check domoticz (from a crontab) is up a restart whole PI if needed...

domoticzUrl=localhost:8080
BN=`basename $0`

WHOCOUNT=$(who | wc -l)
if [ ${WHOCOUNT} -ne 0 ]
then
  logger $BN: Someone is logged on, no check.
  exit 0
fi

httping -c 5 -i 0 -t 1 --ts -v -Wsqg $domoticzUrl
STATUS=$?
if [ ${STATUS} -ne 0 ]
then
  logger $BN: Domoticz httping-ed KO, retry after 1mn... 
  sleep 1m
  # Retry once
  httping -c 5 -i 0 -t 1 --ts -v -Wsqg $domoticzUrl
  STATUS=$?
  if [ ${STATUS} -ne 0 ]
  then
    logger $BN: Still KO. Get last logs and try service restart then wait...
    tail -n 20 /tmp/domoticz.txt | logger
    service domoticz restart
    sleep 1m
    logger $BN: Check after service restart...
    httping -c 5 -i 0 -t 1 --ts -v -Wsqg $domoticzUrl
    STATUS=$?
    if [ ${STATUS} -ne 0 ]
    then
      logger $BN: Still KO. Get last logs and REBOOT...
      tail -n 20 /tmp/domoticz.txt | logger
      /sbin/shutdown --no-wall -r now
      STATUS=$?
    else
      logger $BN: Service restart OK !
    fi
  else
    logger $BN: Retry OK !
  fi
else
  logger $BN: Domoticz ALIVE !
fi

wait
logger $BN: DONE, status= ${STATUS} !!!
exit ${STATUS}
This is called from a root cron job every 30mn, here's the crontab line:

Code: Select all

0,30  *  *   *   *     /root/checkDomoticz.sh > /dev/null 2>&1
For now, this never failed even if I have less issues than in the past (still using v4.10717 with a few web interface/JS fixes, by far the most stable version I had for now).

Just don't forget to stop the cron/rename script when intentionally stopping domoticz service: I screwed a raspbian version update in the middle of the process, with then a non-bootable system & a full reinstall needed!

EDIT : Should add a apt lock check to this script, as I may forget about this when debian 11 will be out...
Added logged user check as suggested hereupper by Janni, that's much better!