web analytics

Internal Server Error monitoring in Nagios

Here, our requirement is to monitor a site and if it returns internal server error, alert us through the nagios interface. Here the monitoring tool checks if the domain returns a 500 error and if it does restart the web service. This is applicable for cases where internal server error is caused due to fast cgi issues.

Checking is done using NRPE, installed in the server being monitored. Install NRPE and nagios plugins.

yum install nagios-nrpe


yum install nagios-plugins-nrpe
rpm -q –list nagios-plugins-1.4.13-1.el5.rf
chkconfig nrpe on

vi /etc/nagios/nrpe.cfg
Add the IP address of the remote server also to the list of allowed_hosts.

allowed_hosts=127.0.0.1,xxx.xxx.xxx.xxx(IP of server being monitored)

Enter the check command
command[check_divya]=/opt/nagios_checker.sh

vi /opt/nagios_checker.sh (script which checks the HTTP status code and if 555 matches, restarts lighttpd)

!/bin/bash
check()
{
res=$(/usr/lib64/nagios/plugins/check_http -H domain.com|
grep -i “500 *Internal” 2>&1 > /dev/null)
test=$?
}
check
if [ $test -eq 1 ]
then
echo OK
exit 0;
elif [ $test -eq 0 ]
then
sudo /etc/init.d/lighttpd restart;
sleep 6;
check
if [ $test -eq 1 ]
then
echo OK
exit 0;
elif [ $test -eq 0 ]
then
echo CRITICAL
exit 2;
fi
fi

chown nagios.nagios nagios_checker.sh
chmod 700 nagios_checker.sh
visudo
Comment out ‘Defaults requiretty’
nagios hostname=NOPASSWD: /etc/init.d/lighttpd

add port 5666 in csf

in nagios server ,
first check if its running
/check_nrpe -H hostIP -c check_divya

if yes, give in services.cfg

define service{
use generic-service
host_name domain.com
service_description check500error
is_volatile 0
check_period 24×7
max_check_attempts 3
normal_check_interval 3
retry_check_interval 1
contact_groups cp-techs
notification_interval 120
notification_period 24×7
notification_options w,u,c,r
check_command check_nrpe!check_divya
}

Now, check in the nagios interface if its working. If the alert remains for sometime, and the automatic reboot doesnt fix it, login to the remote server, stop nrpe and correct the issue and then restart NRPE.

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>