web analytics

Monitoring adaptec raid using nagios

Adaptec RAID can be monitored using the NRPE plugin which needs to be installed in the nagios server as well as the server being montiored.

Steps to be performed on the server being monitored:
————————————————-

First check the architecture of the server and make sure you download the appropriate source files.

root@host [~]# arch
x86_64

Here its a 64 bit system.

arcconf is the command line utility for Adaptec based RAID.

root@host[~]# mkdir /usr/StorMan/
root@host[~]# cd /usr/StorMan/


root@host[~]# wget http://www.obvious.co.nz/aacraid/arcconf/arcconf-x64-6.10-b18350.tgz
root@host[/usr/StorMan]# tar -xvzf arcconf-x64-6.10-b18350.tgz
root@host[/usr/StorMan]# cd linux-x64/cmdline
root@host [/usr/StorMan/linux-x64/cmdline]# cp arcconf /usr/StorMan/
root@host [~]# /usr/StorMan/arcconf GETCONFIG 1
/usr/StorMan/arcconf: error while loading shared libraries: libstdc++.so.5: cannot open shared object file: No such file or directory

if you get the above error, then install:

yum install compat-libstdc++-33

Now run “/usr/StorMan/arcconf GETCONFIG 1” again; you will get an output such as:

=======================
root@host [~]# /usr/StorMan/arcconf GETCONFIG 1
Controllers found: 1
———————————————————————-
Controller information
———————————————————————-
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Adaptec 2405
Controller Serial Number : 8C4110B2854
Physical Slot : 5
Temperature : 41 C/ 105 F (Normal)
Installed memory : 128 MB
Copyback : Disabled
Background consistency check : Disabled
Automatic Failover : Enabled
Global task priority : High
Performance Mode : Default/Dynamic
Stayawake period : Disabled
Spinup limit internal drives : 0
Spinup limit external drives : 0
Defunct disk drive count : 0
Logical devices/Failed/Degraded : 1/0/0
——————————————————–
Controller Version Information
——————————————————–
BIOS : 5.2-0 (16116)
Firmware : 5.2-0 (16116)
Driver : 1.1-5 (2453)
Boot Flash : 5.2-0 (16116)
——————————————————–
Controller Battery Information
——————————————————–
Status : Not Installed

———————————————————————-
Logical device information
———————————————————————-
Logical device number 0
Logical device name : Data
RAID level : 10
Status of logical device : Optimal
Size : 279788 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
Write-cache mode : Disabled (write-through)
Write-cache setting : Disabled (write-through)
Partitioned : Yes
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled
——————————————————–
Logical device segment information
——————————————————–
Group 0, Segment 0 : Present (0,0) 3LN17RCY000098113NP7
Group 0, Segment 1 : Present (0,1) 3LN2DBGX00009811TUZ5
Group 1, Segment 0 : Present (0,2) 3LN0N1FV00009812YLAZ
Group 1, Segment 1 : Present (0,3) 3LN18AN100009811SN63

———————————————————————-
Physical Device information
———————————————————————-
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device(T:L) : 0,0(0:0)
Reported Location : Enclosure 0, Slot 0
Reported ESD(T:L) : 2,0(0:0)
Vendor : SEAGATE
Model : ST3146855SS
Firmware : 0002
Serial number : 3LN17RCY000098113NP7
World-wide name : 5000C50006996154
Size : 140014 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device(T:L) : 0,1(1:0)
Reported Location : Enclosure 0, Slot 1
Reported ESD(T:L) : 2,0(0:0)
Vendor : SEAGATE
Model : ST3146855SS
Firmware : 0002
Serial number : 3LN2DBGX00009811TUZ5
World-wide name : 5000C50006999700
Size : 140014 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
Device #2
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device(T:L) : 0,2(2:0)
Reported Location : Enclosure 0, Slot 2
Reported ESD(T:L) : 2,0(0:0)
Vendor : SEAGATE
Model : ST3146855SS
Firmware : 0002
Serial number : 3LN0N1FV00009812YLAZ
World-wide name : 5000C5000698060C
Size : 140014 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
Device #3
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SAS 3.0 Gb/s
Reported Channel,Device(T:L) : 0,3(3:0)
Reported Location : Enclosure 0, Slot 3
Reported ESD(T:L) : 2,0(0:0)
Vendor : SEAGATE
Model : ST3146855SS
Firmware : 0002
Serial number : 3LN18AN100009811SN63
World-wide name : 5000C50006995B28
Size : 140014 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
Device #4
Device is an Enclosure services device
Reported Channel,Device(T:L) : 2,0(0:0)
Enclosure ID : 0
Type : SES2
Vendor : ADAPTEC
Model : Virtual SGPIO
Firmware : 0001
Status of Enclosure services device
Temperature : Normal
=========================

Now, to configure NRPE, you need to have nagios plugins installed.

cd /usr/local/src
wget http://downloads.sourceforge.net/nagiosplug/nagios-plugins-1.4.13.tar.gz?modtime=1222335829&big_mirror=0
tar -xvzf nagios-plugins-1.4.13.tar.gz
chown -R root. nagios-plugins-1.4.13
cd nagios-plugins-1.4.13
mkdir /usr/local/nagios
./configure –prefix=/usr/local/nagios
make
make install

Now install the NRPE plugin:
cd /usr/local/src
wget http://internap.dl.sourceforge.net/sourceforge/nagios/nrpe-2.7.tar.gz
tar zxvf nrpe-2.7.tar.gz
chown -R root. nrpe-2.7
cd nrpe-2.7
./configure
make all

This will configure NRPE with:

NRPE port: 5666
NRPE user: nagios
NRPE group: nagios

cp src/nrpe /usr/local/nagios/libexec/
cp src/check_nrpe /usr/local/nagios/libexec/
cp sample-config/nrpe.cfg /usr/local/nagios/libexec/

vi /usr/local/nagios/libexec/nrpe.cfg

Allow localhost and Nagios server to run NRPE

add “allowed_hosts=127.0.0.1,enterhostip” below “# ALLOWED HOST ADDRESSES”
Now, download the python script which provides the output of arcconf from Nagios Exchange:

http://www.nagiosexchange.org/cgi-bin/page.cgi?g=Detailed/2817.html;d=1 and untar it to /usr/local/sbin/

cd /usr/local/sbin/
vi check-aacraid.py
Copy the script from the site mentioned above and paste it in this script

chmod +x /usr/local/sbin/check-aacraid.py

vi /usr/local/nagios/libexec/nrpe.cfg
add the line “command[check_aacraid]=sudo /usr/local/sbin/check-aacraid.py” and comment other examples given.

adduser nagios
chown -R nagios.nagios /usr/local/nagios/

Add “nrpe 5666/tcp # NRPE” to /etc/services

Allow port 5666 in your firewall.

Create a startup script for nrpe:

cd /usr/local/src/nrpe-2.7
cp init-script /etc/rc.d/nrpe
Make sure the path for NrpeBin and NrpeCfg in the startup script is:

NrpeBin=/usr/local/nagios/libexec/nrpe
NrpeCfg=/usr/local/nagios/libexec/nrpe.cfg

chmod +x /etc/rc.d/nrpe
ln -s /etc/rc.d/nrpe /etc/init.d/nrpe

Configure NRPE to start at bootup
chkconfig nrpe on
/etc/init.d/nrpe start

Check if nrpe is running:

root@host [~]# ps aux|grep nrpe
nagios 15075 0.0 0.0 39888 1024 ? Ss 21:25 0:00 /usr/local/nagios/libexec/nrpe -c /usr/local/nagios/libexec/nrpe.cfg -d
root 15343 0.0 0.0 61128 724 pts/0 S+ 21:27 0:00 grep nrpe

Also, check if it is running in port 5666:

root@host [~]# telnet localhost 5666
Trying 127.0.0.1…
Connected to localhost (127.0.0.1).
Escape character is ‘^]’.
^]
telnet> q

Now, you need to provide sudo access to specific commands executed by user ‘nagios’.

For this, edit sudoers file by executing visudo and add following lines:

nagios hostname= NOPASSWD: /usr/local/sbin/check-aacraid.py

or give the following

// nagios ALL=(ALL) NOPASSWD: /usr/StorMan/arcconf GETCONFIG 1 *

Comment the line “Defaults requiretty”

Check if NRPE is able to execute the plugin and return the output code:

root@host [~]# /usr/local/nagios/libexec/check_nrpe -H localhost -c check_aacraid
Logical Device 0 Optimal,Controller Optimal (This output will be shown after configuring in Nagios)

We are going to setup RAID monitoring via NRPE plugin which needs to be installed in the Nagios server as well as the server being monitored.

Configuration on Nagios server:
——————————

Assuming the monitored host is already configured for other services, you may proceed as follows:

cd /usr/local/nagios/etc
Check if the following entries are present in checkcommands.cfg. This will already be present in the server as NRPE has been configured for RAID monitoring in some of the existing servers. If it is present, no need to add it again.

———————-
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}

define command {
command_name check_aacraid
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_aacraid
}
———————-

Add the following in services.cfg

—————

define service{
use generic-service
host_name <hostname of the monitored server>
service_description RAID
is_volatile 0
check_period 24×7
max_check_attempts 3
normal_check_interval 3
retry_check_interval 1
contact_groups cp-techs
notification_interval 120
notification_period 24×7
notification_options w,u,c,r
check_command check_nrpe!check_aacraid
}
—————–

Check Nagios configuration for any errors:

/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

If you see “Things look okay – No serious problems were detected during the pre-flight check”

Restart nagios
/usr/local/etc/rc.d/nagios stop
/usr/local/etc/rc.d/nagios start

nrpe should run as a deamon in nagios:

/usr/local/nagios/libexec/nrpe -d /usr/local/nagios/etc/nrpe.cfg

Test if NRPE can execute the plugin on the remote server:

/usr/local/nagios/libexec/check_nrpe -H <remotehost> -c check_aacraid

It should return “Logical Device 0 Optimal,Controller Optimal” if the configuration is proper and if RAID is in good health.

That’s all.

if u get errors, try running the script by logging in as user nagios .. sometimes there can be sudo errors
if u get error line no output from arcconf, check sudo permission
chmod 4111 /usr/sbin/sudo

2 comments to Monitoring adaptec raid using nagios

  • Savannah  says:

    Awesome blog!

    I thought about starting my own blog too but I’m just too lazy so, I guess Ill just have to keep checking yours out.
    LOL,

  • Vijay  says:

    Great documentation!! Thanks for the share

Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>