Setting up Nagios on Debian or Ubuntu

 Posted by on March 1, 2012
Mar 012012
 
nagios-logo2-150x150

Nagios is an open-source monitoring tool ideal to keep an eye on your systems. It can monitor all sorts of functions to make sure no service is failing on your (remote) systems. It will warn you up-front if things are going bad or when a service becomes unavailable.
I have been using it on numerous occasions to provide an overview of my systems. It scales very well and you can define rolls and access levels.
It even supports paging or sending a text-message to your phone when there's a problem. In this tutorial I will show you how to set up such a system.

Installing Nagios

You will be needing a Debian/Ubuntu system or alike with a working webserver (Apache2) and MySQL. If you do not have these packages first install them by running :

apt-get install apache2 libapache2-mod-php5 mysql-server

We will be needing four packages :

  • nagios3
  • ndoutils-nagios3-mysql
  • nagios-plugins
  • nagios-nrpe-plugin

install these by running this command :

apt-get install nagios3 nagios-plugins ndoutils-nagios3-mysql nagios-nrpe-plugin

After we have installed it we can acces our nagios3 webinterface by going to : 'http://www.example.com/nagios3'

You will be prompted to login. You do not have an account yet so we will make one by running :

 htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Now restart Apache2 by issuing :

/etc/init.d/apache2 restart

When we go back to our webinterface at 'http://www.example.com/nagios3' and we log in we see an interface. If you click on hosts. You will see your localhost. You can look around, if you click on the traffic light you will get details about the checks it is performing.

Client configuration

On the clients you want to monitor you need to install the nagios-nrpe-server package.

apt-get install nagios-nrpe-server

This will install the nagios nrpe client, so our host can communicate with his clients. Now you need to add the trusted IP of the host to the clients. In /etc/nagios/nrpe.cfg you need to change this line :

allowed_hosts= 1.2.3.4

Where 1.2.3.4 is your monitoring host's IP. Save the document and restart the nrpe service.

/etc/init.d/nagios-nrpe-server restart

Adding Objects

Nagios3 has objects, meaning hosts, services, contacts, groups,.... This has one very nice advantage, one can use inheritance, but more on that later. We can find all the relative config files for adding objects in this folder:

/etc/nagios3/conf.d/

Have look at these config files before you proceed. You can add all of the the following directives to any of the .cfg files and they will be included. However I advice that you use logical names to make your life easier. You can always make another config file, and it will automatically be included if it's in this folder.

Adding host-groups

First let's add some host-groups, a host group is a grouping of servers, one host can belong to multiple hostgroups. You could define host groups by :

  • type
  • use
  • organisational unit
  • geographical place
#example hostgroup
define hostgroup{
        hostgroup_name  example		#the name of the host group
        alias           example Servers	#an alias where you can put some info
        members         example-server-1,example-server-2	#servers  in the group
        }

#http-host group
define hostgroup{
        hostgroup_name  HTTP admins		#the name of the host group
        alias           Http Servers	#an alias where you can put some info
        members         webserver-1,webserver-2	#servers in the group
        }

define hostgroup {
        hostgroup_name  all-group
                alias           All Servers
                members         *
		contact_groups  head-admins
        }

Now we have defined two hostgroups with contact groups in them. We still need to define these contact groups.

Adding contacts

First of all let's define some people:

define contact{
	contact_name Lucas
	alias Lucas Kauffman
	service_notification_period 24×7
	host_notification_period 24×7
	service_notification_options w,u,c,r
	host_notification_options d,u,r
	service_notification_commands notify-service-by-email, notify-service-by-sms
	host_notification_commands notify-host-by-email, notify-host-by-sms
	email [email protected]
	pager +32479896254202
	address1			[email protected]
	address2			2424242442
}

Service notification:

  • w = notify on WARNING service states
  • u = notify on UNKNOWN service states
  • c = notify on CRITICAL service states
  • r = notify on service recoveries (OK states)
  • f = notify when the service starts and stops flapping
  • n = the contact will not receive any type of service notifications

Host Notification:

  • d = notify on DOWN host states
  • u = notify on UNREACHABLE host states
  • r = notify on host recoveries (UP states)
  • f = notify when the host starts and stops flapping
  • s = send notifications when host or service scheduled downtime starts and ends
  • n = the contact will not receive any type of host notifications

address1 and address2 are extra addresses you can define, you can define up to 6 addresses. Also mind the notify-host-by-sms directive, if you do not have a GSM or GSM-module installed, this will not work. A tutorial on how to get this working can be found at the bottom of the page. Now we need to define a contact group:

define contactgroup{
	contactgroup_name		example-admins
	alias				example administrators
	members				lucas,filipe,arvid
}

Adding hosts

We can make different host templates and then inherrit from these objects.

define host{
        name                            example-host    ; The name of this host template
        contact_groups                  example-admins
        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL HOST, JUST A TEMPLATE!
}

Now let's define a host that makes use of this template:

define host{
        use             example-host
        host_name       webserver-1		;name we will refer to in nagios
        alias           example Webserver 1
        address         4.5.6.7
}

Adding services to check

First of all we will use the generic service template, because it enables lots of options. If you use some default options on a service you can add this to a template.
Below you can find the default template, I renamed it to example-service:

# generic service template definition
define service{
        name                            example-service ; The 'name' of this service template
        active_checks_enabled           1       ; Active service checks are enabled
        passive_checks_enabled          1       ; Passive service checks are enabled/accepted
        parallelize_check               1       ; Active service checks should be parallelized (disabling this can lead to major performance problems)
        obsess_over_service             1       ; We should obsess over this service (if necessary)
        check_freshness                 0       ; Default is to NOT check service 'freshness'
        notifications_enabled           1       ; Service notifications are enabled
        event_handler_enabled           1       ; Service event handler is enabled
        flap_detection_enabled          1       ; Flap detection is enabled
        failure_prediction_enabled      1       ; Failure prediction is enabled
        process_perf_data               1       ; Process performance data
        retain_status_information       1       ; Retain status information across program restarts
        retain_nonstatus_information    1       ; Retain non-status information across program restarts
                notification_interval           0               ; Only send notifications on status change by default.
                is_volatile                     0
                check_period                    24x7
                normal_check_interval           5
                retry_check_interval            1
                max_check_attempts              4
                notification_period             24x7
                notification_options            w,u,c,r
                contact_groups                  example-admins
                check_command                   check-host-alive; By default I want to check if the host can be pinged

        register                        0       ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
        }

Let's add some services to check :

define service{
        use                             example-service         ; Name of service template to use
        hostgroup_name                  example
        service_description             Disk Space
	contact_groups			example-admins
        check_command                   check_nrpe_1arg!check_disk
        }

define service{
        use                             other-service         ; Name of service template to use
        hostgroup_name                  other-group
        service_description             Disk Space
	contact_groups			other-admins
        check_command                   check_nrpe_1arg!check_disk
        }

define service{
        use                             example-service         ; Name of service template to use
        hostgroup_name                  all-group
        service_description             Disk Space
	contact_groups			admins-group1,admins-group2
        check_command                   check_nrpe_1arg!check_disk
        }

I've added three similar services, but for different hosting groups. If you are in one group you will only get to see your own servers normally. I use the host-group since I want to do this check for every host. If you respecify options after inheritance, they can also be overwritten. If I'd want to do a specific check for a single host only I can do:

define service{
        use            		 	example-service             ; Inherit default values from a template
        host_name             	  	webserver-2
        service_description     	SSH
        check_command   		check_ssh
}

This would add an additional check to webserver-2 to monitor SSH. Now we have added all these configs we need to restart the Nagios 3 service:

/etc/init.d/nagios3 restart

Note that for some checks the NRPE client needs to be configured as well, for instance the check_users,check_disk and check_load need additional commands on the client side. We give these commands as an argument to the check_nrpe command. For instance check_disk, we add a service with command check_nrpe!check_disk as described above. On the client we need to add a line to the bottom of /etc/nagios/nrpe.cfg to instruct on what disk should be checked and what the arguments should be. We also give a threshold for warning (20% free) and critical (10% free). :

#commands
command[check_disk]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/xvda1

The command can also be tested and executed from the shell by running :

/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/xvda1

I prefer to do configure this check on the client side because syntax might be different for the the disk. Sometimes it might be /dev/xvda1, sometimes it might be /dev/sda2,... . Should you want to check from server side, including adding the arguments, then we need to configure server side :

define service{
        use                             example-service         ; Name of service template to use
        hostgroup_name                  all-group
        service_description             Disk Space
	contact_groups			admins-group1,admins-group2
        check_command                   check_nrpe!check_disk!20%!10%!/dev/xvda1
        }

And on the client side :

#commands
command[check_disk]=/usr/lib/nagios/plugins/check_disk-w $ARG1$ -c $ARG2$ -p $ARG3$

Also you need to explicitly enable this in the /etc/nagios/nrpe.cfg file by changing this line from 0 to 1 :

dont_blame_nrpe=1

However this is a security issue and I do not recommend it to do it this way. Please refer to the documentation of the plugins for syntax. After this restart the nrpe service with :

/etc/init.d/nagios-nrpe-server restart

There are many check modules for Nagios, normally we have all modules that are considered stable. If you want to find out what modules there are, take a look here. Below you can see an example of different possible warnings :

Add SSL for Nagios webinterface

It might be handy to use SSL for you webinterface. So if you want to set this up, have a look at this apache2 SSL tutorial.

More is available

There are many, many directives and options, have a look here. If you want to know how you can set up your pager to work with nagios, checkout this blog-post.

  3 Responses to “Setting up Nagios on Debian or Ubuntu”

  1. please give me a complete procedure about how to configure email notification in nagios3

    • Your question is a bit broad, what is it you do not understand? Normally you just install postfix it will automatically send notifications to your clients. I will write a post about how you can notify multiple groups depending on the amount of notifications this weekend.

  2. Hi,

    thank you so much for this great article. I used it to set-up nagios3 on my Ubuntu 12.04 VPS and it works just fine.

    However, I thought I would be able to manage nagios from its web interface but then I figured out that's not the case. So, a quick google search on this led me to this awesome article which installs an extremely good interface to nagios called check_mk.

    I highly recommend check_mk as it simplifies the configuration of nagios at least for those like me which are not that skilled when it comes to manually editing configuration files..

    thanks again

 Leave a Reply

(required)

(required)


four − = 3

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>