Splunk: integrating Nagios and OSSEC

18 Jun 2012

I've started with my bachelor's test and decided to have a look at Splunk. We are building our own cloud using openstack and I'm in charge of monitoring and securing all of our machines and instances. The problem these days is that a lot of apps come with their own webinterface. It's ok to have 3 or 4 or even maybe 5 interfaces, but imagine you have 20 or even 30 different webapplications, all with their own interface. This is where Splunk comes in.

What is Splunk

Splunk is built to do analysis on machine generated data. This means that it can search through logs... a lot of logs. Splunk is built to digest and analyze "big data", this is data that grows rapid in amount and complexity. Splunk is made to crunch all of the data and present you with something a lot more comprehensible.You just feed data to Splunk and it will munch through it. You can do query's on it and generate graphs. This is useful to identify problems in your IT infrastructure, you might have an application that generates tons of logs. With Splunk you just make a query to look for some specific keyword, then you just extract some keywords and you can make a nice graph. For instance if you have a distributed application that generates some errors, you can quickly generate a graph to see where and when the errors occur. For a system administrator this makes your life a whole lot easier to find and resolve problems. The nice part is it can also generate weekly reports or send you alerts when it sees something passing in the logs. There is just one big downside to all this awesomess, Splunk isn't free. You can use it for free for about 500 mb of logs per day (free version is a bit more limited than the enterprise version), if you go over that amount you can start to dig deep, for pricing have a look here.

Nagios integration

There is a plugin for Nagios, it is pretty easy to integrate with Splunk, just go to apps -> find more apps and search for "Splunk for Nagios". Install it and restart Splunk. You will need to add some data to your nagios.cfg:

Configuring Nagios

perfdata_timeout=5
process_performance_data=1
host_perfdata_command=nagios-process-host-perfdata
service_perfdata_command=nagios-process-service-perfdata
host_perfdata_file_mode=a
service_perfdata_file_mode=a
host_perfdata_file_processing_interval=86400
service_perfdata_file_processing_interval=86400
host_perfdata_file_processing_command=nagios-process-host-perfdata-file
service_perfdata_file_processing_command=nagios-process-service-perfdata-file

Add these lines to your command.cfg (do check before you add them if they are not already present):

# 'nagios-process-host-perfdata' command definition
define command{
        command_name    nagios-process-host-perfdata
        command_line    /usr/bin/printf "%b" "$TIMET$ src_host=\"$HOSTNAME$\" perfdata=\"HOSTPERFDATA\" hoststate=\"$HOSTSTATE$\" attempt=\"$HOSTATTEMPT$\" statetype=\"$HOSTSTATETYPE$\" executiontime=\"$HOSTEXECUTIONTIME$\" reason=\"$HOSTOUTPUT$\" result=\"$HOSTPERFDATA$\"\n" >> /opt/nagios/var/host-perfdata
        }
# 'nagios-process-service-perfdata' command definition
define command{
        command_name    nagios-process-service-perfdata
        command_line    /usr/bin/printf "%b" "$TIMET$ src_host=\"$HOSTNAME$\" perfdata=\"SERVICEPERFDATA\" name=\"$SERVICEDESC$\" severity=\"$SERVICESTATE$\" attempt=\"$SERVICEATTEMPT$\" statetype=\"$SERVICESTATETYPE$\" executiontime=\"$SERVICEEXECUTIONTIME$\" latency=\"$SERVICELATENCY$\" reason=\"$SERVICEOUTPUT$\" result=\"$SERVICEPERFDATA$\"\n" >> /opt/nagios/var/service-perfdata
        }
# 'nagios-process-host-perfdata-file' command definition
define command{
        command_name    nagios-process-host-perfdata-file
        command_line    /bin/cat /dev/null > /opt/nagios/var/host-perfdata
        }
# 'nagios-process-service-perfdata-file' command definition
define command{
        command_name    nagios-process-service-perfdata-file
        command_line    /bin/cat /dev/null > /opt/nagios/var/service-perfdata
        }

The

/opt/nagios/var/*

should be replaced with where your nagios logs are stored. For me (Ubuntu 12.04 LTS) this is

/var/log/nagios3/service-perfdata

. Now add these parameters to your templates.cfg:

process_perf_data               1

Now reload your nagios server with

/etc/init.d/nagios reload

.

Configuring Splunk

If you are not running it on the same machine you will need to Rsync it according to the official tutorial, I personally feel 5 minutes can be a bit of a delay if you want to use realtime data. So in my opinion it is preferable to setup a Universal Forwarder for Splunk. The next step to add sources is almost the same as you find on the Splunk for Nagios configuration page.
  • a/nagios.log
    • Click Manager > Data inputs > Files & Directories > New
    • Specify the source: Continuously index data from a file or directory this Splunk instance can access
    • Full path to your data: eg. /log/nagios/nagios.log
    • Tick More settings
    • Set host: constant value
    • Host field value: eg. hostname.abc.com.au
    • Set the source type: Manual
    • Source type: nagios
    • Index: nagios
    • Click Save
  • b/ host-perfdata :
    • Click Manager > Data inputs > Files & Directories > New
    • Specify the source: Continuously index data from a file or directory this Splunk instance can access
    • Full path to your data: eg. /log/nagios/host-perfdata
    • Tick More settings
    • Set host: constant value
    • Host field value: eg. hostname.abc.com.au
    • Set the source type: Manual
    • Source type: nagioshostperf
    • Index: nagios
    • Click Save
  • c/ service-perfdata :
    • Click Manager > Data inputs > Files & Directories > New
    • Specify the source: Continuously index data from a file or directory this Splunk instance can access
    • Full path to your data: eg. /log/nagios/service-perfdata
    • Tick More settings
    • Set host: constant value
    • Host field value: eg. hostname.abc.com.au
    • Set the source type: Manual
    • Source type: nagiosserviceperf
    • Index: nagios
    • Click Save
Now this is not possible when using the forwarder. If you want to use the Universal Forwarder, create the indexes on your Splunk server. When defining the monitor for the file you need to make sure you are using the correct names:

[monitor:///var/log/nagios3/nagios.log]
index=nagios
sourcetype=nagios
[monitor:///var/log/nagios3/hostperfdata]
index=nagios
sourcetype=hostperfadata
[monitor:///var/log/nagios3/serviceperfdata]
index=nagios
sourcetype=serviceperfadata

Configuring performance graphs

When you access the performance graphs by: apps =>Splunk for Nagios=> Performance Data=> Nagios Linux Performance Data (select a host from the drop down menu in the left corner) you will normally see some of the graphs. If for some reason you do not see CPU or Memory performance data, you will need to change the description of your memory performance in your commands.cfg on your Nagios server or in the Splunk script. The supported plugins are:

You can use these scripts with NRPE. The important part is here to use the correct servicedescription when defining a service. If not, Splunk for Nagios will not detect the data. My services are configured like this:

define service{
        use openstack-service
        hostgroup_name openstack
        service_description     CPU Usage
        check_command timeout_nrpe!check_cpu_perf!20
}
define service{
        use openstack-service
        hostgroup_name openstack
        service_description Memory Usage
        check_command timeout_nrpe!check_mem!20
}

I created a new command to do the NRPE because I saw that Nagios did not wait long enough to do the remote call of the scripts (/etc/commands.cfg):

define command{
 command_name timeout_nrpe
 command_line /usr/lib/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -t $ARG2$
 }

If you want to use this approach you will need to download the scripts to every host. Define a command on each host at /etc/nagios/nrpe.cfg

command[check_cpu_perf]=/usr/lib/nagios/plugins/check_cpu_perf.sh 20 10
command[check_mem]=/usr/lib/nagios/plugins/check_mem -w 10 -c 5 -f

If all goes well you get graphs like this:

OSSEC Integration

Integrating Splunk for OSSEC is very easy, just go to Apps and the search for Splunk for OSSEC. Install it, restart Splunk and off you go. If your OSSEC server is running on a separate machine you will need to configure syslogoutput on OSSEC.

  • Inside ossec.conf add a syslogoutput block specifying your Splunk system IP address and the port it is listening on:
  •  <syslog_output>
     <server>172.10.2.3</server>
    <port>10002</port>
     </syslog_output>
    

  • Now you need to enable the syslog_output module and restart OSSEC:
  •  #/var/ossec/bin/ossec-control enable client-syslog
     #/var/ossec/bin/ossec-control restart
    

  • On the Splunk side, add this stanza to inputs.conf:
  •  $SPLUNK_HOME/etc/system/local/inputs.conf
     [udp://172.10.2.4:10002] #IP address of OSSEC server
     disabled = false
     sourcetype = ossec
    

By setting the sourcetype as OSSEC you’re ready to take advantage of the Splunk for OSSEC app which can be found here. Make sure you update any local or network firewalls that this communication is traversing and then restart Splunk.

Remote Agent Management

Now you can add all your OSSEC agents to Splunk. This means the Splunk service account has access to log into your OSSEC server and run commands. Now be careful, because this has some security implications for your environment. The following guide was made by southeringtonp.
  • Remote Access Configuration: First, you will need to make sure that the Splunk server can log into the OSSEC server to run management commands.
    • On the OSSEC server, create a new login account for the Splunk server to use when connecting.
    •     [email protected]_server$   useradd splunk
      

    • On the Splunk server, create an SSH keypair for the root user (or whichever account splunkd is running as), and copy the public key to the OSSEC server.
    •     [email protected]_server$   sudo su -
          [email protected]_server#   ssh-keygen
          [email protected]_server#   scp .ssh/id_rsa.pub [email protected]_server:authorized_keys
      

    • On the OSSEC server, log in as the splunk account and configure the authorized_keys file to allow SSH logins without a password:
    •     [email protected]$   mkdir .ssh
          [email protected]$   mv authorized_keys .ssh/
          [email protected]$   chmod -R go-rwx .ssh
      

    • Verify that the Splunk server can log into the OSSEC server without a password prompt. You MUST do this at least once and say yes to the SSH key prompt. The second run should not prommpt.
    •     [email protected]#    ssh [email protected]_server
          [email protected]$   exit
          [email protected]#    ssh [email protected]_server
          [email protected]$   exit
      

    • On the OSSEC server, configure sudo to allow the splunk login account to run agent management commands without prompting.
    •    [email protected]#   /usr/sbin/visudo
      
      (Add the following two lines):
      
                  splunk  ALL=NOPASSWD: /var/ossec/bin/agent_control -l
                  splunk  ALL=NOPASSWD: /var/ossec/bin/manage_agents
      

    • On the OSSEC server, verify that the new splunk account can run the agent management commands without prompting. If either of the following commands prompts for a password, you may have made a mistake in the previous step:
    •     [email protected]_server$   sudo /var/ossec/bin/agent_control -l
          [email protected]_server$   sudo /var/ossec/bin/manage_agents
      

    • On the Splunk server, verify that you can remotely run the commands without a password:
    •     [email protected]$   ssh ossec-server -t -l splunk sudo /var/ossec/bin/agent_control -l
          [email protected]$   ssh ossec-server -t -l splunk sudo /var/ossec/bin/manage_agents
      
      </li></ul>
      

    • App Configuration:
      • All of the following steps are performed on the Splunk server.
      • Check to see if you already have a local copy of ossec_servers.conf:
      •     [email protected]_server#   cd /opt/splunk/etc/apps/ossec
            [email protected]_server#   ls -l local
        

      • Create the local directory and ossec_servers.conf file if they are missing:
      •     [email protected]_server#   mkdir local
            [email protected]_server#   cp default/ossec_servers.conf local/
        

      • Edit local/ossec_servers.conf and disable the local machine if you do not have an OSSEC server on the local machine.
      •     [_local]
            DISABLED = True
        

      • In local/ossec_servers.conf, add your new server:
      • (If your ssh key is in the default path, the '-i' parameter used in some examples is not reuquired)

            [ossec_server]
            AGENT_CONTROL = ssh ossec-server -t -l splunk sudo /var/ossec/bin/agent_control
            MANAGE_AGENTS = ssh ossec-server -t -l splunk sudo /var/ossec/bin/manage_agents
        

    • Restart Splunkd
    •  /opt/splunk/bin/splunk restart splunkd
      

    Final Word

    I only worked with Splunk a few weeks to see how it works and what it can do. I really am quite impressed with this piece of software, you can feed it tons of data and recover relevant information easily. I think this can increase the productivity of an administrator significantly when troubleshooting a problem.