Nagios Config Part 2: adding service check via SSH

•August 4, 2017 • Leave a Comment

As part of my attempts to setup Nagios, I will now configure it to check services instead of just hosts. It is a good thing to know more than if a server is simply dead, which is all that a “ping” will tell you. If a machine is physically connected, and turned on, and booted up in a very basic way, it will respond to a “ping” to basically say, “yes, I am here.” But what if it’s not sharing the cat videos it’s supposed to? What if the hard drive is full and it can’t do its job? This is what service checks are here to check.

There are two ways to talk to the machine and check things that aren’t readily available to the outside world (on the flipside, checking that the cat videos are being shared should be simple since this is already a publicly available asset). The first is using SSH, and the second is through NRPE (Nagios Remote Plugin Execution).

SSH is a connection service typically used to open a “terminal” and work on the machine without being physically seated at it. Most servers have it enabled. Leveraging it, a person could drop Nagios plugins (the programs that run the actual checks and return a text response) in a user folder, and execute them remotely without installing anything else.

The downside is that checking hundreds or thousands of things would be slow because a separate SSH connection is created for each, and the overhead of setting up all those connections would be significant. Thus, the NRPE, which will allow Nagios to make multiple checks with the same connection. The downside is the daemon needs to be installed and left running. If you are the admin on the target machine, this is doable.

For now, though, I am going to setup the process to use SSH. It will mean minimal installation of software on my server, and I don’t plan to run anywhere near 100 checks on the machine.

Configure SSH Public Key

For this to work, SSH needs to be able to function without interactively entering a password. We do this by setting up a public access key. The process seems wonky and backward. Think of the private key as your metal key; you don’t want anyone to get access to this. The public key is the equivalent of a lock in a door, except that multiple locks can be installed in a door, and they each can open the door.

So starting on the machine to initiate the connection (“nagios” in my example), and installing the public key on the server to be checked and receive the connection (“nas4free”, or in my example), I executed the following commands

nagios@nagios:~$ ssh-keygen -t rsa
 Generating public/private rsa key pair.
 Enter file in which to save the key (/home/nagios/.ssh/id_rsa):
 Enter passphrase (empty for no passphrase):
 Enter same passphrase again:
 Your identification has been saved in /home/nagios/.ssh/id_rsa.
 Your public key has been saved in /home/nagios/.ssh/

then the newly created public key gets sent to the receiving server. touch is used to prevent an existing authorized_keys file from getting overwritten (also, this assumes the .ssh folder exists)

nagios@nagios:~$ rsync .ssh/ nagios@
 nagios@'s password:
nagios@nagios:~$ ssh nagios@
 nagios@'s password:
[nagios@nas4free ~]$ cd .ssh
[nagios@nas4free ~/.ssh]$ touch authorized_keys 
[nagios@nas4free ~/.ssh]$ cat >> authorized_keys 
[nagios@nas4free ~/.ssh]$ logout
Connection to closed.
nagios@nagios:~$ ssh nagios@
Last login: Sun Jul 30 16:56:59 2017 from
[nagios@nas4free ~]$

now that nagios can log in without a password, the Nagios software can run its checks.

Adding Plugins To The Remote Machine

On my NAS, I don’t have the ability to compile the plugins right now. So what I installed was the plugins package (with precompiled binaries) into its own folder, copied out the binaries, and then deleted the folder

so as root:

mkdir -p /mnt/data/opt/nagios/tmp
pkg --rootdir /mnt/data/opt/nagios/tmp install -y nagios-plugins
chown -R nagios:nagios /mnt/data/opt/nagios

and then as nagios:

cd /mnt/data/opt/nagios
mkdir plugins
mv tmp/usr/local/libexec/nagios/* plugins
rm -rf tmp

Add Service Checks

The service checks will use the check_by_ssh plugin on the Nagios server, and various plugins run remotely on the machine to be checked. So for example, to add a swap check:

In my /usr/local/nagios/etc/objects/servers/nas4free.cfg, I added:

define command {
  command_name check_swap
  command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -C "/mnt/data/opt/nagios/plugins/check_swap -w 92% -c 85%"

define service {
  use generic-service
  service_description Check Swap
  check_command check_swap
  hosts nas4free.local
  register 1

There are ways to make things more general (and make less typing when adding similar services to this or other hosts in the future), but for now, this makes things as self-contained as possible.

verify the configuration, restart the service, and the newly created service shows up in the “Services” view. Yay!


Nagios Config Part 1: setting up host monitoring

•July 28, 2017 • Leave a Comment

In my quest to learn more DevOps tools, I previously setup Nagios. I had intended to get the host monitoring up and running then, but I sort of had to set it up twice, so here’s the post about actually monitoring the servers. I have a couple servers that I would like to use as examples. The first is my NAS (Network Attached Storage) server. This is actually the thing I would most like to keep an eye on in the long run.

I setup Nagios Core. I was expecting to be able to add servers to monitor through the GUI. Instead, one has too add servers by creating cfg files. I guess that’s how they make Nagios XI a value-add. Ah well, I can hang, especially when we are talking about doing things the free way.

Useful error messages are useful

At first, I was simply restarting the service. When it would fail to start, though, it wouldn’t tell me WHY. Also, if I restart, and my configuration breaks things, the service is now down. To prevent this, Nagios has a verify flag.  At first it sounded like you were supposed to provide each file to be checked. But eventually I tripped on the fact that you are only supposed to provide your main config file. Then, when everything passes, you can restart the service. So, this line…

cd /usr/local/nagios/
bin/nagios -v etc/nagios.cfg

…will give you useful error messages about your whole configuration BEFORE you restart the service.

Enabling config files

What took me embarrassingly long to figure out was my cfg file wasn’t being loaded by the system. I saved my file, restarted the service, and didn’t get any error message. Excellent! Only to enter the hosts section of the page and not see my host. Boo!

Warning: ALWAYS backup the original nagios.cfg before you start modifying it so you can refer to its default state!

What I discovered was at the top of the /usr/local/nagios/etc/nagios.cfg file, which is expressly used by the service as the main config file, there is a seperate cfg_file line for each of the default files. What I hadn’t done is add a line for my new file. I think this is redundant and not a great idea. Obviously, I’m not the first one to have thought that, because there is an example line in nagios.cfg to process all cfg files in a directory. So, I disabled this line


enabled and modified this one


and then made that folder and moved in the localhost.cfg file (after changing t the user nagios)

sudo su nagios
mkdir /usr/local/nagios/etc/objects/servers
mv /usr/local/nagios/etc/objects/localhost.cfg /usr/local/nagios/etc/objects/servers

restarted the service, went to the hosts section of the Nagios page, and there is my localhost, still happy.

Adding my new host

Now that I know how to get the config files to be called, I’ll create my new host. I made the following file

sudo su nagios

vi /usr/local/nagios/etc/objects/servers/nas4free.cfg

and paste this code inside

define host {
  use linux-server
  host_name nas4free.local
  alias nas4free

restart the service, reload the hosts page (for a little while it says “PENDING”), and…


Next steps

So a ping test is fine, but all it really tells you is if the server is on the network. None of the services may be available, or it may be using so much swap that it’s actually locked up. Next I’ll start adding service checks to actually monitor the health of the server, not simply “I’m not dead, yet!”

Nagios on a Rasberry Pi or VirtualBox?

•July 14, 2017 • Leave a Comment

I was digging around for DevOps software to learn, and an old name came to the surface: Nagios. This is monitoring software. It tells you when a server goes down, or a hard drive is acting strange and may be close to failing. Since I had never actually gotten around to learning much about it, and since DevOps is all about commanding swarms of computers, it seemed like a nice place to look into.

But where should it live? My main server is an obvious choice, but I’ve been having a hard time getting it to run everything I want already, plus, it’s something I would want to monitor anyway. The target machine should probably be as simple and dependable as possible. It also occurred to me it should probably not run anything else, both so it can concentrate on its job and remain as reliable as possible. I lamented that such a server would under-utilize any investment in hardware.

Then two things occurred to me: a Rasberry Pi and a VM. If this could work on a Rasberry Pi, I think it would be the best in the long run. It could monitor anything I get running in the cloud. And even if you were truly paranoid, it could probably run until the apes build their society on a cheap UPS. What I’ll do in the short term, though, is hold off on spending the $40 at Adafruit, and run it in a VM. There is a pre-built VM on their site, and I can easily play around with it monitoring my “render farm” while I get this up and running.

So first I downloaded the VM from, and then imported the .ova file into VirtualBox on my Mac. The default login information and URL are shown on the splash screen.

I then added the machine to my “render farm” network, and opened a port into it like I did before, and now I can view the start screen by going to localhost.


Click Access Nagios XI, fill in whatever fields you want to change, click Install, and you’re done (“Installation Complete”). Login, and you’re offered a tour and presented with the home screen.

I started adding things to be monitored, and only THEN did I see this is the PAID version! I don’t have $1000 to play around! I could keep recreating the VM, which would be a pretty simple way of restarting the trial, but I think I’ll investigate the open source version, Nagios Core.

So I scrapped the other VM (aren’t VMs wonderful?), and I made a fresh Ubuntu VM. I then followed these instructions. But they weren’t quite complete, so I filled them in with these instructions. I installed the

I installed the pre-req’s with

sudo -i
apt-get update
sudo apt-get install wget build-essential apache2 php apache2-mod-php7.0 php-gd libgd-dev unzip sendmail

and created the user this will run under

useradd nagios
groupadd nagcmd
usermod -a -G nagcmd nagios
usermod -a -G nagios,nagcmd www-data

update: this only created the user, it didn’t allow me to login as the user. Later on, I had to enable the user, by giving it a password and home directory

passwd nagios
 Enter new UNIX password:
 Retype new UNIX password:
 passwd: password updated successfully
mkdir /home/nagios
chown nagios:nagios /home/nagios
chsh -s /bin/bash nagios

Then, I downloaded the source files

mkdir /tmp/nagios
cd /tmp/nagios

NOTE: Nagios is up to 4.3.2, but I didn’t realize this while I was following instructions. I’ll upgrade if I decide to move forward with my Nagios system.

unpack the files

for e in *; do tar -xvzf $e; done

install the main system

cd /tmp/nagios/nagios-4.2.0
./configure --with-command-group=nagcmd --with-httpd-conf=/etc/apache2/ --with-mail=/usr/sbin/sendmail
make all
make install
make install-init
make install-config
make install-commandmode
make install-webconf
cp -R  contrib/eventhandlers/ /usr/local/nagios/libexec/
chown -R nagios:nagios /usr/local/nagios/libexec/eventhandlers
/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg

added the new site to Apache

/usr/bin/install -c -m 644 sample-config/httpd.conf /etc/apache2/sites-available/nagios.conf
a2ensite nagios
a2enmod rewrite cgi

created a (web gui) login password for the nagiosadmin user

htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

And then installed the plugins

cd /tmp/nagios/nagios-plugins-2.1.2
./configure --with-nagios-user=nagios --with-nagios-group=nagios
make install

set the service to start on boot

sudo update-rc.d nagios defaults

and restarted all the services

service nagios restart
service apache2 restart

I added the VM to my “render farm” network, and forwarded port 80, went to http://localhost:8081/nagios, logged in with my newly created web gui password, and…


Next, I need to start adding machines to monitor to fully test things out and see if I want to invest the time putting this on a Raspberry Pi.

Experimenting with Docker

•June 30, 2017 • Leave a Comment

Following this guide, I did the following.

On my Amazon instance, I ran

sudo yum install -y docker
sudo service docker start

add the user to the docker group so that you don’t need to keep calling sudo

sudo usermod -a -G docker ec2-user

logout, and log back in to pick up the user changes.

Use git to clone an example PHP Docker application.

sudo mkdir -m 777 /Docker
cd /Docker
git clone

wait, it said a Dockerhub login was optional, but now you need it to proceed in the tutorial? Maybe not. I built the example container with

docker build -t ecs-example ecs-demo-php-simple-app/

then viewed the containers with

docker images

amongst other information, this gives the ID of what was just built (e.g. 4189cb456b0b). Not that this was needed after I gave it a label. So to run it, and make it available

docker run -p 9000:80 ecs-example

-p maps port 9000 on the machine to port 80 in the container.

I also downloaded and built the container in this tutorial. It features cat gifs. It deserves an honorable mention for levity, at the very least.

git clone
docker build -t flask-app docker-curriculum/flask-app/docker run -p 9000:5000 flask-app
docker run -p 9000:5000 flask-app

and then -dit to daemon-ize it and run it in the background, and ‘–restart unless-stopped’ to always have it running

docker run -p 9000:5000 -dit --restart unless-stopped flask-app

restart the instance, and lo and behold, cat pictures are still happening. Yay!

The container format is obviously powerful. Just like virtual machines, you can create instances, try things out, and then simply throw the container away if you don’t decide to keep it. I think that I will have to spend some time in the future learning to write my own containers.

Update on my NAS server

•June 24, 2017 • Leave a Comment

At first, it seemed my modifications made the memory usage steadily grow. That doesn’t appear to be the case any more. As with so many things, it appears to be more complicated.

Saving memory

It appears what triggered the problem was that the base system did not fit into memory with my extensions. So swap started being used, and eventually lead to the system locking up. I was able to give myself a bunch of breathing room by tuning ZFS. ZFS tries to cache files in memory (called ARC) to improve performance (subsequent reads/writes to a cached file only go as far as memory, instead of all the way back to the slow disk). I started by specifying less memory to be used by this.

This is a pretty good guide for tuning ZFS … I think. Honestly, making changes didn’t have as much effect as I thought it would. There are quite a few php-cgi processes that seem to be taking up memory, and doing something to them will probably be next. What should be noted, from that guide, the various values need to be added to loader.conf. On NAS4FREE, this is done in the GUI, under System > Advanced, and then the sub-tab loader.conf, which is to the right.

Cleaner use of mount_unionfs

before, when I used mount_unionfs, it mounts the new folder above the target folder. While appears to allow things to work as they should, EVERY change get added to the new folder. The result is a lot of operating cruft winds up in this folder. So when you update the firmware (which is the only reason to use the embedded system over the full one), some of the old system is carried over, and things can get corrupted.

What I ended up doing was to mount the new folder under the target folder, and manually target the extension installs to the new folder, an idea that came from this article.  I modified it, since that article had some redundency, and needed updating to get the ‘pw’ calls to work.

I moved pieces of this into startup scripts so that these changes appear permanent, but here’s the basics

make the extensions folder

sudo mkdir /mnt/data/opt

make folders and files to allow ‘pw’ calls to work (I need this to install git and ffmpeg)

sudo mkdir /mnt/data/opt/etc
cd /mnt/data/opt/etc
sudo touch group master.passwd passwd pwd.db spwd.db

install my packages

sudo --rootdir /mnt/data/opt install -y git ffmpeg
sudo --rootdir /mnt/data/opt install -y python py27-sqlite3 rsnapshot wget

now, everything that is new comes through in the mount_unionfs, but nothing is inadvertently deposited in my new folder unless I expressly put it there.

Dangerous Swapping

Despite trying to make everything fit in memory, I still had swap getting used. This in itself isn’t a problem (I have learned from reading that swap is actually quite elegant). But if things are getting added to swap faster than they are coming out, this is obviously not sustainable. I wrote myself a script, that checks memory and swap usage, and if everything is getting too dangerous, reboots the system. This simple script runs before rsnapshots, and also intermittently throughout the hour. This is pretty in-elegant way of dealing with things, but I got tired of manually restarting the system when I noticed it was hung. Also, the hangs were usually occurring during a rsnapshot, and it messes up the snapshot files.

In a pinch, when swap is getting up around 15%, I can

sudo killall php-cgi

but this only works for a little while, and because these processes are largely inactive, they tend to get pushed to swap, so that’s where most of the cleanup shows up. There were some people I saw changing the settings for how long these are allowed to live idle. I can’t find a persistent way of doing this on FreeBSD right now.


I have used Fuppes since forever. It works for me. The updates are slow because the whole db needs to be updated, the cpu usage even while idle is ridiculous, but it works. I tried miniDLNA, but it would disappear after a while, and I could only get it back with a system reboot: bleh. So I went back to Fuppes. As far as I can tell, Fuppes hasn’t been updated in years, but the version included with NAS4FREE doesn’t appear to be the latest and greatest. I tried to update it, but the compile instructions don’t work. So now that I have the ability to install packages again, I’ve been trying Mediatomb. So far it seems to run extremely efficiently and seems pretty feature rich. Setting up share directories wasn’t obvious at first. You have to click Filesystem, browse to the directory you want to add, click on it, and then in the upper right you can add it as a scan directory. I set it to autoscan. I don’t know how brute-force this is going to turn out. There’s also a inotify option, but the option isn’t compiled into the package, and I would also have to reorganize my folder structure. But given it’s a sqlite3 database, if needed, I can write a script to update single files in the future.

What I found while using Mediatomb, is that it doesn’t seem to support BubbleUPnP’s search feature, which Fuppes used to do. It doesn’t seem to be a configurable item, so now I’m deciding how annoying this is, because when I need this feature, I really need it!

Installing Jenkins on the “farm”

•June 16, 2017 • Leave a Comment

Previously I had a couple posts on creating my own “render farm”, and installing SaltStack on it.

I was deciding between Puppet and Jenkins for the next thing to setup and explore. I decided to let Dice decide. Puppet: 123 hits. Jenkins: 172 hits. Looks like Jenkins wins.

So, as according to the Getting Started guide, and this page about installing Java, on my queen:

sudo apt-get install -y default-jre # Java is a pre-req
wget -q -O - | sudo apt-key add -
sudo sh -c 'echo deb binary/ >> /etc/apt/sources.list.d/jenkins.list'
sudo apt-get update
sudo apt-get install jenkins

Port Forward into the “farm”

I have my farm setup on its own internal “NAT Network”, which means, for me to access Jenkins, or to ssh in (don’t forget to

apt-get install -y openssh-server

to enable ssh), I need to edit the network’s “port forwarding” rules, and…



( is the current address of my queen)

Part of the Jenkins install is that the process is started, so now, with the port forwarding setup, on the host machine I go to http://localhost:8080, and…


I then follow the on-screen prompt to finish the security, and…


I hit the “X” in the upper right

Update: I ended up doing this over again on its own VM. This time, though, I hit “Install suggested plugins.” This ended up being important as further tutorials used these suggested plugins to proceed.

Salt States on my “render farm”

•June 9, 2017 • Leave a Comment

Previous entries:

Now I’m going to figure out States. From my basic understanding, it’s a recipe for setting up a machine. It seems to me that it does the same job as Puppet or Chef, except that you can also run remote commands, too. Which would seem to make it WAY more powerful than either of those.

Running through the official tutorial, my first snag turned out to be the file format. I didn’t have a /srv/salt folder, and I couldn’t run a state with an absolute file, so I created /srv/salt and put my *.sls file in it. It appeared like the state modules weren’t installed, so I spent a long time trying to figure out how to install it. But when I had originally created the file, I indented the file with tabs. When this didn’t work, I had removed the tabs. The subsequent error made me think the modules weren’t installed. But it turns out the *.sls format uses structured whitespace (which being a Python guy, I can appreciate), but it uses specific whitespace: two spaces. Once I got everything correctly “tabbed”, it ran as expected.

I used the example state, and then needed to create the ‘common’ state, since I have a clean install. I added ‘net-tools’ to my common file, since it brings in ‘ifconfig’, which I found early was missing. So my files ended up being:


    - pkgs:
      - rsync
      - lftp
      - curl


    - pkgs:
      - openssh-server
      - net-tools

Then I ran the following commands to apply it to drone1 to test things out

salt drone1 state.apply nettools

Then I created a top.sls file to target states. This allows you to name machines based on their function, and then have states get run automatically. My top file looks like this:

    - common
    - nettools

so then to get everything setup according to this script, on the queen:

salt '*' state.apply

and it just knows to look at the top file and setup everything according to the names