Daniele Duca
BOFH excuse for today:  Data for intranet got routed through the extranet and landed on the internet.
Server date is: 25/11/2024
 

This site best viewed with eyes

Say NO to software patents!

21/01/05 - Release 0.4

What does a real sysadmin needs?

He needs something that can make him hate his work more, like a tool that sends SMS to his mobile phone when he is drunk in some exotic beach, complaining that some server is not responding to pings. Yes, he needs that, he needs TMON ! :>

This tools does exactly that, you can setup it to check your servers every N minutes, with the hated ability to be warned with an SMS when something goes wrong. Let's see how.

Prerequisites:
  1. Curl
  2. a Vola.it account
Limitations:
  1. Since Vola.it is an italian SMS provider and I have an italian mobile phone, I can currently receive SMS without problems. I don't know if it will work also for non italian mobiles, but probably every country has a similar provider, just look around for it. The changes you should make to the script should not be too much difficult.
From now on I assume you have a Vola.it account. The setup is very simple; first, download TMON to your monitor server, and remember to make it executable with

# chmod +x /path/to/tmon
 

After that, edit the script and look for these lines:


tmon
..
VOLAUID="yourVolaUID"
VOLAPASS="yourVolaPASS"
..

 

As you can easily imagine, write your userid and password in these lines. After that, look for:


tmon
..
MASTERS[1]="Your Name"
MASTERS[2]="Your-mobile-number-with-prefix"
MASTERS[3]="Another Name"
MASTERS[4]="Another-mobile-number-with-prefix"
..
 

If you want to be the only one that should be warned, compile only the first couple of array's elements, and delete the remaining. But if you want to disturb also your colleagues, you can write as many numbers as you want, keeping in mind that the syntax is very important. Elements MUST be compiled two at the time, with the first containing the name and the second containing the phone number, WITH country prefix and WITHOUT the "+" sign.

When you are ready, go ahead, where you see these lines:


tmon
..
echo "$ST - Starting TMON script"
pinghost some.host.name

host[1]="another.host.name"
host[2]="25"
host[3]="110"
fullcheck

host[1]="any.another.host.name"
host[2]="80"
fullcheck

..
 

If you need only to be sure that a server is responding to ICMP pings, use the pinghost function. The fullcheck function is similar because it pings the hostname, but it will also check for opened TCP ports. Be warned that an opened tcp port doesn't means that the service is correctly responding, because you could have Inetd or another superserver that keeps the port opened, but something could go wrong beyond it. Just keep this in mind.
If you use the fullcheck function you MUST fill the host array before calling it. In the first element write the hostname and in the others the ports you want to check.

After that, add a crontab entry to run the script every 5 (or whatelse) minutes:


# crontab -l
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /bin/tmon >>/var/log/tmon/check
 

Finished! Some notes about this scripts. When a service is found to be broken, or a server doesn't replied to the ping, the script will delay the warning for another cycle. This is to avoid temporary network problems that could trigger too much false positives. If you receive an SMS it could mean basically these things:
  1. The server or the service is really down
  2. The network connection from the monitoring station to the monitored server could be down.
  3. The network is too much congested
  4. The monitored server is under a very heavy load (very rare, but possible)
If you do modifications to this script to support another SMS provider, please drop me a note, so I can update this document with your version. Thank you :)


DISCLAIMER

No liability for the contents of this document can be accepted. Use the concepts, examples and other content at your own risk. There may be errors and inaccuracies that may damage your system. Proceed with caution, and although this is highly unlikely, the author does not and can not take any responsibility for any damage to your system that may occur as a direct or indirect result of information that is contained within this document. You are strongly recommended to make a backup of your system before proceed and adhere to the practice of backing up at regular intervals.

Informations on this page are released under the GNU FDL License
This page last updated: 21/01/05