Adding new hosts and services

The uxmon-net file

The agent (uxmon) will read its configuration from the file /etc/bigsister/uxmon-net

A good and well structured uxmon-net file consists of:

  • Golbal DEFAULT values (for certain arguments)

  • DESCR iptions of the monitored systems, so the the subsequent healthchecks know what environment they can expect and which features are available on the tested plattform

  • The healthchecks together with the required parameters

Example 2.1. A simple uxmon-net configuration file

# Set the default SNMP community to "public", the
# default frequency is 1/5min anyway
DEFAULT         community=public frequency=5 perf=5 ALL

# Set the default version and protocol for rpc checks to "1" and "udp"
DEFAULT         version=1 proto=udp rpc
DEFAULT         proto=icmp ping

DESCR           features=unix,linux,redhat,redhat71 localhost
DESCR           features=unix,linux,suse,suse70 susix
DESCR           features=unix,linux,redhat,redhat71 rejectix
DESCR           features=unix,linux,redhat,redhat72 publizistix-ext

localhost       perf=30 disk                    memory
localhost       cpuload                         syslog
localhost       fs=.0(20-60) dumpdates          dns
localhost       proc=sendmail min=1 max=30 procs

susix           ping                            perf=5 service=smtp tcp
susix           dns                             ntp

rejectix        ping                            dns


publizistix-ext(publizistix)    ping                            dns
publizistix-ext(publizistix)    perf=5 service=smtp tcp
publizistix-ext(publizistix)    perf=5 service=ssh tcp

Multiple uxmon-net configuration files

Under Unix multiple uxmon-net files may exist - their name must start with uxmon-net followed by an arbitrary suffix (note that e.g. uxmon-net.bak is a valid agent configuration file!). bb_start will in this case start up an instance of uxmon for each of these configuration files.

The uxmon-asroot configuration file(s)

Under Unix, the access to some (socket) commands might be prohibited for ordinary users and thereby for the bigsister account. I.e. on some systems only the root user can access icmp socket commands to send icmp pings. For such cases, Big Sister offers the option to create a special configuration file named uxmon-asroot. For the healthchecks configured in the uxmon-asroot file, bb_start will start an uxmon instance running with root privileges.

As for the uxmon-net configuration file, multiple uxmon-asroot files with a name starting with uxmon-asroot and an arbitrary suffix can be used.

Network objects with simple names

[Note]Note

A resolvable name or an IP address (reverse lookup is not required) is a necessity for the agent (uxmon) to find the system on which the health checks need to be done. It is fundamental to understand that this name does not need to be identical with the name under which the data collected from this host is processed and grouped by the Big Sister server. In some cases it might be convenient to use the hostname for both the agent and the network object name in the Big Sister server. In other cases a somewhat finer granularity might be desired.

The syntax of the uxmon-net file is kept very simple: each entry starts with the name or IP address of the system to be monitored, followed by a list of health checks that should be applied to this system, e.g.:

localhost fstype=ext2 disk

will create a network object named like the output of /bin/hostname on the Big Sister server and run the diskfree test against all mounted partitions holding an ext2 file system of the local system. In the HTML status display on the server all test results will be grouped under the name of the network object. A working name resolution on the agent node, either through the /etc/hosts file or DNS, can be considered a prerequisite when using hostnames in the uxmon-net file.

[Warning]Warning

In the uxmon-net file either localhost or the output of /bin/hostname can be used to point checks to the local system. Unless no alias name (see section Network objects with alias names) is specified, the gathered data will be reported to a network object in the Big Sister server which has the system's hostname!

Network objects with alias names

Sometimes it is necessary to differ between the host you are running a check against and the name of the network object which is reported to the server. For instance imagine you are running a check against a multihomed host (a host with multiple IP addresses) and you want to access this target system via a well-defined network interface. In this case you can use a configuration line like the following:

192.168.1.17(myhost) ping

This will run the ping test against 192.168.1.17. The result of the test however is then reported to be related to the network object myhost (see figure 2.2).

Figure 2.1. Host Aliases

Host Aliases

A host definition of the form

name1(name2)

is always treated by uxmon in the following way: name1 is used internally by the agent while name2 is the name reported to the Big Sister server. The server will be completely unaware of the uxmon internal name (name1).

[Tip]Tip

The alias associated with a physical host can be chosen on a per healthcheck basis. Thus, healthcheck results can be displayed on the status pages under different logical host names. For instance, for a physical host A system health could be reported as A-system, while application healthchecks report under A-applications, making it possible to group check results on the Big Sister display in both a system/hardware and an application view.

Health checks

Big Sister is using intelligent health checks to ensure system, application and content availability.

  • System availability can be ensured i.e. through remote ping checks on (one of) the system's NICs. Disk capacity, errors in syslog and other parameters can be made available for futher procedssing through agent software (uxmon) installed on the monitored system.

  • Application availability can be ensured through an agent on the system which is running the application or, if it is a network application through binding to the tcp- service port. Full path healthchecks which go beyond checks of the tcp- service port exist for some network applications i.e. mail-, web- or DNS-servers.

  • Content availability might be desired especially on webservers. This special healthcheck tests if a specified URL is available.

Defining the scope of monitored systems: DESCR

This section applys for Big Sister rev. 0.98beta6 or later. This has been developed for somewhat broader use, so that Big Sister can differentiate between device types (computer, router, switch and others) and tailor healthchecks for the features coming with the specified system. Currently Big Sister is using only one device type (computer).

The methods used for monitoring a certain computer depend on the operating system and whether it is the localhost or a remote system. I.e. an uxmon agent on the localhost will monitor the sendmail service by checking if there is at least one deamon running. For testing a remote system a connection to tcp port 25 will be established and await certain patterns or return codes. A well formed configuration file will have one DESCR statement for every monitored system (see also A simple uxmon-net configuration file).

Big Sister knows the following features:

aix
bsd
hpux
linux
macos
netware
remote
local
solaris
sysv
tru64
true64
unix
windows
[Note]Note

The remote and local features are automatically chosen by uxmon depending on wether the target host is considered the host the agent is running on or some remote host. Adding remote or local explicitly to the features list in uxmon-net will override this automatism.

Common syntax

uxmon-net entries may span multiple lines. Usually a line end will automatically end the respective configuration entry. However if a line ends with a \ character the following line is assumed to be part of that entry also. If a '#' character is found all the characters behind # are treated as a comment.

Most of the checks accept arguments. Arguments are always preceeding the check and are of the form

argument1=value1 argument2=value2 ... check

The argument list only applies to the check immediately following the last argument in the list, thus

myhost proto=icmp ping ping

Figure 2.2. Agent to server

Agent to server

will run two ping checks against the host myhost, the first one will do an ICMP ping, while the second will do a ping using the default protocol (usually UDP). The proto argument does not influence the second ping check, but of course you can do something (rather senseless) like

myhost proto=icmp ping proto=icmp ping

You will find a complete and hopefully up to date list of available health checks with their arguments in the reference part of this documentation!

Pointing the agent to your server

Two special pseudo-checks in the uxmon-net file point your agent to the server(s) that status reports should be sent to: the bsdisplay and bbdisplay checks. The line

myserver bsdisplay

for instance will force the agent to send status information to the server myserver where myserver talks the Big Sister protocol. You can use uxmon in conjunction with a Big Brother server by changing the above line into

myserver bbdisplay

In this case the agent will suppress any non Big-Brother feature and use the Big Brother protocol to talk to the server (see figure 2.2)

Of course multiple bsdisplay / bbdisplaylines may appear in uxmon-net. In this case the agent will report its status information multiple times to (potentially) different servers.

As one would expect the bsdisplay pseudo-check accepts a few arguments, e.g. the line

myserver fqdn=no bsdisplay 

will report status information to myserver by stripping domains from all the host names.

Other useful arguments which are relevant if you want Big Sister to keep statistics on performance data and are listed in section ??.

Check frequencies

By default uxmon runs checks and reports information in 5 minutes intervals. Any-way, some checks might put some load on the target system, or are of no short-term relevance and you prefer to run them less frequently. Other checks might be of extreme importance and should be run more often. For such occasions you can define your own check frequencies. Every check (including the bsdisplay and bbdisplay pseudo-checks) accept the special argument frequency. E.g. the uxmon-net line

localhost frequency=180 metastat

importantmachine frequency=1 ping

will run the metastat check against the local machine every 180 minutes while the ping check is run every minute.

[Note]Frequencies

Note that the argument name frequency is a little bit misleading - it is not really a frequency but rather the time interval in minutes between two runs of a check. Big Sister 1.00 or later supports fractional frequencies making a configuration like frequency=0.5 valid.

When defining check frequencies keep some rules in mind:

  • Running a check more often than the fastest bsdisplay pseudo-check is senseless. Check results will only be reported during bsdisplay runs, not necessarily immediately after each check s run. So the above example only makes sense if you have got for instance the following uxmon-net:

    localhost frequency=180 metastat 
    importantmachine frequency=1 ping 
    myserver frequency=1 bsdisplay

  • You must run bsdisplay checks at least every 10 minutes. The server relies on the agents to report their status rather frequently. If no status information is coming in for 15 minutes the server will assume the agent or communication to the agent is dead and will set status to purple (no report).

  • The above rule does not apply to other checks. Even if you run certain checks only every few hours the bsdisplay check will report the result of the last check run. So the server will not change status to purple.

Defaults

Proceeding to more complex uxmon-net files you will probably get bored by repeatedly listing the same check arguments again and again. Fortunately uxmon supports setting global defaults for certain arguments. For instance the configuration:

localhost frequency=2 type=ext2 diskfree  
localhost frequency=2 memory 
localhost frequency=2 procs=sendmail procs fileserver frequency=5 nfs
myserver frequency=2 bsdisplay

can be simplified to

DEFAULT frequency=2 ALL 
DEFAULT type=ext2 diskfree

the DEFAULT statements in this example do

  • set the default check interval for all (ALL) the checks to 2 minutes

  • set the default type for all diskfree checks to ext2

Of course you can override defaults by just listing an argument with a non-default value as usual. For instance in the example above the interval for the nfs check is explicitly set to 5 minutes overriding the default interval of 2 minutes.

Using self contained healthcheks: testers

As of Big Sister rev 0.98 so called "self contained" healthchecks do exist. Self contained healthchecks are coupled with the testers command which can be accessed from the command line interface.

The testers command is used to query self contained healthcheks for the syntax and arguments available on the system where the command is issued. By default, the output is given in ascii text. Depending on the commandline options and software installed on your system, XML and man page are also valid output formats.

The testers command is located under the {bigsister-root}/bin/ directory. Please read the section on testers in the reference part of this manual for a detailed description.