The Grid Engine Testsuite



Who should read this document

Objectives of the Testsuite

Security Implications

Usage Outline

Configuration
     binary-path.conf
     gid-range.conf
     loadsensor.conf
     local-spool.conf
     Setup Configuration

Internal testsuite illustration
     Structure
     Path/File variables
     Test system specific variables

Implementation Approach

About TCL
     Syntax
     Procedures
     Lists
     Srings
     Arrays
     Info command
     Catch command
     Upvar command
     Procedure lists
     File I/O
     more File commands
     Environment variables  Menu

About Expect
     expect
     send



 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 


Who should read this document

This document should be read by Grid Engine developers and by engineers being attached to quality assurance for the Grid Engine software.  You should read it if you want to test the functionality of a Grid Engine version, especially after enhancements have been made. If you are going to make enhancements, you also should look for ways to include tests for the capabilities you have added or modified. Finally, if you are fixing bugs, we ask you to consider adding a test which would have found each bug you are fixing, thereby ensuring that bugs do not reoccur.

The testsuite is written in TCL and Expect, so this document also contains a brief introduction of the  scripting language and the expect extensions allowing you to enhance the testsuite even if you are not familiar with TCL or Expect.
 
 


Objectives of the testsuite


Testing a complex client server software with many options and features like Grid Engine can be a very time consuming and error prone task. It certainly can be considered impossible to guarantee that a system like Grid Engine can be tested to be absolutely bug free via any kind of tests, be they automatic or manual. However, the testsuite presented here is a big step towards providing a considerable level of confidence in the quality of the software in such a short time and with so much convenience that tests can be run overnight - every night, if desired.

On the more practical level, the testsuite aims at fulfilling the following objectives:

1) Previous release features should also run and work correctly in new software releases.
2) Well known bugs should not re-emerge in newer software releases.
3) Automated tests help to save time & money.
4) Daily builds on different hardware and operating systems help to detect platform problems earlier.
 


Security Implications

The testsuite has three run modes:
 

User mode

The testsuite will install the Grid Engine / Grid Engine Enterprise Edition using a nonpriviledged user account. This means no root access is used. The testsuite will run with the permissions of the user who starts the testsuite. This mode can be reached, when the user who starts the testsuite doesn't enter the root password and just presses return, when he is asked for it at startup. Not all tests will run in this mode.

Root User mode

The testsuite will install the Grid Engine / Grid Engine Enterprise Edition as root user. The user who start the testsuite have to enter the root password (which should be the same on all cluster hosts). The password is stored into a global variable and automatically sent to any login-request when root access is needed. In this mode all tests are available. The testsuite is written in a script language, so e.g. if any unauthorized persion is changing the script and dumping the root password to a file, he has full access to your system. Use this mode only for a isolated cluster behind a firewall.

SSH mode (secure shell mode)

The testsuite will execute ssh -l root <hostname> to have root access on a certain host. The ssh-keys must allow the testsuite user to become root without typing the root password. This means the testsuite user can become root when the system admin adds the testsuite user public key to his "authorized_keys" file - be careful when you configure the ssh accounts! This is a security problem! Please consult the SSH manual for more secure shell setup information. In this mode all tests are available and no interactive question must be answered.
For best security and full test access the SSH mode is suggested.
 
 


Usage outline

There is a testsuite framework script which allows you to setup, configure and operate your test environment interactively. You have to have Tcl/Tk 8.3.2 (or higher) and Expect 5.32.2 (or higher) installed properly at your site before you can start the framework script.

The framework script"check.exp" can be found in the testsuite root directory, which is located in gridengine/testsuite after checking out the Grid Engine source code. It also contains all other testsuite source code. You have to make sure that this directory can be reached by any host in the test cluster you intend to use. Such a directory normally will have to reside on a filesystem shared via NFS.

To invoke the framework script help output, change to the testsuite root directory and type the following command:
 
 

[24] expect check.exp help
[25] usage: expect check.exp [options]

options are:
help          show this
install       just install core system and exit
setup         remove defaults file and run setup
file FILE     use FILE as defaults file
re_init       use allready installed system (will only shutdown/reconfigure the cluster!)
              when test install_core_system is called or install option is set
compile       checkout source code and recompile
all RUNLEVEL  run every test automatically up to runlevel RUNLEVEL
              (RUNLEVEL is a value from 0=short tests, up to 4=week tests)
no_main       don't run main part (usefull for sourceing this file)
test_perm     test remote shell and user permissions on each cluster host
debug LEVEL   run testsuite in debuglevel 0,1 or 2
              0=no debug, 1=more output, 2=1+user response
quiet         no output on setup
 

Before starting the framework test script, a setup of the configuration files ( binary-path.confgid-range.confloadsensor.conf  and  local-spool.conf  ) is neccessary. The framework test script needs also a cluster configuration settings file, called defaults file. When the framework test script ist started the first time and no defaults file was found (or defined with the "file" option parameter), the testsuite will run the "setup" option parameter. The user has to setup the cluster configuration settings, after that the answers are stored in a file called "defaults.sav" (defaults file).

The information stored in the defaults file is specific to a particular test environment and may change whenever you decide to test another version of Grid Engine or when you change the test cluster, for example. More than one defaults file containing the above information can be maintained. The testsuite framework script option "file" can be used to select the defaults configuration.

In order to let the testsuite run via an cronjob table entry (this means update the code, build the binaries, pre-install the binaries, install the cluster software, run some/all tests) it is necessary to install the ssh (secure shell) functionality on each execution/master host. The ssh option serves as a bypass for interactive login sequences asking for the root password. This is a security issue! Please refer to the section "Security Implications" below.

After setting up the configuration, the setup procedure tries to resolve the given hostnames by using programs being part of the Grid Engine binary set. On the first startup, this resolving will fail because the necessary utility binaries will not have been compiled by then. If this is the case you should confirm the correctness of the setup with "y" nevertheless. After that, the testsuite should be re-started again using the "compile" option. After completion of the compile step, the setup procedure will be invoked again utilizing the previously defined parameters, resolving the missing hostnames and storing the completed configuration.

Now the user can start tests.

A detailed step by step course for a testsuite setup can be found at  Setup Configuration .
 


Configuration

Before starting any test runs it is necessary to setup the testsuite configuration. The testsuite root directory contains a subdirectory called "demo_config_dir". These files contain a default configuration. Edit the configuration files as described below.
 

binary-path.conf
The testsuite is using full binary path names to start programms. Here the user can set the correct binary pathes for every cluster host. The testsuite needs two binaries at this time: expect and vim. The expect version should be 5.31.5 or greater. The vim binary path can also point to a vi binary.

The first column of each line is the hostname on which the binary exists. The entry "all" can be used to setup a binary for every host in the cluster.

The second column is the binary name used from the testsuite to resolve the correct path.

The third column is the binary the testsuite should use instead of the name in the second column. The full path is expected. The testsuite is using its own architecture variable. You can use "$ARCH" to insert the architecture name on the host (given at the first column) in the path. (The $ARCH values of your hosts are shown after compiling your cluster software)

examples:
 

all expect /vol2/TCL831/$ARCH/bin/expect
all vim /vol2/tools/common/$ARCH/bin/vim
expo1 expect /vol2/TCL831/solaris64/bin/expect
expo1 vim /usr/bin/vi


gid-range.conf
Each Grid Engine Cluster must have it's own gid range for it's jobs. The gid  range is used to attach a certain gid to the processes of a job to control this job. Please look at the cluster software documentation for more explanation about gid ranges.
The testsuite will append automatically any new system that is started by the testsuite to this file and generate an default gid-range. So no manual edit of this file is necessary.
A cluster is identified by the login of the user starting the testsuite and the COMMD_PORT number defined for the cluster.



loadsensor.conf


This configuration file is used to set the path to eventually used loadsensor files. The execution daemon on some architectures need an external loadsensor binary to get the actual load values of its host. The testsuite will install the loadsensor automatically when the loadsensor.conf file has an entry for this architecture.

The first column of each line is the architecture string for which a loadsensor should be installed.

The second column is the full path to the loadsensor.

examples:
 

aix42 /apath/aix42/qloadsensor.aix42
aix43 /apath/aix43/qloadsensor.aix43

For more information about loadsensors, please look into your cluster software documentation.



local-spool.conf
The "local-spool.conf" file must be generated manually. This file defines the local spool directory of each host entry (if no file is found by the testsuite, no local spool directories will be used). The first column in the file is a host name (where an execution or master daemon should be installed) and the second column is the path to a local directory on this host. In this directory the testsuite will generate a subdirectory for the COMMD_PORT number of the new cluster system, tested by the testsuite. In the subdirectory of each number resides the local spool directory of the execution and/or master daemon.

The location of the spool directories is per default in the cell definition directory where the cluster software is installed (e.g. /GE/default/spool/$hostname or /GE/default/spool/qmaster). This is a global shared directory (NFS). If the network traffic should be reduced it is usefull to set a local spool directory for each execution host.

examples:
 
 

expo1 /usr/tmp/testsuite
expo2 /scratch/testsuite


Setup configuration
After the configuration of the testsuite configuration files, a typical first startup would look like this:
 
[38] cd [your testsuite root directory]
[39] expect check.exp

After typing return the testsuite is looking for the defaults.sav file. If it doesn't exist it will call the setup procedure. After that a vi (vim) is started and the user can change the default values:

 
Cluster configuration settings parameter (defaults.sav)
Description
testsuite_root_directory
Path to the testsuite root directory. This directory must be available from all hosts in the cluster. Default is the current working directory, when starting the testsuite framework script.
testsuite_config_dir
Path to the configuration files (binary-path.conf, gid-range.conf, loadsensor.conf,
local-spool.conf). This directory must be available from all hosts in the cluster.
checktree_root_directory
Path to the checktree. "checktree" is per default a subdirectory of testsuite_root_directory and it conaints all tests. Every "check.exp" file in each subdirectory is interpreted as test. This directory must be available from all hosts in the cluster.
results_root_directory
Path to a directory where the results of each test are stored. Each defaults.sav file should have its own results_root_directory. The lockfile of a testsuite run is also stored into this directory. So if you want to run more than one testsuite at the same time for different clusters it is absolutely necessary to create a results_root_directory for every cluster. This directory must be available from all hosts in the cluster. (The performance tests will save their results in the subdirectory protocols).
qmaster_host
The host name of the qmaster host of the test cluster (This must be the host where the testsuite is started on). 
list_of_execution_hosts
A blank seperated list of execution host names. The first entry is the name of the qmaster host, because on the qmaster host also an execution daemon is installed.
processors_on_each_execution_host
A blank seperated list of numbers. Each number matches the number of processors on each execution host, named in the list_of_execution_hosts parameter.
commd_portnumber
The port number used for TCP/IP connection of the communication daemons. (COMMD_PORT environment variable)
product_type
Type of the cluster software. "sge" for a Grid Engine system and "sgeee" for a Grid Engine Enterprise Edition system (default is "sge").
product_root_directory
Path to the directory where the cluster software should be installed. This directory must be available from all hosts in the cluster. (This is the $SGE_ROOT directory)
source_root_directory
Path to the directory with the source code of the cluster software (used for the compile option). The compile option will CVS update the source code in the "gridengine/source" directory (use "compile no_update" to skip the CVS update).
cvs_hostname
The host where the "cvs update" command should be started (rsh access required).
cvs_release_tag
The cvs release tag of the source (default is "maintrunc"). Use "cvs log" comand to find out more about available tags.
list_of_compile_hosts
A blank separated list of host names where the cluster software should be compiled on (rsh access required). 
first_foreign_system_username
The user name of an additional cluster user. This should not be the user name of the testsuite user (who starts the testsuite). Some tests will check whether the sheperd daemon will set the correct user account.
second_foreign_system_username
The user name of a second additional cluster user. This should not be the user name of the testsuite user (who starts the testsuite). Some tests will check whether the sheperd daemon will set the correct user account.
first_foreign_system_groupname
The group name of an additional cluster user. This should not be the group of the testsuite user (who starts the testsuite). Some tests will check whether the sheperd daemon will set the correct user group.
second_foreign_system_groupname
The group name of a second additional cluster user. This should not be the group of the testsuite user (who starts the testsuite). Some tests will check whether the sheperd daemon will set the correct user group.
default_domain
Used at installation time of the cluster. The value for default_domain (look into cluster documentation for more information about that). Default is "none" (recommended).
mailx_hostname
The testsuite can send error, warning and report mails. It will use the mailx binary to do this. mailx_hostname is the host name on which the mailx binary should be used (rsh required).
mail_to
E-Mail address of the recipent of error-, warning- and report mails. (Default is "none", this means mail sending is disabled)
mail_cc
CC addresses for mail reports. (Default is "none")
enable_error_mails
"1": the testsuite will send error- ,warning- and report mail messages.(default)
"0": the testsuite will only send report mail messages.
max_run_all_mails
This is the maximum error/warning count for mail sending. Not more mails than the value of max_run_all_mails is set to are sent in one testsuite "run all test" run (Default is 400).
use_ssh
"1": use ssh for login
"0": use expect and send password on "login:" question. (default)


After setting the correct values exit the vi with ":wq". The testsuite will try to resolve the hostnames and will report some errors at the first startup:
 

"architecture or file "<product_root_directory>/util/arch" not found",
proc reslove_host - gethostname error or file "<product_root_directory>/utilbin/unknown/gethostname" not found

This is because the system is not compiled and installed at this time. The user has to review the settings and answer the question with "y" if everything is ok.

If no installed system was configured the following screen display should appear:
 
 

===============================================================================
 system version    :  system not installed - run compile option first
 current dir       :  /homedir/myusername/docu_test_src/testsuite/checktree
===============================================================================
 max. runlevel     :  short medium long week
 selected runlevels:  short medium long day week
===============================================================================
  1 test(s) available in subdir: functional
  1 test(s) available in subdir: install_core_system
  1 test(s) available in subdir: performance
 10 test(s) available in subdir: system_tests
===============================================================================
 13 test(s) available in current subdirs
===============================================================================

  (0) select runlevels
  (1) change dir
  (2) run not completed tests (including subdirectories)
  (3) show test descriptions
  (4) exit (press ^C to exit without shutdown of the cluster)
  (5) show completed test list
  (6) show not completed test list
  (7) reset completed test list (for all subdirectories)
  (8) create check report
  (9) run all tests continuously
 (10) use file "unknown.checklog" for output

The first line in the main menu shows the installed cluster software version or the reason why the the system version can not be found. The next line displays the actual path in the checktree. The max. runlevel line shows all runlevels used in tests implemented in the subdirectories. The different runlevels are described in the chapter "Structure". Per default all possible runlevels are selected. This means that all possible tests will run when choosing e.g. the 2nd menu point.

(0) - select runlevels:
The user can select the runlevels he wants to run. When choosing "select runlevels" the following screen display will appear:
 

===============================================================
selected runlevels:  short medium long day week
===============================================================
 

please select/unselect new runlevels: 

(0) short  (   0 min - 15 min / run level   0 -  99 )
(1) medium (  16 min -  1 h   / run level 100 - 199 )
(2) long   (   1 h   -  4 h   / run level 200 - 299 )
(3) day    ( > 4 h   - 24 h   / run level 300 - 399 )
(4) week   ( >24 h            / run level 400 - 499 )

(5) return to previous menu

(1) - change dir:
Select this to walk through the checktree structure in order to select a single test or a special subtree.

(2) - run not completed tests
This menu point will run all selected runlevel tests in each subdirectory. When a test starts the framework will generate a file called "testsuite_lockfile". This is to make sure that only one test will run at the same time.

(3) - show test descriptions
Shows all test descriptions (from all tests in the subdirectories)

(4) - exit
Shutdown the tested cluster system and exit. If no "install_core_system" test was run before, some error output may appear (If the cluster system should not get a shutdown, the testsuite framework can be stopped with "CTRL + C" in the main menu).
Each test will try to write a new file called "testsuite_lockfile" in the check result directory. This should prevent running more than one test at the same time. While a testsuite_lockfile is existent no test will run.)

(5) - show completed test list
Show the list of successfully completed tests

(6) - show not completed test list
Show the list of unsucessfully or uncompleted tests

(7) - reset completed test list (for all subdirectories)
Set the status of all tests in this and in all subdirectories to uncompleted

(8) - create check report
Creates a good and a bad result report for the host running the testsuite in the test result path. The testsuite will report the exact position of the report files.

(9) - run continuously
Like (2), but creates a report at the end of all tests and starts up again

(10) - use logfile for output (HOST.checklog)
This will put all the output from the testsuite framework into a file called HOSTNAME.checklog. The file is stored in the test result path.

If the testsuite is showing "system not installed - run compile option first" at the system version line the user can press "CTRL - C" to stop the testsuite and restart the testsuite with the command:
 

[74] expect check.exp compile

The testsuite will compile the cluster software on each host in the list_of_compile_hosts. After that the binaries are installed at the product_root_directory. After the installation of the binaries the testsuite will resolve each hostname and complete the setup procedure which had errors at the first run. If no errors occur the testsuite can be restarted. The screen should show the following display:
 

===============================================================================
 system version    :  system not running - run install test first
 current dir       :  /homedir/myusername/docu_test_src/testsuite/checktree
===============================================================================
 max. runlevel     :  short medium long week
 selected runlevels:  short medium long day week
===============================================================================
  1 test(s) available in subdir: functional
  1 test(s) available in subdir: install_core_system
  1 test(s) available in subdir: performance
 10 test(s) available in subdir: system_tests
===============================================================================
 13 test(s) available in current subdirs
===============================================================================

  (0) select runlevels
  (1) change dir
  (2) run not completed tests (including subdirectories)
  (3) show test descriptions
  (4) exit (press ^C to exit without shutdown of the cluster)
  (5) show completed test list
  (6) show not completed test list
  (7) reset completed test list (for all subdirectories)
  (8) create check report
  (9) run all tests continuously
 (10) use file "ahostname.checklog" for output

Nearly all tests need a running cluster software system. The test called "install_core_system" will install it. So mostly all tests have the dependency "install_core_system" which means that the test won't run until the test "install_core_system" was successfully performed. Initially, we haven't installed the core system, so we can only run tests that don't need it. Change to the directory "../checktree/system_tests/utilbin/loadcheck". The following lines will appear:
 

===============================================================================
 system version    :  system not running - run install test first
 current dir       :  /homedir/myusername/docu_test_src/testsuite/checktree/sys
                      tem_tests/utilbin/loadcheck
===============================================================================
 max. runlevel     :  short
 selected runlevels:  short medium long day week
===============================================================================
===============================================================================
  0 test(s) available in current subdirs
===============================================================================

  (0) select runlevels
  (1) change dir
  (2) run not completed tests (including subdirectories)
  (3) show test descriptions
  (4) exit (press ^C to exit without shutdown of the cluster)
  (5) show completed test list
  (6) show not completed test list
  (7) reset completed test list (for all subdirectories)
  (8) create check report
  (9) run all tests continuously
 (10) use file "myhostname.checklog" for output
 (11) show test descriptions of local test
 (12) run local test
 (13) run local test continously

A successful run (no. 12 + enter) of the current test should have following output:
 

running local test ...
deleting old temp script files ...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>> loadcheck
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

-----------------------------------------------> enter check level 0
waiting for lock ...
starting test functions (runlevel is 0)...

>>>>>>>>>>>>>>>>>>>>>>
check_numb_proc
runlevel: short(0)
>>>>>>>>>>>>>>>>>>>>>>
expo1: Expected processors: 1
expo1: read processors:     1
S U C C E S S F U L L Y performed "loadcheck" in run level 0 !
saving results for loadcheck (level 0) ...
<results_dir>/expo1.completed
<results_dir>/expo1.uncompleted
Looking for creation of the file "<results_dir>/expo1.uncompleted/loadcheck.res.0" ...
Looking for deletion of the file "<results_dir>/expo1.uncompleted/loadcheck.res.0" ...

set state of "loadcheck" for level 0 to completed !
removing lockfile "<results_dir>/testsuite_lockfile"
testsuite unlocked!
--> mail_report - no mail address

press enter...

Now lets install the the cluster software. Go to the directory ../checktree/install_core_system and start the test. After starting the test the user is asked for the root password. This is because the install_core_system test needs root access. The root password is stored in a global variable, so if it was entered once the testsuite will not ask again for it until the next restart of the whole testsuite. This is of course a security problem, look at "Security Implications" for more information. If the user doesn't enter the root password the testsuite can not run all tests. After running the install_core_system test the testsuite screen output should look like that:
 
 

===============================================================================
 system version    :  Grid Engine 5.3
 current dir       :  /homedir/myusername/docu_test_src/testsuite/checktree/ins
                      tall_core_system
===============================================================================
 max. runlevel     :  short
 selected runlevels:  short medium long day week
===============================================================================
===============================================================================
  0 test(s) available in current subdirs
===============================================================================

  (0) select runlevels
  (1) change dir
  (2) run not completed tests (including subdirectories)
  (3) show test descriptions
  (4) exit (press ^C to exit without shutdown of the cluster)
  (5) show completed test list
  (6) show not completed test list
  (7) reset completed test list (for all subdirectories)
  (8) create check report
  (9) run all tests continuously
 (10) use file "ahostname.checklog" for output
 (11) show test descriptions of local test
 (12) run local test
 (13) run local test continously


Internal testsuite illustration

This chapter shows the internal structure of the testsuite. It is recommended to read the chapter "About Tcl" and About "Expect" to get a short insight into the programming language before reading this.

The chapter "Implementation approach" later will show exactly how to integrate a new test by writing a short test example.



Structure
The testsuite's framework is written in Tcl (Tool command language) and its Expect  (A Tcl-based toolkit for automating interactive programs) extensions. In order to make it simple to integrate new test procedures, each test has its own directory in a hierarchy structure called "checktree". Each directory can have a file named "check.exp", which is a Tcl/Expect script containing one or more test procedures. These scripts are sourced from the testsuite's framework script which also calls the test procedures.

The subdirectory "install_core_system" from the "checktree" directory is used for the installation and set-up of the cluster system to test.
The framework script will parse the directory structure and display all available system tests.
 

Each test script must setup the following variables:
 
 

# globals to define:
global check_name check_description check_needs
global check_functions check_errno check_errstr
global check_init_level_procedure check_highest_level
global check_root_access_needs

# this is check specific
set check_root_access_needs "yes"
set check_name       "qconf"
set check_description(0)  "test qconf -aq and qconf -dq switch"
set check_needs      "init_core_system qstat qsub"
set check_functions   "addqueue submitjob checkjob removequeue"
set check_errno      "-1"    ;# 0 -> OK , != 0 means error
set check_errstr     "was never running";# string for error description
set check_init_level_procedure "onlevelchange"
set check_highest_level  0

The initial value "-1" for the variable "check_errno" is shown for the sake of completeness (The testsuite will set it automatically to "-1", when starting a new test). A test is stored as "successfully performed" when the value of "check_errno" is "0". So if a test does not explicitly set "check_errno" to "0" the test will never run successfully. This strategy prevents that a test programmer forgets to set the correct result state.

check_root_access_needs
This variable should be set to "yes" when the test needs root permissions. If the variable is not set or set to "" the testsuite will not ask for the root password. Once the root password is known the testsuite won't ask again for it.

check_name
Unique name of the test. Each "check.exp" file must have it's own name.

check_description
Short verbal test description array for each runlevel. Each array address (number in brackets) should contains a short string which describes the test runlevel.

check_needs
The name of the tests which should run before this one.  This is to define dependencies on other tests.

check_functions
This is a space seperated list of procedure names. Each procedure is called by the framework script in the given order. Each procedure should perform a test and at last set the "check_errno" and "check_errstr" variables.

check_errno and check_errstr
Any test procedure can set the values check_errno and check_errstr. If a procedure successfully performed  its test, it should set check_errno to "0" and put a comment like "no errors" into check_errstr. If check_errno is not set to "0" the framework script will report the check_errstr content and store the test in the directory for failed checks. The following table will show the different meanings of the check_errno value.
 

$check_errno
meaning
0
no errors
-1
error, but testsuite will run till end
-2
error, stop current test
-3
warning message (stop at next runlevel)

It is very important that the check_errstr variable will report the verbal error reasons, because the testsuite will show this reasons to the testsuite user. If the check_errstr is accurately described, the user will easier find an error.

The programmer can set the two variables by using the procedure set_error. The first parameter is the number for the check_errno, the second parameter is the error text for check_errstr.

check_init_level_procedure
The testsuite can run each test in a different check level. The lowest check level number is 0. Each check can define its own highest check level (variable check_highest_level). If the test programmer defines a check_init_level_procedure the testsuite framework will call this procedure before running each procedure in check_functions. So the programmer can change some values for each level test run. The global variable CHECK_ACT_LEVEL shows the current run level of the testsuite.

The check_init_level_procedure must return "-1" if a check level is not implemented or "0" when the check can run the level. Each test script must have at least a run level "0" to test its functional behavior.

If the test does not define check_highest_level and check_init_level_procedure the testsuite will only call run level 0.
 

check_highest_level
If a test can run with e.g. 2 different setups the check_highest_level should be set to 1. Of course this makes only sense when a check_init_level_procedure is defined.

The two main characteristics of a test are its runtime and its category. The test category is defined by the position in the checktree hierarchy.

The tests runtime is defined by the use of different run levels. The table below shows a guideline for test levels:
 

run level
test time
0-99
0-15 min.
100-199
16 min. - 1h
200-299
1h - 4h
300-399
4h - 24h
400-499
> 24h

File and path variables
 
CHECK_TESTSUITE_ROOT path to the testsuite directory
CHECK_CHECKTREE_ROOT path where checktree resides
CHECK_RESULT_DIR check result directory for host specific tests which had no errors.
CHECK_BAD_RESULT_DIR check result directory for failed host specific tests or such tests which wasn't started.
CHECK_CORE_RESULT_DIR same meaning as $CHECK_BAD_RESULT_DIR, but not host specific. Some tests like the install_core_system test are effective for the whole cluster. 
CHECK_CORE_BAD_RESULT_DIR same meaning as $CHECK_BAD_RESULT_DIR, but not host specific. Some tests like the install_core_system test are effective for the whole cluster. 
CHECK_PROTOCOL_DIR directory for test protocols, log files, etc.
CHECK_ACTUAL_TEST_PATH directory of currently running test
CHECK_PRODUCT_ROOT path to Grid Engine system ($SGE_ROOT)
CHECK_OUTPUT output channel ( e.g. stdout, filepointer) for test debug output (puts $CHECK_OUTPUT "error ..")
CHECK_CURRENT_WORKING_DIR current working directory ( same as echo PWD)
CHECK_SPOOL_DIR_CONFIG_FILE file for local spooldirs in $CHECK_CONFIG_DIR path
CHECK_BINARY_DIR_CONFIG_FILE file for binary directories in $CHECK_CONFIG_DIR
CHECK_LOADSENSOR_DIR_CONFIG_FILE file for architecture specific loadsensor binaries in $CHECK_CONFIG_DIR
CHECK_CONFIG_DIR global configuration testsuite directory
CHECK_TCL_SCRIPTFILE_DIR testsuite's subdir for tcl files 
CHECK_SCRIPT_FILE_DIR testsuite's subdir for script files
CHECK_DEFAULT_DOMAIN default domain name for install_core_system test 

Test system variables
CHECK_PRODUCT_VERSION_NUMBER
version string of the system to test ( first line of qstat -help)
CHECK_PRODUCT_TYPE
"sge" or "sgeee"
CHECK_COMMD_PORT
equal to $COMMD_PORT
CHECK_ARCH
system architecture (e.g. solaris64)
CHECK_USER
user who started the test
CHECK_GROUP
group of user who started the test
CHECK_HOST
system hostname (e.g. etna5 )
CHECK_CORE_EXECD
known execd hostnames
CHECK_CORE_MASTER
qmaster hostname
CHECK_CORE_PROCESSORS
number of processors on each execution host (in order of $CHECK_CORE_EXECD)
CHECK_ACT_LEVEL
actual check level ( value from 0 up to xxx )
CHECK_SOURCE_DIR
path to source code directory (gridengine/source)
CHECK_SOURCE_HOSTNAME
name of host with cvs installed on
CHECK_SOURCE_COMPILE_HOSTS
list of hostnames on which the source should be compiled
CHECK_FIRST_FOREIGN_SYSTEM_USER
other system user for e.g. submitting jobs
CHECK_SECOND_FOREIGN_SYSTEM_USER
other system user for e.g. submitting jobs
CHECK_FIRST_FOREIGN_SYSTEM_GROUP
other system group
CHECK_SECOND_FOREIGN_SYSTEM_GROUP
other system group
CHECK_MAILX_HOST
host where the mailx binary is available
CHECK_REPORT_EMAIL_TO
mail reports/errors to this email-account
CHECK_REPORT_EMAIL_CC
cc reports/errors to this email-accounts
CHECK_USE_SSH
enable/disable use of ssh (0 = disable), default is "disable"
CHECK_ADMIN_USER_SYSTEM
0=root password is set/ 1= no root password set
There are some more global definitions, but they are not so important for test programmers. Please refer to the comment at the beginning of the "check.exp" file in the testsuite's root directory for all global definitions.
 
 


About Tcl

This chapter should give a small summary of the relevant Tcl language syntax. The following lines are original Tcl command line Syntax. A ";#" within a line is interpreted as comment, as well as a "#"  at the start of a line.


Tcl Syntax
# here some Tcl commands:

set x "hello world!"  ;# set variable x to hello world! (" are omitted)
 

# a semicolon (;) is used to separate commands
# commands between "[" and "]" brackets are interpreted. The command before "["
# will get the result.
# the next three commands have the same result (3 x "hello world!")

tcl> puts $x ; puts [set x] ; puts stdout [set x];
hello world!
hello world!
hello world!
tcl>

# if you want to calculate, use the expr command:
puts [expr ( 1 + 4)]       ;# Output: 5
puts [ expr ( [expr (1 + 4 ) ] + 4 ) ] ;# Output: 9
 

# values between "{" and "}" brackets are not interpreted
puts "hallo $x"    ;# output: hallo hello world!
puts {"hallo $x"}  ;# output: "hallo $x"
 

# while loop and incr example
# use incr to increase or decrease a variable
set count 0
set backcount 0
while { $count < 5} {
  incr count 1
  incr backcount -1
  puts "$count,$backcount"
}

>1,-1
>2,-2
>3,-3
>4,-4
>5,-5
 

# for loop
# initialization, stop criteria, increment
for {} {1} {} {      ;# endless loop
}

for {set count 2} { $count > 0 } { incr count -1 } {
  puts "count is \"$count\""
}

>count is "2"
>count is "1"
 

# else ... (like "C")
if { $a == 1 } {
  puts "a is 1"
} else {
  puts "a is not 1"
}
 

# switch
switch -- $count {
  0       { puts "count is 0" }
  default { puts "default" }
}
 

# break; continue; have the same meaning in loops as in the "C" language



Procedures
You can define procedures with the following syntax:
 

proc test { name } {
  puts "name is \"$name\""
}

tcl>test me
name is "me"

Once, a procedure is declared (or sourced) you can use it. The source command is used to load a tcl script file into the name space. Once a tcl script is sourced every procedure in it is available:

# the source command:
source /tmp/myprocedures.tcl

Procedures can be called recursively. Values can be returned with the return command. You can use a global variable to stop the recursive walk.
 

# simple recursive procedure
 

global runs       ;# define the global variable runs

proc callme { call } {
   global runs    ;# you want to acces the global variable runs
   puts "this is call $call"
   set back ""
   if { $call < $runs } {
      set back [callme [ expr ( $call + 1 ) ]]
      return $back
   } else {
      return $call
   }
}

The output will look like follows:

tcl> set runs 5
tcl> callme 0
this is call 0
this is call 1
this is call 2
this is call 3
this is call 4
this is call 5
5
tcl>

A script can stop its run with the exit command:

exit 1  ;# stop script with exit code 1
 

Tcl has powerful list procedures. A list is a string with separate entries. Tcl uses {...} to separate each entry if more than one space character is in it.

# define a list with 3 entries
set list "hello you {     out there }"

# show list entry no. 3
puts [ lindex $list 2 ]   ;# start from "0"

The output will look like follows:

>   out there
 

Here some primary list procedures:

llength "abc d"      ;# returns "2"
lindex  "a b c" 0     ;# returns "a"
lrange  "a b c" 0 1    ;# returns "a b"   (sublist)
 

put out whole list with the foreach command:

foreach elem $list {
  puts $elem
}

>hello
>you
>   out there



Procedures to manipulate lists


append a list to another:

set new_list [ concat $list $list]
 

print out the new list:

puts $new_list
>hello you {     out there } hello you {     out there }
 

append element to list (this manipulates the list directly), don't use the "$" sign

lappend list "what's up"

puts $list
>hello you {     out there } {what's up}
 

insert element into a list:

# 3 is the index of where the "." should be
linsert $list 3 "."
>hello you {     out there } . {what's up}
 

replace one or more elements:

# this replaces the elements 3 to 3 with !
lreplace $list 3 3 "!"
>hello you {     out there } ! {what's up}
 

search elements (wildcards are allowed until switch -exact):

lsearch $list "?ou"     ;# returns 2
lsearch -exact $list "you"   ;# returns 2
 

sort elements:

# sort switches are -integer -real -decreasing -increasing

set list "1 5 6 4 3"

>lsort -increasing $list ;# (default)
1 3 4 5 6

>lsort -decreasing $list
6 5 4 3 1

> set list "1.5 1.3 1.4 2.3"
1.5 1.3 1.4 2.3

> lsort -real $list
1.3 1.4 1.5 2.3

> lsort -integer $list
expected integer but got "1.5"
 

split and join lists:

split "1:2:3:4" ":"
>1 2 3 4

join "1 2 3 4 ":"
>1:2:3:4



String procedures
# example for string compare
if { [ string compare "ja" "ja" ] == 0 } {
   puts "strings are equal"
}

# other commands
string compare   $a     $b            ;# compare strings
string first     "uu"   "uunet.uu"        ;# result 0 (first positon of "uu")
string last      "uu"   "uunet.uu"        ;# result 6 (last position of "uu")
string length    "foo"                    ;# result 3
string index     "foo"  2                 ;# result o
string range     "foo"  1 2               ;# result oo
string tolower   "FOO"                    ;# result foo
string toupper   "foo"         ;# result FOO
string trim      "  foo     do  "         ;# result "foo     do"
string trimleft  "  foo     do  "         ;# result "foo     do   "
string trimright "  foo     do  "         ;# result "  foo     do"



Arrays in Tcl
Arrays in Tcl are variables with "(" ")" brackets behind them. Here some examples:

set uid(0) "root"
set uid(1) "mike"
set uid(2) "ralf"

The term inside the brackets identifies the value and can also be a string:

set uid(root) "0"
set uid(mike) "1"

Arrays can also have more dimensions:

set user(name,0) "mike"
set user(name,1) "mike@balrog"

Get the size of an array and more ...:

array size user    ;# returns "2"
array names user   ;# returns "name,0 name,1"



The info command
With the info exists command you can find out if a variable is visible in the local name space:

info exists list      ;# returns 1
info exists something    ;# returns 0

More info commands (see man page for details) :

info locals
info globals
info vars
info procs
info level       ;# returns the local name space level depth (0 = global level)



The catch command
If a command error appears, Tcl has the default behavior to terminate the script and show the error message including a stack trace. The catch command is used to prevent abnormal script termination in case of an error.

set result [ catch { exec date } output ]

catch returns 0 if the command between the "{}" brackets completed correctly. Otherwise the result variable has the value 1. The output variable will get the output of the command. So if the date program is not found the error information is stored in output and the script will not terminate.



Procedure lists
It is possible to store procedure names in a list and call the procedure referenced in the variable value. The eval command is useful to hack up procedure calls with arguments (e.g. "test 4") in single arguments.

set my_procedures "{test 4} testit"

eval [lindex $my_procedures 0]    ;# call test procedure
[lindex $my_procedures 1]    ;# call testit procedure (without arguments)



The upvar command
The upvar command is used to have access to one of the callers variable name space.

proc print_name {} {
   upvar name l_name
   puts $l_name
}

set name "hello world!"
print_name

>hello_world



File I/O in Tcl
# write in file
set save_list "lothar joachim andy"
set output [open "file" "w"]
puts $output "$save_list"
flush $output          ;# write output immediately to file
close $output

# read from file
set input [open "file" "r"]
gets $input load_list     ;# returns number of bytes read
close $input
puts $load_list
 

# look for end of file
while { [eof $input] != 1 }    ;# eof returns 1 if end of file is reached

# get file names in local directory
tcl> glob *              ;# returns all files matching * (each file)
CVS check.exp readme.txt checktree remoteinstall.csh results

tcl>glob *.txt          ;# returns only readme.txt
readme.txt



Other useful file commands:
file dirname   "/usr/local/bin/xterm" ;# returns "/usr/local/bin"
file tail      "/usr/local/bin/xterm" ;# returns "xterm"
file extension "/usr/local/bin/settings.csh" ;# returns ".csh"
file rootname  "/usr/local/bin/settings.csh" ;# returns "/usr/local/bin/settings"

file isdirectory "results"    ;# returns 1 if "results" is a directory, else 0
file isfile      "readme.txt" ;# returns 1 if file , else 0
file executable  #; for more information look at the man pages ....
file exists
file owned
file readable
file writeable
file size
file atime
file mtime
file type
file readlink
file stat
file lstat



Access to the environment variables in Tcl
The user's environment variables are stored in the array "env":

proc print_PATH {} {

  global env

  puts [set env(PATH)]   ;# a call to set VARIABLE without a value will just
                         ;# return the content of VARIABLE!
              ;# [set env(PATH)] is equal to $env(PATH)

}

#get a list of all set environment variables:
set local_environment [array names env]

#show each value
foreach elem $local_environment {
  puts "$elem is set to [set env($elem)]"
}

>SGE_ROOT is set to /homedir/myusername/test/v502a
>LD_LIBRARY_PATH is set to /usr/sgitcl/lib
>...

#set or change the value of an environment variable
set env(PATH) ".:[set env(PATH)]"   ;# prepend current directory
 


About Expect


The testsuite uses the tcl extensions from the expect toolkit. Important are the commands spawn (start an application), expect (wait for special output from the application) and send (send input to the application).

To start an application under the control of expect use the following procedures:

- proc open_spawn_process {args}

and

- proc close_spawn_process { id }
 

Here is a small example of how to use these procedures:

proc addqueue {} {
  global CHECK_INSTALLED_SYSTEM CHECK_ARCH CHECK_HOST env
 

# set default EDITOR environment to vi (qconf will start the $EDITOR application)
  set env(EDITOR) "vi"
  set id [ open_spawn_process "$CHECK_INSTALLED_SYSTEM/bin/$CHECK_ARCH/qconf" "-aq"]
  set timeout 30   ;# wait max. 30 seconds for qconf's vi start
  expect ".*"
 

# now we can send some vi commands (qconf is starting the vi):
  send ":%s/qname.*template.*$/qname   ${CHECK_HOST}.qtest/\n"
  send ":%s/hostname.*unknown.*$/hostname   $CHECK_HOST/\n"
  send ":%s/load_thresholds.*np_load_avg.*$/load_thresholds   np_load_avg=5.00/\n"
  send ":wq\n"

  set timeout 30

  expect {
      timeout {
         set_error -1 "got timeout"
      }
      "added*\"${CHECK_HOST}.qtest\"" {
         set_error 0 "no errors"
      }
      "queue*exists" {
         set_error -1 "queue \"${CHECK_HOST}.qtest\" already exists"
      }
      default {
         set_error -1 "could not add queue"
      }
  }

  close_spawn_process $id
}


expect
Once an application is started with the open_spawn_process procedure (calls spawn) you can scan the interactive output of the application with the expect command. There are many special characters for pattern searching ($ - end of line, ^ - beginning of line ["Exploring Expect" from Don Libes, O'Reilly & Associates, Inc., page 73 ff. ] describes this in detail):

expect "^log*\n"    ;# look for "log" at beginning of the line followed by
         ;# arbitrary characters and a closing "\n"
 

The expect call above returns when the application outputs something like that:

time:  19
login: tom
usage: 12

When expect gets the second line ("login: tom\n") the expect command returns and the tcl script will continue. All output of the application is stored in the variable expect_out(buffer). All output that matched the expected pattern is stored in the variable expect_out(0,string). Please refer to the man pages of expect for more information.

The timeout variable is useful to interrupt a not matching expression. If timeout is set to a value greater than "0" the expect call will return after $timeout seconds. "-1" means no timeout and "0" returns immediately. timeout is also a keyword for an expect { } case segment.

Other keywords for the expect {} case segment:

timeout      - perform timeout specific action
eof              - perform action on end of file (application terminates)
default       - perform default action



send


If it is neccessary to reply to an application the send command is used. The example above sends some vi command strings.
 
 



 

Implementation approach


In this chapter a simple test is developed and integrated into the testsuite. The first thing to do is to find the correct position for the test in the "checktree". If no test category ("functional", "install_core_system", "performance", "system_tests") is matching a new directory must be created:
 
 

[40] pwd
/homedir/myusername/src/testsuite/checktree
[41] mkdir practice
[42] ls
CVS                  install_core_system  practice
functional           performance          system_tests
[43] cd practice/
[45] mkdir simple
[47] cd simple
[49] vi check.exp

A "check.exp" file must define some global variables:
 
 

# define global variable in this namespace
global check_name
global check_description
global check_needs
global check_functions
global check_errno
global check_errstr
global check_highest_level
global check_init_level_procedure
global check_root_access_needs
 

# no, we don't need rood access
set check_root_access_needs ""

# define a level initialization procedure:
set check_init_level_procedure "init_level"

# define test's name and run level descriptions
set check_name       "simple_test"
set check_highest_level      5
set check_description(0)     "default runlevel test"     ;# run level 0 description
set check_description(5)     "another simple dummy test" ;# run level 5 description

# define test's dependencies
set check_needs       "" ;# just a list of "check_name" names of other test

# define test's procedure order
set check_functions      "create_file"
lappend check_functions      "check_time"

The example above defines a test which has a run level initialization procedure ("init_level"), two run levels (0 and 5), without any dependencies and two test procedures ("create_file" and "check_time").

The test programmer has to implement now the three procedures "init_level", "create_file" and "check_time".
 

Procedure "init_level":
 

# test's own globals:
global sleep_time

# run level initialization
proc init_level {} {
  global CHECK_ACT_LEVEL sleep_time

  switch -- $CHECK_ACT_LEVEL {
     "0" {
        set sleep_time 10 ;# run level 0
        return 0
     }
     "5" {
        set sleep_time 60;# run level 5
        return 0
     }
  }
  return -1  ;# no other level
}

The procedure is called before any other "check_function". It will return the value "0" if the current level is supported. Otherwise the procedure returns "-1". The global variable "sleep_time" is set to "10" in run level "0" and to "60" in run level "5". The main test procedures (listed in the global "check_function") will use the "sleep_time" variable.
 
 
 

Procedure "create_file":
 

proc create_file {} {

   global sleep_time CHECK_ACTUAL_TEST_PATH CHECK_OUTPUT
 

   puts $CHECK_OUTPUT "writing file ..."

   # get unix timestamp
   set actual_time [timestamp]

   # open file "testfile" in current testpath for writing
   if { [ catch { open "$CHECK_ACTUAL_TEST_PATH/testfile" "w" } file ] != 0 } {
      set_error -2 "can't open file for writing"
      return
   }

   # write timestamp into file
   puts $file $actual_time

   # close file
   close $file
 

   puts $CHECK_OUTPUT "sleeping $sleep_time seconds ..."

   # wait time for this level
   sleep $sleep_time

   set_error 0 "no errors"
}
 

This procedure opens a file in the current checktree path and writes the actual unix time stamp in it. If an error occurs in the file open process the procedure will set the check_errstr state to -2 ( failed, stop ) and return. After writing the file it will sleep the number of seconds defined for this run level and return the check_errstr state 0 (no error).
 

Procedure "check_time"
 

proc check_time {} {
   global sleep_time CHECK_ACTUAL_TEST_PATH CHECK_OUTPUT

   puts $CHECK_OUTPUT "reading file ..."

   # open file "testfile" in current testpath for reading
   if { [ catch { open "$CHECK_ACTUAL_TEST_PATH/testfile" "r" } file ] != 0 } {
      set_error -2 "can't open file for reading"
      return
   }

   # timestamp from file
   gets $file write_time

   # close file
   close $file

   # get unix timestamp
   set actual_time [timestamp]

   puts $CHECK_OUTPUT "write_time: $write_time"
   puts $CHECK_OUTPUT "actual_time: $actual_time"

   # set test results
   set_error 0 "no errors"
   if { [ expr ( $actual_time - $write_time ) ]  < $sleep_time } {
       set_error -1 "time error"
   }
}

This procedure will reopen the file written by the procedure "create_file" and store the content into the variable "write_time". It also gets the "actual_time" by calling tcl's timestamp procedure. If the timestamp in the file is younger than the actual time the check_errstr state is set to -1 (error, continue).
 
 

Start the new "check.exp" test

To start the new test the subdirectory "../checktree/practice/simple" has to be the current working directory. Selecting "run local test" will start one testrun with all the enabled run levels. The output should look like this:
 

12
running local test ...
----------------------------------------------

checking dependencies of "simple_test" in "/homedir/myusername/src/testsuite/checktree/practice/simple" ...
 

-> enter check level 0
-> lock_testsuite: pid=25935 host=MYHOSTNAME user=myusername
waiting for lock ...
new lockfile written! Testing for correct lock ...
waiting to get lock
file size is: 18
waiting to get lock
file size is: 18
lock success!
OK - starting test functions (runlevel is 0)...

calling init level function "init_level" ...

starting function "create_file" in runlevel 0
writing file ...
sleeping 10 seconds ...
status:  procedure create_file: no errors

starting function "check_time" in runlevel 0
reading file ...
write_time: 963924383
actual_time: 963924393
status:  procedure check_time: no errors
S U C C E S S F U L L Y performed "simple_test" in run level 0 !
saving results for simple_test (level 0) ...
/homedir/myusername/src/testsuite/results/MYHOSTNAME.completed
/homedir/myusername/src/testsuite/results/MYHOSTNAME.uncompleted
testsuite unlocked!
 

-> enter check level 5
-> lock_testsuite: pid=25935 host=MYHOSTNAME user=myusername
waiting for lock ...
new lockfile written! Testing for correct lock ...
waiting to get lock
file size is: 18
waiting to get lock
file size is: 18
lock success!
OK - starting test functions (runlevel is 5)...

calling init level function "init_level" ...

starting function "create_file" in runlevel 5
writing file ...
sleeping 60 seconds ...
status:  procedure create_file: no errors

starting function "check_time" in runlevel 5
reading file ...
write_time: 963924410
actual_time: 963924470
status:  procedure check_time: no errors
S U C C E S S F U L L Y performed "simple_test" in run level 5 !
saving results for simple_test (level 5) ...
/homedir/myusername/src/testsuite/results/MYHOSTNAME.completed
/homedir/myusername/src/testsuite/results/MYHOSTNAME.uncompleted
testsuite unlocked!

press enter...


 


Copyright 2001 Sun Microsystems, Inc. All rights reserved.