ssync


Author:Michael W. Shaffer
Current Version:2.3
Status:Stable
Release Date:2002-11-06
Source Archive:ssync-2.3.tar.gz
DEB Package (i386 woody):ssync_2.3-1_i386.deb
RPM Package (i386 RH6.2):ssync-2.3-1.i386.rpm
SRPM Package:ssync-2.3-1.src.rpm
Mac OS X Binaries:ssync-2.3-osx.tar.gz
Change Log:CHANGES
License:GPL

Contents

What is it?

Ssync is a minimalistic tool for keeping filesystems in synchronization. My main goals in writing ssync were correctness, simplicity, speed, low-resource consumption, and portability. It features a number of options to control how things are synchronized and under what conditions, as well as useful dry-run and verbose modes.

Ssync has been compiled and is known to work on the following systems:

I use ssync in production on a number of systems, including some with moderately large filesystems of around 400 GiB and 2 million+ files. So far it has worked well in my environment under very heavy and constant usage. If you are using FreeBSD, ssync is included in the Ports collection of recent releases.

It should build and function correctly on most UNIX-like platforms with a working ANSI or C89 compliant compiler unsing the default makefile, with appropriate tweaks to things like the CC variables, etc. If you have problems building ssync on your particular UNIX platform, or if you come up with a makefile to build it successfully, I would appreciate your feedback.

There are specific makefiles for HP-UX (using c89) and OS X, as well as for building Debian .deb and RedHat .rpm packages. To use an alternate makefile, run make with the -f option, as such:

make -f makefile.osx

Why another synchronization tool?

The name ssync is a contraction of [s]imple filesystem [sync]hronizer. It was designed to be an extremely simple and reliable solution to a significant operational need. On the network I manage, I recently put into production a pair of loosely coupled highly available Linux file servers which run Samba, NFS, and dhttpd to service the file sharing needs of about 500 users with client machines running Windows and various UNIX platforms. I chose not to use any of the currently available HA packages to manage these systems for various reasons: The actual monitoring and failover features are handled by a separate daemon I created called peerd. Since the implementation does not rely on a shared disk subsystem, some means of keeping the two separate filesystems of the peer machines in relatively close synchronization was needed. Originally, the solution to this requirement was a shell script which ran various rsync commands, first using a connection to an rsync server process on the master machine and later relying on a couple of NFS filesystems exported on the master and mounted on the slave specifically for the replication. As it turned out, this solution was less than satisfactory since rsync would randomly but fairly frequently fail to complete the synchronization of one or more directory trees by either hanging indefinitely or barfing out numerous puzzling and spurious errors. The more I thought about it, the more I was convinced that what was needed was something much less complex and hopefully more reliable than rsync seemed to be in this application, and thus was born ssync / ssyncd. I don't pretend that this program is useful for anything besides the rather narrow mission for which it was designed (and it may not even be useful for that). I do think, however, that it at least provides an alternative sync tool for certain situations, and I was unable to find any viable alternative to rsync in the open source world when I wrote this.

Features

Limitations

The basic function of ssync is simply to make the directories, files, and links on a destination filesystem match those on a source filesystem. The default behavior is to read a list of paths to sync from a specified file and recursively process each of them. You may also specify the paths to sync with the (-f | --src-path) and (-t | --dst-path) command line options if you just want to quickly sync two paths without bothering to create configuration and work files.

Building and Installing

As of version 1.8 there are now binary packages available. If you have a Linux system which uses either the .rpm or .deb package formats, then all you have to do is install the package and edit the config files. I have tested and deployed ssync on both RedHat and Debian Linux. I am not aware of any Linux specific features which it uses, so I think it will work fine on most other UNIX-like platforms as well. As of the 2.0 release, I have eliminated what few GCC-isms the code contained and added the GCC -ansi and -pedantic flags to the makefile, so I think it will now build and work on most UNIX systems with a reasonably ANSI or C89 compliant compiler. With the GCC -ansi flag on, and because I did use snprintf(), lstat(), lchown(), and a couple of other not-strictly-POSIX things, it does require -D_BSD_SOURCE to build on Linux. If your platform does not have any of these functions for some reason, just let me know and I'll see if there are any workarounds.

There is no configure script since I just didn't feel like writing one and I don't really think one is necessary at this point. There may be one in the future. You may need to change the makefile if you don't have gcc available. Otherwise, a plain old make should do it. The build will produce two binaries, ssync (the interactive version), and ssyncd, the daemon. Also included is a rather generic ssyncd.init startup script which can be copied to /etc/init.d or wherever your distribution puts startup files. Examples of the the config files /etc/ssyncd.conf and /etc/ssyncd.work are provided, and they should be edited as appropriate to your situation. If you are running the interactive ssync version, it will obey whatever command line options you give as well as any configuration it might find in a file called .ssyncrc in the current directory. I have not yet gotten around to implementing any behavior for ssync to look for a .ssyncrc file in the user's home directory.

Configuration

All of the available configuration options are shown in the example ssyncd.conf configuration file and can be set either in this file (for ssyncd), in .ssyncrc (for ssync), or on the command line (for both). A summary of config options is below. The -c option, of course, only makes sense on the command line (duh). You will see a complete list of all updates, deletions, and exceptions at the default log-level of 0. If you want to suppress everything except errors, set log level 3 (warn). Log level 2 (info) is probably what most people want.

Config fileLong OptionShort OptionComment
---help-hdisplay usage message and version
conf-path--conf-path-cread alternative config file from the default
interval--interval-inumber of seconds to sleep between completing one run and starting the next
work-file--work-file-wpath for file containing work paths (see also src-path and dst-path)
src-path--src-path-falternative way to specify a single source path
dst-path--dst-path-talternative way to specify a single destination path
priority--priority-nscheduling priority (-20 - +20), see renice(8)
no-detach--no-detach-Fdo not daemonize (use with log-mode: stderr)
no-sync-data--no-sync-data-Ddo not sync data (content) of files
no-sync-time--no-sync-time-Tdo not sync atime / mtime
no-sync-meta--no-sync-meta-Mdo not sync meta-data (uid / gid / mode)
update-only--update-only-Uonly sync things if source mtime is > destination mtime
test--test-Xrun sync procedure and collect statistics without actually modifying anything
pid-path--pid-path-ppath for pid file
log-mode--log-mode-m[file|syslog|stderr] logging mode
log-path--log-path-lpath for log file if using file based logging
log-ident--log-ident-sidentification string if using syslog based logging
log-level--log-level-vlogging verbosity (0 - 5), lower levels are more verbose (2 is normal, 3 is errors only, 0 lists all updates and deletions

Here's the example ssyncd.conf file:


#
# ssyncd.conf
#

interval:		300			# time between sync runs in seconds
work-file:		/etc/ssyncd.work	# list of paths to synchronize (you can also specify
                                                # a single source and destination in the config file
						# or on the command line with src-path and dst-path
#src-path:		/src/path		# alternative specification of one source path
#dst-path:		/dst/path		# alternative specification of one destination path

priority:		0			# scheduling priority (range -20 - +20)
                                                # be careful with this! and read renice(8)
                                                # if you don't know what it means

#no-detach:             yes                     # [y|n] do not detach from terminal
#no-sync-data:		yes			# [y|n] do not sync data (file contents)
#no-sync-time:		yes			# [y|n] do not sync atime / mtime
#no-sync-meta:		yes			# [y|n] do not sync meta-data (uid / gid / mode)
#update-only:		yes			# [y|n] update only if source mtime > dest mtime
#test:			yes			# [y|n] test only (modify nothing in dest.)

pid-path:		/var/run/ssyncd.pid	# path for pid file

log-mode:               file                    # [file|syslog|stderr] logging mode
log-path:		/var/log/ssyncd.log	# path for file based logging
log-ident:		ssyncd			# id for syslog based logging
log-level:		2	# 0 - ALL
				# 1 - TRACE
				# 2 - INFO
				# 3 - WARN
				# 4 - SEVERE
				# 5 - FATAL

The work file just contains a list of work items, one per line, in the form:

/source/path | /destination/path
The paths can be either files or directories, and source directories will be processed recursively. There is no form of substitution or environment variable parsing, and there is no facility for excluding things. If the destination is a different type than the source (i.e. source is a file and destination is a directory), then the program will unlink the destination object (recursively) and re-create it as the new type. This means that if you wanted to sync a file into a directory, you should give the full path name of the destination including the file name. This 'feature' might also have some disastrously unexpected effects if you tried to specify a symlink to a directory or file as the source path and a real directory or file as the destination. The config file parsing routines are really simple-minded and will just discard all whitespace in either config file (meaning paths with whitespace will not be parsed correctly). If it causes a lot of issues, I may refine this behavior in the future. Here's the example ssyncd.work file:
#
# ssyncd.work:   Example work file for ssync / ssyncd
#
# Each line must be of the form:
#
#   source path | destination path
#

# Individual files
/mnt/peer/etc/aliases          | /etc/aliases
/mnt/peer/etc/group            | /etc/group
/mnt/peer/etc/group-           | /etc/group-
/mnt/peer/etc/gshadow-         | /etc/gshadow-
/mnt/peer/etc/gshadow          | /etc/gshadow
/mnt/peer/etc/passwd           | /etc/passwd
/mnt/peer/etc/passwd-          | /etc/passwd-
/mnt/peer/etc/shadow-          | /etc/shadow-
/mnt/peer/etc/shadow           | /etc/shadow

# Directory trees
/mnt/peer/etc/cron.d           | /etc/cron.d
/mnt/peer/etc/cron.daily       | /etc/cron.daily
/mnt/peer/etc/cron.monthly     | /etc/cron.monthly
/mnt/peer/etc/cron.weekly      | /etc/cron.weekly
/mnt/peer/etc/init.d           | /etc/init.d
/mnt/peer/etc/logrotate.d      | /etc/logrotate.d
/mnt/peer/etc/rc0.d            | /etc/rc0.d
/mnt/peer/etc/rc1.d            | /etc/rc1.d
/mnt/peer/etc/rc2.d            | /etc/rc2.d
/mnt/peer/etc/rc3.d            | /etc/rc3.d
/mnt/peer/etc/rc4.d            | /etc/rc4.d
/mnt/peer/etc/rc5.d            | /etc/rc5.d
/mnt/peer/etc/rc6.d            | /etc/rc6.d
/mnt/peer/etc/rcS.d            | /etc/rcS.d

mwshaffer@yahoo.com