![]() |
LinkChecker |
To check a URL like http://www.example.org/
it is enough to
type linkchecker www.example.org/
on the command line or
type www.example.org
in the GUI application. This will check the
complete domain of http://www.example.org
recursively. All links
pointing outside of the domain are also checked for validity.
All URLs have to pass a preliminary syntax test. Minor quoting After the syntax check passes, the URL is queued for connection checking. All connection check types are described below.
HTTP links (http:
, https:
)
After connecting to the given HTTP server the given path or query is requested. All redirections are followed, and if user/password is given it will be used as authorization when necessary. Permanently moved pages issue a warning. All final HTTP status codes other than 2xx are errors.
Local files (file:
)
A regular, readable file that can be opened is valid. A readable directory is also valid. All other files, for example device files, unreadable or non-existing files are errors.
File contents are checked for recursion.
Mail links (mailto:
)
A mailto: link eventually resolves to a list of email addresses. If one address fails, the whole list will fail. For each mail address the following things are checked:
FTP links (ftp:
)
For FTP links the following is checked:
anonymous
, the default password is anonymous@
.Telnet links (telnet:
)
A connect and if user/password are given, login to the given telnet server is tried.
NNTP links (news:
, snews:
, nntp
)
A connect is tried to connect to the given NNTP server. If a news group or article is specified, it will be requested from the server.
Ignored links (javascript:
, etc.)
An ignored link will print a warning, but no error. No further checking will be made.
Here is the complete list of recognized, but ignored links. The most prominent of them are JavaScript links.
acap:
(application configuration access protocol)afs:
(Andrew File System global file names)chrome:
(Mozilla specific)cid:
(content identifier)clsid:
(Microsoft specific)data:
(data)dav:
(dav)fax:
(fax)find:
(Mozilla specific)gopher:
(Gopher)imap:
(internet message access protocol)irc:
(internet relay chat)isbn:
(ISBN (int. book numbers))javascript:
(JavaScript)ldap:
(Lightweight Directory Access Protocol)mailserver:
(Access to data available from mail servers)mid:
(message identifier)mms:
(multimedia stream)modem:
(modem)nfs:
(network file system protocol)opaquelocktoken:
(opaquelocktoken)pop:
(Post Office Protocol v3)prospero:
(Prospero Directory Service)rsync:
(rsync protocol)rtsp:
(real time streaming protocol)service:
(service location)shttp:
(secure HTTP)sip:
(session initiation protocol)tel:
(telephone)tip:
(Transaction Internet Protocol)tn3270:
(Interactive 3270 emulation sessions)vemmi:
(versatile multimedia interface)wais:
(Wide Area Information Servers)z39.50r:
(Z39.50 Retrieval)z39.50s:
(Z39.50 Session)Before descending recursively into a URL, it has to fulfill several conditions. The conditions are checked in this order:
--recursion-level
command line option, the recursion
level GUI option, or through the configuration file.
The recursion level is unlimited by default.--ignore-url
command line option or through the
configuration file.Note that the local and FTP directory recursion reads all files in that
directory, not just a subset like index.htm*
.