SAX

SAX stands for Simple API for XML, and was originally a Java API for reading XML. (Full details at http://saxproject.org). SAX implementations exist for most common modern computer languages.

FoX includes a SAX implementation, which translates most of the Java API into Fortran, and makes it accessible to Fortran programs, enabling them to read in XML documents in a fashion as close and familiar as possible to other languages.

SAX is a stream-based, event callback API. Conceptually, running a SAX parser over a document results in the parser generating events as it encounters different XML components, and sends the events to the main program, which can read them and take suitable action.

Events

Events are generated when the parser encounters, for example, an element opening tag, or some text, and most events carry some data with them - the name of the tag, or the contents of the text.

The full list of events is quite extensive, and may be seen below. For most purposes, though, it is unlikely that most users will need more than the 5 most common events, documented here.

Given these events and accompanying information, a program can extract data from an XML document.

Invoking the parser.

Any program using the FoX SAX parser must a) use the FoX module, and b) declare a derived type variable to hold the parser, like so:

   use FoX_sax
   type(xml_t) :: xp

The FoX SAX parser then works by requiring the programmer to write a module containing subroutines to receive any of the events they are interested in, and passing these subroutines to the parser.

Firstly, the parser must be initialized, by passing it XML data. This can be done either by giving a filename, which the parser will manipulate, or by passing a string containing an XML document. Thus:

  call open_xml_file(xp, "input.xml", iostat)

The iostat variable will report back any errors in opening the file.

Alternatively,

  call open_xml_string(xp, XMLstring)

where XMLstring is a character variable.

To now run the parser over the file, you simply do:

 call parse(xp, list_of_event_handlers)

And once you're finished, you can close the file, and clean up the parser, with:

 call close_xml_t(xp)

Receiving events

To receive events, you must construct a module containing event handling subroutines. These are subroutines of a prescribed form - the input & output is predetermined by the requirements of the SAX interface, but the body of the subroutine is up to you.

The required forms are shown in the API documentation below, but here are some simple examples.

To receive notification of character events, you must write a subroutine which takes as input one string, which will contain the characters received. So:

module event_handling
  use FoX_sax
contains

  subroutine characters_handler(chars)
    character(len=*), intent(in) :: chars

    print*, chars
  end subroutine
end module

That does very little - it simply prints out the data it receives. However, since the subroutine is in a module, you can save the data to a module variable, and manipulate it elsewhere; alternatively you can choose to call other subroutines based on the input.

So, a complete program which reads in all the text from an XML document looks like this:

module event_handling
  use FoX_sax
contains

  subroutine characters_handler(chars)
    character(len=*), intent(in) :: chars

    print*, chars
  end subroutine
end module

program XMLreader
  use FoX_sax
  use event_handling
  type(xml_t) :: xp
  call open_xml_file(xp, 'input.xml')
  call parse(xp, characters_handler=characters_handler)
  call close_xml_t(xp)
end program

Attribute dictionaries.

The other likely most common event is the startElement event. Handling this involves writing a subroutine which takes as input three strings (which are the local name, namespace URI, and fully qualified name of the tag) and a dictionary of attributes.

An attribute dictionary is essentially a set of key:value pairs - where the key is the attributes name, and the value is its value. (When considering namespaces, each attribute also has a URI and localName.)

Full details of all the dictionary-manipulation routines are given in AttributeDictionaries(AttributeDictionaries.html), but here we shall show the most common.

So, a simple subroutine to receive a startElement event would look like:

module event_handling

contains

 subroutine startElement_handler(URI, localname, name,attributes)
   character(len=*), intent(in)   :: URI  
   character(len=*), intent(in)   :: localname
   character(len=*), intent(in)   :: name 
   type(dictionary_t), intent(in) :: attributes

   integer :: i

   print*, name

   do i = 1, len(attributes)
      print*, getKey(attributes, i), '=', getValue(attributes, i)
   enddo

  end subroutine startElement_handler
end module

program XMLreader
 use FoX_sax
 use event_handling
 type(xml_t) :: xp
 call open_xml_file(xp, 'input.xml')
 call parse(xp, startElement_handler=startElement_handler)
 call close_xml_t(xp)
end program

Again, this does nothing but print out the name of the element, and the names and values of all of its attributes. However, by using module variables, or calling other subroutines, the data could be manipulated further.

Error handling

The SAX parser detects all XML well-formedness errors. By default, when it encounters an error, it will simply halt the program with a suitable error message. However, it is possible to pass in an error handling subroutine if some other behaviour is desired - for example it may be nice to report the error to the user, and carry on with some other task.

In any case, once an error is encountered, the parser will finish. There is no way to continue reading past an error.

An error handling suubroutine works in the same way as any other event handler, with the event data being an error message. Thus, you could write:

subroutine error_handler(msg)
  character(len=*), intent(in) :: msg

  print*, "The SAX parser encountered an error:"
  print*, msg
  print*, "Never mind, carrying on with the rest of the calcaulation."
end subroutine

Full API

Derived types

There is one derived type, xml_t. This is entirely opaque, and is used as a handle for the parser.

Subroutines

There are four subroutines:

This opens a file. xp is initialized, and prepared for parsing. string must contain the name of the file to be opened. iostat reports on the success of opening the file. A value of 0 indicates success.

This closes down the parser (and closes the file, if input was coming from a file.) xp is left uninitialized, ready to be used again if necessary.

(Advanced: By default, this will be done in a non-validating way, testing only for well-formedness errors. However, if validate is set to true. FoX will attempt to diagnose validation errors. Note that FoX is not a full validating parser, and will not read external entities, so do not rely on this behaviour)

The full list of event handlers is in the next section. To use them, the interface must be placed in a module, and the body of the subroutine filled in as desired; then it should be specified as an argument to parse as:
name_of_event_handler = name_of_user_written_subroutine
Thus a typical call to parse might look something like:

  call parse(xp, startElement_handler = mystartelement, endElement_handler = myendelement, characters_handler = mychars)

where mystartelement, myendelement, and mychars are all subroutines written by you according to the interfaces listed below.


Callbacks.

All of the callbacks specified by SAX 2 are implemented. Documentation of the SAX 2 interfaces is available in the JavaDoc at http://saxproject.org, but as the interfaces needed adjustment for Fortran, they are listed here.

For documentation on the meaning of the callbacks and of their arguments, please refer to the Java SAX documentation.

Triggered when some character data is read from between tags.

NB Note that all character data is reported, including whitespace. Thus you will probably get a lot of empty characters events in a typical XML document.

NB Note also that it is not required that large chunks of character data all come as one event - they may come as multiple consecutive events.

Triggered when the parser reaches the end of the document.

Triggered by a closing tag.

Triggered when a namespace prefix mapping goes out of scope.

Triggered when whitespace is encountered within an element declared as EMPTY. (Only active in validating mode.)

Triggered by a Processing Instruction

Triggered when either an external entity, or an undeclared entity, is skipped.

Triggered when the parser starts reading the document.

Triggered when an opening tag is encountered. (see LINK for documentation on handling attribute dictionaries.

Triggered when a namespace prefix mapping start.

Triggered when a NOTATION declaration is made in the DTD

Triggered when an unparsed entity is declared

Triggered when a normal parsing error is encountered. Parsing will cease after this event.

Triggered when a fatal parsing error is encountered. Parsing will cease after this event.

Triggered when a parser warning is generated. Parsing will continue after this event.

Triggered when an attribute declaration is encountered in the DTD.

Triggered when an element declaration is enountered in the DTD.

Triggered when a parsed external entity is declared in the DTD.

Triggered when an internal entity is declared in the DTD.

Triggered when a comment is encountered.

Triggered by the end of a CData section.

Triggered by the end of a DTD.

Triggered at the end of entity expansion.

Triggered by the start of a CData section.

Triggered by the start of a DTD section.

Triggered by the start of entity expansion.


Exceptions.

Although FoX tries very hard to work to the letter of the XML and SAX standards, it falls short in a few areas.

(This includes non-ASCII characters present only by character reference.)

It will, however, happily accept documents labelled as UTF-8 encoded.

Beyond this, any aspects of XML and SAX which FoX fails to do justice to are bugs.

Note that (as permissable within XML) FoX acts primarily as a non-validating parser, and thus all constraints marked as Validity Constraints by XML-1.0/1.1 are ignored by default. A subset of them will be picked up by FoX's validation mode, but only a small subset.

Note also that FoX will not read external entities when processing an XML document.


What of Java SAX 2 is not included in FoX?

The difference betweek Java & Fortran means that none of the SAX APIs can be copied directly. However, FoX offers data types, subroutines, and interfaces covering a large proportion of the facilities offered by SAX. Where it does not, this is mentioned here.

org.sax.xml:

org.sax.xml.ext:

org.sax.xml.helpers: