This manual is for liblouisxml (version 1.8.0, 26 January 2009), an xml to Braille Translation Library.
This file may contain code borrowed from the Linux screenreader BRLTTY, Copyright © 1999-2006 by the BRLTTY Team.
Copyright © 2004-2007 ViewPlus Technologies, Inc. www.viewplus.com and Copyright © 2007,2008 JJB Software, Inc. www.jjb-software.com.
This file is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser (or library) General Public License (LGPL) as published by the Free Software Foundation; either version 3, or (at your option) any later version.This file is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser (or Library) General Public License LGPL for more details.
You should have received a copy of the GNU Lesser (or Library) General Public License (LGPL) along with this program; see the file COPYING. If not, write to the Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
liblouisxml is a software component which can be incorporated into software packages to provide the capability of translating any file in the computer lingua franca xml format into properly transcribed braille. This includes translation into grade two, if desired, mathematical codes, etc. It also includes formatting according to a built-in style sheet which can be modified by the user. The first program into which liblouisxml has been incorporated is xml2brl. This program will translate an xml or text file into an embosser-ready braille file. It is not necessary to know xml, because MSWord and other word processors can export files in this format. If the word processor has been used correctly xml2brl will produce an excellent braille file.
There is a Mac GUI application incorporating liblouisxml called louis. For a link to it go to www.jjb-software.com/downloads. A similar Windows application is in the works.
Computer programmers who wish to use liblouisxml in their software can find the information they need in Programming with liblouisxml. Those who wish to change the output generated by liblouisxml should read Customization Configuring liblouisxml. If you encounter a type of xml file with which liblouis is not familiar you can learn how to tell it how to process that file by reading Connecting with the xml Document. Finally, if you wish to implement a new braille mathematics code read Implementing Braille Mathematics Codes.
You will also find it advantageous to be acquainted with the companion library liblouis, which is a braille translator and back-translator (see Overview).
Liblouisxml may contain code borrowed from the Linux screenreader BRLTTY, Copyright © 1999-2006 by the BRLTTY Team.
Copyright © 2004-2007 ViewPlus Technologies, Inc. www.viewplus.com.
Copyright © 2007,2008 JJB Software, Inc. www.jjb-software.com.
Liblouisxml is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
Liblouisxml is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with Liblouis. If not, see http://www.gnu.org/licenses/.
liblouisxml is an "extensible renderer", designed to translate a wide variety of xml documents into braille, but with a special emphasis on technical material. The overall operation of liblouisxml is controlled by a configuration file. The way in which a particular type of xml document is to be rendered is specified by a semantic-action file for that document type. Braille translation is done by the liblouis braille translation and back-translation library (see Overview). Its operation, in turn is controlled by translation table files. All these files are plain text and can be created and edited in any text editor. Configuration settings can also be specified on the command line of the console-mode transcription program xml2brl.
The general operation of liblouisxml is as follows. It uses the
libxml2 library to construct a parse tree of the xml document. After
the parse tree is constructed, a function called
examine_document
looks it over and determines whether math
translation tables, etc. are needed. examine_document
also
constructs a prototype semantic-action file, if one does not exist
already. When it is finished, another function, called
transcribe_document
, does the actual braille transcription. It
calls transcribe_math
to handle MathML subtrees,
transcribe_chemistry
for chemical formula subtrees,
transcribe_graphic
for SVG graphics, etc. Entities are
translated to Unicode, if they are not already. Sequences of symbols
indicate superscripts, return to the baseline, subscripts, start and
end of fractions, etc. The Braille translator and back-translator
library liblouis is used to do the braille translation.
The transcribe_math
function works in conjunction with the
latest version of liblouis and a special math translation table to
transcribe most mathematical expressions into fairly good Nemeth Code.
Much refinement is still necessary. Other braille mathematical codes
can be handled by modifying the translation table.
The functions which are not needed at the moment, such as
transcribe_chemistry
, are only skeletons. However, I hope that
transcribe_graphics
can be expanded in the near future to use
the graphics capability of the Tiger tactile graphics embossers.
The latest versions of liblouisxml and liblouis can be downloaded from www.jjb-software.com. Note that liblouisxml will only work with the latest version of liblouis.
liblouisxml can be compiled to use either 16-bit or 32-bit Unicode internally. This is inherited from liblouis, so liblouis must be compiled first and then liblouisxml. Wherever 16 bits are mentioned in this document, read 32 if you have compiled the library for 32 bits.
As stated in the previous section, liblouisxml uses three kinds of files, configuration files, semantic-action files, and liblouis translation tables. The first two are discussed later in this documentation. liblouis translation tables are discussed in the liblouis guide (see Overview) which is distributed with liblouis. These files can be placed on various paths, which are determined at compile time. One of these paths should be to the lbx_files directory provided by liblouisxml, which contains the principal configuration file (canonical.cfg) and the semantic-action files. Another should be to the tables directory in the liblouis distribution. Note that liblouisxml also generates some files, all of which are placed on the current directory. These files are new prototype semantic-action files, additions to old semantic-action files, temporary files, and log files. The first two can be used to extend the capability of liblouisxml to process xml documents. The latter two are useful for debugging.
Paths are set by changing a few lines of code in the paths.c module. If you are preparing liblouisxml for Windows a function which finds the name of the "Program Files" directory for your locale is called automatically. You can then modify the line containing the term ‘yourSubDir’ as needed.
If you are preparing liblouisxml for a Unix-type system look for the line that says ‘Set Unix Paths’. The following three lines establish a path to the lbx_files directory in your home directory. As distributed, this directory contains the semantic-action files and some configuration files. You can chose to copy the tables from the liblouis distribution into it as well, or you can modify the following three lines to point to the actual location of the tables. You can also chose to place both the lbx_files and the tables directory in /etc.
The function addPath
takes care of adding path to liblouisxml
properly. You can specify many more than two paths.
char *lbx_version (void)
This function returns a pointer to a character string containing the version of liblouisxml, plus other information such as the release date and perhaps notable changes.
void * lbx_initialize ( const char *const configFilelist, const char const *logFileName, const char *const settingsString)
This function initializes the libxml2 library, processes
canonical.cfg and configuration settings given in
settingsString
and the configuration files given in
configFilelist
. This is a list of configuration file names
separated by commas. If the first character is a comma it is taken to
be a string containing configuration settings and is processed like
the settingsString
string. Such a string must conform to the
format of a configuration file. Newlines should be represented with
ASCII 10. If logfilename
is not null
, a log file is
produced on the current directory. If it is null
any messages
are printed on stderr. The function returns a pointer to the
UserData
structure. This pointer is void
and must be
cast to (UserData *)
in the calling program. To access the
information in this structure you must include louisxml.h. This
function is used by xml2brl.
int lbx_translateString ( const char *const configfilelist, char * inbuf, widechar *outbuf, int *outlen, unsigned int mode)
This function takes a well-formed xml expression in inbuf
and
translates it into a string of 16-bit (or 32-bit if this has been
specified in liblouis) braille characters in outbuf
. The xml
expression must be immediately followed by a zero or null byte.
Leading whitespace is ignored. If it does not then begin with the
characters ‘<?xml’ an xml header is added. If it does not begin
with ‘<’ it is assumed to be a text string and is translated
accordingly. The header is specified by the xmlHeader
line in
the configuration file. If no such line is present, a default header
specifying UTF-8 encoding is used. The mode
parameter specifies
whether you want the library to be initialized. If it is 0 everything
is reset, the canonical.cfg file is processed and the
configuration file and/or string (see previous section) are processed.
If mode
is 1 liblouisxml simply prepares to handle a new
document. For more on the mode
parameter see the next section.
Which 16-bit character in outbuf
represents which dot pattern
is indicated in the liblouis translation tables. The
configfilelist
parameter points to a configuration file or
string. Among other things, this file specifies translation tables. It
is these tables which control just how the translation is made,
whether in Grade 2, Grade 1, the Nemeth Code of Braille Mathematics or
something else.
Note that the *outlen
parameter is a pointer to an integer.
When the function is called, this integer contains the maximum output
length. When it returns, it is set to the actual length used. The
function returns 1 if no errors were encountered and a negative number
if a complete translation could not be done.
int lbx_translateFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode)
This function accepts a well-formed xml document in
inputFilename
and produces a braille translation in
outputFilename
. As for lbx_translateString
, the
mode
parameter specifies whether the library is to be
initialized with new configuration information or simply prepared to
handle a new document. In addition, the mode
parameter can
specify that a document is in html, not xhtml. liblouisxml.h
contains an enumeration type with the values dontInit
and
htmlDoc
. These can be combined with an or (‘|’) operator. The
input file is assumed to be encoded in UTF-8, unless otherwise
specified in the xml header. The encoding of the output file may be
UTF-8, UTF-16, UTF-32 or Ascii-8. This is specified by the
outputEncoding
line in the configuration file,
configfilelist
. The function returns 1 if the translation was
successful.
int lbx_translateTextFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode)
This function accepts a text file in inputFilename
and produces
a braille translation in outputFilename
. The input file is
assumed to be encoded in Ascii8. Blank lines indicate the divisions
between paragraphs. Two blank lines cause a blank line between
paragraphs (or headers). The output file may be in UTF-8, UTF-16, or
Ascii8, as specified by the outputEncoding
line in the
configuration file, configfilelist
. As for
lbx_translateString
, the mode
parameter specifies
whether complete initialization is to be done or simply initialization
for a new document.
int lbx_backTranslateFile ( char *configfilelist, char *inputFileName, char *outputFileName, unsigned int mode)
This function accepts a braille file in inputFilename
and
produces a back-translation in outputFilename
. The input file
is assumed to be encoded in Ascii8. The output file is in either plain
text or html, according to the setting of backFormat
in the
configuration file. Html files are encoded in UTF8. In plain-text,
blank lines are inserted between paragraphs. The output file may be in
UTF-8, UTF-16, or Ascii8, as specified by the outputEncoding
line in the configuration file, configfilelist
. The mode
parameter specifies whether or not the library is to be initialized
with new configuration information, as described in the section on
lbx_translateString
(see lbx_translateString).
void lbx_free (void)
This function should be called at the end of the application to free
all memory allocated by liblouisxml and liblouis. If you wish to
change configuration files during your application, use a mode
parameter of 0 on the function call using the new configuration
information.
At the moment, actual transcription with liblouisxml is done with the command-line (or console) program xml2brl. The line to type is:
xml2brl [OPTIONS] [-f config-file] [infile] [outfile]
The brackets indicate that something is optional. You will see that nothing is required except the program name itself, xml2brl. The various optional parts control how the program will behave, as follows:
xml2brl is set up so that it can be used in a "pipe". To do this, omit both infile and outfile. Input is then taken from the standard input unit.
The first file name encountered (a word not preceded by a minus sign) is taken to be the input file and the second to be the output file. If you wish input to be taken from stdin and still want to specify an output file use two minus signs (‘--’) for the input file.
If only the program name is typed xml2brl assumes that the configuration file is default.cfg, input is from the standard input unit, and output is to the standard output unit.
msword2brl infile outfile
Infile must be a Microsoft Word file. The script first calls the antiword program, so you must have this installed on your machine. antiword is called with -x db, which causes the output to be in docbook format. This is piped to xml2brl. The output file from xml2brl contains much of the formatting, including emphasis, of the word file.
The operation of liblouisxml is controlled by two types of files: semantic-action files and configuration files. The former are discussed in the section Connecting with the xml Document - Semantic-action Files (see Connecting with the xml Document - Semantic-Action Files). The latter are discussed in this section. A third type of file, braille translation tables, is discussed in the liblouis documentation (see Overview). Another section of the present document which may be of interest is Implementing Braille Mathematical Codes (see Implementing Braille Mathematics Codes).
liblouisxml (with liblouis) can be used as the braille transcription component in any number of applications with different overall purposes and user interfaces. However, as of now the principal application is xml2brl, which is a console application for Mac and Linux. (There is also a Mac GUI application called louis.) The information below therefore applies to xml2brl as much as to liblouisxml.
Before discussing configuration files in detail it is worth noting
that the application program has access to the information in the
configuration files by calling the liblouisxml function
lbx_initialize
. This function returns a pointer to a data
structure containing the configuration information.
xml2brl uses the configuration file default.cfg unless a different one is specified via the -f command-line option. The configuration file name may include a full path. In this case, liblouisxml will consider this to be the user path. (This can be changed at compile time (see Files and Paths). If just a file name (or list) is given, liblouisxml will consider the current directory as the user path.
The configuration "file" specified with the -f option need not be a single filename. It can be several file names separated by commas. Only the first filename may have a path component. This path is taken as the user path, as discussed in the previous paragraph. This file-list feature is also found in liblouis. It enables you to combine configuration files on the command line. For example, a file list may consist of one file specifying the output format used in your establishment, a comma, and then the name of a stylesheet.
After the path, if any, has been evaluated, but before reading any of the files, liblouisxml reads in a file called canonical.cfg. This file specifies values for all possible settings. It is needed to complete the initialization of the program. You may alter the values in the distribution canonical.cfg, but you should not delete any settings. If a configuration file read in later contains a particular setting name, the value specified simply replaces the one specified in canonical.cfg.
As you will see by looking at canonical.cfg, it contains four main sections, outputFormat, translation, xml and styles. In addition, a configuration file can contain an include entry. This causes the file named on that line to be read in at the point where the line occurs. The sections need not follow each other in any particular order, nor is the order of settings within each section important. In this document and in the canonical.cfg file, where section and setting names consist of more than one word, the first letter of each word following the initial one is capitalized. This is merely for readability. The case of the letters in these names is ignored by the program. Section and setting names may not contain spaces.
Here, then, is an explanation of each section and setting in the canonical.cfg file. When you look at this file you will see that the section names start at the left margin, while the settings are indented one tab stop. This is done for readability. it has no effect on the meaning of the lines. You will also see lines beginning with a number sign (‘#’), which are comments. Blank lines can also be used anywhere in a configuration file. In general, a section name is a single word or combination of unspaced words. However, each style has a section of its own, so the word ‘style’ is followed by the name of the style. Setting lines begin with the name of the setting, followed by at least one space or tab, followed by the value of the setting. A few settings have two values.
This section specifies the format of the output file (or string, if no file name is given).
cellsPerLine 40
LinesPerPage 25
interpoint no
lineEnd \r\n
pageEnd \f
fileEnd ^z
printPages yes
braillePages yes
paragraphs yes
BeginingPageNumber 1
printPageNumberAt top
braillePageNumberAt bottom
hyphenate no
outputEncoding ascii8
inputTextEncoding ascii8
formatFor textDevice
backFormat plain
backLineLength 70
interline no
This section specifies the liblouis translation tables to be used for various purposes.
literaryTextTable en-us-g2.ctb
uncontractedTable en-us-g1.ctb
compbrailleTable en-us-compbrl.ctb
mathtextTable en-us-mathtext.ctb
MathexpTable nemeth.ctb
editTable edittable.ctb
interlineBackTable en-us-interline.ctb
This section provides various information for the processing of xml files.
semanticFiles *,nemeth.semm
xmlheader <?xml version='1.0' encoding='UTF8' standalone='yes'?>
entity nbsp ^1
internetAccess yes
newEntries yes
The following sections all deal with styles. Each style has its own section. Style section names are unlike other section names in that they consist of the word style, followed by a space, followed by a style name. More styles may be added as the software develops, and some may be dropped.
This section specifies the style of the whole document. The settings given in it are applied to all other styles. If a section for another style is given, the settings in it replace those from the document style for that section. Because the settings in the document style apply to all other styles, if a document style section is given it must precede the sections for all other styles.
linesBefore 0
This setting gives the number of blank lines which should be left before the text to which this style applies. It is set to a non-zero value for some header styles.
linesAfter 0
The number of blank lines which should be left after the text to which this style applies.
leftMargin 0
The number of cells by which the left margin of all lines in the text should be indented. Used for hanging indents, among other things.
firstLineIndent 0
The number of cells by which the first line is to be indented relative to leftMargin. firstLineIndent may be negative. If the result is less than 0 it will be set to 0.
translate contracted
This setting is currently inactive. It may be used in the future. This setting tells how text in this style should be translated. Possible values are ‘contracted’, ‘uncontracted’, ‘compbrl’, ‘mathtext’ and ‘mathexpr’.
skipNumberLines no
If this setting is ‘yes’ the top and bottom lines on the page will be skipped if they contain braille or print page numbers. This is useful in some of the mathematical and graphical styles.
format leftJustified
The format setting controls how the text in the style will be formatted. Valid values are ‘leftJustified’, ‘rightJustified’, ‘centered’, ‘computerCoded’, ‘alignColumnsLeft’, ‘alignColumnsRight’, ‘listColumns’ and ‘listLines’. The first three are self-explanatory. ‘computerCoded’ is used for computer programs and similar material. The next three are used for tabular material. ‘alignColumnsLeft’ causes the left ends of columns to be aligned. ‘alignColumnsRight’ causes the right ends of columns to be aligned. ‘listColumns’ causes columns to be placed one after the other, separated by whatever separation character has been specified in the semantic-action file, followed by a space. An escape character (hex 1b) must also be specified to indicate the end of the column. Two escape characters must be specified to indicate the end of a row. Indentation of the lines in a row is controlled by the leftMargin and firstLineIndent settings. ‘listLines’ is similar except that it lists lines, as in poetry stanzas. The semantic-action file must specify two escape characters to indicate the end of a line.
newPageBefore no
If this setting is ‘yes’, the text will begin on a new page. This is useful for certain mathematical and graphical styles. Page numbers are handled properly.
newPageAfter no
If this setting is ‘yes’ any remaining space on the page after the material covered by this style is handled is left blank, except for page numbers.
rightHandPage no
if this setting is ‘yes’ and interpoint is yes the material covered by this style will start on a right-hand page. This may cause a left-hand page to be left blank except for page numbers. If interpoint is ‘no’ this setting is equivalent to newPageBefore.
This style is used for arithmetic examples in elementary math books. On recognizing this style, the translator formats the material in a special way. This style has no settings different from those of the document style at the moment. Nevertheless, the line ‘style arith’ must be included in canonical.cfg so that it will be set up properly.
This style is used for an attribution following a quotation.
format rightJustified
This style is used for bibliographies. Settings will be added later.
This style is used for picture captions.
This style is used for computer programs.
This is for entries in a table of contents.
This style is for the dedication of a book.
This is for giving directions for exercises.
This is for showing mathematics that is set off from the text.
leftMargin 2
This if for text that is set off from the rest of the text.
This is the first level in a set of exercises where there are sublevels.
This is for the second level of exercises, such as exercise a following exercise 1.
This is for the third level of exercises.
firstLineIndent 2
Section: style graph
This style reserves space for a graph or other tactile material.
skipNumberLines yes
This style reserves space for the label of a graph.
This style is used for main headings, such as chapter titles.
The first level of subreadings after the main heading.
firstLineIndent 4
The fourth and final level of headings.
firstLineIndent 4
This style is used for indexes. The extra ‘x’ is not an error. It is there to prevent conflict with names elsewhere in the software.
This is for the individual items in a list.
This style causes its contents to be formatted in a way suitable for the representation of matrices.
format alignColumnsLeft
This style is used for braille music.
skipNumberLines yes
This style is used for footnotes.
Paragraph. This is ordinary body text.
firstLineIndent 2
This style is used for quotations that are set off from the rest of the text.
This style is used for a section with a section number.
firstLineIndent 4
This style is used for mathematical material that is arranged spatially, such as large fractions.
this style is used for stanzas in poetry.
This and the subsequent numbered styles can be used by the user for any purpose.
This style is used for subsections with a subsection number.
firstLineIndent 4
This style is used for ordinary tables.
This style is used to begin a title page.
newPageAfter yes
This style is used for transcriber's notes which are set off from the text.
This style is used to indicate the beginning of a braille volume.
When liblouisxml (or xml2brl) processes an xml document, it needs to be told how to use the information in that document to produce a properly translated and formatted braille document. These instructions are provided by a semantic-action file, so called because it explains the meaning, or semantics, of the various specifications in the xml document. To understand how this works, it is necessary to have a basic knowledge of the organization of an xml document.
An xml document is organized like a book, but with much finer detail.
first there is the title of the whole book. Then there are various
sections, such as author, copyright, table of contents, dedication,
acknowledgments, preface, various chapters, bibliography, index, and
so on. Each chapter may be divided into sections, and these in turn
can be divided into subsections, subsubsections, etc. In a book the
parts have names or titles distinguished by capitalization, type
fonts, spacing, and so forth. In an xml document the names of the
parts are enclosed in angle brackets (‘<>’). for example, if
liblouisxml encounters <html>
at the beginning of a document,
it knows it is dealing with a document that conforms to the standards
of the extensible markup language (xhtml) - at least we hope it does.
When you see a book, you know it's a book. The computer can know only
by being told. Something enclosed in angle brackets is called an
"element" (more properly, a "tag") in xml parlance. (There may be more
between the angle brackets than just the name of the element. More of
this later). The first "element" in a document thus tells liblouisxml
what kind of document it is dealing with. This element is called the
"root element" because the document is visualized as branching out
from it like a tree. Some examples of root elements are <html>
,
<math>
, <book>
, <dtbook3>
and
<wordDocument>
. Whenever liblouisxml encounters a root element
that it doesn't know about it creates a new file called a
semantic-action file. The name of this file is formed by stripping the
angle brackets from the root element and adding a period plus the
letters ‘sem’. If you look in a directory containing
semantic-action files you will see names like html.sem,
dtbook3.sem, math.sem, and so on.
Sometimes it is advantageous to preempt the creation of a
semantic-action file for a new root element. For example, an article
written according to the docbook specification may have the root
element <article>
. However, the specification itself has the
root element <book>
. In this case you can specify the
book.sem file in the configuration file by writing, in the xml
section,:
semanticFiles book.sem
You will note that this setting uses the plural of "file". This is because you can actually specify a list of file names separated by commas. You might want to do this to specify the semantic-action file for the particular braille mathematical code to be used. For example:
semanticFiles book.sem,ukmath.sem
As you will see in the next section, different braille style conventions and different braille mathematical codes may require different semantic-action files
liblouisxml records the names of all elements found in the document in the semantic-action file. The document has a multitude of elements, which can be thought of as describing the headings of various parts of the document. One element is used to denote a chapter heading. Another is used to denote a paragraph, Still another to denote text in bold type, and so on. In other words, the elements take the place of the capitalization, changes in type font, spacing, etc. in a book. However, the computer still does not know what to do when it encounters an element. The semantic-action file tells it that.
Consider html.sem. A copy is included as part of this documentation with the name example_sem. It may differ from the file that liblouisxml is currently using. You will see that it begins with some lines about copyrights. Each line begins with a number sign (‘#’). This indicates that it is a "comment", intended for the human reader and the computer should ignore it. Then there is a blank line. Finally, there are two other comments explaining that the file must be edited to get proper output. This is because a human being must tell the computer what to do with each element. The semantic files for common types of documents have already been edited, so you generally don't have to worry about this. But if you encounter a new type of document or wish to specify special handling for styles or mathematics you may have to edit the semantic-action file or send it to the maintainer for editing. In any case the rest of this section is essential for understanding how liblouisxml handles documents and for making changes if the way it does so is not correct.
After another blank line you will see a table consisting of two, and sometimes three, columns. The first column contains a word which tells the computer to do something. For example, the first entry in the table is: ‘include nemeth.sem’. This tells liblouisxml to include the information in the nemeth.sem file when it is deciphering an html (actually xhtml) document (it may be preferable to use the semanticFiles setting in the configuration file rather than an include).
The second row of the table is:
no hr
‘hr’ is an element with the angle brackets removed. It means nothing in itself. However, the first column contains the word ‘no’. This tells liblouisxml "no do", that is, do nothing.
After a few more lines with ‘no’ in the first column, we see one that says:
softreturn br
This means that when the element <br>
is encountered,
liblouisxml is to do a soft return, that is, start a new line without
starting a new paragraph.
The next line says:
heading1 h1
This tells liblouisxml that when it encounters the element <h1>
it is to format the text which follows as a first-level braille
heading, that is, the text will be centered and proceeded and followed
by blank lines. (You can change this by changing the definition of the
heading1 style).
The next line says:
italicx em
This tells liblouisxml that when it encounters the element <em>
it is to enclose the text which follows in braille italic indicators.
The ‘x’ at the end of the semantic action name is there to
prevent conflicts with names elsewhere in the software. Just where the
italic indicators will be placed is controlled by the liblouis
translation table in use.
The next line says:
skip style
This tells liblouis to simply skip ahead until it encounters the
element </style>
. Nothing in between will have any effect on
the braille output. Note the slash (‘/’) before the ‘style’.
This means the end of whatever the <style>
element was
referring to. Actually, it was referring to specifications of how
things should be printed. If liblouisxml had not been told to skip
these specifications, the braille output would have contained a lot of
gobledygook.
The next line says:
italicx strong
This tells liblouis to also use the italic braille indicators for the
text between the <strong>
and </strong>
elements.
After a few more lines with ‘no’ in the first column we come to the line:
document html
This tells liblouisxml that everything between <html>
and
</html>
is an entire document. <html>
was the root
element of this document, so this is logical.
After another ‘no’ line we come to:
para p
liblouisxml will consider everything between <p>
and
</p>
to be a normal body text paragraph.
The next line is:
heading1 title
this causes the title of the document to also be treated as a braille level 1 heading.
Next we have the line:
list li
The xhtml <li>
and </li>
pair of elements is used to
enclose an item in a list. liblouisxml will format this with its own
list style. That is, the first line will begin at the left margin and
subsequent lines will be indented two cells.
Next we have:
table table
You will note that the names of actions and elements are often identical. This is because they are both mnemonic. In any case, this line tells liblouisxml to format the table contained in the xhtml document according to the table formatting rules it has been given for braille output.
Next we have the line:
heading2 h2
This means that the text between <h2>
and </h2>
is to be
formatted according to the Liblouisxml style heading2. A blank line
will be left before the heading and the first line will be indented
four spaces.
After a few more lines we come to:
no table,cellpadding
Note the comma in the second column. This divides the column into two
subcolumns. The first is the table element name. The second is called
an "attribute" in xml. It gives further instructions about the
material enclosed between the starting and ending "tags" of the
element (<table>
and </table>
. Full information requires
three subcolumns. The third is called the value and gives the actual
information. The attribute is merely the name of the information.
Much further down we find:
no table,border,0
Here the element is table, the attribute is border and the value is 0. If liblouisxml were to interpret this, it would mean that the table was to have a border of 0 width. It is not told to do so because tables in braille do not have borders.
Now let's look at the file which is included at the beginning of the html.sem file. This is nemeth.sem. As with html.sem, a copy is included in the documentation directory with the name example_nemeth.sem , but it is not necessarily the one that liblouisxml is currently using. It illustrates several more things about how liblouisxml uses semantic-action files.
The first thing you will notice is that for quite a few lines the first and second columns are identical. This is because the MathML element and attribute names are part of a standard, and it was simplest to use the element names for the semantic actions as well.
The first line of real interest is:
math math
Every mathematical expression begins with the element <math>
(which may have attributes and values), and ends with </math>
.
This is therefore the root element of a mathematical expression.
However, mathematical expressions are usually part of a document, so
it is not given the semantic action document. The math semantic action
causes liblouisxml to carry out special interpretation actions. These
will become clearer as we continue to look at the nemeth.sem
file. You will note that this line has three columns. The meaning of
the third column is discussed below.
After another uninteresting line we come to two that illustrate several more facts about semantic-action files:
mfrac mfrac ^?,/,^# mfrac mfrac,linethickness,0 ^(,^;%,^)
Like the math entry above, the first line has three columns. While the
first two columns must always be present, the third column is
optional. Here, it is also divided into subcolumns by commas. The
element <mfrac>
indicates a fraction. A fraction has two parts,
a numerator and a denominator. In xml, we call these parts children of
<mfrac>
. They may be represented in various ways, which need
not concern us here. What is of real importance is that the third
column tells liblouisxml to put the characters ‘~?’ before the
numerator, ‘/’ between the numerator and denominator, and
‘~#’ after the denominator. Later on, liblouis will translate
these characters into the proper representation of a fraction in the
Nemeth Code of Braille Mathematics. (For other mathematical codes,
see Implementing Braille Mathematics Codes).
The second line is of even greater interest. The first column is again ‘mfrac’, but this line is for binomial coefficient. The second column contains three subcolumns, an element name, an attribute name and an attribute value. The attribute linethickness specifies the thickness of the line separating the numerator and denominator. Here it is 0, so there is no line. This is how the binomial coefficient is represented in print. The third column tells how to represent it in braille. liblouisxml will supply ‘~(’, upper number, ‘~%’, lower number, ‘~)’ to liblouis, which will then produce the proper braille representation for the binomial coefficient.
Returning to the line for the math element, we see that the third column begins with a backslash followed by an asterisk. The backslash is an escape character which gives a special meaning to the character which follows it. Here the asterisk means that what follows is to be placed at the very end of the mathematical expression, no matter how complex it is.
For further discussion of how the third column is used see Implementing Braille Mathematics Codes. The third column is not limited to mathematics. It can be used to add characters to anything enclosed by an xml tag.
Here is a complete list of the semantic actions which liblouisxml recognizes. Many of them are also the names of styles. These are listed first, preceded by an asterisk. For a discussion of these, see Customization Configuring liblouisxml.
* arith
* attribution
* biblio
* blanklinebefore
* caption
* code
* contents
* dedication
* directions
* dispmath
* disptext
* document
* exercise1
* exercise2
* exercise3
* glossary
* graph
* graphlabel
* heading1
* heading2
* heading3
* heading4
* indexx
* list
* matrix
* music
* note
* para
* quotation
* section
* spatial
* stanza
* style1
* style2
* style3
* style4
* style5
* subsection
* table
* titlepage
* trnote
* volume
acknowledge
allcaps
author
blankline
bodymatter
boldx
booktitle
boxline
cdata
center
chemistry
contracted
copyright
endnotes
footer
frontmatter
graphic
italicx
jacket
line
linkto
maction
maligngroup
malignmark
math
menclose
merror
mfenced
mfrac
mglyph
mi
mlabeledtr
mmultiscripts
mn
mo
mover
mpadded
mphantom
mprescripts
mroot
mrow
ms
mspace
msqrt
mstyle
msub
msubsup
msup
mtd
mtext
mtr
munder
munderover
newpage
no
noindent
none
preface
rearmatter
rightalign
righthandpage
runninghead
semantics
skip
softreturn
specsym
tblbody
tblcol
tblhead
tblrow
tnpage
transcriber
uncontracted
The Nemeth Code of Braille Mathematical and Science Notation has been implemented. Other braille mathematics codes can be implemented by following the same pattern. The Nemeth Code implementation is discussed as an example below.
Four tables are used to translate xml documents containing a mixture of text and mathematics into the Nemeth code. They can be found in the subdirectory lbx_files of the liblouisxml directory. First, the semantic-action file nemeth.sem is used to interpret the mathematical portions of the xml document (The text portions are interpreted by another semantic-action file which will not be discussed here). After the math and text have been interpreted, two liblouis tables, nemeth.ctb and en-mathtext.ctb are used to translate them. Each piece of mathematics or text is translated separately and the pieces are strung together with blanks between them. This results in inaccuracies where mathematics meets text. The fourth table, also a liblouis table, is used to remove these inaccuracies. It is called edittable.ctb, and it does things like removing the multi-purpose indicator before a blank, inserting the punctuation indicator before a punctuation mark following a math expression, and removing extra spaces.
The general format and use of semantic-action files were discussed in the previous section, (see Connecting with the xml Document - Semantic-Action Files). In this section we shall concentrate on the optional third column, which is used a lot in nemeth.sem. While the first two columns can be generated by liblouisxml but must be edited by a person, the third column must always be provided by a human.
As previously stated, the third column tells liblouisxml what characters to insert to inform liblouis how to translate the math expression. Look at the following line:
mfrac mfrac ^?,/,^#
You will see that the third column contains two commas. This means that it has three subcolumns. A fraction has a numerator and a denominator. These are called children of the mfrac element. The first subcolumn specifies the characters that liblouisxml should place in front of the numerator. The second subcolumn gives the characters to be placed between the numerator and denominator. Finally, the third subcolumn gives the characters to place after the denominator. You will see that the first subcolumn contains a caret followed by a question mark. The dot pattern for the question mark in computer braille is the same as for the Nemeth start-fraction indicator. The caret is used so that liblouis can tell this apart from a question mark, which also has the same dot pattern in computer braille. The second subcolumn contains a slash but no caret. This is because there is no danger of confusion where the slash is concerned. The third subcolumn does contain a caret, and it also contains a number sign, which corresponds to the Nemeth end-fraction indicator. When liblouisxml encounters the MathML representation of the fraction one-half it produces the following string of characters: ‘^?1/2^#’. liblouis then removes the carets to get ‘?1/2#’.
As another example, consider the entry in nemeth.sem for a subscript.
msub msub ,^;,^"
Here the first subcolumn is blank, because nothing is to be placed before the subscripted symbol. The second subcolumn contains a caret and a semicolon (in computer braille). This corresponds to the Nemeth subscript indicator. The third column contains a caret and a quotation mark, corresponding to the Nemeth baseline indicator. liblouisxml translates the MathML expression for x superscript i into ‘x^;i^’. liblouis subsequently produces ‘x;i’. There are other steps if the subscript is numeric. These are handled by pass2 opcodes in the liblouis translation table, nemeth.ctb.
You will notice that the entries in nemeth.sem have various numbers of subcolumns in the third column. In general, the characters given in the first subcolumn are placed before the first child of the element given in the second column. The characters in the second subcolumn are placed before the second child, and so on, until the characters given in the last subcolumn are placed after the last child.
Sometimes an element or tag can have an indeterminate number of
children. This is true of <math>
itself. Yet, it may be
necessary to place some characters after the very last element. Let us
look at the <math>
entry.
math math \eb,\*\ee
First let us discuss escape sequences starting with a backslash. These are basically the same as in liblouis. The sequence ‘\e’ is shorthand for the escape character, which would otherwise be represented by ‘\x001b’. The beginning of a math expression is denoted by an escape character followed by the letter b and the end by an escape character followed by the letter ‘e’. This enables the editing table to do such things as drop the baseline indicator at the end of a math expression and insert a number sign at the beginning, if needed.
Not found in liblouis is the sequence ‘\*’. This means to put what follows after the very last child of the math element, no matter how many there are.
As another example consider:
mtd mtd \*\ec
mtd
is the MathML tag for a table column. There may be many
children of this tag. The entry says to put an escape character (hex
1b), plus the letter ‘c’, after the very last of them.
As a final example consider:
mtr mtr ^.^\,^(,\*^.^\,^)\er
mtr
is the MathML tag for a row in a table, in this case a
matrix. Each row in a matrix must begin with the dot pattern
‘46-6-12356’ and end with the dot pattern ‘46-6-12456’. As
usual a caret is placed before the corresponding characters. Since dot
6 is a comma, it must be escaped. This is done by placing a backslash
before the comma. There are two subcolumns. the first contains the
characters to be placed at the beginning of each row. The second
starts with ‘\*’, signifying that the characters following it
are to be placed at the end of everything in this row. A subcolumn
starting with ‘\*’ must be the last (or only) subcolumn.
Here this last subcolumn ends with an escape character and the letter <r>, signifying the end of a row.
So much for the semantic action file. Even though the characters in the third column were chosen to correspond with nemeth characters, they may not have to be changed for other math codes. liblouis can replace them with anything needed.
This brings us to a consideration of the two tables used by liblouis to translate mathematics texts. The first, en-mathtext.ctb is used to translate text appearing outside math expressions. It is necessary because the Nemeth code requires modifications of Grade 2 braille. Other math codes may not have this requirement.
The table actually used to translate mathematics is nemeth.ctb. It includes two other tables, chardfs.cti and nemethdefs.cti. The first gives ordinary character definitions and is included by all the other tables. Note however, that the unbreakable space, ‘\x00a0’, is translated by dot 9. This is used before and after the equal sign and other symbols in nemeth.ctb. The second table contains character definitions for special math symbols, most of which are Unicode characters greater than ‘\x00ff’. The Greek letters are here. So are symbols like the integral sign.
Most of the entries in nemeth.ctb should be familiar from other tables. The unfamiliar ones follow the comments ‘# Semantic pairs’ and ‘# pass2 corrections’. The first simply replace characters preceded by a caret with the character itself. The second make adjustments in the code generated directly from the nemeth.sem file. The pass2 opcode is discussed in the liblouis guide (see Overview). Here are some comments on a few of the entries in nemeth.ctb.
pass2 @1456-1456 @6-1456
Replaces double start-fraction indicators with the start complex fraction indicator.
pass2 @3456-3456 @6-3456
Replaces double end-fraction indicators with the end-complex-fraction indicator.
pass2 @56[$d1-5]@5 *
Removes the subscript and baseline indicators from numeric subscripts.
pass2 @5-9 @9
Removes the baseline or multipurpose indicator before an unbreakable space generated by the translation of an equal sign, etc.
pass2 @45-3-5 @3
Replaces a superscript apostrophe with a simple prime symbol.
pass2 @9[]$d @3456
Puts a number sign before a digit preceded by a blank.
pass2 @9-0 @9
Removes a space following an unbreakable space.
We now come to the fourth and last table used for math translation, the editing table, edittable.ctb. As explained at the beginning, this table is used to remove inaccuracies where math translation butts up against text translation. For example, the Nemeth code puts numbers in the lower part of the cell. However, punctuation marks are also in the lower part of the cell. So Nemeth puts a punctuation indicator, dots ‘456’, in front of any lower-cell punctuation that immediately follows a mathematical expression. If this occurs inside Mathml it is handled by nemeth.ctb. However, a MathML expression is often followed by a punctuation mark which is the first part of text. liblouisxml puts a blank between math and text, but this can result in a mathematical expression followed by a blank and then, say, a period, dots ‘256’. edittable.ctb replaces the blank with the punctuation indicator.
When you look at edittable.ctb you will see that it begins with an include of chardefs.cti. Most of the entries are ordinary, but some are interesting. for example,
always "\s 0
replaces the baseline or multipurpose indicator followed by a space with just a space.
backFormat
: outputFormatbackLineLength
: outputFormatBeginingPageNumber
: outputFormatbraillePageNumberAt
: outputFormatbraillePages
: outputFormatcellsPerLine
: outputFormatcenter
: stylecompbrailleTable
: translationeditTable
: translationentity
: xmlfileEnd
: outputFormatfirstLineIndent
: styleformat
: styleformatFor
: outputFormathyphenate
: outputFormatinputTextEncoding
: outputFormatinterline
: outputFormatinterlineBackTable
: translationinternetAccess
: xmlinterpoint
: outputFormatleftMargin
: stylelineEnd
: outputFormatlinesAfter
: stylelinesBefore
: styleLinesPerPage
: outputFormatliteraryTextTable
: translationMathexpTable
: translationmathtextTable
: translationnewEntries
: xmlnewPageAfter
: stylenewPageBefore
: styleoutputEncoding
: outputFormatpageEnd
: outputFormatparagraphs
: outputFormatprintPageNumberAt
: outputFormatprintPages
: outputFormatrightHandPage
: stylesemanticFiles
: xmlskipNumberLines
: styletranslate
: styleuncontractedTable
: translationxmlheader
: xmllbx_backTranslateFile
: lbx_backTranslateFilelbx_free
: lbx_freelbx_initialize
: lbx_initializelbx_translateFile
: lbx_translateFilelbx_translateString
: lbx_translateStringlbx_translateTextFile
: lbx_translateTextFilelbx_version
: lbx_version