4 Writing Documents4.1 Getting Started with XML
The Extensible Markup Language is not a data description language
in itself. Rather, it defines a syntax that lets you design your
own customized markup languages for arbitrary data models. Take a
look at the following example:
<?xml version="1.0" encoding="UTF-8"?>
<cocktail alcoholic="yes">
<name>Pina Colada</name>
<ingredient>
<name>rum</name>
<amount unit="oz">3</amount>
</ingredient>
<ingredient>
<name>coconut milk</name>
<amount unit="tbsp">3</amount>
</ingredient>
<ingredient>
<name>pineapple</name>
<amount unit="tbsp">3</amount>
</ingredient>
<ingredient>
<name>ice</name>
<amount unit="cup">2</amount>
</ingredient>
</cocktail> |
What you see is a complete XML document describing the ingredients
needed to make the popular Pina Colada cocktail. On the first
line, you see the XML declaration. It specifies the XML
version and the character encoding of the document. In general,
you should use a unicode character encoding such as UTF-8 or
UTF-16, since any standards conforming XML parser is required to
be able to read these.
What follows is the root element cocktail that contains
a name element, telling us the name of the cocktail, followed
by the various ingredients that you need to make a Pina Colada.
The textual elements delimiting the beginning and end of an XML
element are called “tags”. An opening tag has the form
<tag_name> and the corresponding closing tag is written
</tag_name>.
XML elements must always be properly nested. In addition, there must be
only one root element. Thus, you can picture the logical structure of
an XML document as a tree. A single, bare element of this tree is
called a “node”. Its direct descendants are called “children”
and the node from which it originates the “parent”.
Nodes may carry additional attributes. Our cocktail from above,
for instance, has the attribute alcoholic which in this
case is set appropriately to “yes”.
The names of elements and attributes can be chosen arbitrarily to
represent a given data model. In our example, we tried to model
a beverage but you might just as well define a set of tags to describe
the parts of an automotive vehicle.
When working with eCromedos, you will be using XML-based markup to describe
the logical structure of standard text documents.
4.2 Available Document Classes
In version 1.0, eCromedos defines three document classes: report,
book and article. The difference between these
is mainly cosmetic and only visible in printed output. Their definition
is formally layed down in a set of Document Type Definitions
(DTD), which the document processor uses to verify the correct
structure of documents before attempting to transform them.
Take a look at the following listing for an example of a simple
book in eCromedos Markup-Language:
<book lang="english" secsplitdepth="1" secnumdepth="1" tocdepth="1">
<head>
<subject>Subject</subject>
<title>Document Title</title>
<author>Document Author</author>
<date>Jan. 16, 1980</date>
<publisher>Example Publisher</publisher>
</head>
<chapter>
<title>My very First Document</title>
<p>
Hi Everybody!
</p>
</chapter>
</book> |
Documents always have a head, regardless of the
employed document class. In contrast to HTML, the order of the
header elements is not arbitrary. The elements title
and author are mandatory and you may specify multiple authors.
As you can see, our book has a chapter with a single paragraph
of text. A paragraph is the simplest textual element that
may occur inside a section.
A report is essentially the same as a book, except that books
are layed out double-sided with uneven margins and reports are
layed out one-sided with even margins.
Articles differ from books and reports in that the primary
sectioning element is section instead of chapter.
Furthermore, sections in an article are printed directly in sequence,
whereas in books and reports a new chapter will always start
a new page.
4.3 Structuring Documents
In general, you will be using the sectioning elements chapter,
section, subsection and subsubsection
to structure your documents.
Sectioning elements must be given a title and they must
be nested hierarchically correct, i.e. you cannot have a chapter
in a section and you cannot have a subsection in a chapter without
first opening a section.
4.3.1 Minisections
Minisections are set with the minisection tag. They
may appear anywhere in the section hierarchy below the primary
sectioning element for the particular document class. The title
of a minisection will not be numbered and will not receive an
entry in the table of contents.
4.3.2 Prefaces
In books and reports you may use the preface element
to set an abitrary number of prefaces right after the document
head. The title of a preface will not be numbered and will not
appear in the table of contents (TOC) when generating printed
output. However, it will receive an entry in the TOC when generating
HTML.
A preface may contain paragraphs of text, as well as block elements,
such as figures and tables. It must not contain any deeper sections.
If you feel, you need to section your preface, you should probably
make it a chapter.
4.3.3 Appendices
An appendix is essentially the same as a chapter. Only
the numbering will be different in that the first part of the
section counter will be a latin letter instead of an arabic
number. Appendices may occur only in document classes book and
report. They are to be placed right after the last primary
section of a document.
4.3.4 Glossaries
A glossary can be placed after the last regular section,
which is, depending on the document class, either the last chapter,
the last section or the last appendix. A glossary is basically an
extra section that must contain nothing but a definition list
(see section 5.1).
At this time, eCromedos does not provide functionality for creating
and sorting glossaries automatically. This is due to the complexity
of implementing this for arbitrary languages. An interface to
xindy, the flexible index generator (see [3])
is planned for the future.
4.3.5 Bibliographies
Bibliographies are entered with the biblio tag and
individual entries with bibitem. A bibliography may
occur only after the glossary, if theres is one, or after the
last section in the document, otherwise. Currently, eCromedos
does not support bibliographies after individual sections.
Here is an example:
<biblio number="yes">
<bibitem label="KOCH06">
Tobias Koch. eCromedos User Manual.
<tt>http://www.ecromedos.org</tt>,
2006.
</bibitem>
<bibitem label="WALSH03">
Norman Walsh, Leonard Muellner.
DocBook: The Definitive Guide.
O‘Reilly, 2003.
</bibitem>
</biblio> |
The number attribute is there to control, whether the
individual items should be sequentially numbered or if the
user-supplied labels should be used.
In the main part of your document, you can use the cite
tag to cite an entry from the bibliography. For example, in allusion
to the listing above, you could write <cite label="KOCH06"/>,
which the document processor would replace with “[1]” when
numbering is turned on and “[KOCH06]” when numbering is off.
4.4 Formatting Text
From your word processor you may be used to being able to emphasize
text by setting it in bold or italic letters or by underlining it.
With eCromedos you can achieve this, by enclosing the span of text
to be formatted inside the tags b for bold print, i
for italic letters or u for underlining. You may also combine
these arbitrarily.
Sometimes, you may want to set certain terms or expressions, such
as internet addresses, in a font with fixed character width. To
this end, there is the tt tag, which prints text in
typewriter letters.
Examples of Formatting Text |
<u>Underlined text</u>
|
Underlined text
|
<i>Italicized text</i>
|
Italicized text
|
<b>Bold letters</b>
|
Bold letters
|
<b><i>Bold face and italics</i></b>
|
Bold face and italics
|
<tt>Typewriter letters</tt>
|
Typewriter letters
|
|
For the sake of completeness, there are also six elements for
modifying the font size. In a serious document you should hardly
have any reason to use these, though.
Examples of Modifying the Font Size |
<xx-small>Text in XXS</xx-small>
|
Text in XXS
|
<x-small>Text in XS</x-small>
|
Text in XS
|
<small>Small letters</small>
|
Small letters
|
<medium>Regular size</medium>
|
Regular size
|
<large>Large letters</large>
|
Large letters
|
<x-large>Text in XL</x-large>
|
Text in XL
|
<xx-large>Text in XXL</xx-large>
|
Text in XXL
|
|
4.5 Hyphenation
In printed output text is set justified over the entire width of
the text body. In order to avoid large gaps between words on single
lines, LATEX applies language specifc patterns to automatically
hyphenate and break words on the right margin border.
Unfortunately, LATEX's hyphenation mechanism is not always able
to split words correctly and in rare cases cannot hyphenate certain
words, at all.
You can provide hints telling LATEX in which places a given word
may be split, by marking the corresponding spots with the y
tag. For example, to tell LATEX that it may hyphenate “bibliography”
only in between “biblio” and “graphy” you would write
biblio<y/>graphy in your markup.
4.6 Line and Page Breaks
In general, you should not worry about where a line breaks or where
to start a new page, because it is the job of the formatting engine
(i.e. LATEX or your web browser) to take care of this.
In rare cases, however, you may have to intervene manually. You
can use <br/> to break the current line and
<pagebreak/> to start a new page. You should not
use multiple brs or multiple pagebreaks in a row.
Of course, a pagebreak is only visible in printed output.
When you need to prevent linebreaks in certain places, you
can either use the non-breaking space ( ) or protect
the specific strip of text with the nobr tag. For example,
a title or degree should not be separated from the name that follows
it. Consequently, you should write Dr. Pepper
or <nobr>Dr. Pepper</nobr> to
prevent the formatting engine from possibly breaking the line right
before Pepper.
4.7 Cross-References
Sometimes you will want to refer to the contents of a different
section in your manuscript, i.e. you may write something like “[...]
you will find out more about this on page XYZ”. However,
at the time of writing your markup, you cannot tell on which page
the section you are referring to will actually be printed. The
solution is to label the location you wish to reference and let
eCromedos do the math.
To label a certain spot in your text, you use the label tag.
This tag has a single, mandatory attribute, that is the name
of the label. This must be a unique identifier among all labels
in your document. Take a look at the following example:
<chapter>
<title>The Show about Nothing</title>
<p>
Seinfeld<label name="seinfeld"> is the best
sitcom of all times.
</p>
</chapter> |
You can now use the elements ref to obtain the section
number and pageref to get the page number like this:
<chapter>
<title>About Myself</title>
<p>
I really enjoy watching Seinfeld. You can read more
about Seinfeld in section <ref name="seinfeld"/> on
page <pageref name="seinfeld"/>.
</p>
</chapter> |
ref and pageref can also point to any other
object with a label, such as a figure or a numbered equation. In
that case ref will resolve to the corresponding object
counter instead of the section counter.
4.8 Marginals and Footnotes
Marginal notes can be placed with the marginal tag. And
yes, they also work in HTML output. Try this example:
<p>
In this episode<marginal>The Summer of George</marginal>,
George finally looses his job at the Yankee Stadium but
gets an extra three months' pay-off.
</p> |
LATEX does not allow marginals in table cells. For HTML output
this limitation does not exist.
Footnotes are placed in the same fashion by use of the footnote
tag. They do work inside tables without restrictions.
4.9 Quoting
Unless you are setting your text in typewriter letters, you will
not be able to enter the correct quotation marks for your language
directly with your keyboard. You could use XML character entities
to access the glyphs, but that is tedious. Instead you should use
the q and qq tags for single and double quoting,
respectively.
4.10 Predefined Entities
eCromedos predefines a number of entity names that may come in
handy in certain situations. The following table lists all
available names and how they are resolved:
Entity
|
Description
|
&tex;
|
Resolves to “TEX”
|
&latex;
|
Resolves to “LATEX”
|
|
The non-breaking space
|
&zwsp;
|
The zero-width space
|
&endash;
|
The en-dash (–);
|
&emdash;
|
The em-dash (—);
|
&dots;
|
Resolves to “...”
|
|
Note The zero-width space is particularly useful
for making long path names or Internet addresses break across lines
without introducing hyphens or spaces.
In order to use these, you have to include a document type declaration
at the top of your document, appropriate to the document class you
are using. This might look like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE report SYSTEM "http://www.ecromedos.net/dtd/1.0/report.dtd"> |
You can also insert these entities indirectly via the entity
element, in which case you don't need the document type declaration.
|