<copyDocument to = Path selection = boolean : false preserveInclusions = boolean : false saveCharsAsEntityRefs = boolean : false indent = boolean : false encoding = (ISO-8859-1|ISO-8859-13|ISO-8859-15|ISO-8859-2| ISO-8859-3|ISO-8859-4|ISO-8859-5|ISO-8859-7| ISO-8859-9|KOI8-R|MacRoman|US-ASCII|UTF-16|UTF-8| Windows-1250|Windows-1251|Windows-1252|Windows-1253| Windows-1257) : UTF-8 > Content: [ extract ]* [ resources ]* </copyDocument> <extract xpath = Absolute XPath (subset) dataType = anyURI|hexBinary|base64Binary|XML toDir = Path baseName = File basename without an extension extension = file name extension > <processingInstruction target = Name data = string /> | <attribute name = QName value = string /> | any element </extract> <resources match = Regexp pattern copyTo = Path referenceAs = anyURI />
Copy document being edited to the location specified by required attribute to
.
Attribute | Description |
---|---|
to | Specifies the file where the document (or the node selection) is to be copied. |
selection | If this attribute is specified with value If multiple nodes are explicitly selected, their parent element is saved and a special processing-instruction Example, the user has selected paragraphs with content 2, 3 and 4: <div> <?select-child-nodes 3-5?> <p>1</p> <p>2</p> <p>3</p> <p>4</p> </div> In the above example, Otherwise, it is the whole document which is saved to the specified location. |
preserveInclusions | If this attribute is specified with value
Otherwise,
|
saveCharsAsEntityRefs | If this attribute is specified with value Otherwise, the generated XML file contains character references such as |
indent | If this attribute is specified with value Otherwise, the generated XML file is not indented. |
encoding | Specifies the encoding of the generated XML file. |
<extract xpath = Absolute XPath (subset) dataType = anyURI|hexBinary|base64Binary|XML toDir = Path baseName = File basename without an extension extension = File name extension > <processingInstruction target = Name data = string /> | any element </extract>
The extract
element is designed to ease the writing of XSLT style sheets that need to transform XML documents where binary images (TIFF, PNG, etc) or XML images (typically SVG) are embedded.
In order to do this, the extract
element copies the image data found in the element or the attribute specified by attribute xpath
to a file created in the directory specified by attribute toDir
.
The name of the image is automatically generated by extract
. However, attributes baseName
and extension
may be used to parametrize to a certain extent the generation of the image file name.
Now the question is: how does the XSLT style sheet know about the ``extracted'' image files? The extract
element offers three options:
Replace the element containing image data by the one specified as a child element of extract
.
If xpath
selects an attribute instead of an element, the element containing the selected attribute is replaced.
DocBook example: replace embedded svg:svg
(allowed in "-//OASIS//DTD DocBook SVG Module V1.0//EN
") by much simpler imagedata
:
<cfg:extract xmlns="" xpath="//imageobject/svg:svg" toDir="raw"> <imagedata fileref="resources/{$url.rootName}.png" /> </cfg:extract>
OR, replace the element containing image data by the attribute which is specified using the attribute
child element of extract
. This attribute is added to the parent element of the element containing image data.
If xpath
selects an attribute instead of an element, the element containing the selected attribute is replaced.
DocBook 5 example: replace embedded db5:imagedata/svg:svg
by db5:imagedata/@fileref
:
<cfg:extract xmlns="" xmlns:db5="http://docbook.org/ns/docbook" xmlns:svg="http://www.w3.org/2000/svg" xpath="//db5:imagedata/svg:svg" toDir="raw" > <cfg:attribute name="fileref" value="resources/{$url.rootName}.png" /> </cfg:extract>
OR, more general approach, insert a processing instruction (which is specified using the processingInstruction
child element of extract
) at the beginning of the element from which data has been extracted.
If xpath
selects an attribute instead of an element, the processing instruction is inserted in the element containing the selected attribute.
Example: insert <?extracted
in extracted_file_name
?>imgd:image_ab
and imgd:image_eb
:
<extract xpath="//imgd:image_ab/@data | //imgd:image_eb" toDir="raw"> <processingInstruction target="extracted" data="resources/{$url.rootName}.png" /> </extract>
The replacement element (attribute values or text nodes in the element or in any of its descendant) and the inserted processing instruction (target and data) can reference the following variables which are substituted by their values during the extraction step:
Variable | Value |
---|---|
{$file.path} | Pathname of the extracted image file. Example: "/tmp/xxe1234/book_image_3.svg ". |
{$file.parent} | Pathname of the directory containing the extracted image file. Example: "/tmp/xxe1234/ ". |
{$file.name} | Name of the extracted image file. Example: "book_image_3.svg ". |
{$file.rootName} | Name of the extracted image file, but without an extension. Example: "book_image_3 ". |
{$file.extension} | Extension of the extracted image file name. Example: "svg ". |
{$file.separator} | Native path component separator of the platform. Example: |
{$url} | URL of the extracted image file. Example: " NoteUnlike |
{$url.parent} | URL of the directory containing the extracted image file. Example: "file:///tmp/xxe1234 ". Note that this URL does not end with a '/'. |
{$url.name} | Name of the extracted image file. Example: "book_image_3.svg ". |
{$url.rootName} | Name of the extracted image file, but without an extension. Example: "book_image_3 ". |
{$url.extension} | Extension of the extracted image file name. Example: "svg ". |
In fact, any XPath expression (full XPath 1.0, not just the subset used in attribute xpath
), not only variable references, can be put between curly braces (example: {./@id}
). Such XPath expressions are evaluated as strings in the context of the element selected by attribute xpath
. If attribute xpath
selects an attribute, its parent element is used as an evaluation context for the XPath expression.
Attributes:
Selects elements and attributes containing the image data to be extracted.
This XPath expression must conform to the XPath subset needed to implement W3C XML Schemas (but not only relative paths, also absolute paths).
Specifies how the image data is ``stored'' in the elements or the attributes selected by the above XPath expression: anyURI, hexBinary, base64Binary or XML. This cannot be guessed for documents conforming to a DTD and for documents not constrained by a grammar.
Default: find the data type using the grammar of the document being processed.
Specifies the directory where extracted image files are to be created. Relative directories are relative to the temporary directory created during the execution of the process (that is, %W
).
Default: use the temporary directory created during the execution of the process (that is, %W
).
Specifies the start of the extracted image file names. An automatically generated part is always added after this user prefix.
Default: the base name of an extracted image file is automatically generated in its entirety.
Specifies which extension to use for extracted image file names. Specifying "svgz
" for extracted SVG images allows to create compressed SVG files.
Default: the extension is guessed by XXE for a number of common image formats.
<resources match = Regexp pattern copyTo = Path referenceAs = anyURI />
The resources
child element specifies what to do with the resources which are logically part of the document.
The resources which are logically part of the document are specified using another configuration element: documentResources
(see Section 7, “documentResources” in
Note that elements replaced during an extraction step specified by the extract
element are never scanned for resources.
The default resources
child elements are:
<resources match="(https|http|ftp)://.*" /> <resources match=".+" copyTo="." />
Attributes of the resources
child element:
For each resource of the document found using the documentResources
element, its URI is tested to see if it matches the first resources
child element. If it does not match the first resources
child element, the second resources
child element is tried and so on until a matching resources
child element is found.
If the matching resources
element has no copyTo
or referenceAs
attribute, the resource is ignored. For example, rule <resources match="(https|http|ftp)://.*" />
is designed to ignore resources with an absolute URL.
Specifies where to copy the matched resource. This can be a file name or a directory name.
The value of this attribute can contain $1
, $2
, ..., $9
variables, which are substituted with the substrings matching the parenthesized groups of the match
regular expression.
Example:
<resources match=".*/([^/]+)\.jpg" copyTo="resources/$1.jpeg" />
Matches images/logo.jpg
, therefore file logo.jpg
will be copied to resources/logo.jpeg
.
Specifies the reference to the resource in the document created by the copyDocument
configuration element.
Like for copyTo
, the value of this attribute can contain $1
, $2
, ..., $9
variables.
Generally, this attribute is not needed because the reference implied by the value of the copyTo
attribute is sufficient. But this attribute can be useful if images are to be converted from their original format to the format supported by a FO processor.
Example (excerpt of
):XXE_addon_dir
/slides_config/xslMenu.incl
<process> <mkdir dir="resources" /> <mkdir dir="raw" /> <copyDocument to="__doc.xml"> <resources match="(https|http|ftp)://.*" /> <resources match=".+\.(png|jpg|jpeg|gif)" copyTo="resources" /> <resources match="(?:.+/)?(.+)\.(\w+)" copyTo="raw" referenceAs="resources/$1.png" /> <resources match=".+" copyTo="resources" /> </copyDocument> <convertImage from="raw" to="resources" format="png" /> ... </process>