[Galax]

XQuery Optimization             
 Overview
 Pre-processing
 Algebraic Optimization
 Specific Operations
 Physical Storage

Specific Operations
 Document Projection

You have reached Galax's XQuery optimization page!

The purpose of this page is to document our progress in developing state of the art XML query optimization in Galax. We are currently redesigning Galax's architecture to support query optimization, and we are working on a number of specific optimization features.

What you will find here:

  • Some documentation for the Galax's internal architecture.
  • Some research papers and technical reports on query optimizations
  • Some demos for specific optimization features.
  • Benchmarking reports.
  • In some occasions, we will also distribute development versions of Galax that include some of our latest optimization code.

  Latest News [HOT!]


Overview and Architecture

One of the challenges in building an efficient query processor is the need to work at many different levels in system in a cohesive fashion. The following figure presents an idealized version of the Galax query processing architecture.

[Global Architecture]

From a query processing point of view, the evaluation in Galax can be separated in the following phases:

  • Pre-processing: Before applying optimization techniques, a number of actual transformation on the query must be performed. This phase includes parsing, query normalization, static typing, and a number of syntactic simplification operated on the query AST.
  • Algebraic optimization: Standard query optimization can then take place. That step would require typically the use of an XML algebra.
  • Optimization for specific XQuery operations: Some specific certain classes of XQuery operations which are expensive or used very often during query processing require the development of specific algorithms.
  • Physical representation, storage and indices

Query pre-processing

[Architecture -- Query Rewriting]

Before applying optimization techniques, a number of actual transformation on the query must be performed. This phase includes parsing, query normalization, static typing, and a number of syntactic simplification operated on the query abstract syntax tree (AST). Some of those query transformations have already a direct impact on performances. For instance, removing implicit casting of nodes to values or removing sorting by document order. In addition, many other optimization techniques based on XML algebras cannot be performed before a serious clean-up of the XQuery AST.

More detailed about that phase and on its impact can be found in the following papers.


Algebraic Query Optimization

[Architecture -- Specific Operations]

After the query has been normalized and simplified, the next step is to compile it into an algebra. Currently Galax is using a variant of the XQuery core with support for tuples as such an algebra. We are interested in using a more complete algebra, but what is the right algebra for XML Query processing is still largely an open issue. One of the possible candidates though is the Timber algebra developped by our friends at university of Michigan.


Specific XQuery Operations

[Architecture -- Specific Operations]

We are currently developing specific algorithms for certain classes of operations which are expensive or used very often during XQuery processing. Note that efficient support for those operations typically assumes specific knowledge about the physical representation.

XML Projection

One of the bottleneck of query processing for main-memory XQuery implementations is due to the size of tree representations for XML document (e.g., DOM or the XQuery Data Model). XML projection is physical operation that can be used to remove uncessary node in the XML data model based on the paths used in a given query. Document projection takes an XML stream as input and loads a projected document according to a set of input path expressions. Those paths are inferred from the query using a static analysis algorithm described in details in the following papers.

If you want to try out document projection for yourself, you can download the following archive, which contains the complete source code for Galax with document projection:


Physical representation, storage and indices

[Architecture -- Physical Layer]

XML is a very versatile markup language, suitable for many kinds of applications in many kinds of environment. Galax has the ambition to be a very versatile XQuery implementation that will work on a variety of physical XML representations. We are especially interested in experimenting with Galax in the following environments.

  • XML Files. Many applications just deal with ordinary XML files. Files are easy to write and maintaing using many ordinary XML tools and work well for light-weight applications.
  • XML Streams. In the context of distributed applications, such as for information integration or Web services, XML documents are accessed as streams from the network.
  • Native XML repository. Data-intensive applications dealing with large amounts of XML data often require the use of a storage manager. The use of an XML storage manager can provide important benefit to your application, including scalability, performances, concurrency control, crash recovery, and high-availability.

Currently, Galax only supports access to local XML files, but we are working on hooking up an http/SOAP client inside Galax that will allow Galax to receive XML streams from the network. We are also considering the development of a storage manager that will allow Galax users to process efficiently large amounts of XML data. Please come back to check on our progress!