The query language processor is activated in the GUI simple search entry when the search mode selector is set to Query Language. It can also be used with the KIO slave or the command line search. It broadly has the same capabilities as the complex search interface in the GUI. Additionally, the query language is for now the only way to access the important Recoll field search capabilities.
The language is roughly based on the Xesam user search language specification.
If the results of a query language search puzzle you and you doubt what has been actually searched for, you can use the GUI show query link at the top of the result list to check the exact query which was finally executed by Xapian.
Here follows a sample request that we are going to explain:
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
This would search for all documents with John Doe appearing as a phrase in the author field (exactly what this is would depend on the document type, ie: the From: header, for an email message), and containing either beatles or lennon and either live or unplugged but not potatoes (in any part of the document).
An element is composed of an optional field specification, and a value, separated by a colon. Exemple: Beatles, author:balzac, dc:title:grandet
The colon, if present, means "contains". Xesam defines other relations, which are not supported for now.
All elements in the search entry are normally combined with an implicit AND. It is possible to specify that elements be OR'ed instead, as in Beatles OR Lennon. The OR must be entered literally (capitals), and it has priority over the AND associations: word1 word2 OR word3 means word1 AND (word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit parenthesis, they are not supported for now.
An element preceded by a - specifies a term that should not appear. Pure negative queries are forbidden.
As usual, words inside quotes define a phrase (the order of words is significant), so that title:"prejudice pride" is not the same as title:prejudice title:pride, and is unlikely to find a result.
Most Xesam phrase modifiers are unsupported, except for l (small ell) to disable stemming, and p to turn a phrase into a NEAR (unordered proximity) search. Exemple: "prejudice pride"p
Recoll currently manages the following default fields:
title, subject or caption are synonyms which specify data to be searched for in the document title or subject.
author or from for searching the documents originators.
recipient or to for searching the documents recipients.
keyword for searching the document-specified keywords (few documents actually have any).
filename for the document's file name.
ext specifies the file name extension (Ex: ext:html)
The field syntax also supports a few field-like, but special, criteria:
dir for filtering the results on file location (Ex: dir:/home/me/somedir). Please note that this is quite inefficient, that it may produce very slow searches, and that it may be worth in some cases to set up separate databases instead.
date for searching or filtering on dates. The syntax for the argument is based on the ISO8601 standard for dates and time intervals. Only dates are supported, no times. The general syntax is 2 elements separated by a / character. Each element can be a date or a period of time. Periods are specified as PnYnMnD. The n numbers are the respective numbers of years, months or days, any of which may be missing. Dates are specified as YYYY-MM-DD. The days and months parts may be missing. If the / is present but an element is missing, the missing element is interpreted as the lowest or highest date in the index. Exemples:
2001-03-01/2002-05-01 the basic syntax for an interval of dates.
2001-03-01/P1Y2M the same specified with a period.
2001/ from the beginning of 2001 to the latest date in the index.
2001 the whole year of 2001
P2D/ means 2 days ago up to now if there are no documents with dates in the future.
/2003 all documents from 2003 or older.
Periods can also be specified with small letters (ie: p2y).
mime or format for specifying the mime type. This one is quite special because you can specify several values which will be OR'ed (the normal default for the language is AND). Ex: mime:text/plain mime:text/html. Specifying an explicit boolean operator or negation (-) before a mime specification is not supported and will produce strange results. Note that mime is the ONLY field with an OR default. You do need to use OR with ext terms for example.
type or rclcat for specifying the category (as in text/media/presentation/etc.). The classification of mime types in categories is defined in the Recoll configuration (mimeconf), and can be modified or extended. The default category names are those which permit filtering results in the main GUI screen. Categories are OR'ed like mime types above.
Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wild-card on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.
The document filters used while indexing have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.
All words entered in Recoll search fields will be processed for wildcard expansion before the request is finally executed.
The wildcard characters are:
* which matches 0 or more characters.
? which matches a single character.
[] which allow defining sets of characters to be matched (ex: [abc] matches a single character which may be 'a' or 'b' or 'c', [0-9] matches any number.
You should be aware of a few things before using wildcards.
Using a wildcard character at the beginning of a word can make for a slow search because Recoll will have to scan the whole index term list to find the matches.
Using a * at the end of a word can produce more matches than you would think, and strange search results. You can use the term explorer tool to check what completions exist for a given term. You can also see exactly what search was performed by clicking on the link at the top of the result list. In general, for natural language terms, stem expansion will produce better results than an ending * (stem expansion is turned off when any wildcard character appears in the term).