Syntax and Rules || Tutorial
|| Sample Queries || XCQL
General rules:
- The query consists of search clauses linked by boolean operators (and,
or, not, prox).
- A search clause always includes a term, and may also include both
an index name and relation (it must include either both or neither).Examples
: (1) ' "cats and bats" '(2) ' title exact "cats and
bats" '.
- An index name always includes a base name and may also include a prefix
(the name of an index set). Examples: (1) title, (2) dc.title
- "srw" is a well-known prefix, denoting an index set of utility
indexes.
- "resultSetName" is a well-known index in the srw index set.
A search clause may be a result set name. This is a special case, where
the index and relation are expressed as "srw.resultSetName ="
and the term is the result set name. This special case conforms to but
is not explicitly distinguished in the bnf defined below.
- serverChoice is a well-known index in the srw index set, and is the
default when the index and relation is omitted from a search clause.
srw.serverChoice means that the server will choose the most appropriate
index for the given term.
- Terms may be enclosed in double quotes (see rule 12 subrule 4, for
escaping a quote within a term). Terms are not however required to be
enclosed in double quotes, but terms should be so enclosed if they contain
any of the following characters: slash(/), equal(=), less(<), greater(>),
double-quote( ") or left- or right-parenthesis.
- Relations. See the list of relations in the bnf below.
- "=" is used:
- When the index is srw.resultSetName. e.g. 'srw.resultSetName
= "resultSetA" '
- For word adjacency, when the term is a list of words.
- For numeric equality.
- "scr" is used to mean "server choice relation".
It is used when the client wishes the server to choose the most
appropriate relation for the specified (or chosen) index. It is
assumed when relation is omitted (in which case the index would
be omitted too, since either both are supplied or both omitted,
and in that case srw.serverChoice is the assumed index).
- "exact" is used for string matching, when the term is
a character string.
- "all" and "any" may be used when the term
is a list of words, to indicate "all words" or "any
words".
- An order-or-equal relation may be used for numeric comparisons,
for example 'temperature <= "100" '.
- Qualifiers. "stem", "relevant", "fuzzy",
and "phonetic" are currently defined. Additional qualifiers
may be defined in the future.
- Boolean operators are evaluated left-to-right, however parenthesis
may enclose a boolean triple, i.e. Operand BOOLEAN Operand, to overide
left-to-right evaluation.
- Proximity is expressed in terms of:
- relation ("<", ">" ,"<="
,">=" ,"=" , "<>"; default
"<="),
- distance (integer; default: 1 for word, zero otherwise),
- unit ("word", "sentence", "paragraph",
or "element"; default "word"), and
- ordering ("ordered" or "unordered"; default
"unordered")
The proximity operator is of the form: "prox/<relation>/<distance>/<unit>/<ordering>"
but any of these specifiers may be omitted, defaulting as specified
above, and any trailing part of the operator consisting entirely of
slashes (because the defaults are used) may be omitted. Examples:
prox
prox/=
prox//3
prox /<//sentence
prox/>/3/word
prox/<=///unordered
- The following masking rules and special characters apply for search
terms:
- A single asterisk (*) is used to mask zero or more characters.
- A single question mark (?) is used to mask a single character.
(Thus N consecutive question-marks means mask N characters.)
- Carat/hat (^) is used as an anchor character for terms that are
wordlists, that is, where the relation is 'all' or 'any', or '='
when used for word adjacency. It may not be used to anchor a string,
that is, when relation is 'exact' (string matches are, by default,
anchored). It may occur at the beginning or end of a word (with
no intervening space) to mean right or left anchored."^"
has no special meaning when it occurs within a word (not at the
beginning or end) or string but must be escaped nevertheless (see
4). Examples:
- title any "cat ^dog rat" means: find title with
cat (anywhere), or with rat (anywhere), or with dog (but only
at the beginning). Would find: "cat eats dog", "cat
eats hat", "hat eats cat", "dog eats hat",
but not "hat eats dog".
- title any "^cat ^dog" would find "cat eats
dog", "cat eats rat", "dog eats rat",
but not "rat eats dog".
- title any "^dog ^cat" AND title="eats house"
would find "dog eats house" as well as "cat eats
house".
- title all "^cat ^dog" would result in no hits.
- title all "^cat dog^" would find "cat eats
dog", but not "dog eats cat".
- title = "^cat dog^" would find "cat dog"
only.
- Backslash (\) is used to escape '*', '?', quote (") and '^'
, as well as itself. Backslash not followed immediately by one of
these characters is an error.
- Examples:
\? is a literal ?
\\ is a literal \
\\* is literal backslash followed by active *.
\\\* is literal \*
\^ is literal ^, whether at the beginning, within, or at the end
of a word.
BNF for CQL Syntax
Note: two-consecutive double-quotes (with no intervening
space) are used in the bnf to indicate a single literal double quote.
This is used to indicate that term and identifier (when used for the result
set name) are to be enclosed in double-quotes.
cql-query::= cql-query boolean search-clause | search-clause
boolean ::= "and"|"or"|"not"|prox
search-clause::= "(" cql-query ")" | [index-name relation]
term
index-name ::= [ index-prefix "."]index-base-name
relation ::= base-relation{"/"qualifier}
base-relation ::= order-relation| "=" |"exact"|"all"
|"any"|"scr"
qualifier ::= "relevant" | "fuzzy"| "stem"
| "phonetic"
order-relation::= "<"|">"|"<="|">="|"<>"
prox::= "prox" [ "/" prox-qualifiers ]
prox-qualifiers ::=[ prox-relation ] "/" [distance] "/"
[ unit ] "/" ordering
| [ prox-relation ] "/" [ distance ] "/" unit
| [ prox-relation ] "/" distance
| prox-relation
unit ::="word"|"sentence"|"paragraph"|"element"
prox-relation ::= order-relation|"="
distance::= non-negative-integer
ordering::="ordered"|"unordered"
index-prefix ::= identifier
index-base-name ::= identifier
identifer ::= string
term::= string|""string""
string ::= a character string
|