ZING
CQL
"Common Query Language"
Issues
Philosophy
All CQL queries must have an unambiguous mapping to classic
Z39.50 Type-1 queries.
Syntax
Boolean::= 'AND', 'OR', 'NOT', Adjacency
Adjacency::= 'W/'Digit
IndexQualifier: [IndexSet.] IndexID
Relationship::= Equality | '>' | '<' | ‘>=’ | ‘<=’ | ‘@fuzzy@’ |
‘@stem@’ | ‘@relevance@’
Equality::= ':' | ‘=’
QualifiedTerm::= [IndexQualifier Relationship] Term | QualifiedTerm Boolean
QualifiedTerm | '(' QualifiedTerm Boolean QualifiedTerm ')'
Term::= NonBlankCharacter* | '"'Character*'"'
Examples
- Author:"levan
ralph" does a adjacency word list search against the author index
- Bib1.AuthorPhrase="levan,
ralph" does a string match against the author index
Clarifications
IndexQualifiers map to combinations of Use and Structure
attributes. IndexSets do not
necessarily map to AttributeSets, but the IndexID’s within the IndexSets do get
explicitly mapped to a combination of a Use and a Structure attribute from AttributeSets.
I have made the ‘:’ and ‘=’ characters equivalent.
The Structure attributes implicitly supported by CQL are
String and AdjacencyWordList.
Type-1 Mappings
All terms are assumed to have a Truncation attribute of 104
(Z39.58 Masking). This supports
the use of ‘?’ and ‘#’ as masking characters. (See http://lcweb.loc.gov/z3950/agency/defns/bib1.html#55.)
All terms are assumed to have a Completeness attribute of 1
(incomplete subfield).
All terms are assumed to have a Position attribute of 3 (any
position in field).
IndexQualifiers map to combinations of Use and Structure
attributes.
While the Structure attributes implicitly supported by CQL
are String and AdjacencyWordList a smart mapping to Type-1 will probably
convert them to the more ambiguous Phrase and WordList Structure attributes.
Issues
- Human readable?
- Human enterable? CQL is a
potential area of disagreement between SRW and SRU. SRW may assume that
client software can manipulate a human entered query into CQL. SRU has no
such advantage; users must type in a CQL query. SO we need to make CQL
easy to enter.
- Internationally friendly? eg:
by using numbers in preference to symbolic identifiers that mean something
in
english.
- Support for multiple
attribute sets in one query?
- Distributed searching?
(Ability to issue a single query against multiple collections without
change.) Different servers provide different access points to their data.
In Z39.50 that means different attribute combinations. So in SRW, does
this means different indexes? Or can we profile SRW as we do Z39.50?
- Direct mapping to Z39.50
constructs?
- How to define scope names
(are they attribute sets or just a logical grouping for names? Eg: dublin
core
attributes are defined in the Bib-1 attribute set at present). Use the
current exact Z39.50 attribute sets etc for mapping onto CQL field names?
Or creators of index sets describe each index in terms of Z39.50
attributes.
- How to manage the population
of field-set names? Should there be a central CQL registry of such names?
If it can change per server, then reusing a query against multiple servers
will be difficult. Should sites be able to define their own new, local
sets without going to the global registry? Instead of 'dc.Title', should it
be a URL? That is, dublin core XML namespace URI + DC element name? Or
should queries be CQL text plus a set of definitions for mapping
"dc" to "Dublin Core URI" etc. A suggestion is to
provide a URI in Explain that points the user/application to the Index to
Attribute Set mapping.
- The pattern match characters
don't seem to follow any existing standards. (Rather, it mixes several
existing standards). Stick to CCL (# and ?) and drop '*'? Want to map to
Z39.50 easily; Z39.50 has got a CCL regex attribute already.
Suggested Queries
- dc.Title = "Power and
Fame"
- dc.Title = ("Power"
AND "Fame")
- dc.Contributor =
"LOC" AND dc.Subject = "Standards"
- dc.Contributor =
"LOC" AND agls.Identifier = "xyzzy"
- bib1.Author, dc.Contributor =
"Smith"
Other Suggestions
- Queries are Unicode text
(UTF-8, etc).
- All text to be searched
always inside quotes. This allows new reserved words to be added later
without breaking old queries.
- Reserved words upper case?
(But not for "EHE".)
- Fields to be searched
identified by a two-part identifier where the first part identifies the
scope for the second part.
Eg: dc.title.