CQL - Profiling New Relations and Modifiers

15th September 2003

Eliot Christian, U.S. Geological Survey
Adam Dickmeiss, Index Data
Sebastian Hammer, Index Data
Rob Sanderson, Liverpool University
Mike Taylor, Index Data

1. Background
        1.1. The Problem
        1.2. First rejected solution
        1.3. Second rejected solution
2. Proposal
3. Examples
4. Status

1. Background

1.1. The Problem

The GEO Z39.50 profile (http://www.blueangeltech.com/standards/GeoProfile/geo22.htm) is unusually complex and powerful, dealing as it does with many concepts not normally encountered by more typical, library-oriented, applications of Z39.50. Its attribute set (http://www.blueangeltech.com/standards/GeoProfile/annex_a.htm) includes support for complex non-bibliogaphic queries.

Some months ago, the GEO community expressed interest in using SRW as its web-services protocol. The only significant barrier to its adoption was a perceived lack of expressive power in SRW's query language, CQL. In particular, CQL as currently defined (www.loc.gov/zing/cql/cql-syntax.html) does not provide a way to express GEO relations such as ``Overlaps'', ``Enclosed'' and ``Before or During'', nor term structures such as ``Coordinate string'' and ``Date string''.

1.2. First rejected solution

The GEO community's first proposal to remedy this deficiency was to make the Type-1 query a part of SRW. The idea was to introduce some way to encode a Z39.50 Type-1 query so that it could be transmitted as part of an SRW search request - either by base64-encoding the query's BER code, or by translating the tree into equivalent XML. Under this arrangement, the GEO community could continue to use its existing attribute set with the new protocol.

There was feeling among the CQL developers that, while this would solve GEO's immediate problems, it would constitute a wasted opportunity to broaden the scope and applicability of CQL. Since CQL is intended to be a fully general-purpose and expressive query language, its developers felt that it should be extended to cater for the GEO requirements, rendering the use of the Type-1 query unnecessary.

1.3. Second rejected solution

The CQL developers accordingly developed an informal proposal to add a set of new relations and relation modifiers to CQL, supporting specific GEO requirements. This approach involved several new keywords (within, overlaps, ISOdate, etc.)

It was hard to detect much enthusiasm from this approach from the GEO community. After some discussion, it became apparent that part of the problem was the inflexibility of this approach: it relies on the CQL developers corrects anticipating and interpreting all of the GEO community's needs.

A further weakness with this approach is that the same process would need to be gone through every time another new community needed CQL extensions in order to express its queries. In contrast, Z39.50's notion of an attribute set allows independent communities to unilaterally invent new kinds of relation and term-structure as required, without reference to a central governing body (the ZIG, in this case). This flexibility is seen as highly desirable.

2. Proposal

The new proposal is not GEO-specific, but instead allows communities more flexibility in defining their own CQL semantics. It turns out to be surprisingly simple both to state and implement:

Under these rules, all previously valid CQL queries are still valid and have the same interpretation as they previously had.

Note: CQL does not have an explicit concept of a term-structure specifier, but relation modifiers fulfil this role neatly - just as, in programming languages like Perl, related but differing operators like = and eq indicate the type of their operands for the purposes of comparison. For, for example, the relation-and-modifier =/x.ISOdate might be defined to mean compare for equality, treating the term as a date in ISO format.

3. Examples

Assuming that the xyz context-set is in force:

dc.date = 2000-01-08				OK (has old meaning)

dc.date any 2000-01-08				OK, though strange (has old meaning)

dc.date foobar 2000-01-08			error ("foobar" is not in the CQL context-set)

dc.date xyz.foobar 2000-01-08			OK

dc.date xyz.foobar/ISOdate 2000-01-08		error ("ISOdate" is not in the CQL context-set)

dc.date xyz.foobar/std.ISOdate 2000-01-08	OK

dc.date and 2000-01-08				"and" is interpeted as a boolean, not a relation

dc.date xyz.and 2000-01-08			OK ("xyz.and" is interpeted as a relation)

dc.date paragraph 2000-01-08			syntax error: "paragraph" is a keyword

dc.date xyz.paragraph 2000-01-08		OK

foo or/xyz.exclusive bar			either foo or bar but not both
  

Assuming nothing about the prevailing context-set:

>geo="http://www.blueangeltech.com/standards/GeoProfile/cql/"
    >dc="http://www.loc.gov/z3950/agency/zing/cql/dc-indexes.html"
    dc.subject all "dinosaurs tracks" and geo.location geo.within texas
  

An automatically generated example from Eliot Christian's SRU Geospatial Search Demo page at www.gils.net/sru-geo.html:

>geoIndex="http://www.blueangeltech.com/Standards/GeoProfile/annex_a.htm#Use%20Attributes" >geoRelation="http://www.blueangeltech.com/Standards/GeoProfile/annex_a.htm#Relation%20Attributes" >geoStructure="http://www.blueangeltech.com/Standards/GeoProfile/annex_a.htm#Structure%20Attributes" (geoIndex.title geoRelation.match/geoStructure.phrase "sickle claw") and (geoIndex.timeperiod =/geo.date "19680312,19980318") or (geoIndex.northbc < "78") and (geoIndex.coordinates geoRelation.overlap/geoStructure.coordinate "-106.7,25.8,-93.5,36.5")

4. Status

This proposal is accepted. Having been approved by all five authors, it was be submitted to the SRW working group for consideration. It was approved for inclusion in CQL 1.1 (part of SRW 1.1) at the meeting of 25th-26th September 2003.

We know of three CQL parsing implementations (all of them free software):

As proof of concept, support for the extension described in this document has been added to all three of these implementations. It is available for download in release 0.7 of CQL-Java and release 2.0.4 of YAZ.

Rob's sample GEO-profiled CQL-to-Z39.50 gateway is at http://srw.o-r-g.org:8080/metar/docs.html

Feedback to <mike@indexdata.com> is welcome!

Valid HTML 4.0! Valid CSS!