Proposal: SRU over HTTP POST

1st February 2006

Problem

The present version of the SRU protocol, as described at www.loc.gov/sru, is bound to the HTTP GET operation, and is described in terms of a URI that includes query parameters.

This use of HTTP GET subjects SRU to several limitations, e.g:

(Both of these are real problems that real implementations have encountered, not merely theoretical.)

Solution

These limitations can easily be overcome by allowing SRU requests to be expressed using HTTP POST as well as GET. This removes all length restrictions on the parameter set, allows characters other than those from the seven-bit US-ASCII repertoire to be included in the parameters, and allows the character encoding in use to be specified (as part of the Content-Type header).

The mapping between the GET and POST methods is trivial, and is defined by the HTTP protocol itself. For example, the SRU-over-GET request:

   GET /voyager?query=dinosaur&startRecord=1&maximumRecords=1&recordSchema=dc&version=1.1&operation=searchRetrieve HTTP/1.1
   Host: z3950.loc.gov:7090

Would be expressed as follows in SRU-over-POST:

   POST /voyager HTTP/1.1
   Host: z3950.loc.gov:7090
   Content-Type: application/x-www-form-urlencoded; charset=iso-8859-1
   Content-Length: 98

   query=dinosaur&startRecord=1&maximumRecords=1&recordSchema=dc&version=1.1&operation=searchRetrieve

The semantics of SRU would remain identical whether requested via GET or POST.

Issues

Proliferation of protocols

A universal IR-standards client already needs to support Z39.50, SRW and SRU. Adding the further requirement of SRU/POST is regrettable. However, consensus seems to be that this is not a great additional burden given how very similar HTTP's GET and POST methods are.

Servers, too, would have the option of adding support for SRU/POST. It would not be recommended to support this instead of SRU/GET, since the latter is in wide use and is likely to continue to be favoured by simple clients.

Protocol recognition

Currently SRW and SRU messages go to the same base URL. Some toolkits assume that anything received via POST is SRW while GET messages are assumed to be SRU. This naive dichotomy would no longer apply if SRU/POST were supported. Instead, it would be necessary to consult the Content-Type header, which is text/xml for SRW and application/x-www-form-urlencoded for SRU/POST.

For toolkits too stupid to do this, another approach is to use different base URLs for the different methods. In this case, the two services' Explain records would refer to each other using the <links> element in <databaseInfo>.

Conclusion

Since real queries (not just hypothetical ones) are impossible with SRU/GET, the people affected are likely to go ahead and implement SRU/POST, whether it is ``allowed'' or not. This being so, the editorial board should probably explicitly permit and specify it.

Feedback to <mike@indexdata.com> is welcome!