A Persistent URL or PURL is a permanent identifier for a web resource. It provides a permanent address for a web resource. PURLs provide a way of letting a service manage the resolution of URLs. As well PURLs provide metadata for resources.

Other persistent identifier schemes include Digital Object Identifiers (DOIs), Life Sciences Identifiers (LSIDs) and INFO URIs. While some other schemes such as DOIs do support curation, DOIs are seen as too commercial. LSIDs are functionally similar to PURLs since they may be mapped to a URL scheme and an administration service. INFO URIs provide neither real-time resolution, nor real-time administration.

PURLs are vulnerable to changes in Domain Name System (DNS) registrations and dependencies on the host computer. As well, a failure to resolve a PURL can lead to an ambiguous state.

A PURL solves the problem of changing URIs in a location-based URI scheme such as HTTP by providing the permanent identification of a web resource. Unlike a regular URL that simply provides an address to a web resource, a PURL redirects the browser to another Web resource.

PURLs ensure that clients can rely on the same Web address to get a web resource, even if the location of that resource changes. They allow the decentralized management and real-time administration of persistent identifiers (i.e. curation). As well, PURL services can solve the Web’s back link problem for content important enough to warrant its use.

PURLs have been used to address persistent identifier needs in the library and Linked Data communities for the past fifteen years. Many Linked Data vocabularies are hosted at purl.org including the Dublin Core element set and FOAF.

A public PURL service has been operated by the Online Computer Library Center since 1995.

Creating a PURL

To create a PURL, select PURL using the create menu from the folder that you wish to store the PURL in. You will be presented with a form.

When you press the Create button the PURL will appear in a Callimachus folder, just like any other Callimachus resource.

Here is a description of the fields used to specify a PURL:

Field Description
Local name The name of your PURL. This name will be appended to the URL of the folder you are in to form the PURL's address.
Comment An optional field provided for your own use.
GET status See PURL GET status.
GET content location The target location pattern for the GET response content from the PURL.
GET cache control How long the results of a PURL should be cached by a proxy or a client.
POST request target The response content location pattern for POST requests
PUT request target The response content location pattern for PUT requests
PATCH request target The response content location pattern for PATCH requests
DELETE request target The response content location pattern for DELETE requests

URL Target Patterns

The content location patterns are URI templates, optionally prefixed by a Regular Expression (on the same line), optionally prefixed by a comma separated list of request methods.

200, 404, and 410 GET location patterns and POST/PUT/PATCH/DELETE target patterns, may also include request headers (on their own line) and optionally a request body (separated by a blank line). These are used by Callimachus to create the out going HTTP request.

[[Method ]RegEx ]URI-Template[
Request-Header]*[

Request-Body]

The regular expression is applied against the entire (absolute) request-uri (include query string) starting at the end of this resource's URI. If the regular expression does not match or the request method does not match one of provided (if any are provided) the URI-template is ignored and error condition is returned to the client. If no regular expression is provided, the rule will only match requests that have the same path as this resource.

The URI-Template (and outgoing request headers and body, if applicable) uses the query parameters and regex group names and numbers as template variables. If multiple values exist for the variable, the values will be comma separated, unless the explode ('*') modifier is used, in which case the separator depends on the type of expansion.

Expansion Form Description Example Expansion
Literal The literal character is copied directly to the result /target-uri /target-uri
Unknown variable Variables that are undefined are ignored by the expansion process O{undef}X OX
Simple Percent encoded values, comma separated {hello} Hello%20World%21
Reserved Value substitution, comma separated {+path}/here /foo/bar/here
Fragment If any value, append crosshatch and values, comma separated {#x,hello,y} #1024,Hello%20World!,768
Label with Dot-Prefix Value substitution, each prefixed by the dot operator www{.dom*} www.example.com
Path Segment Value substitution, each prefixed by the slash operator {/list*} /red/green/blue
Path-Style Parameter Name-Value pairs prefixed by semicolon, separated by "=" {;list*} ;list=red;list=green;list=blue
Form-Style Query Percent encoded name-Value pairs prefixed by "?" or "&", separated by "=" {?x,y} ?x=1024&y=768
Form-Style Continuation Percent encoded name-Value pairs prefixed by "&", separated by "=" {&list*} &list=red&list=green&list=blue

If the target URL has no query string component, the incoming request query string is appended to the target URL.

For example, a call can be made to an existing PURL with parameters like this:

/existing-purl?first=Joe&last=Bloggs

These variables that are passed in, can then be passed on to the content location pattern of the PURL using bracket notation.

existing-pipeline.xpl?results{&first,last} -> /existing-pipeline.xpl?results&first=Joe&last=Bloggs

The variable names inside the curly brackets should match the variable names used in the query string (or regex named group or number) when the PURL is called. This notation will pass along a URL-encoded version of the parameter's value. If you want to pass along an unencoded version use the reserved expansion, indicated with a plus sign before the variable name, as shown in the table above and example below.

/{+last}/{+first}.txt -> /Bloggs/Joe.txt

PURL GET status

PURLs are categorized by the HTTP response code they result in. Callimachus implements the following types of GET responses:

Response code Label Description
200 Copy A PURL resource of this type is a cached copy of its target location. It is most often used in Callimachus to refer to XProc pipelines for the collection, transformation and rendering of remote content.
301 Canonical The resource has been assigned a new permanent URI and any future references to this resource SHOULD use the given location URI. The PURL redirects a client to the target location.
302 Alternate A representation of the resource currently resides at the given location. The PURL redirects a client to the target location.
303 Described by The target location provides information about the requested resource, but the requested resource may not be an "information resource" (that is, it may be a real-world object). This type of PURL is most often used when resolving RDF and Linked Data resources.
307 Resides The resource location has been temporarily changed. Redirection to the target location should be considered temporary and may change.
308 Moved The resource URL has been permanently moved to the target location.
404 Missing The resource is not available. It is "Missing". This situation may or may not be temporary. The content of the target location is used as the response content. The content might not be shown to a user if it is less than 512 bytes in length.
410 Gone The resource is no longer available and no forwarding address is known. This condition is expected to be considered permanent. The content of the target location is used as the response content. The content might not be shown to a user if it is less than 512 bytes in length.
451 Illegal Access to the resource is denied for legal reasons.

Note: The 200 PURL is also known as an "Active PURL" because the PURL takes an active role in creating its response. Other types of PURLs are passive in that they simply redirect to existing content and/or return an appropriate HTTP response code.

PURL example

In this example, we will name our PURL "blogrollup" to match the example XProc pipeline given in the XProc RSS feed example. This example uses a PURL of type 200 in order to resolve against a dynamically generated RSS feed.

Field Value
Local name blogrollup
GET status Copy (200)
GET content location blogrollup.xpl?results
Cache Control max-age=300

The GET content location field is set to the PURL's target location. PURLs of type 404 and 410 also require a GET location for their response content. In this example, we will provide a relative URL to the XProc results,  and append ?results to the XProc URL to get the results of the pipeline instead of the pipeline itself.

The Cache Control field determines how long the results of a PURL should be cached by a proxy or a client. In this example, we have reduced the max-age parameter from 3600 seconds (1 hour) to 300 seconds (5 minutes).

How it works

The XProc pipeline "blogrollup(.xpl)" serves as the target for the PURL "blogrollup". When the PURL "blogrollup" is resolved, the target location blogrollup.xpl?results is invoked, which creates the RSS feed and returns it as the result of the PURL.

Benefit of using this PURL

You could just call the XProc pipeline directly, however the PURL provides a significant benefit: The PURL address can stay persistent, even if the location of the target URL changes. Additionally, either the PURL or the pipeline can take parameters that impacts the output of the pipeline. Equally important is the ability for 200 type PURLs to cache their results.

Support for partial PURLs

Callimachus does not implement "Partial PURLs" as of the 1.4 release. Partial PURLs "allow PURLs to be created which refer to a directory level portion of a URL; any path information appended to a partial redirect PURL may in turn be appended to its target URL. That allows a single PURL to redirect to a hierarchy on a target Web server." They are useful in that they allow PURL targets to refer to database content. Callimachus may implement support for partial PURLs in a later release.