Definition

A Persistent URL or PURL is a permanent identifier for a web resource. It provides a permanent address for a web resource. PURLs provide a way of letting a service manage the resolution of URLs. As well PURLs provide metadata for resources.

Other persistent identifier schemes include Digital Object Identifiers (DOIs), Life Sciences Identifiers (LSIDs) and INFO URIs. While some other schemes such as DOIs do support curation, DOIs are seen as too commercial. LSIDs are functionally similar to PURLs since they may be mapped to a URL scheme and an administration service. INFO URIs provide neither real-time resolution, nor real-time administration.

PURLs are vulnerable to changes in Domain Name System (DNS) registrations and dependencies on the host computer. As well, a failure to resolve a PURL can lead to an ambiguous state.

Purpose

A PURL solves the problem of changing URIs in a location-based URI scheme such as HTTP by providing the permanent identification of a web resource. Unlike a regular URL that simply provides an address to a web resource, a PURL redirects the browser to another Web resource.

PURLs ensure that clients can rely on the same Web address to get a web resource, even if the location of that resource changes. They allow the decentralized management and real-time administration of persistent identifiers (i.e. curation). As well, PURL services can solve the Web’s back link problem for content important enough to warrant its use.

PURLs have been used to address persistent identifier needs in the library and Linked Data communities for the past fifteen years. Many Linked Data vocabularies are hosted at purl.org including the Dublin Coreelement set and FOAF.

A public PURL service has been operated by the Online Computer Library Center since 1995.

Creating a PURL

To create a PURL, select PURL using the create menu from the folder that you wish to store the PURL in. You will be presented with the following dialogue:

When you press the Create button the PURL will appear in a Callimachus folder, just like any other Callimachus resource.

PURL fields

Here is a description of the fields used to specify a PURL:

Description
Local name The name of your PURL. This name will be appended to the URL of the folder you are in to form the PURL's address.
Comment An optional field provided for your own use.
Type See Types of PURLs.
Content location The target location for the response content from the PURL. All types of PURLs require a target location.
Cache control How long the results of a PURL should be cached by a proxy or a client.

Parameters in PURLs

Parameters can be passed to and from PURLs on the query string. For example, a call can be made to an existing PURL with parameters like this:

POST existing-purl?first=Joe&last=Bloggs

These variables that are passed in can then be passed on to the content target location of the PURL using bracket notation.

existing-pipeline.xpl?results&first={first}&last={last}

The variable names inside the curly brackets should match the variable names used in the query string when the PURL is called. This notation will pass along a URL-encoded version of the parameter's value. If you want to pass along an unencoded version insert a plus sign before the variable name.

/{+last}/{+first}.txt

Types of PURLs

PURLs are categorized by the HTTP response code they result in. Callimachus implements the following types of PURLs:

Response code Label Description
200 Copy A PURL resource of this type is a cached copy of its target location. It is most often used in Callimachus to refer to XProc pipelines for the collection, transformation and rendering of remote content.
301 Canonical The resource has been assigned a new permanent URI and any future references to this resource SHOULD use the given location URI. The PURL redirects a client to the target location.
302 Alternate A representation of the resource resides temporarily at the given location. The PURL redirects a client to the target location.
303 Described by The target location provides information about the requested resource, but the requested resource may not be an "information resource" (that is, it may be a real-world object). This type of PURL is most often used when resolving RDF and Linked Data resources.
307 Resides The resource location has been temporarily changed. Redirection to the target location should be considered temporary and may change.
308 Moved The resource URL has been permanently moved to the target location.
404 Missing The resource is not available. It is "Missing". This situation may or may not be temporary. The content of the target location is used as the response content. The content might be shown to a user if it is more than 512 bytes in length.
410 Gone The resource is no longer available and no forwarding address is known. This condition is expected to be considered permanent. The content of the target location is used as the response content. The content might be shown to a user if it is more than 512 bytes in length.

Note: The 200 PURL is also known as an "Active PURL" because the PURL takes an active role in creating its response. Other types of PURLs are passive in that they simply redirect to existing content and/or return an appropriate HTTP response code.

PURL example

In this example, we will name our PURL "blogrollup" to match the example XProc pipeline given in the XProc RSS feed example. This example uses a PURL of type 200 in order to resolve against a dynamically generated RSS feed.

The "Content location" field is set to the PURL's target location. PURLs of type 404 and 410 also require a target location for their response content. In this example, we will provide a relative URL to the XProc results,  and append ?results to the XProc URL to get the results of the pipeline instead of the pipeline itself.

The Cache Control field determines how long the results of a PURL should be cached by a proxy or a client. In this example, we have reduced the max-age parameter from 3600 seconds (1 hour) to 300 seconds (5 minutes).

The screenshot below shows the blogrollup PURL next to two XProc pipelines: blogrollup and hello-world.

How it works

The XProc pipeline "blogrollup(.xpl)" serves as the target for the PURL "blogrollup". When the PURL "blogrollup" is resolved, the target location blogrollup.xpl?results is invoked, which creates the RSS feed and returns it as the result of the PURL.

Benefit of using this PURL

You could just call the XProc pipeline directly, however the PURL provides a significant benefit: The PURL address can stay persistent, even if the location of the target URL changes. Additionally, either the PURL or the pipeline can take parameters that impacts the output of the pipeline. Equally important is the ability for 200 type PURLs to cache their results.

Support for partial PURLs

Callimachus does not implement "Partial PURLs" as of the 1.2 release. Partial PURLs "allow PURLs to be created which refer to a directory level portion of a URL; any path information appended to a partial redirect PURL may in turn be appended to its target URL. That allows a single PURL to redirect to a hierarchy on a target Web server." They are useful in that they allow PURL targets to refer to database content. Callimachus may implement support for partial PURLs in a later release.