A. Overview

Callimachus 0.18 introduced an XProc implementation. XProc is an XML pipeline language and can be used to script the transformation of content such as SPARQL query results or XML data gathered from the Web. XProc is a W3C Recommendation.This page provides examples of XProc pipelines to perform common tasks.

The XProc Web site provides links to other materials, including a tutorial and examples. The XProc specification provides the complete details of the XProc syntax and should be consulted before using XProc for complex tasks. The purpose of this page is to ease your transition into XProc usage by providing a short tutorial.

Callimachus uses XProc as an extension mechanism. As of Callimachus 0.18, one can create PURLs that resolve to XProc pipelines. XProc replaces the Action with no side effects pattern that used an executable Turtle file to orchestrate the transformation of SPARQL query results to other formats via XSLT.

An XProc pipeline works similarly to other pipeline implementations: A series of "steps" are defined and the order of processing is step-by-step. Consider a Unix pipeline as an analogy:

$ cat README.txt | grep Callimachus > callimachus-refs.txt

That Unix pipeline consists of two steps (the 'cat' command and the 'grep' command. The first step echoes the contents of the file README.txt, which is passed to the 'grep' command. The 'grep' command outputs only those lines containing the word 'Callimachus'. Finally, the results are put onto the process' STDOUT and thus to a file. An XProc pipeline works in a similar manner. A number of steps are defined, the output of each becomes the input to the next. The equivalent of STDIN in XProc is called source and has zero or more XML documents. The equivalent of STDOUT in XProc is called result and has zero or more XML documents.


B. "Hello, World" in XProc 

XProc pipelines may be created in Callimachus just like any other type of resource. Navigate to any Callimachus folder where you have write access and select Pipeline from the menu.

You will be redirected to an editor with the default pipeline definition pre-populated in it:

To create your pipeline, simply type (or copy) your XProc definition into the editor. A simple XProc pipeline that outputs "Hello, World" looks like this:

<?xml version="1.0" encoding="UTF-8" ?> 
  
<p:pipeline version="1.0" 
      xmlns:p="http://www.w3.org/ns/xproc" 
      xmlns:c="http://www.w3.org/ns/xproc-step" 
      xmlns:l="http://xproc.org/library"> 
  
<p:serialization port="result" media-type="text/plain" method="text" /> 
  
<p:identity> 
  <p:input port="source"> 
    <p:inline> 
      <c:data content-type="text/plain">Hello, World</c:data> 
    </p:inline> 
  </p:input> 
</p:identity> 

</p:pipeline>

The output of this pipeline may be retrieved by resolving its URL with the suffix "?results" appended.

If you saved the above pipeline at the URL http://example.com/test/hello-world.xpl then you can get the pipeline itself at that URL, a human-readable HTML page containing the pipeline at http://example.com/test/hello-world.xpl?view and the results of the pipeline at http://example.com/test/hello-world.xpl?results, like this:

$ curl http://example.com/test/hello-world.xpl 
<?xml version="1.0" encoding="UTF-8" ?> 
<p:pipeline version="1.0" 
      xmlns:p="http://www.w3.org/ns/xproc" 
      xmlns:c="http://www.w3.org/ns/xproc-step" 
      xmlns:l="http://xproc.org/library"> 

<p:serialization port="result" media-type="text/plain" method="text" /> 

<p:identity> 
  <p:input port="source"> 
    <p:inline> 
      <c:data content-type="text/plain">Hello, World</c:data> 
    </p:inline> 
  </p:input> 
</p:identity> 

$ curl http://example.com/test/hello-world.xpl?results 

                     Hello, World

You might be thinking, "Wow! That's typical XML! Very verbose to do so little." We hope to convince you that XProx is an excellent way for us to extend Callimachus into new areas, such as gathering, transforming and rendering data from anywhere on the Web.

In the "Hello, World" example, the serialization tag sets the MIME type of the result. If this tag is not present, XProc defaults to a MIME type of "application/xml" and a method of "xml".

The p:identity tag is used to echo any defined input to the output. In this case, we specified a literal string and asked that it be put to the output.

We could do something similar by getting a resource from the Web and putting it to the output. It would be more interesting to get two resources from the Web and combine them. The following section gets two Atom feeds from blogs, combines them and creates a new Atom feed of the results.


C. Create an RSS feed from two named Atom feeds

This example uses XSLT to convert each input feed from Atom format to RSS format, gets their entries and places the entries into an RSS template. XProc p:document tags are used to read Atom feeds from blog sites and then they are converted using xslt to RSS using p:xslt steps. Lastly, an p:insert tag is used to insert the items from the produced channels into the empty rss channel.

<?xml version="1.0" encoding="UTF-8" ?> 
<p:pipeline version="1.0" 
      xmlns:p="http://www.w3.org/ns/xproc" 
      xmlns:c="http://www.w3.org/ns/xproc-step" 
      xmlns:l="http://xproc.org/library"> 

<p:serialization port="result" media-type="application/rss+xml" method="xml" /> 

<p:xslt name="prototypo-rss"> 
  <p:input port="source"> 
    <p:document href="http://prototypo.blogspot.com/feeds/posts/default" /> 
  </p:input> 
  <p:input port="stylesheet"> 
    <p:document href="http://atom.geekhood.net/atom2rss.xsl" /> 
  </p:input> 
</p:xslt> 

<p:xslt name="jamesrdf-rss"> 
  <p:input port="source"> 
    <p:document href="http://jamesrdf.blogspot.com/feeds/posts/default" /> 
  </p:input> 
  <p:input port="stylesheet"> 
    <p:document href="http://atom.geekhood.net/atom2rss.xsl" /> 
  </p:input> 
</p:xslt> 

<p:insert match="/rss/channel" position="last-child"> 
  <p:input port="source"> 
    <p:inline> 
      <rss version="2.0"> 
        <channel> 
          <title>RSS Title</title> 
          <description>This is an example of an RSS feed</description> 
          <link>http://www.someexamplerssdomain.com/main.html</link> 
          <lastBuildDate>Mon, 06 Sep 2010 00:01:00 +0000 </lastBuildDate> 
          <pubDate>Mon, 06 Sep 2009 16:45:00 +0000 </pubDate> 
          <ttl>1800</ttl> 
        </channel> 
      </rss> 
    </p:inline> 
  </p:input> 
  <p:input port="insertion" select="/rss/channel/item"> 
    <p:pipe step="prototypo-rss" port="result" /> 
    <p:pipe step="jamesrdf-rss" port="result" /> 
  </p:input> 
</p:insert>

</p:pipeline>

D. Rendering RDF from Pipeline into HTML using Callimachus Templates

<?xml version="1.0" encoding="UTF-8" ?> 
<p:pipeline version="1.0" 
      xmlns:p="http://www.w3.org/ns/xproc" 
      xmlns:c="http://www.w3.org/ns/xproc-step" 
      xmlns:l="http://xproc.org/library" 
      xmlns:calli="http://callimachusproject.org/rdf/2009/framework#"> 

<p:serialization port="result" media-type="text/html" method="html" doctype-system="about:legacy-compat" /> 

<p:import href="/callimachus/library.xpl" /> 

<calli:render-html> 
  <p:input port="source"> 
    <p:document href="http://localhost:8080/moon?describe" /> 
    <p:document href="http://localhost:8080/sun?describe" /> 
  </p:input> 
  <p:input port="query"> 
    <p:inline> 
      <c:data content-type="application/sparql-query">
        <![CDATA[
          SELECT ?this { BIND (<$target> AS ?this) }
        ]]>
      </c:data> 
    </p:inline> 
  </p:input> 
  <p:with-param name="target" select="'moon'" /> 
  <p:input port="template"> 
    <p:document href="/callimachus/templates/concept-view.xhtml" /> 
  </p:input> 
</calli:render-html> 

</p:pipeline>

E. Rendering SPARQL Endpoint into HTML using Callimachus Templates


<?xml version="1.0" encoding="UTF-8" ?> 
<p:pipeline version="1.0" 
      xmlns:p="http://www.w3.org/ns/xproc" 
      xmlns:c="http://www.w3.org/ns/xproc-step" 
      xmlns:l="http://xproc.org/library" 
      xmlns:calli="http://callimachusproject.org/rdf/2009/framework#"> 
 
<p:serialization port="result" media-type="text/html" method="html" doctype-system="about:legacy-compat"/> 
 
<p:option name="id" required="true"/>
 
<p:import href="/callimachus/library.xpl"/> 
 
<calli:render-html endpoint="/data/endpoint">
  <p:input port="query"> 
    <p:inline> 
      <c:data content-type="application/sparql-query">
        <![CDATA[
          SELECT ?this { BIND (<$target> AS ?this) }
        ]]>
      </c:data> 
    </p:inline> 
  </p:input> 
  <p:with-param name="target" select="$id"/>
  <p:input port="template">
    <p:document href="endpoint-view.xhtml"/> 
  </p:input> 
</calli:render-html> 
 
</p:pipeline>

F. Displaying Named Query Results on OpenStreetMap via Pipeline

This example will demonstrate a few core pieces of functionality strung together to create an interesting and useful application. It will walk through the following steps:

  1. Creating a named query
  2. Using that named query as the input source of an XML Processing (XProc) pipeline
  3. Assigning a Perisistent URL (PURL) to the results of that XProc pipeline [Optional]
  4. Generating an OpenStreetMap display from a call to that PURL

Specifically, this example will generate a map of all the Nuclear Maps in the United States, plot those points on a map, and allow the user to click on any of those facilities and learn more about a facility.

1. Create a named query

This query must return data in a way that is compliant with use by OpenStreetMap. Here is what the query looks like in Callimachus:

As you can see above, the query specificly selects the URL (?link), name (?title), description (?description, which is always the same), and latitute (?lat) and longitude (?long) of the facility. This query generates a SPARQL Protocol and RDF Query Language (SPARQL) XML result set which must now be transformed into Geographically Encoded Objects for Rich Site Summary (GeoRSS) via an XProc pipeline.

2. Use that named query as the input source of an XProc pipeline

To generate GeoRSS from SPARQL XML results, use an XProc pipeline with the appropriate EXtensible Stylesheet Language (XSL) stylesheet.

<?xml version="1.0" encoding="UTF-8" ?> 
<p:pipeline version="1.0" 
      xmlns:p="http://www.w3.org/ns/xproc" 
      xmlns:c="http://www.w3.org/ns/xproc-step" 
      xmlns:l="http://xproc.org/library"> 

<p:serialization port="result" media-type="application/rss+xml" method="xml" /> 

<p:load href="nukemap.rq?results" /> 

<p:xslt name="nukemap"> 
  <p:input port="stylesheet"> 
    <p:document href="sparql-georss.xsl" /> 
  </p:input> 
</p:xslt>

</p:pipeline>

There a small number of tags here each with their own distinct purpose. The serialization tag (<p:serialization>) uses the port attribute (port="result") to declare that the results of the pipeline will be a form of RSS+XML (media-type="application/rss+xml"). The load tag (<p:load>) executes the named query created in step 1 and returns the results to the pipeline.

XSL Transformation is a language for transforming one form of an XML document into another form of XML. In this case we'll be going from SPARQL XML to GeoRSS.  The XSLT tag (<p:xslt>) declares the start of the XSLT and names it "nukemap". Inside that transformation, the input tag (<p:input>) uses the port attribute (port="stylesheet") to specify the stylesheet that will be applied to the results of the named query. The document tag (<p:document>) specifies the URL of the stylesheet to be used (href="sparql-georss.xsl") which in this case is located in the same directory as the pipeline.

3. Assign a PURL to the results of that XProc pipeline [Optional]

To enable the ability to set caching control, as well as move or edit the XProc pipeline without having to change the link, it would be wise to assign a PURL to the results of the pipeline. However, if you are simplying putting together a prototype, or do not anticipate relocating your resource, a PURL is entirely optional. The map can call the URL of the XProc pipeline without the intermediary PURL. However, if you were to create a PURL it would look like this:

The PURL is given a name (nukemap) and a type of 200 which means that the results of this PURL will be copied from their location. Content location specifies where the PURL should look for content, which in this case we want to be the results of our XProc pipeline (nukemap.xpl?results). Lastly, we can set cache control on the PURL. This is a way of improving efficiency by not having to constantly fetch new results if a number of requests are made in quick succession. In this case we are specifying that if multiple requests are made over the course of 3,600 seconds, rather than following the PURL and getting new results each time, simply grab results from cache. 

4. Generate OpenStreetMap display from a PURL

The last step of this example is to tie it all together. At this point there is now a single PURL (nukemap) that can be accessed which will return the results of the named query in GeoRSS format. Now that PURL needs to be worked into the OpenStreetMap code as the input for generating the map itself. If you are unfamiliar with OpenStreetMaps (OSM) at a high level, it would be beneficial to read some of their basic documentation and examples before diving into this code.

<?xml-stylesheet type="text/xsl" href="/callimachus/template.xsl"?> 
<html xmlns="http://www.w3.org/1999/xhtml"> 
<head> 
  <title>U.S. Nuclear Plants</title> 
  <script src="//www.openlayers.org/api/OpenLayers.js"></script> 
  <script> 
  // <![CDATA[ 
    jQuery(function() { 
      var map = new OpenLayers.Map("mapdiv"); 
      map.addLayer(new OpenLayers.Layer.OSM()); 
      
      map.addLayer(new OpenLayers.Layer.GeoRSS("My Points", "nukemap", { 
        tileOptions: { crossOriginKeyword: null } 
      })); 

      map.setCenter(new OpenLayers.LonLat("-98.0", "38.0").transform( 
          new OpenLayers.Projection("EPSG:4326"), // transform from WGS 1984 
          map.getProjectionObject() // to Spherical Mercator Projection 
        ), 4 // Zoom level 
      ); 
    }); 
  // ]]> 
  </script> 

  <style> 
    .leaflet-map-pane img, /* OpenStreetMap */ 
    .olMapViewport img { /* general OpenLayers maps */ 
      max-width: none; 
     } 
  </style> 

</head> 
<body> 
  <h1>U.S. Nuclear Plants</h1> 
  <table width="100%"> 
    <tbody> 
      <tr> 
        <td style="width:100%">
          <div id="mapdiv" style="height:800px"></div>
        </td> 
      </tr> 
    </tbody> 
  </table> 
</body> 
</html>

There is really one important line that makes this code different from a standard implementation of OSM and that line is:

map.addLayer(new OpenLayers.Layer.GeoRSS("My Points", "nukemap", { tileOptions: { crossOriginKeyword: null } }));

which is where we call the nukemap PURL as the source of our GeoRSS. With the infrastructure in place it is a simple adjustment to standard code that allows for a simple collection of data files to become an interesting visualization of nuclear plants in the United States.

NB: Twitter Bootstrap (which Callimachus' default theme is based on) has a default setting that intereferes with OpenStreetMap. Luckily this can be easily overcome with a bit of CSS at the top of the page. The code is below.

<style> 
  .leaflet-map-pane img, /* OpenStreetMap */ 
  .olMapViewport img { /* general OpenLayers maps */ 
    max-width: none; 
  } 
</style>

G. Passing Parameters into a Pipeline for Use in a Named Query

Passing parameters into an XProc Pipeline is structurally very similar to passing them into Named Queries. The URI is still query.rq?results&variableName=variable.  However, once the paramater has been successfully passed into the pipeline the pipeline must interpret it correctly depending on where it will be used. It starts the same way, with the <p:option> tag. The name attribute assigns the paramter a name while the required attribute states whether or not the parameter is necessary for the pipeline to execute.

From there, if the intent is to use the parameter in a named query, the query string must be built up inside the <p:load> tag in order for it to be passed correctly to the named query. In this example the <p:with-option> tag two attributes: the name attribute which defines it as a link (href) and the select attribute which defines where to look for that link. Inisde the select attribute is where the query string must be constructed using the concat function. The string is simply the file name for the query (nuclear-chemical-amount.rq), the suffix that returns results in SPARQL/XML (?results) and the URI-encoded variable name and string (&amp;substance='encode-for-uri($substance)). $substance here refers to the option that was defined in the previous step.

After this step, the SPARQL/XML that is returned can be passed to any other necessary steps just as it could be in a pipeline that does not use parameters.

<?xml version="1.0" encoding="UTF-8" ?>
<p:pipeline version="1.0" 
      xmlns:p="http://www.w3.org/ns/xproc" 
      xmlns:c="http://www.w3.org/ns/xproc-step" 
      xmlns:l="http://xproc.org/library"> 

<p:serialization port="result" media-type="application/json" method="text" /> 

<p:option name="substance" required="true" /> 

<p:load> 
    <p:with-option 
        name="href" 
        select="concat( 
            'nuclear-chemical-amount.rq?results&amp;substance=', 
            encode-for-uri($substance) 
        )" 
    /> 
</p:load> 

<p:xslt name="piechart">
    <p:input port="stylesheet">
        <p:document href="../coordinate-points-d3-json-sparql.xsl" /> 
    </p:input> 
    <p:with-param name="x-coordinate-variable" select="'name'" /> 
    <p:with-param name="y-coordinate-variable" select="'total'" /> 
</p:xslt> 

</p:pipeline>

H. Aggregating RDF through a pipeline

If you are trying to expose some raw RDF and a describe query is not the right solution for you, for example because you want to intelligently aggregate data from multiple sources, then the following pattern is useful. It also demonstrates how to generate a CONSTRUCT sparql query and call it from within XProc.

The intent of the example below is to fetch the RDF for a specific facility. It then aggregates any referenced data from the same data source (filtering out what is not required). Finally, it aggregates data from another data source.

Note the with-option line: <p:with-option name="replace" select="concat('replace(., &quot;%ID%&quot;, &quot;', $id, '&quot;)')" />. This is hard to read but it takes the pipeline parameter $id and contructs a replace function call to modify the SPARQL query. For example, if the id was 1234, the generated function would be

replace(., '%ID%', '1234')

Having built the SPAQRL query it is then executed by a POST to the relevant endpoint and the results are returned as XML.

<?xml version="1.0" encoding="UTF-8"?>
<p:pipeline version="1.0"
    xmlns:p="http://www.w3.org/ns/xproc"
    xmlns:c="http://www.w3.org/ns/xproc-step"
    xmlns:l="http://xproc.org/library"
    xmlns:frs="http://opendata.epa.gov/frs/schema#"
    xmlns:rcra="http://opendata.epa.gov/rcra/schema/"
>

<p:serialization port="result" media-type="application/rdf+xml" method="xml"/> 

<p:option name="id" required="true" />

<p:string-replace match="/c:request/c:body/text()">
    <p:with-option name="replace" select="concat('replace(., &quot;%ID%&quot;, &quot;', $id, '&quot;)')" />
    <p:input port="source">
        <p:inline>
            <c:request method="post" href="/usepa/data/frs">
                <c:body content-type="application/sparql-query">
                <![CDATA[
                PREFIX places: <http://purl.org/ontology/places#>
                PREFIX rcra: <http://opendata.epa.gov/rcra/schema/>
                CONSTRUCT { ?s ?p ?o }
                WHERE {{
                    BIND(<http://opendata.epa.gov/facilities/%ID%> AS ?s) ?s ?p ?o
                } UNION {
                    BIND(<http://opendata.epa.gov/facilities/%ID%> AS ?facility) ?facility ?fp ?s .
                    ?s ?p ?o .
                    FILTER (?p NOT IN(places:contains))
                } UNION {
                    SERVICE </usepa/data/rcra> {
                        SELECT DISTINCT * {{
                            BIND(<http://opendata.epa.gov/facilities/%ID%> AS ?facility)
                            ?s owl:sameAs $facility .
                            ?s ?p ?o.
                        } UNION {
                            BIND(<http://opendata.epa.gov/facilities/%ID%> AS ?facility)
                            ?handler owl:sameAs $facility .
                            ?handler ?handlerPred1 ?s .
                            FILTER (?handlerPred1 NOT IN(rcra:hasActivity))
                            ?s ?p ?o .
                        } UNION {
                            BIND(<http://opendata.epa.gov/facilities/%ID%> AS ?facility)
                            ?handler owl:sameAs $facility .
                            ?handler ?handlerPred2 ?s .
                            FILTER (?handlerPred2 IN(rcra:hasActivity))
                            ?s ?p ?o .
                            FILTER (?p IN(rcra:receivedReportOn, rcra:reportType, rcra:reportedInCycle))
                        }}
                    }
                }}
                ]]>
              </c:body>
            </c:request>
        </p:inline>
    </p:input>
</p:string-replace>

<p:http-request/>

</p:pipeline>