May 02, 2015

Comprehensive RDF Query API

Original post by Chimezie Ogbuji (edited for contemporary use by Graham Higgins <gjh@bel-epa.com>)

RDFLib’s support for SPARQL has come full circle and I wasn’t planning on blogging on the developments until they had settled some – and they have. In particular, the last piece was finalizing a set of APIs for querying and result processing that fit well within the framework of RDFLib’s various Graph API’s. The other issue was for the query APIs to accomodate eventual support for other querying languages that are capable of picking up the slack where SPARQL is wanting (transitive closures, for instance – try composing a concise SPARQL query for calculating the transitive closure of a given node along the rdfs:subClassOf property and you’ll immediately see what I mean).

Querying

Every Graph instance has a query method through which RDF queries can be dispatched:

def query(self,
            strOrQuery, initBindings={}, initNs={},
            DEBUG=False, processor="sparql")
    """
    Executes a SPARQL query (eventually will support Versa queries with
    same method) against this Conjunctive Graph

    :Params:

    :strOrQuery: - Is either a string consisting of the SPARQL query
        or an instance of rdflib.sparql.bison.Query.Query

    :initBindings: - A mapping from variable name to an RDFLib term
        (used for initial bindings for SPARQL query)

    :initNS: - A mapping from a namespace prefix to an instance of
        rdflib.Namespace (used for SPARQL query)

    :DEBUG: - A boolean flag passed on to the SPARQL parser and
        evaluation engine

    :processor: - The kind of RDF query. Choose 'sparql' to use the
        pure-Python "nOSQL" SPARQL processor, choose 'sparql2sql' to
        use the pure-Python "SPARQL2SQL" SPARQL processor.

    """

The first positional argument strOrQuery is either a query string or a pre-compiled query object (compiled using the appropriate BisonGen mechanism for the target query language). Pre-compilation can be useful for avoiding redundant parsing overhead for queries that need to be evaluated repeatedly:

from rdfextras.sparql2sql.bison import Parse
queryObject = Parse(sparqlString)

The initBindings keyword argument is a dictionary that maps variables to their values. The dictionary is expected to be a mapping from variables to RDFLib terms. This is passed on to the SPARQL processor as initial variable bindings.

initNs is yet another top-level parameter for the query processor: a namespace mapping from prefixes to namespace URIs.

The DEBUG flag is pretty self-explanatory. When set to True, it will cause additional print statements to appear for the parsing of the query (when the sparql2sql processor is selected) as well as the patterns and constraints passed on to the processor (for SPARQL queries).

Finally, the processor keyword specifies which kind of processor to use to evaluate the query: sparql or sparql2sql.

Result formats

SPARQL has two result formats (JSON and XML). Thanks to Ivan Herman’s recent contribution the SPARQL processor now supports both formats. The query method (above) returns instances of QueryResult, a common class for RDF query results which define the following method:

def serialize(self,format='xml'):
    # real code required ...
    pass

The format argument determines which result format to use. For SPARQL queries, the allowable values are: graph – for CONSTRUCT / DESCRIBE queries (in which case a resulting Graph object is returned), json,or xml. The resulting object also acts as an iterator over the bindings to allow for manipulation in the host language (Python).

Table Of Contents

Previous topic

sparql - “Compositional Semantics” SPARQL engine

Next topic

SPARQL in RDFLib (Version 2.1)