You are hereSpark

Spark


Spark is a Java client API for accessing remote SPARQL processors, similar in style to JDBC, the Java Database  Connectivity API, for accessing remote relational databases. SPARQL is a query language for RDF, commonly used to access data in semantic web applications.

Jena and Sesame are popular open-source Java frameworks for working with RDF and SPARQL, however these frameworks tend to be either too much and not enough for some common needs. Both frameworks provide the ability to model RDF datasets, import/export a variety of formats, plug in custom storage engines, and execute queries over models with those storage engines.

After working with Jena, Sesame, a variety of triple stores, and our own SPARQL products, we concluded that there is a missing spot in the stack for a JDBC-style connection-oriented library for accessing remote SPARQL processors. Jena and Sesame both open up storage APIs (for potentially remote stores) but those are at a lower level; the client API is assumed to work at the levels of graphs. The SPARQL HTTP protocol is widely used and supported but offers no client-side programming API or server-side library, does not support connection-oriented use cases, and relies on results sent in a text form (usually XML or JSON). As a result, it does not perform as well as triple-store specific APIs.

Spark is:

  • Connection-oriented so connections can hold the state of an interaction 
  • Client-server to leverage either ubiquitous SPARQL endpoints or custom SPARQL processor APIs 
  • Interface-oriented and system-agnostic so one API can be used for many SPARQL processors and communication protocols 
  • A query API focusing on cursored access to results 
  • Lightweight RDF data API, and NOT a graph API (you should still use Jena, Sesame, etc to work with in-memory graphs) 
  • [work in progress] A metadata API, defining a common way to retrieve metadata from SPARQL processors 
  • [future] Able to support updates and transactions 

Spark consists of the following artifacts:

  • spark-api – the Spark API, only interfaces (javadoc) (example
  • spark-spi – the Spark SPI, classes helpful for building an implementation of the API (javadoc
  • spark-protocol – an implementation of the Spark API for accessing SPARQL endpoints over HTTP (javadoc)

See also: Sherpa, a high-performance, language-agnostic, binary protocol for SPARQL processor communication.

Open source

Spark is an open source project hosted on GitHub in the spark project under the Revelytix organization account. It is released under the Apache License, Version 2.0.

Spark is being developed for use within the Revelytix product suite; however, we felt that many users of SPARQL processors could benefit from such an API. We welcome participation in defining the Spark API and in building implementations of Spark for other SPARQL processors.

If you’re interested in using Spark or working on it, please discuss your ideas on the revelytix-oss Google group.