Data Services Newsletter

Volume 8 : No 1 : March 2006

Searchable Product Archive and Distribution Engine (SPADE)

The Searchable Product Archive and Distribution Engine project (SPADE SPUD, nee UPDS) at the DMC provides a permanent, searchable archive for XML-encoded data products. The products can be essentially any XML document, and can be searched by any of their fields that have been identified as searchable fields. SPADE provides a single uniform web services-based tool to query and access all manner of scientific data products.

The system consists of a pair of servers for submission and querying. Product documents are submitted using a web service client to the Submission server, which extracts searchable metadata into a relational database and archives the XML document in its entirety. The web services-based API supports both a stand-alone Java GUI client and a web browser interface available to query the archive.

The SPADE System
Figure 1: The SPADE System.

To search the archive, one first selects a product type to get the searchable fields for that product. After choosing the product, the user can enter query constraints for that product type. Each product type will have its own set of searchable metadata fields.

SPADE - Selecting a Product Type
Figure 2: SPADE – Selecting a Product Type.

SPADE - Entering queries
Figure 3: SPADE – Entering queries.

The query will return a list of products that matched the specified filter criteria, allowing the user to view them directly or have the products packaged and downloaded as a group.

Currently queries are only available by product type. That is, you can only query for one type of product at a time by the metadata that is available for that product type. In the next release, there will be an expanded set of common metadata fields available with which to search across different products. For example, one might search for all products that relate to a certain geographic region. Common metadata fields will include latitude and longitude extents, geographic region information, time extents, keywords, Dublin Core, and others.

The system is structured so that new and as-yet unforeseen product types can be added to the archive with minimal effort. To add a new product type to the archive, a configuration document is created and registered with the system. This configuration document describes the product’s searchable fields and is used to create the database tables, guide the metadata extraction, and to build the query page. Currently, creating this configuration document is a manual process, similar to creating an XML Schema document. In the future we will provide tools to simplify and at least partially automate the process. The products can have essentially any XML structure, as long as they conform to a minimal set of system requirements, particularly the inclusion of source and product unique IDs.

The archive is populated with over 280,000 products, including over 250,000 XML Hypocenters, over 21,000 Harvard CMTs going back to 1962, PBO strain data, and XML FARM products. Some of these products are experimental and will likely change before the final release. SPADE is currently in beta release, with a 1.0 release expected first quarter, 2007.

More information can be found at http://www.iris.edu/spade http://www.iris.edu/spud or by contacting the IRIS DMC.

Note: SPADE is now called SPUD

by Linus Kamb (IRIS Data Management Center)

Page built 00:34:22 | v.995c67d7