Page tree
Contents

Introduction

Data services in the research domain support the use of research collections and datasets by providing automated functions for the creation, access, processing and analysis of data. More and more data providers are publishing their data through services. In Australia, for example, research organisations, science agencies, government departments and a number of national research infrastructure facilities are all moving to more formal publishing of data through services. Also, data consumers are increasingly accessing data services and connecting them with other services or tools (e.g. virtual laboratories) for data analysis, processing and visualisation.

Context

 In 2017-18, the Australian Research Data Commons (ARDC) convened a Data Services Interest Group around data service provision and consumption across the NCRIS facilities, science agencies, universities, and broader public sector. To improve discovery and use of data and related services across these organisations, the Interest Group agreed on some “end user” scenarios that the group aspired to support:

  • Individual researchers looking to “plug data into” their own tool or model using standard services

  • Virtual laboratories providing tools over data from various common data services

  • Third party innovators leveraging data across a pool of services for development work.

The Interest Group set out to identify “shared practice” for exposing information about data and related services across all organisations by asking:

  • What information set would a data steward need to possess to satisfy the requests in those scenarios?

  • In current information technology practice where might such information typically be stored and exposed in data management systems?

Core metadata for services and related collections

The Interest Group looked at Data and related Service metadata terms from a number of metadata schemas, and attempted to group these terms by concept in Data and Data-Service Metadata Concepts and Schemas (Google Sheet): each row is a group of terms from various schemes; the groups are named in column B, ‘Concepts - for data and related services’.  

Based on this, the Interest Group agreed upon a core information set for “data and related services” which might then be encoded in a given standard and exchanged using the corresponding protocol. The set has been tested against common OGC/W3C/OpenAPI/Web-index standards to make sure it works within a given metadata-protocol combination (eg ISO 19115 and CSW). However, it does not prescribe a particular metadata scheme, exchange protocol, or information management approach. Further detailed information on this approach is available in the document, Data and related Services: discovery and use (Google Document).

The agreed core set of information for data and related services follows. Future work for the Interest Group includes developing community standards for the encoding of values for these concepts:

(Essential = information required to respond to the three end-user scenarios listed above; more details here;
Recommended = desirable information for discovery, appraisal, citation, re-use, etc)

Concepts, for data-services and related dataRequirement

service URL  

service identifier (if different from the URL)

Essential *

service type: protocol and version - e.g. ‘wms 1.3’

service-use documentation  (if protocol is non-standard - e.g. URL to service description)

service type: function  (if protocol is non-standard - e.g. ‘download’)

Essential*

service type: resource type (e.g. ‘service’)

Essential

data subject (e.g. 'observedProperty', 'variableMeasured')

Essential

service title

Essential

data spatial coverage

Essential if available

data geographic/projected CRS

Essential if available

data temporal coverage

Essential if available

service description/ abstract

Recommended

data format

Recommended

service date (modified)

Recommended

service rights

Recommended

data rights

Recommended

data contributor/owner/publisher

Recommended

data language

Recommended

service language

Recommended

data identifying information - its text name, or an identifier such as a uuid or doi to a landing page

Recommended

service contributor/owner/publisher

Recommended

* = essential for a minimum response

Community standards for the mapping of these concepts (in development):

 RIF-CS

Service Record RIF-CS Xpath

Collection Record RIF-CS Xpath

Concepts, for data-services and related data

Service/identifier

Collection/relatedInfo/identifierAND ORCollection/relatedInfo/relation/url (service URL to this dataset)

service URL  service identifier (if different from the URL)

Service/@type

Service/relatedInfo/@type=’reuseInformation’/identifier[@type=’url’]

Collection/relatedInfo[@type=’service’]/{title and or notes}

service type: protocol and version - e.g. ‘wms 1.3’service-use documentation  (if protocol is non-standard - e.g. URL to service description)service type: function  (if protocol is non-standard - e.g. ‘download’)

Service

Collection/relatedInfo[@type=’service’]

service type: resource type (e.g ‘service’)

Service/subject ORService/relatedInfo[@type=’collection’]/{title OR notes}

Collection/subject

data subject

Service/name

Collection/relatedInfo[@type=’service’]/{title and or notes}

service title

Service/coverage/spatial

Collection/location/spatial

data spatial coverage

Service/coverage/spatial

Collection/location/spatial

data geographic/projected CRS

Service/coverage/temporal

Collection/coverage/temporal

data temporal coverage

Service/description

Collection/relatedInfo[@type=’service’]/notes

service description/ abstract

Service/relatedInfo/@type=’collection’/{title OR notes}

Collection/relatedInfo[@type=’service’]/format

data format

Service/@dateModified

Collection/relatedInfo[@type=’service’]/notes

service date (modified)

Service/rights

Collection/relatedInfo[@type=’service’]/notes

service rights

Service/relatedInfo/@type=’collection’/{title OR notes}

Collection/rights

data rights

Service/relatedInfo/@type=’collection’/{title OR notes}

Collection/relatedObject/keyORCollection/relatedInformation[@type=’party’]

data contributor/owner/publisher

Service/relatedInfo/@type=’collection’/{title OR notes}

Collection/name/@lang

data language

Service/name/@lang

Collection/relatedInfo[@type=’service’]/notes

service language

Service/relatedInfo

Collection/identifierCollection/name

data identifying information - its text name, or an identifier such as a uuid or doi to a landing page

Service/relatedObject/key

Collection/relatedInfo[@type=’service’]/notes

service contributor/owner/publisher

 DCAT

Following analysis and discussion in the W3C Dataset Exchange Working Group (DXWG), a proposed solution for cataloguing services in the context of a DCAT catalog has been developed – see the DCAT-2 Working Draft and Editors Draft. Classes for DataService and DataDistributionService have been added to the DCAT vocabulary. Examples of their use are shown here and here. Note that the second example is modelled on an instance from Research Data Australia. A summary of the solution is below:

DCAT

Concepts, for data-services and related data

Comment

dct:title

service title

 

dct:description

service description/ abstract

 

dcat:endpointDescription

service description/ abstract

Link to machine-readable endpointDescription, such as a Swagger or GetCapabilities document

rdf:type

service type: resource type

 

dct:conformsTo

service type: protocol and version

In DCAT a single service might support multiple interfaces or protocols, so this property may be repeated

dct:type

service type: function

In DCAT a service might have multiple classifiers, either at different levels of refinement or with values taken from different controlled vocabularies, so this property may be repeated

dcat:endpointURL

service URL

service API

dcat:landingPage

service URL

Human useable landing-page for the service (which might be provided at a different URL to the service API)

dcat:servesDataset

data subject, data spatial coverage, data geographic/projected CRS, data temporal coverage, data format, data rights, data contributor/owner/publisher, data language, data identifying information

In DCAT this is a link to a description of the dataset(s) served, which is packaged as a separate record. All of the dataset descriptors are provided as properties of that resource, and not as properties of the service itself.

dct:language

service language

 

dct:accessRights

service rights

 

dct:modified

service date (modified)

 

dct:creator|publisher|contributor
prov:wasAttributedTo

service contributor/owner/publisher

 

 

 

Examples of service description using the agreed core concepts

There is no assumption that data provision organisations necessarily maintain independent metadata descriptions of services; there is however a shared expectation that a core set of information about data and related services be available from somewhere in the data management system. These might include: references to services within dataset records; or “self-describing” interfaces such as GetCapabilities; or combinations of both (see Data and related Services - Metadata Views for more information). The image below demonstrates how information about the three core concepts: "service title", "service creator" and "data subject" could be obtained from various locations in the data management system.

 

Following are three real-life examples from some of the data provision organisations we have been working with:

 

 Example 1 - Service metadata provided in a dedicated service record (Geoscience Australia)

Service record from the GA metadata catalogue

Service metadata extracted from the above and mapped to the core service concepts: 

 

Service conceptValue

service URL  

http://services.ga.gov.au/gis/services/DEM_SRTM_1Second_Slope/MapServer/WMSServer

service type: protocol and version

Protocol: WMS 1.3.0, 1.1.1

service type: resource type

service

data subject

Land topography models, Ecology landscape, elevation, slope

service title

Digital Elevation Model (DEM) of Australia derived from SRTM with 1 Second Grid - Smoothed Percentage Slope WMS

data spatial coverage

["112.000000 -44.000000,154.000000 -44.000000,154.000000 -9.000000,112.000000 -9.000000,112.000000 -44.000000"]

data geographic/projected CRS

 

data temporal coverage

 

service description/ abstract

Digital Elevation Model (DEM) of Australia derived from SRTM with 1 Second Grid - Smoothed Percentage Slope WMS

data format

 

service date (modified)

2018-06-18

service rights

 

data rights

 

data contributor/owner/publisher

Geoscience Australia

data language

 

service language

 

data identifying information

UUID: aac46307-fce8-449d-e044-00144fdd4fa6

service contributor/owner/publisher

Geoscience Australia

 Example 2 - Service metadata from within a dataset record combined with the response from a self-describing service (AODN)

IMAS UTAS dataset record in the AODN portal:

 

Hyperlink to Service URL within dataset record:

 

Service description at Service URL (OGC WFS) extracted from dataset metadata XML:

 

Summation of service metadata extracted from the above and mapped to the core service concepts:

Service concept

Value

service URL  

http://geoserver.imas.utas.edu.au/geoserver/seamap/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=seamap:SeamapAus_NSW_marine_habitats_2002&outputFormat=SHAPE-ZIP

service type: protocol and version

service type: function

Protocol: WFSFunction: Access

service type: resource type

service

data subject

bathymetry/seafloor topography...

service title

seamap

data spatial coverage

-27.64400, 149.28540, 154.29516, -37.60255

data geographic/projected CRS

 

data temporal coverage

2002-05-30

service description/ abstract

 

data format

SHAPE-ZIP

service date (modified)

 

service rights

 

data rights

CC-BY 4.0

data contributor/owner/publisher

IMAS UTAS

data language

English

service language

English

data identifying information

ID: 9a94d1ba-8509-4d78-8b55-d25fd222cdffName: MAP - NSW marine habitats

service contributor/owner/publisher

IMAS UTAS

 Example 3 - Service metadata from self-describing service OpenAPI (Atlas of Living Australia)

Service OpenAPI URL - description for multiple service endpoints provided in json:

 

Summation of service metadata extracted from the above and mapped to the core service concepts:

Service concept

Value

service URL  

http://biocache-ws.ala.org.au/ws/occurrences/search*

service type: protocol and version

webservice
search

Parameters:
fq - array(False)
formattedFq - array(False)
facets - array(False)
formattedQuery - string(False)
q - string(False)
...

service type: resource type

service

data subject

occurrence

service title

occurrenceSearchUsingGET

data spatial coverage

 

data geographic/projected CRS

 

data temporal coverage

 

service description/ abstract

occurrenceSearchUsingGET operation available at biocache-service API

data format

json

service date (modified)

 

service rights

 

data rights

 

data contributor/owner/publisher

 

data language

 

service language

 

data identifying information

 

service contributor/owner/publisher

Atlas of Living Australia

 

 

  • No labels