EarthServer is fully committed to standards, most importantly the datacube standards of OGC,ISO, and European INSPIRE, centering around coverages. This tutorial gives a one-stop shop of information about datacube standards: a primer, interactive demos, slides, standards, publications...you name it!
Actionable datacubes provide a powerful, yet simplifying paradigm for getting insight from massive spatio-temporal data. This concept, which generalizes the concept of seamless maps from 2-D to n-D, is based on preprocessing incoming data so as to integrate all data form one sensor into one logical array, say 3-D x/y/t for image timeseries or 4-D x/y/z/t for weather forecasts. This enables spatial analysis (both horizontally and vertically) and multi-temporal analysis simultaneously. Adequate service interfaces enable “shipping code to the data” to avoid excessive data transport. In standardization, datacubes belong to the category of coverages as established by ISO and OGC. Datacube services based on these standards are in operational use since many years now, serving multi-Petabyte assets. Below is a kaleidoscope of coverage-enabled services, with their individual portals and clients supported.
However, despite the accepted advantages many questions reach us on datacubes - not surprisingly, as this concept is new (and not yet taught comprehensively at universities). This EarthServer tutorial provides answers and links to get you from zero (or above) ot hero!
Coverages are the worldwide accepted concept for representing raster data and datacubes, such as: 1-D sensor timeseries, 2-D images and DEMs, 3-D x/y/t image timeseries and x/y/z geophysics data, 4-D x/y/z/t atmospheric data, etc. Generally, coverages represent regular and irregular grids, point clouds and general meshes, although today raster data and datacubes prevail.
A coverage consists of
OGC, ISO, and EU INSPIRE agree on a common interoperable Coverage Implementation Schema which comes with broad format encoding support, such as XML, JSON, RDF, GeoTIFF, NetCDF, etc. [more on coverage data]
The OGC Web Coverage Service (WCS) standards suite esablishes a structured, modular collection of increasingly powerful services. WCS Core defines basic functionality: coverage subsetting and format encoding. Any implementation claiming to support WCS must support this Core functionality (which is not hard, it's tentatively kept simple) and may additionally support any of the WCS extensions defined, such as the Earth datacube analytics and fusion language, Web Coverage Processing Service (WCPS). [more on coverage services]
The authoritative source for coverage data is OGC CIS which is also ISO standard (ISO 19123-2:2018). OGC WCS, together with its datacube analytics language WCPS, is the accepted way to offer flexible, easy-to-use services on spatio-temporal coverages, in particular: datacubes.
OGC provides a rigorous conformance test suite validating implementations down to pixel level. Any implementation can test itself by downloading and executing this free and open test suite.
...read on below, get "from zero to hero"!
Abstractly, a coverage is a function mapping location (i.e., coordinates) to data values - in plain words: a coverage offers some value (such as a color pixel) for each of the coordinates it covers. These coordinates are called direct positions - only at those direct positions the coverage stores a value; via interpolation, values can also be generated for coordinates between direct positions - more about that later. To represent this, the coverage data structure technically consists of four main components (plus some details which we ignore at this level of detail):
The UML diagram illustrates this structure; it shows the four components, of which metadata are optional, as well as the inheritance from the basic geo objecttype, Feature. Quit simple, isn't it?
Let us inspect these components in turn.
Domain set. The domain set consists of direct positions where values are located. As we focus on raster data / datacubes we only consider gid coverages where the domain set forms a (regular or irregular) grid. Such a grid, as well as its grid coordinates, can be of any number of dimensions (better said: axes), made up from spatial, temporal, and other axes such as spectral frequencies, for example. The underlying grid space of a domain set is defined by its corresponding Coordinate Reference System (CRS), more about that later.
In gridded coverages, which is what we focus on with raster data and datacubes, coordinates are aligned on some grid, obviously.
Still, there is an amazing variety among possible grid types: Cartesian or geo-referenced, space or time or something else, regular or irregular, etc.
With growing complexity the description of a grid, as part of the domain set, grows, and likewise does the size of the corresponding domain set.
The simplest case is a regular Cartesian or geo-referenced axis; in this case, we simply need to store lower and upper bound as well as resolution.
In case of an irregular axis this is more involved: all the individual grid points on the axis between its lower and upper bound need to be stored explicitly.
More complex grids require even more involved representations.
Below is an example for a grid, in XML representation, that involves both regular axes (Lat and Long) and an irregular one (time).
Note the definition of Lat
and Long
as regular axis with a given resolution,
as opposed to the explicit enumeration of the time steps in the date
axis. The underlying (Cartesian) grid,
modelling the array data structure, is given by the GridLimits
- but that is just a technical detail.
<DomainSet> <GeneralGrid srsName="http://www.opengis.net/def/crs-compound?1=http://www.opengis.net/def/crs/EPSG/0/4326&2=http://www.opengis.net/def/crs/OGC/0/AnsiDate" axisLabels="Lat Long date" uomLabels="deg deg d"> <RegularAxis axisLabel="Lat" uomLabel="deg" lowerBound="40" upperBound="60" resolution="10"/> <RegularAxis axisLabel="Long" uomLabel="deg" lowerBound="-10" upperBound="10" resolution="10"/> <IrregularAxis axisLabel="date" uomLabel="d"> <C>2015-12-01</C> <C>2015-12-02</C> <C>2016-00-12</C> <C>2016-05-01</C> </IrregularAxis> <GridLimits srsName="http://www.opengis.net/def/crs/OGC/0/Index3D" axisLabels="i j k"> <IndexAxis axisLabel="i" lowerBound="0" upperBound="2"/> <IndexAxis axisLabel="j" lowerBound="0" upperBound="2"/> <IndexAxis axisLabel="k" lowerBound="0" upperBound="2"/> </GridLimits> </GeneralGrid> </DomainSet>
Range values. For storage, these values will need to be linearized following one of many possible schemes, but this is an implementation detail of the particular format chosen and does not affect the fact that coordinates are determined by the coverage axes. There is also the question on how the direct positions of the domain set are connected to their respective values. Actually, there are several ways of achieving this:
Range type. A coverage’s range type captures the semantics of the range set values; its definition is based on SWE (Sensor Web Enablement) Common so that sensor data can be transformed into coverages without information loss, thereby enabling seamless service chains from upstream data acquisition (e.g., through OGC SOS) to downstream analysis-ready user services (such as OGC WMS, WCS, and WCPS). Notably, the range type can go far beyond just a datatype indicator (such as integer versus float); unit of measure, accuracy, nil values, and the semantics (by way of a URL reference), and more information can be provided with a range type, thereby accurately describing the meaning of the values. The following is an example range type definition for panchromatic optical data, encoded in GML:
<swe:field name="panchromatic"> <swe:Quantity definition="http://opengis.net/def/property/OGC/0/Radiance"> <swe:description>panchromatic sensor</swe:description> <swe:NilValues> <swe:nilValue reason="http://www.opengis.net/def/nil/OGC/0/AboveDetectionRange">255</swe:nilValue> </swe:NilValues> <swe:uom code="W.m-2.sr-1.nm-1"/> </swe:Quantity> </swe:field>
Metadata.
This optional part is left unspecified in the standard, it can contain any number of anything, literally (xs:any
, for the XML experts).
In addition to domain set and range type, the mandatory technical metadata of a coverage, these are completely application dependent.
Of course, the coverage cannot understand them, but they will duly be transported so that the connection between data and metadata is preserved.
One example for such metadata is given by the European INSPIRE legal framework for a common Spatial Data Infrastructure.
INSPIRE prescribes canonical metadata for each object following a specific schema. Coverage metadata are the right place to keep those.
This demonstration showcases use of INSPIRE metadata.
Note the "any number": different applications may add their own metadata, and each application in practice would only look at those metadata it understands.
Here is an example where three different, independent metadata records coexist in the coverage; note how they all bring along their own schema:
<GeneralGridCoverage xmlns="http://www.opengis.net/cis/1.1/gml" ...> <DomainSet> ... </DomainSet> <RangeSet> ... </RangeSet> <RangeType> ... </RangeType> <Metadata> <el-covmd:ElevationGridCoverageMetadata xmlns:el-covmd="http://inspire.ec.europa.eu/schemas/el-covmd/4.0" ...> ... </el-covmd:ElevationGridCoverageMetadata> <card4l:Card4lMetadata xmlns:card4l="..." ...> ... </card4l:Card4lMetadata> <special:MySpecialMetadata xmlns:special="..." ...> ... </special:MySpecialMetadata> </Metadata> </GeneralGridCoverage>
Let's come back to CRSs and the cumbersome srsName
attribute we skipped explaining in the above domain set example.
Science tells us that a coordinate is meaningless as long as there is no indication about the reference system in which it is expressed.
A value of 42 - is that degrees (referring to what datum?), meters, years since epoch, or million years backwards?
All of that information is provided by the CRS.
As per OGC Naming Authority decision, CRSs shall be expressed in URLs, and that is what we find in the srsName
attribute
(which, BTW, expands to spatial reference system name - some legacy terminology stemming from GML).
These URLs you can indeed resolve by simply following
http://www.opengis.net/def/crs/EPSG/0/4326 or
http://www.opengis.net/def/crs/OGC/0/AnsiDate;
a so-called CRS resolver service, running an open-source implementation by Jacobs University, does that trick.
Being annoyed about the complexity, CIS 1.1 allows any commonly accepted notation, including notations like EPSG:4326
.
So far, so good - but we need multi-dimensional CRSs. The EPSG catalog is large, but preparing all possible axis combinations is just unfeasible.
Therefore, and foillowing ISO 19111-2, CRS and axis composition is provided where the base URL ends with crs-compound
,
followed by an ordered list of component CRSs and axis. And that is exactly what our funny srsName
does - slightly reformatted it becomes clear:
http://www.opengis.net/def/crs-compound? 1=http://www.opengis.net/def/crs/EPSG/0/4326 & 2=http://www.opengis.net/def/crs/OGC/0/AnsiDate
This completes the information necessary to understand a domain set. But here is some more service:
it would slow down services considerable if, for each coverage decoding, it first needs to retrieve the CRS definition.
Actually, not all of it is needed anyway - most important are the axis names and units of measure.
The axisLabels="Lat Long date"
and uomLabels="deg deg d"
attributes provide this excerpt directly, for the tool's convenience.
Additionally, the axis labels define the sequence of axes;
this axis sequence ambiguity actually is a problem recurring in GeoJSON and other OGC services where it is just implicitly assumed.
The coverage structure is represented 1:1 in the XML, JSON, and RDF encodings coming with CIS 1.1. However, while these encodings are "informationally complete" in that they carry all of a coverage's information they are not always inefficient. For 1-D diagrams formats like JSON are ideal because the file will be small anyway, and can be consumed conveniently by some JavaScript or TypeScript client and its diagramming libraries. A satellite scene, on the other hand, nobody would want to store and exchange in one monolithic XML file.
Therefore, binary formats are supported for coverage encoding in addition. A lineup of widely used binary formats is available in CIS extensions, including GeoTIFF, NetCDF, GRIB2, JPEG2000, and several more. Of course, it is well known that these formats have their individual ways of representing "metadata" that could carry domain set, range type, and coverage metadata. Actually, often one or the other component cannot be transported at all - think of JPEG, for example. Well, often this is ok, and if a user explicitly requests such a format they will know hwat they do. Citing the JPEG example again, if the coverage should just be portrayed on a screen it is perfectly acceptable to miss part of the data.But as always in life, we want it all: informationally complete and, at the same time, efficient. For this purpose a container concept has been introduced with CIS 1.0 and extended with CIS 1.1. A coverage container consists of a "shell" that may use any suitable format that can hold files inside, such as Multipart/MIME, zip, SAFE, etc. Inside there is a canonical header in some information-lossless format, like the abovementioned XML, JSON, or RDF, followed by one or more files which typically would contain some efficient binary encoding of the range set (but could also be metadata or domain set, for that matter). The header holds the coverage data in general, but for the "outsourced" parts instead of the data stores a refernce to the file(s) coming. By the way, this mechanism can also be used to encode tiled coverages.
In summary, the common, unified structure of a coverage can be encoded in a variety of alternativees, with individual pros and cons, with options for all situations arising.
The coverage standards of OGC and ISO – which are kept identical – divide coverage definition into an abstract and a “concrete” level. On abstract level we find ISO 19123, which is identical to OGC Abstract Topic 6, providing coverage concepts and definitions in a very general way. This definition still allows manifold implementations which are not necessarily interoperable. This fact is underpinned by the observation that such services regularly come with their own clients, which are not interchangeable across different server implementations. Therefore, ISO and OGC provide a concretization of coverages, based on these abstract standards but establishing interoperability; again, specifications ISO 19123-2 and OGC Coverage Implementation Schema (CIS) are identical. This concretization gets manifest in OGC conformance tests which assess any Web Coverage Service (WCS) and the coverages it delivers down to the level of single pixels; hence, CIS/WCS are interoperable and can be used in any combination of clients and servers.
Coverages are standardized by ISO and OGC in close synchronization. Concepts and Terminology are established in ISO 19123-1 (primer) which is currently under adoption vote for an update of outdated OGC Abstract Topic 6. A concrete, interoperability-testable data structure based on these concepts is given by OGC Coverage Implementation Schema 1.1, specifically: its General Grid Coverage structure with its schemata for XML, JSON, and RDF encoding (primer). OGC Coverage Implementation Schema (CIS) 1.0 was adopted by ISO as 19123-2; work in ISO has commenced to adopt the current OGC revision, CIS 1.1, which introduces the more powerful and easier to handle General Grid Coverage. After that adoption process, ISO and OGC will be in sync with their coverage standards suites.
While coverages can be served through a variety of interfaces - including WFS, WMS, and WPS - only WebCoverage Service (WCS) offers comprehensive, tailored functionality. Similar to the modularly organized CIS, the WCS suite also has a mandatory Core around which a series of optionally implementable extensions gather. This structure is tentqtive so as to have the lowest possible effort and skill level for WCS implementers while, at the same time, allow advanced implementers building compliant, but more elaborate WCS tools.
WCS Core offers very simple, focused coverage access and extraction functionality:
Technically, WCS Core offers three request types:
GetCapabilities
: what data does this service offer, and what are its capabilities?
DescribeCoverage
: tell me details about this coverage!
GetCoverage
: give me that coverage (or part of it), and in my favourite format!
We focus on GetCoverage
as the central workhorse of WCS and explain it by example.
The following request, using GET/KVP encoding, performs trimming in space and slicing in time,
effectively extracting a 2D image from a 3D timeseries datacube, delivered as GeoTIFF
(we use this WCS demo service):
https://ows.rasdaman.org/rasdaman/ows ? SERVICE=WCS & VERSION=2.0.1 & REQUEST=GetCoverage & COVERAGEID=AvgTemperatureColorScaled & SUBSET=Lat(-30,70) & SUBSET=Long(-140,140) & SUBSET=ansi(%222000-02-01T00:00:00.000Z%22) & FORMAT=image/jpeg
We can see the SUBSET
parameters, one for each axis (yes, unix
is a crazy name for a time axis, agreed).
In this particular example we might obtain a time slice at a particular point in time and for some particular geographic region from an x/y/t datacube for display on screen.
The encoding format in which a coverage is stored in the server is called the coverage's Native Format; without any FORMAT parameter in a request the coverage will be delivered in this Native Format (also in case of subsetting). The CRS in which the coverage’s domain set coordinates are expressed is referred to as its Native CRS. A WCS Core will always interpret subsetting coordinates and deliver coverage results in the Native CRS of the coverage addressed.
A central promise of WCS Core is that coverage values will be delivered guaranteed unchanged – no rounding, no resampling etc. may take place. Of course, this may change if some lossy format is used (such as JPEG with a quality factor less than 100%). Further, WCS extensions may change this behavior as we will see below.
Both CIS and WCS are organised in a modular way:
A synoptic view of the OGC coverage data and service model is presented below.
Now I hear you ask: And how do I find out which extension the particular service supports?
The answer is: The list of extensions as well as data formats supported, CRSs supported, and so on is listed in the Capabilities document of a service,
to be obtained via a GetCapabilities
request.
The following diagram shows the architecture of the complete WCS suite with its components and their relationships:
WCS extensions can be classified into functionality extensions and protocol bindings. The latter allow expressing any request of Core and extensions in one of GET/KVP, POST/XML, SOAP, and (still under discussion) OpenAPI.
The functionality-enhancing extensions provide a variety of useful extras over WCS Core; we look at each in turn.
GetCoverage
request with an extra, optional parameter
to select range components (“bands”, “variables”). With RANGESUBSET
a selection of bands can
be extracted by band name, allowing to select and also reorder.
For example, the following request extracts a false-color image from some hyperspectral scene:
http://www.acme.com/wcs ? SERVICE=WCS & VERSION=2.1 & REQUEST=GetCoverage & COVERAGEID=c001 & RANGESUBSET=nir,red,green
Generally, one or more bands can be extracted and brought into any sequence desired.
For large amounts of bands, intervals can be specified, such as red:blue
, based on the order defined in the range type.
SCALING
parameter is added to the GetCoverage request.
The interpolation method applied is implementation dependent unless WCS Interpolation is implemented in the server (see below).
Obviously, in presence of scaling it cannot be guaranteed any longer that range values remain unchanged.
OUTPUTCRS
lets users indicate the target CRS in which the coverage should be delivered.
Additionally, the CRS of a subsetting bbox given by one or more SUBSET
parameters
can be changed from the coverage's Native CRS through the optional request parameter SUBSETTINGCRS
.
Following OGC convention, CRSs are expressed as URLs, such as in the following example:
http://www.acme.com/ows?SERVICE=WCS & VERSION=2.1 & REQUEST=GetCoverage & COVERAGEID=c001 & OUTPUTCRS=http://www.opengis.net/def/crs/EPSG/0/4326
The list of CRSs supported by a server are listed in its capabilities document which can be retrieved through a GetCapabilities
request.
GetCapabilities
request.
Such methods might include nearest-neighbor, linear, or quadratic.
ProcessCoverages
.
It has a single parameter, query
, which is supposed to contain a WCPS query string. See the WCPS section for details.
InsertCoverage
creates a new coverage in the server;
DeleteCoverage
removes one or more, and UpdateCoverage
performs a partial replacement within an existing coverage.
For example, the following request generates a new coverage in the server by reading the coverage from some other server via the URL indicated:
http://www.acme.com/wcs ? SERVICE=WCS & VERSION=2.1 & REQUEST=InsertCoverage & COVERAGEREF=http://bcme.com/archive/hurricane.nc
Since its inception, WCS offers several different protocol bindings, i.e., languages between client and server. This does not address coverage encodings - they are just the optic of communication - but rather the way requests and responses are formatted.
GET
requests and responses, with functionality (REQUEST=...
) and parameters (SUBSET=...
)
encoded in the query part of the URL.
This allows for a convenient single-line command that any browser can send, but in is restricted in its length by built-in restrictions of Web client and server libraries.
POST
requests instead, with commands and parameters transmitted through an XML structure specified in the standards.
This is convenient in particular when large data have to be transmitted (such as in a WCS-T request), but cannot be copy-pasted into a browser.
OGC OAPI-Coverages is a draft-stage effort in OGC to unify feature, coverage, map, and processing based on OpenAPI concepts. From a WCS perspective, this is an evolution, not a disruptive replacement, that constitutes an additional way of expressing requests to the server and handling responses. Most notably, OAPI-Coverages is based on the Coverage Implementation Schema (CIS) 1.1 just as WCS 2.1 is. Currently, OAPI-Coverages functionality is a small subset of WCS (more or less: WCS-Core).
For compliance testing, OGC provides a free, open test suite which examines WCS implementations and the coverages delivered down to the level of single pixels, thereby ensuring interoperability across the large and increasing ecosystem of open-source and proprietary servers and clients.
OGC WCS 2 is the current version (WCS 1 is deprecated since long). WCS 2.0.1 is based on CIS 1.0; WCS 2.1, adopted in 2019, is identical except that it also knows about CIS 1.1 General Grid Coverages. The WCS 2 extensions likewise have been enhanced; some of them have been at version 1.0 and are lifted now to 1.1, some already had 2.0 and are now 2.1. The WCPS language (which is independent from the WCS request protocol binding) is originally 1.0 and has been elevated to 1.1 to incorporate the CIS 1.1 coverage structure. OGC WQS has the status of a Discussion Paper.
OAPI-Coverages is under discussion for quite some time now (official kickoff in May 2018), but not yet stable and still changing; recent estimates expect adoption sometime 2022. From an implementation perspective there are the caveats that (i) the specification in places still is inconcise which prohibits interoperability, (ii) not implementation proven, and (iii) there are no clients available yet.
A WCPS query has several parts, the two most important ones being the for
and the return
clauses.
The for
part serves to indicate the coverage objects we want to work with, and to tie them to a variable - a shorthand, if you will -
for subsequent processing directives. Say, we want to inspect coverages A
, B
, and C
in turn
using variable $c
as an iterator (we note in passing that such variables start with a $
symbol). This is written as:
for $c in ( A, B, C )
This is fine, but not complete, as after all it does not give us any result. We need to add a return
clause where we state what we actually want to get back.
The following "hello world" example returns 42 (assuming coverages A
, B
, and C
exists on the server, otherwise adapt the name accordingly):
for $c in ( A, B, C ) return 42
Ahem, nothing impressive, admittedly: all we get back is 3 times the literal 42
as below - but hey, we need to start somewhere.
42 42 42
Not impressive, but a first sign of life. Oh, by the way, such a simple number is returned in ASCII as text. This can be printed or easily converted into any programming language's number representation. If you will, ASCII is the transfer encoding.
Things get more complicated once we start extraction from coverages - as we all know there is a plethora of format encodings available, end everybody has their own favourite. Now WCPS is graceful on this and lets you select your personal preference. As long as the data allows this - the server will have a hard time encoding something 4D into PNG which is just made for 2D, so that won't work.
Understood, but how do we express our desired encoding format? Well, by applying an
encode()
function on the result wich the coverage output as first parameter and the encoding format as second parameter,
indicated through its MIME type like "image/tiff" for GeoTIFF.
Assuming we have a small 2D coverage (be careful, you really don't want to try this with a multi-Terabyte object!) this could read as follows:
for $c in ( A ) return encode( $c, "image/tiff" )
The above query would return one GeoTIFF file; had we indicated a list of say 3 coverages then we would get back 3 GeoTIFF files, one for each iteration.
Now we are ready to roll - let us inspect next how we can extract from coverages. After all, as mentioned, we would not want to download coverages which typically are Big Data, hence "too big to transport".
Oh, by the way, the for
clause can do even more for us: combining several coverages.
This is achieved by listing several variable declarations in the for
clause.
For example, let us request a band ratio (more on that below) from bands stored in two separate coverages, say MODIS-nir
and MODIS-red
.
Ignoring the processing magic in the return
clause for now - we will address this below - we can establish fusion requests like the following:
for $nir in ( MODIS-nir ), $red in ( MODIS-red ) return encode( (unsigned int) 127*(($red-$nir)/($red+$nir)+1), "image/tiff" )
Generally speaking, for some lineup
for $x1 in (X1,1,X1,2,...), $x2 in (X2,1,X2,2,...), ..., $xn in (Xn,1,Xn,2,...) return f($x1,$x2,...,$xn)each combination of the
$xi,j
is generated and evaluated in turn. This allows data fusion on an arbitrary number of coverages.
As an example, this INSPIRE WCS service contains a 4-way fusion.
At the heart of WCPS are coverage expressions - they actually perform the heavy lifting of Big Data Analytics. A rich set of operators is provided by the language, effectively covering some mileage of Tensor Algebra, up to about the Fast Fourier Transform). We start our tour with the most basic operation: coordinate-based extraction from a coverage.
The most immediate operation, subsetting, we know already from WCS trimming and slicing.
In WCPS, notation is more compact: instead of the WCS SUBSET
sequence subsetting in WCPS is enshrined in brackets, such as:
$c[ Lat(30:60), Long(40:60), date("2020-03-21") ]
Sequence of axes does not matter, as we have always indicate the name of the axis addressed. As with WCS, axes left out will be delivered uncut. Needless to say, the lower bound must not be higher than the upper bound.
In so-called induced operations some operation which is known on pixels silently gets applied to all pixels simultaneously. Any unary or binary function defined on the cell types qualifies for induced use, includng all the well-known arithmetic, comparison, Boolean, trigonometric, and exponential/logarithmic operations, as well as case distinction. Talking about case distinction, the following example performs a color classification of land temperature (note the null value 99999 which is mapped to white, see the query result in the figure right):
for $c in ( AvgLandTemp ) return encode( switch case $c = 99999 return {red: 255; green: 255; blue: 255} case $c < 18 return {red: 0; green: 0; blue: 255} case $c < 23 return {red: 255; green: 255; blue: 0} case $c < 30 return {red: 255; green: 140; blue: 0} default return {red: 255; green: 0; blue: 0}, "image/png" )
As always it is corageous to let the expression iterate over the full coverage. Being streetwise we therefore add subsetting:
for $c in ( AvgLandTemp ) return encode( switch case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] = 99999 return {red: 255; green: 255; blue: 255} case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] < 18 return {red: 0; green: 0; blue: 255} case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] < 23 return {red: 255; green: 255; blue: 0} case $c[ date("2014-07"), Lat(35:75), Long(-20:40) ] < 30 return {red: 255; green: 140; blue: 0} default return {red: 255; green: 0; blue: 0}, "image/png" )
By instinct we have expressed the processing steps in the sequence that is most efficient: first subsetting, then pixel computations. However, things are getting unwieldy, repeating subset coordinates all over is onerous. We should not be forced to do it that way. Actually, a good optimizer, such as the one implemented in rasdaman, will rearrange automatically to the most efficient execution sequence. If we know that our WCPS server is that smart we can just as well indicate the subsetting once and for all at the end:
for $c in ( AvgLandTemp ) return encode( ( switch case $c = 99999 return {red: 255; green: 255; blue: 255} case $c < 18 return {red: 0; green: 0; blue: 255} case $c < 23 return {red: 255; green: 255; blue: 0} case $c < 30 return {red: 255; green: 140; blue: 0} default return {red: 255; green: 0; blue: 0} ) [ date("2014-07"), Lat(35:75), Long(-20:40) ], "image/png")
And here is yet another way of making expressions crisper and easier to read: pulling parts out into separate variable definitions.
This is what the let
clause accomplishes by letting us give names to expressions;
the above query would now read:
for $c in ( AvgLandTemp ) let $subset := [ date("2014-07"), Lat(35:75), Long(-20:40) ] return encode( ( switch case $c = 99999 return {red: 255; green: 255; blue: 255} case $c < 18 return {red: 0; green: 0; blue: 255} case $c < 23 return {red: 255; green: 255; blue: 0} case $c < 30 return {red: 255; green: 140; blue: 0} default return {red: 255; green: 0; blue: 0} ) [ $subset ], "image/png" )
any number of variables can be defined, and it is highly recommended as it nicely structures queries for human eyes.
Ready for a fun fact? We can construct entirely new coverages, cell-by-cell. We want to do this, for example, to support WebGL clients in doing 3D terrain draping. The trick is to provide a PNG image where RGB is built from some any source (like a satellite image) while feeding the alpha channel with the elevation data. Of course we will need to adjust resolution as both typically differ widely; we choose to re-scale the DEM. This is what the corresponding query looks like:
for $s in (SatImage), $d in (DEM) return encode( struct { red: $s.b7, green: $s.b5, blue: $s.b0, alpha: scale( $d, 20 ) }, "image/png" )
In passing we note that also casts to different pixel types are available. This is necessary, for example, when some arithmetic operation would return float while the format (here: PNG) expects 8-bit unsigned quantities.
In the functional, side-effect free language of WCPS we already have derived new coverages, to be shipped to the client in some encoding.
What we have manipulated was the mostly pixel values.
The domain set was either given by a subsetting expressions or carried over implicitly.
The new range type, finally, was either forced by a cast or derived automatically,
All these are special cases of a general constructor capable of building a completely new coverage with all elements controlled.
This coverage constructor has three clauses. In the coverage
section a name is given to the new coverage,
in the over
section the extent is defined for each axis the coverage will have,
and in the values
clause an expression is indicated for computing the coverage's cell values.
Here is an example creating a 256x256 matrix with axes x
and y
containing a white-to-black grey bar:
coverage GreyMatrix over $px x ( 0 : 255 ), $py y ( 0 : 255 ) values (char) ( ( $px + $py ) / 2 )
Should we want to refer to another coverage then this is no problem, in both domain and range set.
Auxiliary function imageCrsDomain()
delivers the extent for some axis, in Cartesian coordinates so that iteration is possible:
coverage GreyMatrix over $px x imageCrsDomain( $c, Long ) $py y (imageCrsDomain( $c, Lat ) values (char) ( ( $px + $py ) / 2 )
What's missing still is some way to aggregate coverage values. In coverage speak this is said to condense values.
Such a condenser function receives a coverage expression to iterate over
and delivers min
, max
, sum
, avg
, count
as well as some
(is there any true value?) and all
(are all values true?).
For example, this below yields the maximum value in the red band of a satellite scene:
for $s in ( LandsatScene ) return max( $s.red )
Note that there is no encode()
- the result, a single number, is simply returned as an ASCII string.
Very often condensers are used in conjunction with a coverage constructor. Here is an example, deriving a histogram from some satellite band:
for $s in ( LandsatScene ) return encode( coverage $bucket ( 0 : 255 ) values count( $s.red = $bucket ), "text/csv" )
And just like there is a constructor generalizing induced operations there is also a general condenser construct behind these condenser functions,
which now turn out to be special cases.
The general structure is, in analogy to the constructor, the keyword condense
followed by the operation to be executed,
an over
clause indicating the region to iterate over, and finally a using
clause with an expression to be evaluated at each position.
Hence, the above max()
example is equivalent to the following writing:
for $s in ( LandsatScene ) return condense max over $p in imageCrsDomain( $s ) using $s.red
So why should we use this much more complicated construct? Well, it has access to the coordinates, thereby allowing to express neighbourhood operations. This can be seen nicely when expressing a matrix multiplication (remember, it is ai,k*bk,j):
for $a in ( MatrixA ), $b in ( MatrixB ) return encode( coverage MatMult domain $i in crsDomain($a)[0], $j in crsDomain($b)[1] values condense + over $k in crsDomain($a)[1] using $a[ $i, $k ] * $b[ $k, $j ] , "text/csv" )
Sometimes we are interested in just some coverages, but we don't know upfront which of them qualify before we have seen their data contents.
Downloading all candidates for client-side inspection is not an option as they are "too big to transport", so we need to let the server find out.
This is accomplished with the where
clause which performs server-side preselection among coverage sets by applying any criterion that you may provide; only thise coverages passing the filter will be forwarded to the return
processing and delivery.
Here is an example that will give you the Bavaria DEM only if it contains heights over 3000m - which it doesn't, so the result will be empty (try with a smaller number!):
for $c in ( Bavaria_50_DSM ) where max( $c ) > 3000 return 1
Ready for a final effort? To show the expressive power of WCPS let us do edge detection using the Sobel oeprator given as
To WCPS this translates as:
for $c in (NIR) let $kernel1 := coverage kernel1 over $x x (-1:1), $y y (-1:1) value list < 1; 0; -1; 2; 0; -2; 1; 0; -1 >, $kernel2 := coverage kernel1 over $x x (-1:1), $y y (-1:1) value list < 1; 2; 1; 0; 0; 0; -1; -2; -1 >, $cutOut := [ ${cutOut} ] return encode( sqrt( pow( coverage Gx over $px1 i( imageCrsdomain( $c[$cutOut], i ) ), $py1 j( imageCrsdomain( $c[$cutOut], j ) ) values condense + over $kx1 x( imageCrsdomain( $kernel1, x ) ), $ky1 y( imageCrsdomain( $kernel1, y ) ) using $kernel1[ x($kx1), y($ky1) ] * $c.${selectedBand}[ i($px1 + $kx1), j($py1 + $ky1) ] , 2.0 ) + pow( coverage Gy over $px2 i( imageCrsdomain( $c[$cutOut], i ) ), $py2 j( imageCrsdomain( $c[$cutOut], j ) ) values condense + over $kx2 x( imageCrsdomain($kernel2, x ) ), $ky2 y( imageCrsdomain($kernel2, y ) ) using $kernel2[ x($kx2), y($ky2) ] * $c.${selectedBand}[ i($px2 + $kx2), j($py2 + $ky2) ] , 2.0 ) ), "image/jpeg" )
When applied in this demo to a false color image (left) the server returns this image (right):
Remains one mystery to be resolved: How is such a request sent to the server?
Basically, it can be done via a https GET
or a POST
request. The GET
variant has the advantage that the request can be sent as a single line even from a browser, but has the disadvantage that we need to URL-encode various characters that are not allowed to appear literally, such as spaces. While in common environments like JavaScript this is even done automatically we use the POST
for demonstration through Linux command lines. The curl
command is our friend here. It allows to specify both target URL and parameters, sends it to the server, and receives the result. Here is a working example:
curl 'https://ows.rasdaman.org/rasdaman/ows?service=WCS&version=2.0.1&request=ProcessCoverages' -d 'query=for $c in (mean_summer_airtemp) return encode($c, "image/png")' -o wcps.png
The first parameter is the URL following the pattern of WCS, of which the ProcessCoverages
request serving to send WCPS queries is a part.
The -d
parameter contains the query string, and the -o
parameter finally determines the result file name.
This concludes our primer on WCPS; for more introductory and detail information on various levels refer to the list below.
The rasdaman engine has pioneered Actionable Datacubes® and Array Databases. With its enabling approach of a high-level datacube analytics language -- adopted into ISO SQL -- and underpinned by a powerful datacube architecture rasdaman remains the gold standard for modern multi-dimensional raster data services.
We gladly share our experience to answer any questions you may have, from strategic issues down to any technical depth. This can be discussed based on your own data, your own ecosystem requirements, and of course under strict confidentiality (such as under an NDA). Webinars as well as on-site meetings are possible.
Contact us - we gladly share our experience and insight from many years of writing, implementing, and deploying OGC, ISO, and INSPIRE standards, from Raspberry Pi to dozens-of-Petabytes DIAS archives.