Skip to main content

Energy March Edition 2019

Page 64

SPATIAL & GIS

Why should the metadata of my dataset be structured this way? When searching for datasets to meet the requirements of a solution, understanding that, for example, in a cadastre a property can be called a lot, a parcel, a land boundary, a property boundary, a title boundary, or several other possible terms, these additional descriptions become important for a search engine. However, in most cases search engines do not consider the fact that one “thing” may be called many different things by other groups of people. As such, if you were to describe a dataset as containing information on “tree canopy”, a user querying the search engine with a more general term such as “vegetation” would not find the dataset and so would not be aware of a dataset that may meet the requirements of their solution as it was described using other terms. Google Dataset Search is currently a leader in this regard. For example, querying “bore hole” and “borehole” yield effectively the same results. This is in contrast with other dataset search engines in use, such as CKAN, which ignores all records containing “borehole” if the search query is “bore hole” and vice-versa. Through expressing metadata in a structured RDF format, vocabularies can be linked to elements of the metadata to “expand” or broaden the content. For example, existing or expertgenerated vocabularies describing alternative representations for “bore hole”, “cadastre” or “tree canopy” could be used to

automatically expand the keywords listed in the metadata records for the cases discussed above. Spatial data also intrinsically contains extra context, be it implied through the geographic extent of where the spatial data itself is or the geographic extent to what the data covers which may be, specifically described in a metadata record. By applying the principles of RDF “triples” to create context in the published metadata, dataset search engine results can be tailored for the end user by looking at the spatial relevance or suitability of a dataset. One example would be describing a dataset’s extent as being “Northam”, a town in the Wheatbelt region of Western Australia. Using RDF compliant vocabularies, a user can query a search engine with a phrase such as “‘in the Wheatbelt” and find said dataset. As such, a user looking to compare data from a set of related geographic areas only needs a single search query, rather than many as is currently required. The Spatial Infrastructures program of FrontierSI has been at the forefront of research in this area for the past several years and new applications, such as Google Dataset Search, show ongoing promise that we are on the right path. For now, FrontierSI is continuing to improve how spatial metadata can better leverage the “web of data”, while supporting Australian data publishers to ready their data for Google Dataset Search.

Figure 2. Represented Graph Data Model.

62

March 2019 ISSUE 5

www.energymagazine.com.au


Turn static files into dynamic content formats.

Create a flipbook