Data models and data specifications

Use of models, scope and specification objectives. Modelling frameworks and applied examples

WELCOME!

With the following slides and interactive material you will be able to take part in the journey to discover Spatial Data Infrastructures (SDI) its components and benefits through the observation of multiple examples and exercises.

You can navigate through the course by pressing the navigation arrows at the bottom of each slide or using your arrow keys on your keyboard. You can move horizontally (← →) for viewing each theme and vertically (↑↓) to view extra recommended information

MetadataData models and data specifications

#	Contents
1	Motivation and Background
2	Scope & objectives of data specifications for SDI’s
3	The modelling framework for data specifications (ISO 19131)
4	Development of data specifications
5	Examples of data models: ISO 19152

SDI components (example of INSPIRE)

Notes for slide 3 At the bottom you have the data components of the infrastructure containing the spatial data itself but also the metadata that describe the data and that make it possible for user to find and to understand the offered data sets. The data component side also holds different registers to store information commonly used by different datasets: code lists, thesauri, and so on...
On top of the data layer we find the service layer containing services that can be called by users and applications to retrieve information on the data (like metadata or in the form of webmaps) or to retrieve the data itself. These services are grouped/collected by the “Service Bus”. In between the services and the service bus there might be a layer managing the access to the different services the Geospatial Rights Management layer (GeoRM).
And completely at the top you’ll have the application and geoportal layer that will be used by the users of data.
The subject of this training module is the part indicated by the red rectangle it concerns the DS
3

Interoperability of data

Motivation and Background

The starting point

Access to spatial data in various ways: copies via CD

User has to deal with interpreting heterogeneous data in different formats, identify, extract and post-process the data needed

→ Lack of interoperability

Examples of incompatibility and inconsistency of spatial data

Semantic and schematic differences

Notes for slide 5 This slide addresses some issues of incompatibility and inconsistency which can happen at the borders (in different MS).
We make a distinction between two types of inconsistencies.
On the one hand, the semantic and schematic differences, illustrated here with two examples
different ways of modelling houses: as individual houses or as building blocks (an aggregation of individual houses)
OR
different classifications for the same entity (industrial zone at one side of the border and Built-up area at the other side of the border.
On the other hand, different spatial representations:
Vector vs raster
2D vs 3D
River as poly or as line (depending on scale but also on the content you want to model and on the use of the data).
Different boundaries  agree to use same geometry
Overlapping and shift
Inconsistency between data themes (a road that does not follow the DEM)
5

Levels of heterogeneity (1)

Syntactic heterogeneity

Data may be implemented in a different syntax of different paradigms, such as relational or object- oriented models. Syntactic heterogeneity is also related to the geometric representation of geographic objects, e.g., raster and vector representations.

Structural or schematic heterogeneity

Objects in one database are considered as properties in another, or object classes can have different aggregation or generalisation hierarchies, although they might describe the same Real World concepts.

Notes for slide 6 We just talked about the lack of interoperability in different data sources.
Let’s have a closer look now to the different levels of heterogeneity.

The first level of heterogeneity is what I call the “Syntactic Heterogeneity”. Data may be implemented in different syntax which can be translated as data that is stored in different formats: shapefile, ascii-file, relational versus object-oriented database models. But it is also related to the geometric representation of geographic objects f.e. some datasets may use the vector approach while other datasets are using the raster approach to depict the same type of objects.
With the existing technologies of today this is not an invincible problem. For most of the cases conversion algorithms are offered within the most commonly used software products. It would be very convenient though that everyone uses the same syntax so conversion will be unnecessary.

The second level of heterogeneity is the Structural or also called the schematic heterogeneity. This has to do with how you model the information in a certain structure inside the database. So even if you use the same syntax (or database format) and you deal with objects/concepts with a semantic meaning then you still can model them in a different way in different schemes. If you ask two people to model a certain piece of information in a same database model, you probably end up with two different database schemes but they cover the same information/semantic concepts. It all concerns the way the information is organised inside the database. You can have different levels of aggregation or detail. Objects in one schema can be considered as properties in another but in the end it concerns the same Real World concepts. You can see a schema as the personal perception of a somebody on a real world concept.
This second level of heterogeneity is also of a technical kind and can be solved by actual technologies that can align different schemas describing the same real world concepts. But again it would be very convenient that everybody applies the same schemas.
6

Levels of heterogeneity (2)

Semantic heterogeneity

A Real World concept may have more than one meaning to comply with various disciplines, giving as a consequence semantic heterogeneity.

e.g. Different classifications/definitions of roads when viewed from different perspectives: traffic network route directions, spatial planning… <> 1 on 1 match

Notes for slide 7 The 3rd level of heterogeneity is the semantic heterogeneity.

This is the most complex level of the three to deal with. It means that people are giving different meanings to the same real world concept. This is not strange and it probably depends on the discipline from which point of view you are looking at the real world concept.
A typical example is the geographical object: road.
From the point of view of the road authority, responsible for the quality of the pavement of the road, the concept road will get a different meaning than looked at it through the eyes of the people responsible for the traffic flows, for them the type of pavement (concrete or asphalt) is not important they are more interested in the connectivity of the roads in order to manage traffic flows, traffic jams and so on.
The same concept can have different meanings and it is the challenge here to align these different meanings. So if in one MS they talk about a certain type of soil we would like to know that this type of soil has the same name, the same meaning in another MS.
7

Interoperability of data (1)

Technical interoperability

should guarantee that system components can interoperate

Semantic interoperability

should guarantee that data content is understood by all in the same way

Notes for slide 8 The scope now is to make data interoperable and to solve the different levels of heterogeneity that we talked about in the previous slides. The result would be that each producer and each user can easily exchange data amongst each other with our without the aid of a coordinator in between.
As I said before establishing technical interoperability in not that difficult because todays applications and software are able help in the harmonisation process. It only requires means and actions.
Implementing semantic interoperability however is something else. Here we have to make sure that the data content is understood by everybody in the same way. It means we all have to agree on using harmonised definitions and classifications for objects while at this moment many different definitions and classification systems exist for the same objects in different countries. The idea would be to map the existing heterogeneity to these universal definitions and classifications so data becomes exchangeable and understood by everybody in the same way.

Technical: already established by many systems
Semantic: still a long way to go. Inspire first step, Linked Data / semantic web
8

Interoperability of data (2)

…what a SDI aiming at

Provide access to spatial data via network services and according to a harmonised data specification to achieve interoperability of data

Datasets used within organizations may remain unchanged

Data or service providers have to provide a transformation between their internal model and the harmonised data specification

How?

Facilitate data use and interoperability by adopting common cross-domain models to exchange data

DATA INTEROPERABILITY

Implementation alternatives

Notes for slide 11 What are now the different implementation alternatives to reach this harmonisation towards the INSPIRE DS? At the start of the INSPIRE story, three different implementation strategies were proposed:
The first one is the on-the-fly transformation of spatial data through download services that automatically transform the source data to the schema of the INSPIRE DS. The data is on the fly processed after the request for the data has been send by the user;
The second proposed implementation strategy, is the offline transformation of spatial data. It means that each data provider transforms his own data into the appropriate INSPIRE DS and offers the result to the user by means of an INSPIRE download service.
The third possibility is the external transformation of spatial data by a separate network service. Here the idea is that the user first downloads the source dataset and makes use of a separate transformation service to transform the data into the INPSIRE DS.
At this point in time, the implementation of INSPIRE is in full process and it became clear that the second option is the most preferred one and also the most implemented strategy.
Why are the other options not so popular?
The first alternative is technically possible but the disadvantage is that the processing happens on-the-fly and this requires time to retrieve the final result, certainly with large datasets.
The last option has the same problem as the first one, but here is also another reason why it is not commonly used in practise: each dataset requires a specific transformation to arrive to the DS, so it means that a specific transformation service must be created for each existing data source, which makes it difficult to keep the overview of all services.
11

Conclusion: common Data Specifications is the goal

Member States should make data available within the scope of INSPIRE using

the same spatial object types (and definitions)
the same attributes (and definitions, types, code lists) and relationships to other types, e.g. BuildingHeight, BuildingSize
a common encoding (GML application schemas)
common portrayal rules

This facilitates interoperability and pan-European/cross-border applications (e.g. information systems, reporting systems, forecasting models)

Targeted benefit

Source: EC Joint Research Centre

Example: key requirements of the INSPIRE directive (1)

Art 3(7): “Interoperability means the possibility for spatial data sets to be combined, and for services to interact, without repetitive manual intervention, in such a way that the result is coherent and the added value of the data sets and services is enhanced”

Art 7(1): “Implementing rules laying down technical arrangements for the interoperability and, where practicable, harmonisation of spatial data sets and services … shall be adopted…. Relevant user requirements, existing initiatives and international standards for the harmonisation of spatial data sets, as well as feasibility and cost-benefit considerations shall be taken into account in the development of the implementing rules.”

Example: key requirements of the INSPIRE directive (2)

Art 8(2): The implementing rules shall address the following aspects of spatial data:

(a) a common framework for the unique identification of spatial objects, to which identifiers under national systems can be mapped in order to ensure interoperability between them;

(b) the relationship between spatial objects;

(c) the key attributes and the corresponding multilingual thesauri commonly required for policies which may have an impact on the environment;

(d) information on the temporal dimension of the data;

(e) updates of the data.

02 | Scope & objectives of data specifications for SDI’s

Thematic scope

SDI data scope

Scope is spatial data – not all kinds of thematic/descriptive data

Re-use the INSPIRE data specs for own usage

Extensions
Additional constraints
Re-use of common objects

Notes for slide 18 Although we tend to attach a lot of thematic data to the spatial data, this is not the scope of INSPIRE.
We have seen now the thematic domains that INSPIRE focusses on. But the next question is: What kind of thematic data falls within the scope? Here we often can notice misinterpretations of the directive The data scope of INSPIRE is strictly “Spatial data” and not all kind of thematic data. It really concerns the spatial objects that can be identified within a certain thematic domain or within multiple domains, including some crucial key attributes necessary to describe these objects. Business data, which mostly is non-spatial but which can be spatially referenced, is out of scope. As an example: Information from water quality measurements in surface water is out of scope but the spatial objects delineating these surface waters are definitely within the scope.
MS are encouraged to re-use the INSPIRE DS for their own usage and there is a possibility to extend them for example to include business data, but it is not an obligation.
18

Exercise 1: Find your scope

Go to INSPIRE website https://inspire.ec.europa.eu/inspire-tools

Use the tool “Find your scope” (toolkit):

In catalogue of INSPIRE objects:
- find “zone” --> limit to only “Spatial object type” --> narrow search “terrestrial zone”
- Which Object, INSPIRE Data Theme, Application Schema
- What are the other possible specialisations of TransportArea?

To find the Spatial object that should be used for a dataset that stores the locations of stations where magnetic measurements are performed. (use “Direct Search”)

Find your own scope…

Example of INSPIRE

Exercise 1: Result (1)

PortArea – Transport Networks – Water TN

Exercise 1: Result (2)

Direct search: “magnetic field”

Relevant objects? Observed Event (NZ) vs Geoph Station (GE - Geophysics)

03 | The modelling framework for data specifications (ISO 19131)

Data harmonisation and data specs aspects

Example from

INSPIRE

Harmonisation General Principles

The modelling framework

Notes for slide 25 We can distinguish some categories in the list of harmonisation aspects. First of all there are some “General principles”:
Of course these include the INSPIRE principles saying that:
spatial data must be stored and made available on the most appropriate level;
That it must be possible to share and combine spatial data from different sources in a consistent way;
another principle is the use of consistent language when referring to terms to overcome semantic problems. This can be reached by making use of a glossary with common definitions.
but also the possibility to fall back on a reference model is one of the base principles. It is the framework of all the technical parts that support us in modelling information and administration of data;

INSPIRE principles: Data harmonisation is a methodology to reach these goals.
Terminolgy: This component will support the use of a consistent language when referring to terms via a glossary. This needs to be registered and managed through change control with multi-lingual support. The ESDI needs to select a common terminology from all of the existing terminologies and/or their translations.
Ref Model: This component will define the framework of the technical parts including topics like information modelling (i.e. conceptual modelling framework with rules for application schemas) and data administration (i.e. reference systems). It will provide a structure which allows the components of INSPIRE which are related to data specifications to be described in a consistent manner.
25

Harmonisation Schemas

Notes for slide 26 Beside the general principles we also have some principles related to schemas:

One aspect will deal with the question: How to describe these schemas? If everybody is allowed to use his own way of documenting these schemas, we will end up again with an undesired heterogeneity. So there must be some general rules that define a common way of documenting. The rule is to establish feature catalogues which define the types of spatial objects and their properties. While the application schema gives a full description of the content and structure expressed in a conceptual schema language like UML.

Harmonising data also means dealing with spatial and temporal aspects. There is a need for common information on spatial geometries and topology (what are the geometry types to be used and what are their characteristics) but also on the way temporal characteristics of data will be managed.

When talking about exchanging spatial data we cannot go around the aspect of coordinate referencing. So instructions/guidelines/recommendations towards common European reference systems, including the definition of European geographical grids, are needed for creating a harmonised data space across Europe.

To avoid storing geometries more than once, it is necessary to have a mechanism to reference information to existing base-topographic/spatial objects, f.e. you can use the geometry of the theme “buildings” to add other thematic information like Industrial facilities by referencing the relevant geometry instead of redefining it. How this should be done, is tackled in the GCM by the aspect of Object Reference Modelling.

A feature catalogue define the types of spatial objects and their properties (attributes, association roles, operations) as well as constraints and are required when turning the data into usable information.
An application schema is the full description of the contents and structure of a spatial dataset is given by the which is expressed in a formal conceptual schema language.
The feature catalogue defines the meaning of the spatial object types and their properties while the application schema describes the formal structure. Text elements in the feature catalogues should be maintained at least in the official European languages.
26

Harmonisation Translations

Harmonisation Identification

Notes for slide 28 Next category of harmonisation aspects is concerning the identification of objects.

The identifier management is one of the key issues in INSPIRE. It states that every spatial object (at least for annex I, II) should get a unique and persistent identifier (i.e. INPSIRE identifier). All annex III themes followed this good practise although it was initially not required. The “INSPIRE id” makes it possible to reference each object by means of its identifier.

Registers are very important in the harmonisation process. They are functioning as the libraries/dictionaries that hold the information commonly agreed on. There will be registers for listing possible reference systems, listing possible “UOM”s, code lists (i.e. values of classifications) used in the different themes, thesauri, …
Those registers will become available through registry services so they can be used by other models or applications.

Metadata play also an important part in the identification of datasets and services. Information stored in the MD gives the user an idea of what he may expect from the described dataset of service. There are different levels of MD as we will see later on in this module.
28

Harmonisation Data Quality

Notes for slide 29 A last category of harmonisation aspects concerns “data quality” elements.
One of these elements focusses on the maintenance of data products and spatial objects within those products. How to deal with updates? What are the best practises for versioning of objects (introducing a new object and deprecating the old one).
the quality component will advise the need to publish quality levels of each spatial dataset using the criteria defined in the ISO 19100 series of standards, including completeness, consistency, currency and accuracy. This will include methods of best practice in publishing Acceptable Quality Levels…
in this category also consistency between data is addressed and this in terms of format, logical and topological accordance, and so on.
DQ concerns also multiple representations or the best practises of how data can be aggregated:
over time and space;
but also across different resolutions also called generalisation of data.

Maintenance: This component will define best practice in ensuring that application data can be managed against updates of reference information without interruption of services. This will require the definition of mechanisms by different stakeholder areas to manage where this is required and it is feasible. RSS feeds for change information?

Quality: This component will advise the need to publish quality levels of each spatial dataset using the criteria defined in the ISO 19100 series of standards, including completeness, consistency, currency and accuracy. This will include methods of best practice in publishing AQL etc

Consistency: Format, logical, topological etc

- Multiple representations
- Derived reporting (example: typically water samples at 1 km intervals are reported to the European level)
29

Harmonisation Other aspects

Notes for slide 30 All aspects on this slide do not fit in one of the previous categories however they cannot be disregarded when it concerns data harmonisation.
The data transfer aspect focusses mainly on the encoding of data. As said before, within the INSPIRE framework, GML is seen as the standard for encoding data. But for coverage data this might not be the proper format to exchange data. Therefor the GCM will give some guidance in alternative encoding mechanisms.
The data capturing aspect covers the DS-specific criteria regarding, which spatial objects are to be taken on board (in scope or out of scope) or which coordinates will represent certain spatial objects. Also a certain accuracy of data capture can be required.
And last but not least the conformance aspect. For a dataset to be declared as INSPIRE conformant it needs to pass conformance tests as specified in the individual DS of the theme it belongs to.

This overview of harmonisation aspects is not to bore you but they all come back in the final DS documents throughout the different chapters. Some of these aspects are a repetition from one of the general framework documents (like GCM, guidelines for O&M) but others are described by the Data Specification document itself.
30

Some wrap-up questions

The modelling framework

Where does the abbreviation GCM stands for (used in the INSPIRE context)?

How many thematic domains, divided in how many annexes, are addressed by the INSPIRE directive?

Which kind of heterogeneity?

What are next from “Encoding” and “Harmonised vocabularies” the other two major cornerstones of data interoperability?

04 | Development of data specifications

What is a data specification?

DATA SPECIFICATION

Synonym to data product specification

Detailed description of a data set or data set series together with additional information that will enable it to be created, supplied to and used by another party

[ISO 19131]

Data Specification

Step-wise methodology

Notes for slide 34 The creation process of such INSPIRE DS followed a step-wise methodology with moments of feedback and with the involvement of the relevant stakeholders. Here on this slide you can see the workflow of the process indicating as well the feedback possibilities (iterations) between the different steps.
It started with the development of relevant “use cases” from these use cases, user requirements were drafted and the spatial objects were identified. In parallel an as-is analysis was made of the existing datasets. Based on those requirements and the description of the real situation, gaps were identified and the results of all three processes were used to draft the first version of DS. Then followed a implementation, testing and validation phase after which a new round of feedback was possible and the comments were used to fine tune the DS.
To give you an idea of the duration of this process: For the annex II and III themes, that were developed in parallel with each other, it took approximately 2 years until the final DS were ready for adoption.
In the coming slides I will explain the separate steps in more detail.
34

Use case development

Step 1

Major sources are:

European environmental policies
User requirements survey
SDIC/LMO reference material
EU-funded initiatives and projects

Notes for slide 35 The first step in the development process was the defining of relevant use cases.
Many stakeholders could propose Use Cases that should be taken into account for the further development process. The major sources on which the UC were inspired, are listed here.
Relevant UC could be derived from or influenced by:
existing European environmental policies, fe MS have already certain reporting obligations concerning environmental information;
a survey on user requirements;
reference material provided by Spatial Data Interest Communities (SDIC) and Legally Mandated Organisations (LMO). Both groups were already involved in other aspects of the INSPIRE framework. This material was based on real-life implementations of certain themes, which gave a good insight on the how and why datasets were modelled in a specific way.
similar EU-funded initiatives and projects.
This all illustrates that the DS did not come out of the bleu, a lot of material came from existing situations and had to be considered as UC to serve as a base of the DS development.
35

Identification of user requirements and spatial object types

Step 2

Identify requirements on:

the data content

metadata, data quality, portrayal and other elements of the data specification

As-is analysis

Step 3

Analyse the current situation regarding spatial data sets for the theme, based on:

the reference material
existing internationally standardised data specifications
expertise/field experts

Example of As-is analysis

Gap analysis

Step 4

Compare identified data sources with identified user requirements

Data specification development

Step 5

The data specifications must be designed to ensure easy mapping between existing data and the harmonised data specification.

Consider:

No excessive costs
No collection of new data!

Implementation, validation and Cost-Benefit Analysis

Development

Step 6-7

review process
test under real world conditions
analyse costs and benefits

Final round of harmonisation

Which level of harmonisation is “just right”?

Requires:

an iterative process
well-established requirements
good understanding of the existing geographic information
testing and validation

Result

Data specification for all Annex Themes

Textual description of the data model
UML model
GML application schema

05 | Examples of data models: ISO 19152

CP - Scope

The scope of the cadastral information in the INSPIRE context is limited to the geographic side of the cadastral information systems (land administration)

INSPIRE does not aim at harmonising the concepts of ownership and rights related to the parcels

Cadastral parcels should serve the purpose of generic information locators. Having included the reference to the national registers as a property (attribute) of the INSPIRE parcels, national data sources can be reached.

If you look at the scope of CP as listed in the DS document you can see that in the INSPIRE context cadastral information is limited to the geographic side of the cadastral information systems (managed by a land administration). So all business information related to the geographic component is out of scope and should not be made available throughout INSPIRE.
Inspire does not aim at harmonising the concepts of ownership and rights related to the parcels. So this is also not modelled within INSPIRE.
Cadastral parcels should serve the purpose of generic information locators. This means that it is sufficient to include only a reference with a code or a key (as an attribute of a parcel object) to the national registers in order to reach the information stored in the national data sources.

CP - Bacground

All countries run a register
- Usually a partition of the country with exceptions
Basic unit of the system is the parcel

The cadastral parcels should be, as much as possible, single areas of Earth surface (land and/or water) under homogenous real property rights and unique ownership, where real property rights and ownership are defined by national laws.

CP - Basic components

Parcel (basic unit)

Subdivision (municipalities, sections, districts, parishes, urban or rural blocks, etc)
- Carry information for the parcels inside the subdivision: accuracy or scale
Cadastral boundaries
- Only neccessary if spatial accuracy is associated with them

CP - Application schema

CP – Feature types

CadastralParcel (mandatory)
CadastralZoning (auxiliary)
CadastralBoundary (auxiliary)
BasicPropertyUnit (auxiliary)

Core Profile

Notes for slide 49 Here we see a list of the four classes as feature types (i.e. spatial identifiable objects). And we can see that there is only 1 feature type mandatory (cadastral parcel) all the rest is auxiliary (mandatory in specific conditions) but the core profile is actually limited to the Cadastral parcel feature type.
This is a good example that implementing INSPIRE DS is not so complex as often is argued. If we look to the attributes that belong to a CP, we see:
geometry
inspireID: External object identifier of the spatial object
Label: Text commonly used to display the cadastral parcel identification
nationalCadastralReference: Thematic identifier at national level of the cadastral parcel which ensures the link to the national cadastral register or equivalent.
Some voidable properties
areaValue: Registered area value
referencePoint: point within the parcel used for label placement
validFrom and validTo: Official date and time when the cadastral parcel is legally established and deprecated;
two lifeCycleInfo properties:
beginLifespanVersion and endLifespanVersion: Date and time at which this version of the spatial object was inserted or changed in the spatial data set and the date when it was superseded or retired
Apart from the attributes you can see in the last compartment of the class that there are also some constraints defined which put some extra limitations on the content of some attributes.
49

CP - Requirements & Recommendations

On this slide you see an IR requirement (remember the style, red and the double border), and some recommendations that were defined for the CP theme.
The IR requirement makes the thematic identifier in the form of the nationalCadastralReference mandatory for a CP. It forms the link to the national cadastral registers with ownership and rights information.
There is a recommendation that advises to provide the geometry as a GM_Surface (ie a concept from ISO to model a polygon);
There are some topological recommendations that say to avoid topological overlaps and gaps between CPs. You can wonder why this is only a recommendation and not a requirement because for such a dataset as CP you would not expect such a weak demand on the topological quality.
But not all MS have National cadastral systems with that quality yet, this has to do with history with the inventory method of the parcels. In many countries there is already a digital layer but that does not have any legal value yet and the only legally binding document is an informative description in a kind of codex which does not avoid topological correctness. That is why this was only taken on board as a recommendation and not as a requirement.

CP - Requirements & Recommendations

CP - Geometry

0-, 1-,2-,2,5 dimensional geometries

CP - Enumerations/codelists

Land Administration Domain Model

ISO 19152

Major packages of the data model

Party
Administrative
Spatial unit
Surveying and representation

Reference list

ISO/TC 211 Geographic information/Geomatics. (2007). ISO 19131:2007. ISO. Retrieved October 19, 2020, from https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/03/67/36760.html

Brodeur, J., & Badard, T. (2008). Modeling with ISO 191xx Standards. In S. Shekhar & H. Xiong (Eds.), Encyclopedia of GIS (pp. 705–716). Springer US. https://doi.org/10.1007/978-0-387-35973-1_811

ISO, I. (2015). 19109: 2015 Geographic information-Rules for application schema. International Organisation for Standardization.