Data models and data specifications
Use of models, scope and specification objectives. Modelling frameworks and applied examples
WELCOME!
With the following slides and interactive material you will be able to take part in the journey to discover Spatial Data Infrastructures (SDI ) its components and benefits through the observation of multiple examples and exercises.
You can navigate through the course by pressing the navigation arrows at the bottom of each slide or using your arrow keys on your keyboard. You can move horizontally ( ← →) for viewing each theme and vertically (↑↓) to view extra recommended information
MetadataData models and data specifications
# Contents
1 Motivation and Background
2 Scope & objectives of data specifications for SDI’s
3 The modelling framework for data specifications (ISO 19131)
4 Development of data specifications
5 Examples of data models: ISO 19152
Interoperability of data
Motivation and Background
Access to spatial data in various ways: copies via CD
User has to deal with interpreting heterogeneous data in different formats, identify, extract and post-process the data needed
→ Lack of interoperability
Notes for slide 4
This slide shows the situation when there is no interoperability at all, this reflects how it used to be.
A user had to search for input data by contacting different data producers, all having their own rules for access and use of their data products. In many cases these datasets were retrieved in different formats (f.e. shapefile, ASCII-files, …) and it was up to the user to interpret this heterogeneous data and to identify, extract and post process the data in order to end up with a harmonised dataset to be used for further analysis.
So, in the future this is something we would like to solve. The whole process of searching, retrieving and combining different data sources should be made much easier and preferably in automatic way. 4
Examples of incompatibility and inconsistency of spatial data
Semantic and schematic differences
Semantic and schematic differences
Notes for slide 5
This slide addresses some issues of incompatibility and inconsistency which can happen at the borders (in different MS).
We make a distinction between two types of inconsistencies.
On the one hand, the semantic and schematic differences, illustrated here with two examples
different ways of modelling houses: as individual houses or as building blocks (an aggregation of individual houses)
OR
different classifications for the same entity (industrial zone at one side of the border and Built-up area at the other side of the border.
On the other hand, different spatial representations:
Vector vs raster
2D vs 3D
River as poly or as line (depending on scale but also on the content you want to model and on the use of the data).
Different boundaries agree to use same geometry
Overlapping and shift
Inconsistency between data themes (a road that does not follow the DEM) 5
Levels of heterogeneity (1)
Syntactic heterogeneity
Data may be implemented in a different syntax of different paradigms, such as relational or object- oriented models. Syntactic heterogeneity is also related to the geometric representation of geographic objects, e.g., raster and vector representations.
Structural or schematic heterogeneity
Objects in one database are considered as properties in another, or object classes can have different aggregation or generalisation hierarchies, although they might describe the same Real World concepts.
Notes for slide 6
We just talked about the lack of interoperability in different data sources.
Let’s have a closer look now to the different levels of heterogeneity.
The first level of heterogeneity is what I call the “Syntactic Heterogeneity”. Data may be implemented in different syntax which can be translated as data that is stored in different formats: shapefile, ascii-file, relational versus object-oriented database models. But it is also related to the geometric representation of geographic objects f.e. some datasets may use the vector approach while other datasets are using the raster approach to depict the same type of objects.
With the existing technologies of today this is not an invincible problem. For most of the cases conversion algorithms are offered within the most commonly used software products. It would be very convenient though that everyone uses the same syntax so conversion will be unnecessary.
The second level of heterogeneity is the Structural or also called the schematic heterogeneity. This has to do with how you model the information in a certain structure inside the database. So even if you use the same syntax (or database format) and you deal with objects/concepts with a semantic meaning then you still can model them in a different way in different schemes. If you ask two people to model a certain piece of information in a same database model, you probably end up with two different database schemes but they cover the same information/semantic concepts. It all concerns the way the information is organised inside the database. You can have different levels of aggregation or detail. Objects in one schema can be considered as properties in another but in the end it concerns the same Real World concepts. You can see a schema as the personal perception of a somebody on a real world concept.
This second level of heterogeneity is also of a technical kind and can be solved by actual technologies that can align different schemas describing the same real world concepts. But again it would be very convenient that everybody applies the same schemas. 6
Levels of heterogeneity (2)
Semantic heterogeneity
A Real World concept may have more than one meaning to comply with various disciplines, giving as a consequence semantic heterogeneity.
e.g. Different classifications/definitions of roads when viewed from different perspectives: traffic network route directions, spatial planning… <> 1 on 1 match
Notes for slide 7
The 3rd level of heterogeneity is the semantic heterogeneity.
This is the most complex level of the three to deal with. It means that people are giving different meanings to the same real world concept. This is not strange and it probably depends on the discipline from which point of view you are looking at the real world concept.
A typical example is the geographical object: road.
From the point of view of the road authority, responsible for the quality of the pavement of the road, the concept road will get a different meaning than looked at it through the eyes of the people responsible for the traffic flows, for them the type of pavement (concrete or asphalt) is not important they are more interested in the connectivity of the roads in order to manage traffic flows, traffic jams and so on.
The same concept can have different meanings and it is the challenge here to align these different meanings. So if in one MS they talk about a certain type of soil we would like to know that this type of soil has the same name, the same meaning in another MS.
7
Interoperability of data (1)
Technical interoperability
should guarantee that system components can interoperate
Semantic interoperability
should guarantee that data content is understood by all in the same way
Notes for slide 8
The scope now is to make data interoperable and to solve the different levels of heterogeneity that we talked about in the previous slides. The result would be that each producer and each user can easily exchange data amongst each other with our without the aid of a coordinator in between.
As I said before establishing technical interoperability in not that difficult because todays applications and software are able help in the harmonisation process. It only requires means and actions.
Implementing semantic interoperability however is something else. Here we have to make sure that the data content is understood by everybody in the same way. It means we all have to agree on using harmonised definitions and classifications for objects while at this moment many different definitions and classification systems exist for the same objects in different countries. The idea would be to map the existing heterogeneity to these universal definitions and classifications so data becomes exchangeable and understood by everybody in the same way.
Technical: already established by many systems
Semantic: still a long way to go. Inspire first step, Linked Data / semantic web 8
Interoperability of data (2)
Provide access to spatial data via network services and according to a harmonised data specification to achieve interoperability of data
Datasets used within organizations may remain unchanged
Data or service providers have to provide a transformation between their internal model and the harmonised data specification
Notes for slide 9
This slide shows in a graphical way what INSPIRE is aiming at:
Instead that every user is searching with his own means for a dataset of his needs, spending a lot of time in pre-processing it to the desired format in order to combine it with other datasets, INSPIRE would offer access to spatial data via network services and the data that would be made available follows the structure of a harmonised data specification to achieve interoperability.
There is an important remark to make about this provision of interoperability. The MS are not obliged to modify the datasets that are internally used. But at least the data providers should offer a transformation between their internal model and the harmonised data specification which can be exchanged. 9
How?
Facilitate data use and interoperability by adopting common cross-domain models to exchange data
DATA INTEROPERABILITY
Notes for slide 10
How can we make this facilitation of data access, data use and interoperability work. It is through defining common cross-domain models that will be used to exchange the data. You can compare it with selecting the right plug that fits the socket. 10
Implementation alternatives
Notes for slide 11
What are now the different implementation alternatives to reach this harmonisation towards the INSPIRE DS? At the start of the INSPIRE story, three different implementation strategies were proposed:
The first one is the on-the-fly transformation of spatial data through download services that automatically transform the source data to the schema of the INSPIRE DS. The data is on the fly processed after the request for the data has been send by the user;
The second proposed implementation strategy, is the offline transformation of spatial data. It means that each data provider transforms his own data into the appropriate INSPIRE DS and offers the result to the user by means of an INSPIRE download service.
The third possibility is the external transformation of spatial data by a separate network service. Here the idea is that the user first downloads the source dataset and makes use of a separate transformation service to transform the data into the INPSIRE DS.
At this point in time, the implementation of INSPIRE is in full process and it became clear that the second option is the most preferred one and also the most implemented strategy.
Why are the other options not so popular?
The first alternative is technically possible but the disadvantage is that the processing happens on-the-fly and this requires time to retrieve the final result, certainly with large datasets.
The last option has the same problem as the first one, but here is also another reason why it is not commonly used in practise: each dataset requires a specific transformation to arrive to the DS, so it means that a specific transformation service must be created for each existing data source, which makes it difficult to keep the overview of all services. 11
Conclusion: common Data Specifications is the goal
Member States should make data available within the scope of INSPIRE using
the same spatial object types (and definitions)
the same attributes (and definitions, types, code lists) and relationships to other types, e.g. BuildingHeight, BuildingSize
a common encoding (GML application schemas)
common portrayal rules
This facilitates interoperability and pan-European/cross-border applications (e.g. information systems, reporting systems, forecasting models)
Notes for slide 12
The common data specification goal can be resumed as follows:
When MSs make their spatial data available within the scope of INSPIRE it means by using the same spatial object types, same attributes and relationships with other types, using a common GML encoding and common portrayal rules as specified in the DS technical guidelines, then interoperability will be facilitated. This will result in valuable Pan-European/cross-border applications which can benefit from the availability of harmonised datasets. Applications that can be useful to policy makers or applications useful to risk management and so on…
12
Targeted benefit
Source: EC Joint Research Centre
Notes for slide 13
Picture of the ideal world as we would like to have it.
We see a situation where there is cross-domain/ cross-sector interoperability of spatial data. This makes it easy to exchange and combine different datasets from different sectors in order to perform all kinds of analysis.
In reality, we ‘re still far away from it.
Benefits
Improved data comparability and consistency,
Cross-sector data interoperability
Cross-border interoperability
Reducing barriers among organisations,
Promoting inter-institutional collaboration 13
Example: key requirements of the INSPIRE directive (1)
Art 3(7): “Interoperability means the possibility for spatial data sets to be combined , and for services to interact, without repetitive manual intervention , in such a way that the result is coherent and the added value of the data sets and services is enhanced”
Art 7(1): “Implementing rules laying down technical arrangements for the interoperability and, where practicable, harmonisation of spatial data sets and services … shall be adopted…. Relevant user requirements, existing initiatives and international standards for the harmonisation of spatial data sets, as well as feasibility and cost-benefit considerations shall be taken into account in the development of the implementing rules.”
Notes for slide 14
Without repetitive manual intervention: no need for preprocessing anymore
Coherent: it means that the result can be interpreted in its totality
In another article we can read that INSPIRE will lay down implementing rules for reaching this interoperability and where practicable harmonization of spatial datasets. To develop these rules existing initiatives and especially international standards for the harmonization of spatial datasets will be taken into account.
14
Example: key requirements of the INSPIRE directive (2)
Art 8(2): The implementing rules shall address the following aspects of spatial data:
(a) a common framework for the unique identification of spatial objects , to which identifiers under national systems can be mapped in order to ensure interoperability between them;
(b) the relationship between spatial objects;
(c) the key attributes and the corresponding multilingual thesauri commonly required for policies which may have an impact on the environment;
(d) information on the temporal dimension of the data;
(e) updates of the data.
Notes for slide 15
In article 8 we see that these implementing rules will address a certain number of aspects of spatial data:
One of these aspects is the use of a common framework for the unique identification of spatial objects, which means that each spatial object will be identifiable through a persistent and unique key.
Another important aspect is that there will be some key attributes defined for every geographical object and that multilingual thesauri will be required for different policies which may have an impact on the environment.
Based on this articles we can say that INSPIRE is about making existing heterogeneous data in different MS, exchangeable and interoperable within the whole European Union.
15
02 | Scope & objectives of data specifications for SDI’s
Thematic scope
Notes for slide 17
In general we can say that INSPIRE is aiming at thematic domains that concern environmental policies - public sector data. This slide shows the complete list of thematic domains (34 in total) which are grouped in 3 annexes because of different priorities, different timeline and deadlines to implement data components of the INSPIRE directive.
Annex 1 themes are the basic reference data and they were the first themes to be modelled because themes from the other annexes could depend on them. 17
SDI data scope
Scope is spatial data – not all kinds of thematic/descriptive data
Re-use the INSPIRE data specs for own usage
Extensions
Additional constraints
Re-use of common objects
Notes for slide 18
Although we tend to attach a lot of thematic data to the spatial data, this is not the scope of INSPIRE.
We have seen now the thematic domains that INSPIRE focusses on. But the next question is: What kind of thematic data falls within the scope? Here we often can notice misinterpretations of the directive The data scope of INSPIRE is strictly “Spatial data” and not all kind of thematic data. It really concerns the spatial objects that can be identified within a certain thematic domain or within multiple domains, including some crucial key attributes necessary to describe these objects. Business data, which mostly is non-spatial but which can be spatially referenced, is out of scope. As an example: Information from water quality measurements in surface water is out of scope but the spatial objects delineating these surface waters are definitely within the scope.
MS are encouraged to re-use the INSPIRE DS for their own usage and there is a possibility to extend them for example to include business data, but it is not an obligation.
18
Exercise 1: Find your scope
Go to INSPIRE website https://inspire.ec.europa.eu/inspire-tools
Use the tool “Find your scope” (toolkit):
In catalogue of INSPIRE objects:
find “zone” --> limit to only “Spatial object type” --> narrow search “terrestrial zone”
Which Object, INSPIRE Data Theme, Application Schema
What are the other possible specialisations of TransportArea?
To find the Spatial object that should be used for a dataset that stores the locations of stations where magnetic measurements are performed. (use “Direct Search”)
Find your own scope…
1
Exercise 1: Result (1)
PortArea – Transport Networks – Water TN
Exercise 1: Result (2)
Direct search: “magnetic field”
Relevant objects? Observed Event (NZ) vs Geoph Station (GE - Geophysics)
03 | The modelling framework for data specifications (ISO 19131)
Data harmonisation and data specs aspects
Notes for slide 24
We already saw that the GCM is the guidance for good modelling practises for all DS. The result of the modelling should be a harmonised set of DS for all themes.
This harmonisation can be reached by following the different harmonisation aspects covered in the GCM and they are listed on this slide (from the INSPIRE principles over identifier management, data capturing requirements to conformance issues). In the next slides we will highlight some of the most important harmonisation aspects.
24
Harmonisation General Principles
Notes for slide 25
We can distinguish some categories in the list of harmonisation aspects. First of all there are some “General principles”:
Of course these include the INSPIRE principles saying that:
spatial data must be stored and made available on the most appropriate level;
That it must be possible to share and combine spatial data from different sources in a consistent way;
another principle is the use of consistent language when referring to terms to overcome semantic problems. This can be reached by making use of a glossary with common definitions.
but also the possibility to fall back on a reference model is one of the base principles. It is the framework of all the technical parts that support us in modelling information and administration of data;
INSPIRE principles: Data harmonisation is a methodology to reach these goals.
Terminolgy: This component will support the use of a consistent language when referring to terms via a glossary. This needs to be registered and managed through change control with multi-lingual support. The ESDI needs to select a common terminology from all of the existing terminologies and/or their translations.
Ref Model: This component will define the framework of the technical parts including topics like information modelling (i.e. conceptual modelling framework with rules for application schemas) and data administration (i.e. reference systems). It will provide a structure which allows the components of INSPIRE which are related to data specifications to be described in a consistent manner.
25
Harmonisation Schemas
Notes for slide 26
Beside the general principles we also have some principles related to schemas:
One aspect will deal with the question: How to describe these schemas? If everybody is allowed to use his own way of documenting these schemas, we will end up again with an undesired heterogeneity. So there must be some general rules that define a common way of documenting. The rule is to establish feature catalogues which define the types of spatial objects and their properties. While the application schema gives a full description of the content and structure expressed in a conceptual schema language like UML.
Harmonising data also means dealing with spatial and temporal aspects. There is a need for common information on spatial geometries and topology (what are the geometry types to be used and what are their characteristics) but also on the way temporal characteristics of data will be managed.
When talking about exchanging spatial data we cannot go around the aspect of coordinate referencing. So instructions/guidelines/recommendations towards common European reference systems, including the definition of European geographical grids, are needed for creating a harmonised data space across Europe.
To avoid storing geometries more than once, it is necessary to have a mechanism to reference information to existing base-topographic/spatial objects, f.e. you can use the geometry of the theme “buildings” to add other thematic information like Industrial facilities by referencing the relevant geometry instead of redefining it. How this should be done, is tackled in the GCM by the aspect of Object Reference Modelling.
A feature catalogue define the types of spatial objects and their properties (attributes, association roles, operations) as well as constraints and are required when turning the data into usable information.
An application schema is the full description of the contents and structure of a spatial dataset is given by the which is expressed in a formal conceptual schema language.
The feature catalogue defines the meaning of the spatial object types and their properties while the application schema describes the formal structure. Text elements in the feature catalogues should be maintained at least in the official European languages. 26
Harmonisation Translations
Notes for slide 27
The following slide brings us to the category of translation aspects.
As the INSPIRE directive will be applicable in all European MS. It is logical, in order to make the DS understandable by all MS, that translations are made in all official European languages. This at least for the most crucial information.
Data translation from a local application schema to the INSPIRE application schema of the relevant theme is also addressed in the GCM.
Another kind of translation is the “portrayal model”. This will clarify how standardised portrayal catalogues can be used to harmonised the portrayal of data, i.e. object of the same type will be displayed with the same symbol even if they come from different data sources. 27
Harmonisation Identification
Notes for slide 28
Next category of harmonisation aspects is concerning the identification of objects.
The identifier management is one of the key issues in INSPIRE. It states that every spatial object (at least for annex I, II) should get a unique and persistent identifier (i.e. INPSIRE identifier). All annex III themes followed this good practise although it was initially not required. The “INSPIRE id” makes it possible to reference each object by means of its identifier.
Registers are very important in the harmonisation process. They are functioning as the libraries/dictionaries that hold the information commonly agreed on. There will be registers for listing possible reference systems, listing possible “UOM”s, code lists (i.e. values of classifications) used in the different themes, thesauri, …
Those registers will become available through registry services so they can be used by other models or applications.
Metadata play also an important part in the identification of datasets and services. Information stored in the MD gives the user an idea of what he may expect from the described dataset of service. There are different levels of MD as we will see later on in this module.
28
Harmonisation Data Quality
Notes for slide 29
A last category of harmonisation aspects concerns “data quality” elements.
One of these elements focusses on the maintenance of data products and spatial objects within those products. How to deal with updates? What are the best practises for versioning of objects (introducing a new object and deprecating the old one).
the quality component will advise the need to publish quality levels of each spatial dataset using the criteria defined in the ISO 19100 series of standards, including completeness, consistency, currency and accuracy. This will include methods of best practice in publishing Acceptable Quality Levels…
in this category also consistency between data is addressed and this in terms of format, logical and topological accordance, and so on.
DQ concerns also multiple representations or the best practises of how data can be aggregated:
over time and space;
but also across different resolutions also called generalisation of data.
Maintenance: This component will define best practice in ensuring that application data can be managed against updates of reference information without interruption of services. This will require the definition of mechanisms by different stakeholder areas to manage where this is required and it is feasible. RSS feeds for change information?
Quality: This component will advise the need to publish quality levels of each spatial dataset using the criteria defined in the ISO 19100 series of standards, including completeness, consistency, currency and accuracy. This will include methods of best practice in publishing AQL etc
Consistency: Format, logical, topological etc
- Multiple representations
- Derived reporting (example: typically water samples at 1 km intervals are reported to the European level) 29
Harmonisation Other aspects
Notes for slide 30
All aspects on this slide do not fit in one of the previous categories however they cannot be disregarded when it concerns data harmonisation.
The data transfer aspect focusses mainly on the encoding of data. As said before, within the INSPIRE framework, GML is seen as the standard for encoding data. But for coverage data this might not be the proper format to exchange data. Therefor the GCM will give some guidance in alternative encoding mechanisms.
The data capturing aspect covers the DS-specific criteria regarding, which spatial objects are to be taken on board (in scope or out of scope) or which coordinates will represent certain spatial objects. Also a certain accuracy of data capture can be required.
And last but not least the conformance aspect. For a dataset to be declared as INSPIRE conformant it needs to pass conformance tests as specified in the individual DS of the theme it belongs to.
This overview of harmonisation aspects is not to bore you but they all come back in the final DS documents throughout the different chapters. Some of these aspects are a repetition from one of the general framework documents (like GCM, guidelines for O&M) but others are described by the Data Specification document itself.
30
Some wrap-up questions
The modelling framework
Where does the abbreviation GCM stands for (used in the INSPIRE context)?
How many thematic domains, divided in how many annexes, are addressed by the INSPIRE directive?
Which kind of heterogeneity?
What are next from “Encoding” and “Harmonised vocabularies” the other two major cornerstones of data interoperability?
04 | Development of data specifications
Let’s have a quick look now to the development process of the DS in part 3 of this module.
The knowledge of the process is not crucial to understand the DS but it gives you some useful background on the different steps followed and the involvement of different people and communities.
What is a data specification?
DATA SPECIFICATION
II
Synonym to data product specification
Detailed description of a data set or data set series together with additional information that will enable it to be created, supplied to and used by another party
[ISO 19131]
Notes for slide 33
We already talked a lot about DS:
What are their scope and objectives?
What are the building blocks and the principles behind them?
But what is now a proper definition of a DS, also called a data product specification?
Detailed description of a data set or data set series together with additional information that will enable it to be created, supplied to and used by another party [ISO 19131]
Or in human language and applied to the INSPIRE framework: It is the user manual for anyone that has to create, modify a dataset within the scope of INSPIRE. 33
Data Specification
Notes for slide 34
The creation process of such INSPIRE DS followed a step-wise methodology with moments of feedback and with the involvement of the relevant stakeholders. Here on this slide you can see the workflow of the process indicating as well the feedback possibilities (iterations) between the different steps.
It started with the development of relevant “use cases” from these use cases, user requirements were drafted and the spatial objects were identified. In parallel an as-is analysis was made of the existing datasets. Based on those requirements and the description of the real situation, gaps were identified and the results of all three processes were used to draft the first version of DS. Then followed a implementation, testing and validation phase after which a new round of feedback was possible and the comments were used to fine tune the DS.
To give you an idea of the duration of this process: For the annex II and III themes, that were developed in parallel with each other, it took approximately 2 years until the final DS were ready for adoption.
In the coming slides I will explain the separate steps in more detail. 34
Use case development
Step 1
Major sources are:
European environmental policies
User requirements survey
SDIC/LMO reference material
EU-funded initiatives and projects
Notes for slide 35
The first step in the development process was the defining of relevant use cases.
Many stakeholders could propose Use Cases that should be taken into account for the further development process. The major sources on which the UC were inspired, are listed here.
Relevant UC could be derived from or influenced by:
existing European environmental policies, fe MS have already certain reporting obligations concerning environmental information;
a survey on user requirements;
reference material provided by Spatial Data Interest Communities (SDIC) and Legally Mandated Organisations (LMO). Both groups were already involved in other aspects of the INSPIRE framework. This material was based on real-life implementations of certain themes, which gave a good insight on the how and why datasets were modelled in a specific way.
similar EU-funded initiatives and projects.
This all illustrates that the DS did not come out of the bleu, a lot of material came from existing situations and had to be considered as UC to serve as a base of the DS development. 35
Identification of user requirements and spatial object types
Step 2
Identify requirements on:
the data content
metadata, data quality, portrayal and other elements of the data specification
Notes for slide 36
The second step was, to identify the User Requirements and the essential spatial objects from all the material gathered in the first step.
The requirements could regard:
data content
MD
DQ
Portrayal
Other elements of the DS
36
As-is analysis
Step 3
Analyse the current situation regarding spatial data sets for the theme, based on:
Notes for slide 37
In the third step is based on the reference material provided in the first step. It concerns an As-Is analysis of the current situation regarding current procedures and workflows of (theme specific) spatial data sets.
Apart from the reference material, there was also looked to which extent existing international standards were already in use and of course the knowledge of field experts could not be disregarded in this phase. 37
Example of As-is analysis
Notes for slide 38
This slides gives an example of the workflow that MSs should follow, to report on Habitats according a certain article of the Habitat directive. The MSs need to report on and upload data to the EEA and ETC/BD, they perform a QC, based on the result a second delivery with corrections can be asked. The final result is then stored in a common repository from which a harmonised database on the state of habitats throughout Europe is created. This resulting database can then be used by the EEA to make new assessments according the Biogeographical Regions of Europe.
So this is an example of an established workflow which cannot be neglected in the development process of the DS. 38
Gap analysis
Step 4
Compare identified data sources with identified user requirements
Notes for slide 39
In the fourth step the user requirements are compared to what is already present in existing data sources. User requirements that are not met yet are identified as “Gaps”. This is called the gap analysis phase. DS must be designed to close these gaps.
39
Data specification development
Step 5
The data specifications must be designed to ensure easy mapping between existing data and the harmonised data specification.
Consider:
Notes for slide 40
All previous steps belong to the preparation phase for the real development of the DS. The development of the DS was done by a group of domain experts (TWG). That group had to design the DS in such way that the user requirements were met and that the mapping between the existing data and the final harmonised model was quite straight forward.
The group had to consider two major aspects that are related to each other :
The data harmonisation process should not lead to excessive costs for the data providers because then they will not do the effort to transform their data;
the data specification should not require collection of new data
40
Implementation, validation and Cost-Benefit Analysis
Step 6-7
review process
test under real world conditions
analyse costs and benefits
Final round of harmonisation
Notes for slide 41
The resulting draft versions of the DS were then implemented and send back to the data communities for testing and validation in real world conditions.
Their remarks were evaluated and processed in a new iteration round in order to come up with the final DS.
(Cost-benefit analysis is difficult to make in a quantitative way. Economic value is difficult to estimate, but several studies exist and already many papers on this topic are published in the IJSDIR.) 41
Which level of harmonisation is “just right”?
Notes for slide 42
When developing DS it is important to find the appropriate level of harmonisation.
If the model is too simple, there will be no added value and nobody will use it because it cannot fulfil the user requirements.
If the model is too complex on the other hand it might be difficult to implement. A number of benefits will only be available for a limited number of users. And the cost of harmonisation will be too high.
The compromise is somewhere in the middle and to find this equilibrium it required:
an iterative process with testing, validation and adaption phases;
user requirements that were well-defined;
AND
a good understanding of the existing geographic information
42
Result
Data specification for all Annex Themes
Notes for slide 43
To resume and also to make the link to the next part of the module ‘understanding the technical guidelines’, I give you again the result of the DS process.
For each annex theme of inspire a DS document is created with a textual description of the model, the model as a UML diagram and the final GML application schema which will be used in the transformation process of data sources into the Data model of the DS.
These documents will be the key components for data providers to deal with the transformation process. The other framework documents that we talked about can be considered as essential background material. 43
05 | Examples of data models: ISO 19152
In the following slides I will try to guide you through the DS document of CP which is an annex I theme. I’ll just highlight some elements that are really theme specific by following the DS document structure.
CP - Scope
The scope of the cadastral information in the INSPIRE context is limited to the geographic side of the cadastral information systems (land administration)
INSPIRE does not aim at harmonising the concepts of ownership and rights related to the parcels
Cadastral parcels should serve the purpose of generic information locators. Having included the reference to the national registers as a property (attribute) of the INSPIRE parcels, national data sources can be reached.
If you look at the scope of CP as listed in the DS document you can see that in the INSPIRE context cadastral information is limited to the geographic side of the cadastral information systems (managed by a land administration). So all business information related to the geographic component is out of scope and should not be made available throughout INSPIRE.
Inspire does not aim at harmonising the concepts of ownership and rights related to the parcels. So this is also not modelled within INSPIRE.
Cadastral parcels should serve the purpose of generic information locators. This means that it is sufficient to include only a reference with a code or a key (as an attribute of a parcel object) to the national registers in order to reach the information stored in the national data sources.
CP - Bacground
All countries run a register Usually a partition of the country with exceptions
Basic unit of the system is the parcel
The cadastral parcels should be, as much as possible, single areas of Earth surface (land and/or water) under homogenous real property rights and unique ownership, where real property rights and ownership are defined by national laws.
By looking at the background documentation of the CP DS, you will see that:
- All countries run a register (for storing cadastral information) – sometimes this is only covering a part of the country;
- The basic unit that is used in this registers is the parcel
- And one of the criteria is that a parcel should be a single area of Earth surface (if possible) under homogenous property rights and unique ownership.
CP - Basic components
Parcel (basic unit)
Subdivision (municipalities, sections, districts, parishes, urban or rural blocks, etc)
Carry information for the parcels inside the subdivision: accuracy or scale
Cadastral boundaries
Only neccessary if spatial accuracy is associated with them
By taking into account this background information we can better understand how the basic components of the CP data model were chosen.
- So first of all there is the Parcel defined as the basic unit
- Sometimes parcels are aggregated into higher level unit like municipalities, sections, districts and so on. These aggregations are modelled as “CadastralZoning” in the CP data model. And they can carry higher level information concerning that aggregation.
- In the case that cadastral information is attached to the boundary of cadastral parcels it is necessary to store the cadastral boundaries otherwise it is optional.
CP - Application schema
This slide shows you the UML diagram of the CP application schema which is not complicated. We can distinguish 4 classes that are related to each other. Centrally we have the cadastral parcel which is part of a cadastral zone (the class on the right). The cadastral parcel is defined by its boundaries (the class at the bottom) and on top of the slide we have the basic property unit which is optional for countries where national cadastral reference is given to one or a group of parcel(s) defined by unique ownership and homogeneous real property rights.
CP – Feature types
CadastralParcel (mandatory)
CadastralZoning (auxiliary)
CadastralBoundary (auxiliary)
BasicPropertyUnit (auxiliary)
Notes for slide 49
Here we see a list of the four classes as feature types (i.e. spatial identifiable objects). And we can see that there is only 1 feature type mandatory (cadastral parcel) all the rest is auxiliary (mandatory in specific conditions) but the core profile is actually limited to the Cadastral parcel feature type.
This is a good example that implementing INSPIRE DS is not so complex as often is argued. If we look to the attributes that belong to a CP, we see:
geometry
inspireID: External object identifier of the spatial object
Label: Text commonly used to display the cadastral parcel identification
nationalCadastralReference: Thematic identifier at national level of the cadastral parcel which ensures the link to the national cadastral register or equivalent.
Some voidable properties
areaValue: Registered area value
referencePoint: point within the parcel used for label placement
validFrom and validTo: Official date and time when the cadastral parcel is legally established and deprecated;
two lifeCycleInfo properties:
beginLifespanVersion and endLifespanVersion: Date and time at which this version of the spatial object was inserted or changed in the spatial data set and the date when it was superseded or retired
Apart from the attributes you can see in the last compartment of the class that there are also some constraints defined which put some extra limitations on the content of some attributes.
49
CP - Requirements & Recommendations
On this slide you see an IR requirement (remember the style, red and the double border), and some recommendations that were defined for the CP theme.
The IR requirement makes the thematic identifier in the form of the nationalCadastralReference mandatory for a CP. It forms the link to the national cadastral registers with ownership and rights information.
There is a recommendation that advises to provide the geometry as a GM_Surface (ie a concept from ISO to model a polygon);
There are some topological recommendations that say to avoid topological overlaps and gaps between CPs. You can wonder why this is only a recommendation and not a requirement because for such a dataset as CP you would not expect such a weak demand on the topological quality.
But not all MS have National cadastral systems with that quality yet, this has to do with history with the inventory method of the parcels. In many countries there is already a digital layer but that does not have any legal value yet and the only legally binding document is an informative description in a kind of codex which does not avoid topological correctness. That is why this was only taken on board as a recommendation and not as a requirement.
CP - Requirements & Recommendations
This slide illustrates that the DS is a technical guidance for data providers. In this case it shows how to deal with lifecycle issues of CPs. It gives some examples from reality where CP can change in time and the way these types of changes should be treated: either as a new parcel with a new identifier or just as a new version of the parcel with the same identifier, which than implements that the old version becomes retired.
CP - Requirements & Recommendations
Here we have an important recommendation that says that INSPIRE cadastral parcels should only be published when the parcels are officially published in the national register. So for work in progress it is not necessary to make it available through INSPIRE unless it is published in the National register i.e. when it is officially validated.
CP - Geometry
0-, 1-,2-,2,5 dimensional geometries
These recommendations on geometry we have seen already, so polygons are recommended but by looking at the model we have seen that 1-dimensional features (boundaries) are also acceptable under certain circumstances.
CP - Enumerations/codelists
Here you have an example of a codelist that provides the values that should be used to indicate the levels of aggregation (hierarchy) of cadastral zones.
All the INSPIRE managed codelists are made available online through the INSPIRE codelist register
Land Administration Domain Model
Major packages of the data model
Reference list
ISO/TC 211 Geographic information/Geomatics. (2007). ISO 19131:2007. ISO. Retrieved October 19, 2020, from https://www.iso.org/cms/render/live/en/sites/isoorg/contents/data/standard/03/67/36760.html
Brodeur, J., & Badard, T. (2008). Modeling with ISO 191xx Standards. In S. Shekhar & H. Xiong (Eds.), Encyclopedia of GIS (pp. 705–716). Springer US. https://doi.org/10.1007/978-0-387-35973-1_811
ISO, I. (2015). 19109: 2015 Geographic information-Rules for application schema. International Organisation for Standardization.