Managing the Data Quality with Metadata

From SIMSTADT
Jump to: navigation, search

Contents

[edit] Concept of Metadata

[edit] Generality about Metadata

Data quality is a major challenge of the 3D City models. Indeed, the variety of the building data availability and uncertainty in existing urban areas is very wide. And the precision and reliability of the simulation based on 3D city model depend directly from it.

So as to assess the degree of reliability of the simulation results, a good traceability of the 3D City model data and their uncertainty is essential. For this purpose, a concept of metadata (data about data) is beeing developed, giving to each attribute data a context:

  • its origin (a person/organisation)
  • its age (when it has been for the last time checked)
  • its determination method
  • its assessed precision

The Unity of the data value may also be considered in the upper list. Nevertheless, as convention we will always consider the International System of Units(ISU).

Sometimes, the data is a numerical/string code from a specific nomenclature, as for instance the nomenclature ALKIS for the building usage.
In this case the name of this nomenclature, which is attached to the dataset rather than the individual values, must be indicated once for the whole city model, for example on a meta file linked to the CityGML 3D Model.

[edit] Metadata Code

These metadata must be easily used in the uncertainty propagation calculation, and not make the CityGML File too much heavier . Therefore, their format has to be a numerical code. Some code nomenclature may be common to all data, other may be individualised for specific data with particular determination method.

[edit] 1. Age of the data determination / last validation

Inside each Metadata code, the age of the data value determination / last validation must be given, so as to judge the recentness of the relative data. Indeed, a data which has been established at a high precision, might be passed several years after simply because the city (to whom the 3D City model refers) evolves, with new buildings, refurbishment measures, urban transformations etc...

In this context, an absolute value (Which year?) will make more sense than a relative value (How many years ago?), since it allows the metadata remaining fixed as long as the referred data will not be updated. Then, a part of the code can display the year of determination/last validation (or at less the two last digits).

[edit] 2. Source of the data

A part of the Metadata code must also precise the source of the data value (the city, an engineering firma, an owner, a tenant...). This will not be directly used for the uncertainty calculation, but will give an important indication about the reliability of some data, and moreover will allow a 3D City model reviewer to contact directly the organisation/person responsible for the determination/collection of one data in case of doubt.

[edit] 3. Determination method

Finally, the Metadata code must contain information indicating the determination method (and then indirectly the precision) of the data. The code nomenclature can refer to a direct collection, a measure (with a certain precision), a calculation, an asumption based on benchmarking values, a statement for a scenario etc... In contrary to the Age and Source of the data, the determination method depends generally directly from the nature of the data.

The part #Metadata Nomenclature per Data details this determination method and proposes some metadata part code for each data.


[edit] Metadata and Uncertainties

Medadata may allow for the judgement of the precision of individual input data, calculated intermediary variables, as well as simulation results. For this last case, an uncertainty propogation calculation is required, considering the uncertainties of each input data.

[edit] Uncertainty Quantification

Uncertainty quantification (UQ) is a science which tries to determine how likely certain outcomes are if some aspects of the system are not exactly known.

Two ways of classification exist parallely.
A first way to categorize the sources of uncertainty is to consider:

  • Parameter uncertainty, which comes from the model parameters that are inputs to the computer model (mathematical model) but whose exact values are unknown to experimentalists and cannot be controlled in physical experiments.
  • Structural uncertainty, aka model inadequacy, model bias, or model discrepancy, which comes from the lack of knowledge of the underlying true physics. It depends on how accurately a mathematical model describes the true system for a real-life situation.
  • Algorithmic uncertainty, aka numerical uncertainty, which comes from numerical errors and numerical approximations per implementation of the computer model.
  • Parametric variability, which comes from the variability of input variables of the model.
  • Experimental uncertainty, aka observation error, which comes from the variability of experimental measurements. The experimental uncertainty is inevitable and can be noticed by repeating a measurement for many times using exactly the same settings for all inputs/variables.
  • Interpolation uncertainty, which comes a lack of available data collected from computer model simulations and/or experimental measurements. For other input settings that don't have simulation data or experimental measurements, one must interpolate or extrapolate in order to predict the corresponding responses.

Another way of categorization is to classify uncertainty into two categories:[1][2]

  • Aleatoric uncertainty, aka statistical uncertainty, which is unknowns that differ each time we run the same experiment.
  • Epistemic uncertainty, aka systematic uncertainty, which is due to things we could in principle know but don't in practice. This may be because we have not measured a quantity sufficiently accurately, or because our model neglects certain effects, or because particular data are deliberately hidden.


These uncertainties can be absolute (for building year for instance), or relative.
Relative uncertainty is the measurement uncertainty divided by the measured value. The "Guide to the Expression of Uncertainty in Measurement", commonly known as the GUM, is the definitive document on this subject. The GUM has been adopted by all major National Measurement Institutes, by laboratory accreditation standards such as ISO 17025 and employed in most modern national and international documentary standards on measurement methods and technology.

[edit] Propagation of uncertainty

Existing uncertainty propagation approaches include probabilistic approaches and non-probabilistic approaches. There are basically five categories of probabilistic approaches for uncertainty propagation:[3]

  • Simulation-based methods: Monte Carlo method, importance sampling, adaptive sampling, etc.
  • Local expansion-based methods: Taylor series, perturbation method, etc. These methods have advantages when dealing with relatively small input variability and outputs that don't express high nonlinearity.
  • Functional expansion-based methods: Neumann expansion, polynomial chaos expansion (PCE).
  • Most probable point (MPP)-based methods: first-order reliability method (FORM) and second-order reliability method (SORM).
  • Numerical integration-based methods: Full factorial numerical integration (FFNI) and dimension reduction (DR).

For non-probabilistic approaches, interval analysis [4] , Fuzzy theory, possibility theory and evidence theory are among the most widely used.

The probabilistic approach is considered as the most rigorous approach to uncertainty analysis in engineering design due to its consistency with the theory of decision analysis.

[edit] Metadata Nomenclature per Data

[edit] Building Volume

The volume of an abstract building is calculated as the sum of the volume of each Solid element of its geometry. The volume can be calculated from LoD1 solid geometry or LoD2 solid geometry by tetrahydralization. An alternative approach is to calculate the volume based on the area of the ground surface multiplied by meanBuildingHeihgtRel. This is almost the same as the volume of the LoD1 solid geometry, but takes only into account the volume above ground (if metadata of minTerrainHeigt is 0 or 1). It also makes sense to you this simplified volume calculation for LoD2 models if the geometry is not solid. The differences between a volume calculated based on LoD2 solid geometry and based on ground surface area and meanBuildingHeihgtRel was investigated in a thesis at EIFER.
units of measurement (uom): m3

Volume-LoD2-abb1.png
Volume calculated based on LoD2 solid geometry by tetrahydralization compared to ground surface area times meanBuildingHeihgtRel.

metadata

  • 0 - measured
  • 1 - tetrahydralization of LoD1 solid geometry
  • 2 - tetrahydralization of LoD2 solid geometry
  • 3 - area of Ground surface times meanBuildingHeightRel
  • 4 - volume of axis aligned bounding box
  • 99 - not calculated

[edit] Building Heated/Not directly Heated/Unheated Zones

Several building parameters describe the fact that not the whole building volume extracted from the 3D City model is heated.

Thus, some buildings present unheated attic storey, while others possess a cellar storey, with a underground and overground volume, which can be heated or not. Moreover, some volumes of the building may be not directly heated, like staircases, corridors etc...

Attic storey type

  • 0 - unknown
  • 1 - deduced from geometrical consideration (difference ridgeHeight - eavesHeight)
  • 2 - deduced from aerial photo (presence of roof windows...)
  • 3 - directly collected from building plans

Basement storey type

  • 0 - unknown
  • 1 - deduced from building typologies
  • 2 - deduced from facade photo (presence of windows...)
  • 3 - directly collected from building plans

Proportion of building undirectly heated

  • 0 - unknown
  • 1 - deduced from building typologies and usages
  • 2 - directly collected from building plans

[edit] Building Height

The height of a building given by a set of attributes as defined in the SIG3D modeling guidelines:

  • minTerrainHeight: lowest point of terrain [in absolute coordintes] (Min Geländepunkt)
  • minEavesHeightRel: lowest point of eaves [in absolute coordinates] (Min Höhe Trauflinie)
  • maxEavesHeightRel: highest point of eaves [in absolute coordinates] (Max Höhe Trauflinie) (needed? Maybe not)
  • maxRidgeHeightRel: highest point of roof ridge [in absolute coordinates] (Max Höhe Firstlinie)
  • relative mean building height: meanBuildingHeightRel = (minEavesHeightRel + maxRidgeHeightRel)/2

The minTerrainHeight is the reference point to all relative heights

If these values are not measured during data capturing, they need to be calculated based on the building geometry.
The minTerrainHeight can be either calculated using the terrain model or terrain intersection curve. Otherwise, the minTerrainHeight is set to the min z-value of the building geometry. Using the terrain model is more accurate.
In LoD1, roof structure is not available, so it is more or less impossible to estimate the eaves and roof rigde height. The mean height of the building can be calculated as the difference of the lowest point of terrain and the max. z-value of the building geometry coordinates: meanBuildingHeightRel=max z-value - minTerrainHeight.

Under the assuption that no roof overlaps are modeled in Lod2, the lowest point of eaves is the min z-value of all polygons that belong to a roof surface. The highest point of roof rigde is the max z-value of all polygons that belong to a roof surface. This highest point of eaves is hard to calculate as eaves and roof ridge are not explicitly given in the data model.

The following example illustrates the calculation of the height attributes. The building geometry is given as LoD2 solid with BoundarySurfaces.
Height-Ex1-abb1.png Height-Ex1-abb2.png

  • minTerrainHeigt = 115.22, uom=m, metadata=3
  • minEavesHeightRel = 4.25, uom=m, metadata=1
  • maxEavesHeightRel = 0, uom=m, metadata=99
  • maxRidgeHeightRel = 5.75, uom=m, metadata=1
  • meanBuildingHeightRel = 5.0, uom=m, metadata=2


The same building with LoD1 solid geometry. Based on this model, at least the mean height of the building can be derived. Of course, some attributes might be measured during data caption already. Let us assume that the highest point of the building was measured and stored in the CityGML attribute measuredHeight. It is very likely that this measured hight is similar to maxRidgeHeightRel, so we use it like that.
Height-Ex1-LoD1-abb1.png Height-Ex1-LoD1-abb2.png

  • minTerrainHeigt = 115.22, uom=m, metadata=3
  • minEavesHeightRel = 0, uom=m, metadata=99
  • maxEavesHeightRel = 0, uom=m, metadata=99
  • maxRidgeHeightRel = 5.75, uom=m, metadata=0
  • meanBuildingHeightRel = 5.0, uom=m, metadata=1


Metadata Nomenclature
minTerrainHeight

  • 0 - measured
  • 1 - intersection with terrain model (DTM)
  • 2 - lowest z-value of LoD1 building geometry
  • 3 - lowest z-value of LoD2 building geometry

minEavesHeightRel

  • 0 - measured
  • 1 - (min z-value of LoD2 roof surface geometries) - minTerrainHeight
  • 99 - not available

maxEavesHeightRel

  • 0 - measured
  • 1 - (max z-value of LoD2 wall surface geometries) - minTerrainHeight (could be a bad guess)
  • 99 - not available

maxRidgeHeightRel

  • 0 - measured
  • 1 - (max z-value of LoD2 roof surface geometries) - minTerrainHeight
  • 99 - not available

meanBuildingHeightRel

  • 0 - measured
  • 1 - (max z-value of LoD1 building geometry) - minTerrainHeight
  • 2 - (minEavesHeightRel + maxRidgeHeightRel)/2
  • 99 - not available

[edit] Facade Window Area

The window area is an essential input in buiding physics, in particularly for the calculation of solar gains and transmission heat losses.

Although it can impact strongly the heat demand results, its determination in a 3D city model LoD1 or LoD2 is often subject to very rough assumptions.
units of measurement (uom): m2
metadata

  • 0 - no information
  • 1 - assessed from a unique window-to-wall ratio, same for all the facades of the district (Generally 20%)
  • 2 - assessed from a window-to-wall ratio individualised per Building typologies (based on benchmark values), same for all the facades of a building.
  • 3 - visually determined based on facade photos / on-site visit, for each facade
  • 4 - calculated with image processing algorithm
  • 5 - directly collected on building plan

[edit] Refurbishment year

With refurbishment year, we understand the last time when the building envelope has been entirely refurbished with energy efficiency measures (year of window change or isolated roof insulation can't be considered as it).

The refurbishment year data can not be directly used in the simulation algorithm. Nevertheless, such a data is very useful in the pre-processing calculations to update the original building physics parameters.

No refurbishment year is also an information, if it's deliberatly done.
units of measurement (uom): year
metadata

  • 0 - no information
  • 1 - either information about absence of refurbishment, or known refurbishment year.

[edit] U-values and Wall structure

The U-Values of a wall, a roof or a basement slab of existing buildings, are rarely (actually never) directly given at city scale for a 3D city model.

Depending on the availability of contextual information (building/refurbishment year, outside insulation thickness etc...), different determination methods can be used, for different level of uncertainty.
units of measurement (uom): W/m².K
metadata

  • 0 - no information
  • 1 - deduced from the building type and building year (without refurbishment information)
  • 2 - deduced from the building type and building year (with known refurbishment year)
  • 3 - deduced from the building type and building year, refined with outside insulation thickness per facade
  • 4 - Measured on-site
  • 5 - directly collected from the building specifications


[edit] Future Developments

[edit] 30.8.2013: SIG3D AG Modellierung Meeting on Metadata in CityGML

Wiki SIG3D AG Modellierung: http://en.wiki.modeling.sig3d.org/ Relevant standards on Metadata:

  • ISO 19115 Geographic Information - Metadata
  • ISO 19139 Geographic information -- Metadata -- XML schema implementation
  • deutsche Übersetzung der ISO-Felder: [1]
  • ALKIS3D / GeoInfoDoc7: Metadata for entire data set as well as on feature / attribute level. Very likely to become national standard in german cadastre. [2]

CityGML metadata should be based on ISO 19115; it is not necessary to support all 400 metadata fields, select a useful subset. Extensions might be necessary, especially for LoD, see publication "Nonn/Zipf – Metadaten für 3D-Stadtmodelle – Untersuchungen der Eignung von ISO 19115 und Möglichkeiten der Erweiterung": [3] (final version of the paper?)

clause 2.4 data quality is very well suited for our needs, see "2.4.2 Herkunft" and "2.4.2.2 Bearbeitungsschritt".

CityGML Change Request CR 13-029 Meta Data for City Model: add metadata description to the entire data set (header information) new change request will be submitted: add metadata on feature level and on "group of feature" level.

Remarks: CityGML modell in Enterprise architect will be provided by SIG3D / OGC. Currently, two different models are available. It will be decided at the next OGC TC meeting which one will become the model to work with at OGC for specification of CityGML 3.0. useful ADEs:

solarADE http://www.citygmlwiki.org/index.php/Solar_ADE can be provided by M.O.S.S. UtilityNetworkADE http://www.citygmlwiki.org/index.php/CityGML_UtilityNetworkADE

References:

  1. Armen Der Kiureghiana, Ove Ditlevsen, Aleatory or epistemic? Does it matter?, Structural Safety, Volume 31, Issue 2, March 2009, Pages 105–112
  2. Hermann G. Matthies, Quantifying uncertainty: modern computational representation of probability and applications, Extreme Man-Made and Natural Hazards in Dynamics of Structures, NATO Security through Science Series, 2007, 105-135, DOI: 10.1007/978-1-4020-5656-7_4
  3. S. H. Lee and W. Chen, A comparative study of uncertainty propagation methods for black-box-type problems, Structural and Multidisciplinary Optimization Volume 37, Number 3 (2009), 239-253, DOI: 10.1007/s00158-008-0234-7
  4. Template:Cite book
Personal tools
Namespaces

Variants
Actions
Navigation
Toolbox