Data structures

iTEM defines metadata using the SDMX information model, in order to specify the contents and various formats/representations of both the historical and model projection data flows and data sets.

Overview

This section briefly summarizes the contents of the iTEM SDMX metadata (structure.xml). It does not give a complete or exhaustive terminology for SDMX; see the Resources page in the sdmx documentation for further reading.

By describing the structure itself, we allow for multiple representations that are suitable for different purposes, yet easily interoperable.

General

All the data structures have a uniform resource name (URN) like: urn:sdmx:org.sdmx.infomodel.datastructure.DataStructureDefinition=iTEM:HISTORICAL(0.1). This identifies:

  • what kind of object it is (here, a data structure definition, or DSD).

  • who the reponsible Organization or Agency is (iTEM).

  • the ID of the thing (HISTORICAL DSD), and

  • a version number (0.1).

Specific data need not copy the entire DSD, merely refer to it using the URN. The DSD can be updated over time, incrementing the version, while references to older versions remain valid.

Concept scheme TRANSPORT

This scheme includes concepts that are commonly used as dimensions or attributes for transport data. These are typically represented by a set of discrete codes.

Each Concept is an SDMX ‘Item’ and so it has an id (usually in upper case), and optionally a name and description (both can be multi-lingual), and zero or more annotations.

Concept scheme TRANSPORT_MEASURES

This scheme includes concepts that are measures, i.e. the thing that is measured by the quantity (magnitude and unit) of a particular observation.

Concept scheme MODELING

This scheme includes concepts related to model-based research, including MODEL, SCENARIO ,etc.

Code lists

structure.xml includes lists of codes used to represent particular concepts. For instance, CL_LCA_SCOPE gives three Codes with the ids “TTW”, “WTT”, and “WTW”. (As Items, Codes can also have plain-language names and longer descriptions. The ID is used for a short, machine readable representation.)

A statement like “this data set has a dimension LCA_SCOPE that represents the concept LCA_SCOPE using the code list CL_LCA_SCOPE” is clear and unambiguous about the structure of that particular data set. A second data set can be described as having “an attribute LCA_SCOPE that represents the concept LCA_SCOPE using the code list CL_LCA_SCOPE”; this is a distinct structure and representation, but completely interoperable with the first.

Some standard IDs are used in multiple code lists, mirroring other applications of SDMX. These include:

  • _Z: “Not applicable”, when it does not make logical sense to give a value on this concept/dimension for this data.

  • _T: “Total”, the sum of all data. This is sometimes called “All”.

  • _X: “Not allocated/unknown”, data not associated to any other code in the list.

To obtain and use data structure information in code that works with iTEM data, use generate(). For example:

>>> from item.structure import generate

# Generate an SDMX "structure message" containing all the data structures
>>> sm = generate()

# Select the historical data structure definition
>>> sm.structure["HISTORICAL"]
<DataStructureDefinition iTEM:HISTORICAL(0.1)>

# Show the dimensions of the data structure
>>> dsd = sm.structure["HISTORICAL"]
>>> dsd.dimensions
<DimensionDescriptor: <Dimension SERVICE>; <Dimension MODE>; <Dimension VEHICLE>; <Dimension FUEL>; <Dimension TECHNOLOGY>; <Dimension AUTOMATION>; <Dimension OPERATOR>; <Dimension POLLUTANT>; <Dimension LCA_SCOPE>; <Dimension FLEET>; <MeasureDimension VARIABLE>>

# Get one dimension
>>> dim = dsd.dimensions.get("LCA_SCOPE")
>>> dim
<Dimension LCA_SCOPE>

# Navigate from the dimension to the list of codes used to represent it
>>> dim.local_representation.enumerated
<Codelist iTEM:CL_LCA_SCOPE(0.1) (4 items)>

# Show the codes in this code list
>>> codelist = dim.local_representation.enumerated
>>> codelist.items
{'_Z': <Code _Z: Not applicable>,
 'TTW': <Code TTW: Tank-to-wheels>,
 'WTT': <Code WTT: Well-to-tank>,
 'WTW': <Code WTW: Well-to-wheels>}

# Show the valid concepts to appear in the VARIABLE dimension
>>> dsd.dimensions.get("VARIABLE").local_representation.enumerated.items
{'ACTIVITY': <Concept ACTIVITY: Transport activity>,
 'ENERGY': <Concept ENERGY: Energy>,
 'ENERGY_INTENSITY': <Concept ENERGY_INTENSITY: Energy intensity of activity>,
 'EMISSION': <Concept EMISSION: Emission>,
 'GDP': <Concept GDP: Gross Domestic Product>,
 'LOAD_FACTOR': <Concept LOAD_FACTOR: Load factor>,
 'POPULATION': <Concept POPULATION: Population>,
 'PRICE': <Concept PRICE: Price>,
 'SALES': <Concept SALES: Sales>,
 'STOCK': <Concept STOCK: Stock>}

Code reference

item.structure.generate()sdmx.message.StructureMessage[source]

Return the SDMX data structures for iTEM data.

item.structure.make_template(output_path: Optional[pathlib.Path] = None, verbose: bool = True)[source]

Generate a data template.

Outputs files containing all keys specified for the iTEM HISTORICAL data structure definition. The file is produced in two formats:

  • *.csv: comma-separated values

  • *.xlsx: Microsoft Excel.

…and in three variants:

  • full.*: with full dimensionality for every concept.

  • condensed.*: with a reduced number of dimensions, with labels for some dimensions combining labels for others in shorter, conventional, human-readable form.

  • index.*: an index or map between the two above versions.

See also

collapse

item.structure.base.CL_AREA = (<Code _X: Not allocated/unspecified>, <Code B0: European Union (current composition)>, <Code B4: European Union (27 countries)>, <Code B5: European Union (28 countries)>, <Code W0: World>)

Codes for the REF_AREA dimension, from SDMX codelist ESTAT:CL_AREA(1.8).

item.structure.base.CODELISTS = {'AREA': (<Code _X: Not allocated/unspecified>, <Code B0: European Union (current composition)>, <Code B4: European Union (27 countries)>, <Code B5: European Union (28 countries)>, <Code W0: World>), 'AUTOMATION': (<Code _T: Total>, <Code _Z: Not applicable>, <Code HUMAN: Human>, <Code AV: Automated>), 'FLEET': (<Code _T: Total>, <Code _Z: Not applicable>, <Code NEW>, <Code USED>), 'FUEL': (<Code _T: Total>, <Code _Z: Not applicable>, <Code LIQUID: All liquid>, <Code GAS: Gas>, <Code H2: Hydrogen>, <Code ELEC: Electricity>), 'LCA_SCOPE': (<Code _Z: Not applicable>, <Code TTW: Tank-to-wheels>, <Code WTT: Well-to-tank>, <Code WTW: Well-to-wheels>), 'MODE': (<Code _T: Total>, <Code _Z: Not applicable>, <Code AIR: Aviation>, <Code LAND: All land transport modes.>, <Code WATER: Water>, <Code PIPE: Pipeline>), 'OPERATOR': (<Code _T: Total>, <Code _Z: Not applicable>, <Code OWN: Own-supplied>, <Code HIRE: Hired>), 'POLLUTANT': (<Code _Z: Not applicable>, <Code GHG: GHG>, <Code AQ>), 'SERVICE': (<Code _T: Total>, <Code _Z: Not applicable>, <Code P: Passenger>, <Code F: Freight>), 'TECHNOLOGY': (<Code _T: Total>, <Code _Z: Not applicable>, <Code IC: Combustion>, <Code ELEC: Electric>, <Code FC: Fuel cell>), 'VEHICLE': (<Code _T: Total>, <Code _Z: Not applicable>, <Code LDV: Light-duty vehicle>, <Code BUS: Bus>, <Code TRUCK: Truck>, <Code 2W+3W>)}

Codes for various code lists.

item.structure.base.CONCEPT_SCHEMES = [<ConceptScheme TRANSPORT (10 items)>, <ConceptScheme MODELING (2 items)>, <ConceptScheme TRANSPORT_MEASURE (10 items)>]

Concept schemes.

item.structure.base.CONSTRAINTS = (<ContentConstraint GENERAL0>, <ContentConstraint GENERAL1>, <ContentConstraint GENERAL2>, <ContentConstraint GENERAL3>, <ContentConstraint GENERAL4: Technology/fuel constraints>, <ContentConstraint GENERAL5>, <ContentConstraint GENERAL6>, <ContentConstraint ACTIVITY>, <ContentConstraint ACTIVITY_VEHICLE>, <ContentConstraint EMISSIONS>, <ContentConstraint ENERGY>, <ContentConstraint ENERGY_INTENSITY>, <ContentConstraint LOAD_FACTOR>, <ContentConstraint PRICE_FUEL>, <ContentConstraint PRICE_POLLUTANT>, <ContentConstraint SALES>, <ContentConstraint STOCK>)

Constraints applying to DSDs.

item.structure.base.DATA_STRUCTURES = (<DataStructureDefinition ACTIVITY>, <DataStructureDefinition ACTIVITY_VEHICLE>, <DataStructureDefinition EMISSIONS>, <DataStructureDefinition ENERGY>, <DataStructureDefinition ENERGY_INTENSITY>, <DataStructureDefinition GDP>, <DataStructureDefinition POPULATION>, <DataStructureDefinition PRICE_FUEL>, <DataStructureDefinition PRICE_POLLUTANT>, <DataStructureDefinition LOAD_FACTOR>, <DataStructureDefinition SALES>, <DataStructureDefinition STOCK>, <DataStructureDefinition HISTORICAL>, <DataStructureDefinition MODEL>)

Main iTEM data structures.

item.structure.base.VERSION = '0.1'

Current version of all data structures.

Todo

Allow a different version for each particular structure, e.g. code list.

item.structure.base.anno(**kwargs)Dict[str, List[sdmx.model.Annotation]][source]

Store kwargs as annotations on a AnnotableArtefact for later use.

item.structure.base.exclude(**kwargs)[source]

Return a “_data_content_region” annotation content to exclude multiple codes.

item.structure.base.exclude_z(dims: str)List[Dict][source]

Return “_data_content_region” annotation content to exclude “_Z” codes.

Parameters

dims – Space-separated list of dimensions on which to exclude “_Z” codes.

item.structure.sdmx.cr_from(info: dict, dsd: sdmx.model.DataStructureDefinition)sdmx.model.CubeRegion[source]

Create a CubeRegion from a simple dict of info.

item.structure.sdmx.cr_from_anno(obj: sdmx.model.ContentConstraint, dsd: sdmx.model.DataStructureDefinition)None[source]

Convert an annotation on obj into a CubeRegion constraint.

item.structure.sdmx.dks_from_anno(obj: sdmx.model.ContentConstraint, dsd: sdmx.model.DataStructureDefinition)None[source]

Convert an annotation on obj into a DataKeySet constraint.

item.structure.sdmx.generate()sdmx.message.StructureMessage[source]

Return the SDMX data structures for iTEM data.

item.structure.sdmx.get_cdc()[source]

Retrieve the CROSS_DOMAIN_CONCEPTS from the SDMX Global Registry.

item.structure.sdmx.merge_dsd(sm: sdmx.message.StructureMessage, target: str, others: List[str], fill_value: str = '_Z')sdmx.model.DataSet[source]

‘Merge’ 2 or more data structure definitions.

item.structure.sdmx.merge_general_constraints(cc: sdmx.model.ContentConstraint, dsd: sdmx.model.DataStructureDefinition, sm: sdmx.message.StructureMessage)None[source]

Merge general constraints from sm into cc if relevant to dsd.

item.structure.sdmx.prepare_dsd(dsd: sdmx.model.DataStructureDefinition, sm: sdmx.message.StructureMessage)[source]

Populate data structures within dsd.

The following utility functions are used by make_template():

add_unit(key, concept)

Add units to a key.

collapse(row)

Collapse multiple concepts into fewer columns.

name_for_id(dsd, ids)

Return a nested dict for use with pandas.DataFrame.replace().

item.structure.template.add_unit(key: Dict, concept: sdmx.model.Concept)None[source]

Add units to a key.

item.structure.template.collapse(row: pandas.core.series.Series)pandas.core.series.Series[source]

Collapse multiple concepts into fewer columns.

  • VARIABLE label is formatted using the labels for LCA_SCOPE, POLLUTANT, and/or FLEET.

  • MODE label is formatted using the labels for SERVICE, VEHICLE, AUTOMATION and/or OPERATOR.

item.structure.template.make_template(output_path: Optional[pathlib.Path] = None, verbose: bool = True)[source]

Generate a data template.

Outputs files containing all keys specified for the iTEM HISTORICAL data structure definition. The file is produced in two formats:

  • *.csv: comma-separated values

  • *.xlsx: Microsoft Excel.

…and in three variants:

  • full.*: with full dimensionality for every concept.

  • condensed.*: with a reduced number of dimensions, with labels for some dimensions combining labels for others in shorter, conventional, human-readable form.

  • index.*: an index or map between the two above versions.

See also

collapse

item.structure.template.name_for_id(dsd: sdmx.model.DataStructureDefinition, ids: List[str])Mapping[str, Dict[str, str]][source]

Return a nested dict for use with pandas.DataFrame.replace().

For the concept schemes ids (e.g. ‘mode’), the id attribute of a particulate Concept (e.g. ‘air’) is replaced with its name (e.g. ‘Aviation’).