Input data¶

Every input data set has a corresponding Python module to adjust the unprocessed data provided directly by the upstream data source to the common iTEM structure.

These modules are not invoked directly, but through the function process(), e.g.

# Process upstream data for data set T009; return the results
process(9)

process() loads and makes use of dataset-specific configuration, checks, and additional code from the corresponding submodule, while automating common cleaning steps. See:

the function documentation for a complete description of these steps.
the green [source] link next to each function (e.g. T001.process()) to access and inspect the source code for the dataset-specific cleaning steps.

This pattern reduces duplicated code in each dataset-specific submodule, while remaining flexible to upstream data formats.

HOWTO add upstream data sources or sets¶

Add the data set entry to sources.yaml.
Copy, rename, and modify an existing module, e.g. T012.py.
Extend the tests to ensure this data set is tested.
Update the docstrings in the code and this documentation.

Common code¶

item.historical.COUNTRY_NAME = {'azerbaidjan': 'AZE', 'bolivia (plurinational state of)': 'BOL', 'bosnia': 'BIH', 'bosnia-herzegovina': 'BIH', 'brunei': 'BRN', 'cape verde': 'CPV', 'china, hong kong sar': 'HKG', 'china, macao sar': 'MAC', 'china, taiwan province of china': 'TWN', 'congo kinshasa ': 'COD', 'congo_the democratic republic of the': 'COD', "cote d'ivoire": 'CIV', "dem. people's republic of korea": 'PRK', 'democratic republic of the congo': 'COD', 'former yugoslav republic of macedonia, the': 'MKD', 'germany (until 1990 former territory of the frg)': 'DEU', 'holy see': 'VAT', 'hong-kong': 'HKG', 'iran': 'IRN', 'iran (islamic republic of)': 'IRN', 'ivory coast': 'CIV', 'korea': 'KOR', 'libyan arab jamahiriya': 'LBY', 'macedonia': 'MKD', 'macedonia, the former yugoslav republic of': 'MKD', 'micronesia (fed. states of)': 'FSM', 'moldavia': 'MDA', 'montenegro, republic of': 'MNE', 'palestine': 'PSE', 'republic of korea': 'KOR', 'reunion': 'REU', 'russia': 'RUS', 'saint helena': 'SHN', 'serbia and montenegro': 'SCG', 'serbia, republic of': 'SRB', 'south korea': 'KOR', 'state of palestine': 'PSE', 'swaziland': 'SWZ', 'syria': 'SYR', 'taiwan_province of china': 'TWN', 'tanzania_united republic of': 'TZA', 'the former yugoslav republic of macedonia': 'MKD', 'united states virgin islands': 'VIR', 'venezuela (bolivarian republic of)': 'VEN', 'virgin islands_british': 'VGB', 'wallis and futuna islands': 'WLF'}¶: Non-ISO 3166 names that appear in 1 or more data sets. These are used in iso_alpha_3() to replace names before they are looked up using mod:pycountry.

item.historical.OUTPUT_PATH = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/transportenergy/envs/master/lib/python3.7/site-packages/item/data/historical/output')¶: Path for output from process().

item.historical.cache_results(id_str: str, df: pandas.core.frame.DataFrame) → None [source]¶

Write df to OUTPUT_PATH in two file formats.

The files written are:

id_str-clean.csv, in long (previously ‘programming-friendly’ or ‘PF’) format, i.e. with all years or other time periods in TIME_PERIOD column and one observation per row.
id_str-clean-wide.csv, in wide (previously ‘user-friendly’ or ‘UF’) format, with one column per year/TIME_PERIOD. For convenience, this file has two additional columns:
- NAME: this gives the ISO 3166 name that corresponds to the alpha-3 code appearing in the REF_AREA column.
- ITEM_REGION: this gives the name of the iTEM region to which the data correspond.

item.historical.dim_id_for_column_name(name: str) → str [source]¶: Return a dimension ID in the HISTORICAL structure for a column name.

item.historical.fetch_source(id: Union[int, str], use_cache: bool = True) → pathlib.Path [source]¶

Fetch amd cached data from source id.

The remote data is fetched using the API for the particular source. A network connection is required.

Parameters: use_cache (bool, optional) – If True, use a cached local file, if available. No check of cache validity is performed.
Returns: path to the location where the fetched and cached data is stored.
Return type: pathlib.Path

item.historical.fill_values_for_dataflow(dataflow_id: Optional[str]) → Dict[str, str][source]¶: Return a dictionary of fill values for the data flow dataflow_id.

item.historical.get_area_name_map() → Dict[str, str][source]¶: Return a mapping from lower-case names in CL_AREA to IDs.

item.historical.get_country_name(code: str) → str [source]¶: Return the country name for a country’s ISO 3166 alpha-3 code.

item.historical.get_item_region(code: str) → str [source]¶: Return iTEM region for a country’s ISO 3166 alpha-3 code, or “N/A”.

item.historical.input_file(id: int)[source]¶

Return the path to a cached, raw input data file for data source id.

CSV files are located in the ‘historical input’ data path. If more than one file has a name beginning with “T{id}”, the last sorted file is returned.

item.historical.iso_alpha_3(name: str) → str [source]¶

Return ISO 3166 alpha-3 code for a country name.

Parameters: name (str) – Country name. This is looked up in the pycountry ‘name’, ‘official_name’, or ‘common_name’ field. Replacements from COUNTRY_NAME are applied.

item.historical.process(id: Union[int, str]) → pandas.core.frame.DataFrame[source]¶

Process a data set given its id.

Performs the following common processing steps:

Fetch the unprocessed upstream data, or load it from cache.
Load a module defining dataset-specific processing steps. This module is in a file named e.g. T001.py.
Call the dataset’s (optional) check() method. This method receives the input data frame as an argument, and can make one or more assertions to ensure the data is in the expected format. If assert False or any other exception occurs here, processing fails.
Drop columns in the dataset’s (optional) COLUMNS['drop'] list.
Call the dataset-specific (required) process() method. This method receives the data frame from step (4), performs any additional processing, and returns a data frame.
If the REF_AREA dimension is not already populated, assign ISO 3166 alpha-3 codes, using a column containing country names: either COLUMNS['country_name'] or the default, ‘Country’. See iso_alpha_3().
Assign values to other dimensions:
1. From the dataset’s (optional) DATAFLOW variable. This variable indicates one of the data flows and corresponding data structure definitions (DSDs) in the iTEM data structures. For each dimension in the “full” (HISTORICAL) DSD but not in this dataflow, fill in with “_Z” (not applicable) values.
2. From the dataset’s (optional) COMMON_DIMS dict.
Order columns according to the HISTORICAL data structure.
Check for missing values or missing dimension labels. A fully cleaned data set has none.
Output data to two files. See cache_results().

Parameters: id (int) – Data source id.
Returns: The processed data.
Return type: pandas.DataFrame

item.historical.source_str(id: Union[int, str]) → str [source]¶

Return the canonical string name (e.g. "T001") for a data source.

Parameters: id (int or str) – Integer ID of the data source, or existing string.

item.historical.REGION¶: Map from ISO 3166 alpha-3 code to iTEM region name.

item.historical.SOURCES ← contents of sources.yaml¶: The current version of the file is always accessible at https://github.com/transportenergy/metadata/blob/master/historical/sources.yaml

T000¶

Data cleaning code and configuration for T000.

item.historical.T000.COLUMNS = {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Unit', 'Unit Code', 'PowerCode Code', 'PowerCode', 'Reference Period Code', 'Reference Period', 'Flag Codes', 'Flags']}¶: Columns to drop from the raw data.

item.historical.T000.COMMON_DIMS = {'automation': '_T', 'operator': '_T', 'service': 'P', 'source': 'International Transport Forum', 'technology': '_T', 'unit': '10^9 passenger-km / yr', 'variable': 'Activity'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T000.DATAFLOW = 'ACTIVITY'¶: iTEM data flow matching the data from this source.

item.historical.T000.mode_and_vehicle_type(variable_name)[source]¶

Determine ‘mode’ and ‘vehicle type’ from ‘variable’.

The rules implemented are:

Variable	Mode	Vehicle type
Rail passenger transport	Rail	All
Road passenger transport by buses and coaches	Road	Bus
Road passenger transport by passenger cars	Road	LDV
Total inland passenger transport	All	All

item.historical.T000.process(df)[source]¶: Process data set T000.

T000:
  url: https://stats.oecd.org/index.aspx?queryid=79863
  name: "Passenger transport: Inland passenger transport"
  fetch:
    type: SDMX
    source: OECD
    resource_id: ITF_PASSENGER_TRANSPORT
    key: .T-PASS-TOT-INLD+T-PASS-RL-TOT+T-PASS-RD-TOT+T-PASS-RD-CAR+T-PASS-RD-BUS
    validate: false

T001¶

Data cleaning code and configuration for T001.

This module:

Detects and corrects #32, a data error in the upstream source where China observation values for years 1990 to 2001 inclusive are too low by 2 orders of magnitude (see also #57).

item.historical.T001.COLUMNS = {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Flag Codes', 'Flags', 'PowerCode Code', 'PowerCode', 'Reference Period Code', 'Reference Period', 'Unit Code', 'Unit']}¶: Columns to drop from the raw data.

item.historical.T001.COMMON_DIMS = {'automation': '_T', 'mode': 'Shipping', 'operator': '_T', 'service': 'F', 'source': 'International Transport Forum', 'technology': '_T', 'variable': 'Activity', 'vehicle': 'Coastal'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T001.DATAFLOW = 'ACTIVITY'¶: iTEM data flow matching the data from this source.

item.historical.T001.FIX_32 = False¶: Flag for whether #32 is detected by check() and should be fixed by process().

item.historical.T001.check(df)[source]¶: Check data set T001.

item.historical.T001.process(df)[source]¶

Process data set T001.

Drop null values.
Convert from Mt km / year to Gt km / year.

T001:
  name: Coastal Transport
  fetch:
    type: SDMX
    source: OECD
    resource_id: ITF_GOODS_TRANSPORT
    key: .T-SEA-CAB
    validate: false

T002¶

Data cleaning code and configuration for T002.

item.historical.T002.COLUMNS = {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Unit Code', 'PowerCode Code', 'PowerCode', 'Reference Period Code', 'Reference Period', 'Flag Codes', 'Flags']}¶: Columns to drop from the raw data.

item.historical.T002.COMMON_DIMS = {'automation': '_T', 'fuel': '_T', 'operator': '_T', 'service': 'Freight', 'source': 'International Transport Forum', 'technology': '_T', 'vehicle': 'Container'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T002.DATAFLOW = 'ACTIVITY'¶: iTEM data flow matching the data from this source.

item.historical.T002.process(df)[source]¶: Process data set T002.

T002:
  name: Container Transport
  fetch:
    type: SDMX
    source: OECD
    resource_id: ITF_GOODS_TRANSPORT
    key: .T-CONT-RL-TEU+T-CONT-RL-TON+T-CONT-SEA-TEU+T-CONT-SEA-TON
    validate: false

T003¶

Data cleaning code and configuration for T003.

The input data contains the variable names in VARIABLE_MAP. A new sum is computed, mode=”Inland ex. pipeline” that is the sum of the variables in PARTIAL, i.e. excluding “Pipelines transport”.

item.historical.T003.COLUMNS = {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Flag Codes', 'Flags', 'PowerCode', 'PowerCode Code', 'Reference Period Code', 'Reference Period', 'Unit Code', 'Unit']}¶: Columns to drop from the raw data.

item.historical.T003.COMMON_DIMS = {'automation': '_T', 'service': 'F', 'source': 'International Transport Forum', 'technology': '_T', 'unit': 'Gt km / year', 'variable': 'Activity'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T003.DATAFLOW = 'ACTIVITY'¶: iTEM data flow matching the data from this source.

item.historical.T003.PARTIAL = ['Rail freight transport', 'Road freight transport', 'Inland waterways freight transport']¶: Variables to include in a partial sum.

item.historical.T003.VARIABLE_MAP = {'Inland waterways freight transport': {'mode': 'Shipping', 'vehicle': 'Inland'}, 'Pipelines transport': {'mode': 'Pipeline', 'vehicle': 'Pipeline'}, 'Rail freight transport': {'mode': 'Rail'}, 'Road freight transport': {'mode': 'Road'}, 'Road freight transport for hire and reward': {'mode': 'Road', 'operator': 'HIRE'}, 'Road freight transport on own account': {'mode': 'Road', 'operator': 'OWN'}, 'Total inland freight transport': {'mode': 'Inland'}}¶: Mapping from Variable to mode and vehicle_type dimensions.

item.historical.T003.process(df)[source]¶

Process data set T003.

Remove null values.
Convert units from Mt km / year to Gt km / year.
Lookup and assign “MODE” and “VEHICLE” dimensions based on “VARIABLE”, using VARIABLE_MAP.
Compute partial sums that exclude pipelines.
Concatenate the partial sums to the original data.
Sort.

T003:
  name: Inland Freight Transport
  fetch:
    type: SDMX
    source: OECD
    resource_id: ITF_GOODS_TRANSPORT
    key: .T-GOODS-TOT-INLD+T-GOODS-RL-TOT+T-GOODS-RD-TOT+T-GOODS-RD-REW+T-GOODS-RD-OWN+T-GOODS-IW-TOT+T-GOODS-PP-TOT
    validate: false

T004¶

Data cleaning code and configuration for T004.

Notes:

The input data is does not express the units, which are single vehicles.

Todo

The input data have labels like “- LPG” in the “Fuel type” column, with the hyphen possibly indicating a hierarchical code list. Find a reference to this code list.
The code currently uses some inconsistent labels, such as:
- “Liquid-Bio” (no spaces) vs. “Liquid - Fossil” (spaces).
- “Natural Gas Vehicle” vs. “Conventional” (word “Vehicle” is omitted).
Fix these after PR #62 is merged by using code lists for these dimensions.
Add code to fetch this source automatically. It does not have a clearly-defined API.
Capture and preserve the metadata provided by the UNECE data interface.

item.historical.T004.COLUMNS = {'drop': ['Frequency']}¶: Columns to drop from the raw data.

item.historical.T004.COMMON_DIMS = {'fleet': 'NEW', 'mode': 'Road', 'source': 'UNECE', 'unit': 'vehicle', 'variable': 'Sales'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T004.CSV_SEP = ';'¶: Separator character for pandas.read_csv().

item.historical.T004.DATAFLOW = 'SALES'¶: iTEM data flow matching the data from this source.

item.historical.T004.MAP = {'Fuel type': {'- Bi-fuel vehicles': ('IC', 'BIOFUEL'), '- Biodiesel': ('IC', 'BIODIESEL'), '- Bioethanol': ('IC', 'BIOETH'), '- Compressed natural gas (CNG)': ('IC', 'CNG'), '- Diesel (excluding hybrids)': ('NONHYB', 'DIESEL'), '- Electricity': ('BEV', 'ELEC'), '- Hybrid electric-diesel': ('HYBRID', 'DIESEL'), '- Hybrid electric-petrol': ('HYBRID', 'PETROL'), '- Hydrogen and fuel cells': ('FC', 'H2'), '- LPG': ('IC', 'LPG'), '- Liquefied natural gas (LNG)': ('IC', 'LNG'), '- Petrol (excluding hybrids)': ('NONHYB', 'GASOLINE'), '- Plug-in hybrid diesel-electric': ('PHEV-G', 'ELEC'), '- Plug-in hybrid petrol-electric': ('PHEV-D', 'ELEC'), 'Alternative (total)': ('Alternative', 'Alternative'), 'Diesel': ('IC', 'DIESEL'), 'Petrol': ('IC', 'GASOLINE'), 'Total': ('_T', '_T'), '_dims': ('TECHNOLOGY', 'FUEL')}, 'Type of vehicle': {'New light goods vehicles': ('F', 'Light Truck'), 'New lorries (vehicle wt over 3500 kg)': ('F', 'Heavy Truck'), 'New motor coaches, buses and trolley buses': ('F', 'Bus'), 'New passenger cars': ('P', 'LDV'), 'New road tractors': ('F', 'Medium Truck'), '_dims': ('SERVICE', 'VEHICLE')}}¶: Mapping between existing values and values to be assigned.

item.historical.T004.map_column(value, column)[source]¶: Apply mapping to value in column.

T004:
  url: https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT__40-TRTRANS__03-TRRoadFleet/08_en_TRRoadNewVehF_r.px/?rxid=674effaa-3926-4d2e-9d6d-abfd7dd196b8
  name: New Road Vehicle Registrations by Vehicle Category and Fuel Type

T005¶

item.historical.T005.COLUMNS = {'drop': ['IPCC_description', 'IPCC-Annex', 'Name', 'World Region']}¶: Columns to drop from the raw data.

item.historical.T005.COMMON_DIMS = {'fuel': '_T', 'lca_scope': 'TTW', 'pollutant': 'CO2', 'service': '_T', 'source': 'JRC', 'technology': '_T', 'variable': 'Emissions', 'vehicle': '_T'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T005.DATAFLOW = 'EMISSIONS'¶: iTEM data flow matching the data from this source.

item.historical.T005.MAP_MODE = {'1.A.3.a': 'Air', '1.A.3.b': 'Road', '1.A.3.c': 'Rail', '1.A.3.d': 'Water', '1.A.3.e': 'Other'}¶

Map from IPCC emissions category codes to iTEM CL_MODE values. The actual descriptions appear in the IPCC_description column, which is discarded.

1.A.3.a: Civil Aviation
1.A.3.b: Road Transportation
1.A.3.c: Railways
1.A.3.d: Water-borne Navigation
1.A.3.e: Other Transportation

item.historical.T005.process(df)[source]¶

Process T005.

Select only measures with IDs beginning “1.A.3”.
Map from the IPCC emissions category (e.g. “1.A.3.a”) to mode (e.g. “Air”); see map_mode().
Melt from wide to long format.
Drop NA values.
Use “_X” (not allocated/unspecified) as the region for international shipping and aviation.
Convert from Mt/a to Gt/a.

T005:
  name: Passenger Road Vehicle Fleet and rate per thousand inhabitants by Vehicle Category
  fetch:
    type: OpenKAPSARC
    dataset_id: passenger-road-vehicle-fleet-and-rate-per-thousand-inhabitants-by-vehicle-catego
  url: https://datasource.kapsarc.org/explore/dataset/passenger-road-vehicle-fleet-and-rate-per-thousand-inhabitants-by-vehicle-catego

T006¶

item.historical.T006.COLUMNS = {'drop': ['Frequency', 'Measure']}¶: Columns to drop from the raw data.

item.historical.T006.COMMON_DIMS = {'automation': '_T', 'operator': '_T', 'service': 'F', 'source': 'Eurostat', 'technology': '_T', 'unit': 'percent', 'variable': 'Activity, share of volume'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T006.CSV_SEP = ';'¶: Separator character for pandas.read_csv().

item.historical.T006.DATAFLOW = 'ACTIVITY'¶: iTEM data flow matching the data from this source.

T006:
  name: Passenger Transport
  fetch:
    type: OpenKAPSARC
    dataset_id: passenger-transport
  url: https://datasource.kapsarc.org/explore/dataset/passenger-transport/

T007¶

item.historical.T007.COLUMNS = {'drop': ['Frequency', 'Measure']}¶: Columns to drop from the raw data.

item.historical.T007.COMMON_DIMS = {'automation': '_T', 'operator': '_T', 'service': 'P', 'source': 'Eurostat', 'technology': '_T', 'unit': 'percent', 'variable': 'Activity, share of distance'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T007.CSV_SEP = ';'¶: Separator character for pandas.read_csv().

item.historical.T007.DATAFLOW = 'ACTIVITY'¶: iTEM data flow matching the data from this source.

T007:
  name: New Passenger Car Registrations by Fuel type
  fetch:
    type: OpenKAPSARC
    dataset_id: new-passenger-car-registrations-by-fuel-type
  url: https://datasource.kapsarc.org/explore/dataset/new-passenger-car-registrations-by-fuel-type/

T008¶

item.historical.T008.COLUMNS = {'drop': ['Frequency']}¶: Columns to drop from the raw data.

item.historical.T008.COMMON_DIMS = {'fuel': '_T', 'mode': 'Road', 'service': 'Passenger', 'source': 'UNECE', 'technology': '_T', 'variable': 'Stock'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T008.CSV_SEP = ';'¶: Separator character for pandas.read_csv().

item.historical.T008.DATAFLOW = 'STOCK'¶: iTEM data flow matching the data from this source.

T008:
  name: New Road Vehicle Registrations by Vehicle Category and Fuel type
  fetch:
    type: OpenKAPSARC
    dataset_id: new-road-vehicle-registrations-by-vehicle-category-and-fuel-type
  url: https://datasource.kapsarc.org/explore/dataset/new-road-vehicle-registrations-by-vehicle-category-and-fuel-type/

T009¶

Data cleaning code and configuration for T009.

item.historical.T009.DATAFLOW = 'STOCK'¶: iTEM data flow matching the data from this source.

item.historical.T009.FETCH = True¶: Fetch directly from the source, or cache.

item.historical.T009.map_service(value)[source]¶: Determine ‘service’ dimension based on a vehicle type.

item.historical.T009.process(df)[source]¶

Process input data for data set T009.

Assign “SERVICE” based on “VEHICLE” values.
Assign “TECHNOLOGY” by stripping “- ” prefix from “fuel_type_name” values.

T009:
  name: Road Vehicle Fleet by Vehicle Category and Fuel Type
  fetch:
    type: OpenKAPSARC
    dataset_id: road-vehicle-fleet-by-vehicle-category-and-fuel-type
  url: https://datasource.kapsarc.org/explore/dataset/road-vehicle-fleet-by-vehicle-category-and-fuel-type/

T010¶

Data cleaning code and configuration for T010.

item.historical.T010.COLUMNS = {'country_name': 'REGIONS/COUNTRIES'}¶: Column name to map to ISO 3166 alpha-3 codes.

item.historical.T010.COMMON_DIMS = {'mode': 'Road', 'service': 'Freight', 'source': 'International Organization of Motor Vehicle Manufacturers', 'technology': '_T', 'unit': '10^6 vehicle', 'variable': 'Stock', 'vehicle': '_T'}¶

Dimensions and attributes which do not vary across this data set.

NB “_T” the code for “Total”, is used for the ‘TECHNOLOGY’ and ‘VEHICLE’ dimensions, since this data set provides totals.

item.historical.T010.DATAFLOW = 'STOCK'¶: iTEM data flow matching the data from this source.

item.historical.T010.process(df)[source]¶

Process data set T010.

Melt from wide to long format.
Remove the ‘,’ thousands separators from the values in the ‘VALUE’ column; convert to float.
Drop null values.
Convert units from 10³ vehicles to 10⁴ vehicles.

T010:
  name: Volume of passenger transport relative to GDP
  fetch:
    type: OpenKAPSARC
    dataset_id: volume-of-passenger-transport-relative-to-gdp
  url: https://datasource.kapsarc.org/explore/dataset/volume-of-passenger-transport-relative-to-gdp/

T012¶

Data cleaning code and configuration for T012.

item.historical.T012.COLUMNS = {'country_name': 'Region, subregion, country or area *', 'drop': ['Index', 'Variant', 'Notes', 'Country code', 'Parent code']}¶

Column names:

drop: to drop from the raw data.
country_name: to map to ISO 3166 codes.

item.historical.T012.COMMON_DIMS = {'source': 'United Nations', 'unit': '10^6 people', 'variable': 'Population'}¶: Dimensions and attributes which do not vary across this data set.

item.historical.T012.DATAFLOW = 'POPULATION'¶: iTEM data flow matching the data from this source.

item.historical.T012.process(df)[source]¶

Process data set T012.

Select only rows with Type == "Country/Area"; then drop this column.
Rename “Channel Islands” (ISO 3166 numeric code 830) with 831 (Jersey), the larger (compared to 832/Guernsey) of the two Channel Islands. Code 830 does not exist.
Melt from wide to long format.
Remove spaces from strings in the “Value” column; convert to numeric.
Drop null values.
Convert units from 10³ persons to 10⁶ persons.

T012:
  name: Modal split of passenger transport
  fetch:
    type: OpenKAPSARC
    dataset_id: modal-split-of-freight-transport
  url: https://datasource.kapsarc.org/explore/dataset/modal-split-of-freight-transport/