Input data¶
Every input data set has a corresponding Python module to adjust the unprocessed data provided directly by the upstream data source to the common iTEM structure.
These modules are not invoked directly, but through the function process()
, e.g.
# Process upstream data for data set T009; return the results
process(9)
process()
loads and makes use of dataset-specific configuration, checks, and additional code from the corresponding submodule,
while automating common cleaning steps.
See:
the function documentation for a complete description of these steps.
the green [source] link next to each function (e.g.
T001.process()
) to access and inspect the source code for the dataset-specific cleaning steps.
This pattern reduces duplicated code in each dataset-specific submodule, while remaining flexible to upstream data formats.
HOWTO add upstream data sources or sets¶
Add the data set entry to
sources.yaml
.Copy, rename, and modify an existing module, e.g.
T012.py
.Extend the tests to ensure this data set is tested.
Update the docstrings in the code and this documentation.
Common code¶
-
item.historical.
COUNTRY_NAME
= {'azerbaidjan': 'AZE', 'bolivia (plurinational state of)': 'BOL', 'bosnia': 'BIH', 'bosnia-herzegovina': 'BIH', 'brunei': 'BRN', 'cape verde': 'CPV', 'china, hong kong sar': 'HKG', 'china, macao sar': 'MAC', 'china, taiwan province of china': 'TWN', 'congo kinshasa ': 'COD', 'congo_the democratic republic of the': 'COD', "cote d'ivoire": 'CIV', "dem. people's republic of korea": 'PRK', 'democratic republic of the congo': 'COD', 'former yugoslav republic of macedonia, the': 'MKD', 'germany (until 1990 former territory of the frg)': 'DEU', 'holy see': 'VAT', 'hong-kong': 'HKG', 'iran': 'IRN', 'iran (islamic republic of)': 'IRN', 'ivory coast': 'CIV', 'korea': 'KOR', 'libyan arab jamahiriya': 'LBY', 'macedonia': 'MKD', 'macedonia, the former yugoslav republic of': 'MKD', 'micronesia (fed. states of)': 'FSM', 'moldavia': 'MDA', 'montenegro, republic of': 'MNE', 'palestine': 'PSE', 'republic of korea': 'KOR', 'reunion': 'REU', 'russia': 'RUS', 'saint helena': 'SHN', 'serbia and montenegro': 'SCG', 'serbia, republic of': 'SRB', 'south korea': 'KOR', 'state of palestine': 'PSE', 'swaziland': 'SWZ', 'syria': 'SYR', 'taiwan_province of china': 'TWN', 'tanzania_united republic of': 'TZA', 'the former yugoslav republic of macedonia': 'MKD', 'united states virgin islands': 'VIR', 'venezuela (bolivarian republic of)': 'VEN', 'virgin islands_british': 'VGB', 'wallis and futuna islands': 'WLF'}¶ Non-ISO 3166 names that appear in 1 or more data sets. These are used in
iso_alpha_3()
to replace names before they are looked up using mod:pycountry.
-
item.historical.
OUTPUT_PATH
= PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/transportenergy/envs/master/lib/python3.7/site-packages/item/data/historical/output')¶ Path for output from
process()
.
-
item.historical.
cache_results
(id_str: str, df: pandas.core.frame.DataFrame) → None[source]¶ Write df to
OUTPUT_PATH
in two file formats.The files written are:
id_str-clean.csv
, in long (previously ‘programming-friendly’ or ‘PF’) format, i.e. with all years or other time periods inTIME_PERIOD
column and one observation per row.id_str-clean-wide.csv
, in wide (previously ‘user-friendly’ or ‘UF’) format, with one column per year/TIME_PERIOD
. For convenience, this file has two additional columns:NAME
: this gives the ISO 3166 name that corresponds to the alpha-3 code appearing in theREF_AREA
column.ITEM_REGION
: this gives the name of the iTEM region to which the data correspond.
-
item.historical.
dim_id_for_column_name
(name: str) → str[source]¶ Return a dimension ID in the
HISTORICAL
structure for a column name.
-
item.historical.
fetch_source
(id: Union[int, str], use_cache: bool = True) → pathlib.Path[source]¶ Fetch amd cached data from source id.
The remote data is fetched using the API for the particular source. A network connection is required.
-
item.historical.
fill_values_for_dataflow
(dataflow_id: Optional[str]) → Dict[str, str][source]¶ Return a dictionary of fill values for the data flow dataflow_id.
-
item.historical.
get_area_name_map
() → Dict[str, str][source]¶ Return a mapping from lower-case names in
CL_AREA
to IDs.
-
item.historical.
get_country_name
(code: str) → str[source]¶ Return the country name for a country’s ISO 3166 alpha-3 code.
-
item.historical.
get_item_region
(code: str) → str[source]¶ Return iTEM region for a country’s ISO 3166 alpha-3 code, or “N/A”.
-
item.historical.
input_file
(id: int)[source]¶ Return the path to a cached, raw input data file for data source id.
CSV files are located in the ‘historical input’ data path. If more than one file has a name beginning with “T{id}”, the last sorted file is returned.
-
item.historical.
iso_alpha_3
(name: str) → str[source]¶ Return ISO 3166 alpha-3 code for a country name.
- Parameters
name (str) – Country name. This is looked up in the pycountry ‘name’, ‘official_name’, or ‘common_name’ field. Replacements from
COUNTRY_NAME
are applied.
-
item.historical.
process
(id: Union[int, str]) → pandas.core.frame.DataFrame[source]¶ Process a data set given its id.
Performs the following common processing steps:
Fetch the unprocessed upstream data, or load it from cache.
Load a module defining dataset-specific processing steps. This module is in a file named e.g.
T001.py
.Call the dataset’s (optional)
check()
method. This method receives the input data frame as an argument, and can make one or more assertions to ensure the data is in the expected format. Ifassert False
or any other exception occurs here, processing fails.Drop columns in the dataset’s (optional)
COLUMNS['drop']
list
.Call the dataset-specific (required)
process()
method. This method receives the data frame from step (4), performs any additional processing, and returns a data frame.If the
REF_AREA
dimension is not already populated, assign ISO 3166 alpha-3 codes, using a column containing country names: eitherCOLUMNS['country_name']
or the default, ‘Country’. Seeiso_alpha_3()
.Assign values to other dimensions:
From the dataset’s (optional)
DATAFLOW
variable. This variable indicates one of the data flows and corresponding data structure definitions (DSDs) in the iTEM data structures. For each dimension in the “full” (HISTORICAL
) DSD but not in this dataflow, fill in with “_Z” (not applicable) values.From the dataset’s (optional)
COMMON_DIMS
dict
.
Order columns according to the
HISTORICAL
data structure.Check for missing values or missing dimension labels. A fully cleaned data set has none.
Output data to two files. See
cache_results()
.
- Parameters
id (int) – Data source id.
- Returns
The processed data.
- Return type
-
item.historical.
source_str
(id: Union[int, str]) → str[source]¶ Return the canonical string name (e.g.
"T001"
) for a data source.
-
item.historical.
REGION
¶ Map from ISO 3166 alpha-3 code to iTEM region name.
-
item.historical.
SOURCES
← contents of sources.yaml¶ The current version of the file is always accessible at https://github.com/transportenergy/metadata/blob/master/historical/sources.yaml
T000¶
Data cleaning code and configuration for T000.
-
item.historical.T000.
COLUMNS
= {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Unit', 'Unit Code', 'PowerCode Code', 'PowerCode', 'Reference Period Code', 'Reference Period', 'Flag Codes', 'Flags']}¶ Columns to drop from the raw data.
-
item.historical.T000.
COMMON_DIMS
= {'automation': '_T', 'operator': '_T', 'service': 'P', 'source': 'International Transport Forum', 'technology': '_T', 'unit': '10^9 passenger-km / yr', 'variable': 'Activity'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T000.
DATAFLOW
= 'ACTIVITY'¶ iTEM data flow matching the data from this source.
-
item.historical.T000.
mode_and_vehicle_type
(variable_name)[source]¶ Determine ‘mode’ and ‘vehicle type’ from ‘variable’.
The rules implemented are:
Variable
Mode
Vehicle type
Rail passenger transport
Rail
All
Road passenger transport by buses and coaches
Road
Bus
Road passenger transport by passenger cars
Road
LDV
Total inland passenger transport
All
All
T000:
url: https://stats.oecd.org/index.aspx?queryid=79863
name: "Passenger transport: Inland passenger transport"
fetch:
type: SDMX
source: OECD
resource_id: ITF_PASSENGER_TRANSPORT
key: .T-PASS-TOT-INLD+T-PASS-RL-TOT+T-PASS-RD-TOT+T-PASS-RD-CAR+T-PASS-RD-BUS
validate: false
T001¶
Data cleaning code and configuration for T001.
This module:
Detects and corrects #32, a data error in the upstream source where China observation values for years 1990 to 2001 inclusive are too low by 2 orders of magnitude (see also #57).
-
item.historical.T001.
COLUMNS
= {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Flag Codes', 'Flags', 'PowerCode Code', 'PowerCode', 'Reference Period Code', 'Reference Period', 'Unit Code', 'Unit']}¶ Columns to drop from the raw data.
-
item.historical.T001.
COMMON_DIMS
= {'automation': '_T', 'mode': 'Shipping', 'operator': '_T', 'service': 'F', 'source': 'International Transport Forum', 'technology': '_T', 'variable': 'Activity', 'vehicle': 'Coastal'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T001.
DATAFLOW
= 'ACTIVITY'¶ iTEM data flow matching the data from this source.
-
item.historical.T001.
FIX_32
= False¶ Flag for whether #32 is detected by
check()
and should be fixed byprocess()
.
-
item.historical.T001.
process
(df)[source]¶ Process data set T001.
Drop null values.
Convert from Mt km / year to Gt km / year.
T001:
name: Coastal Transport
fetch:
type: SDMX
source: OECD
resource_id: ITF_GOODS_TRANSPORT
key: .T-SEA-CAB
validate: false
T002¶
Data cleaning code and configuration for T002.
-
item.historical.T002.
COLUMNS
= {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Unit Code', 'PowerCode Code', 'PowerCode', 'Reference Period Code', 'Reference Period', 'Flag Codes', 'Flags']}¶ Columns to drop from the raw data.
-
item.historical.T002.
COMMON_DIMS
= {'automation': '_T', 'fuel': '_T', 'operator': '_T', 'service': 'Freight', 'source': 'International Transport Forum', 'technology': '_T', 'vehicle': 'Container'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T002.
DATAFLOW
= 'ACTIVITY'¶ iTEM data flow matching the data from this source.
T002:
name: Container Transport
fetch:
type: SDMX
source: OECD
resource_id: ITF_GOODS_TRANSPORT
key: .T-CONT-RL-TEU+T-CONT-RL-TON+T-CONT-SEA-TEU+T-CONT-SEA-TON
validate: false
T003¶
Data cleaning code and configuration for T003.
The input data contains the variable names in VARIABLE_MAP
. A new sum is
computed, mode=”Inland ex. pipeline” that is the sum of the variables in
PARTIAL
, i.e. excluding “Pipelines transport”.
-
item.historical.T003.
COLUMNS
= {'drop': ['COUNTRY', 'VARIABLE', 'YEAR', 'Flag Codes', 'Flags', 'PowerCode', 'PowerCode Code', 'Reference Period Code', 'Reference Period', 'Unit Code', 'Unit']}¶ Columns to drop from the raw data.
-
item.historical.T003.
COMMON_DIMS
= {'automation': '_T', 'service': 'F', 'source': 'International Transport Forum', 'technology': '_T', 'unit': 'Gt km / year', 'variable': 'Activity'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T003.
DATAFLOW
= 'ACTIVITY'¶ iTEM data flow matching the data from this source.
-
item.historical.T003.
PARTIAL
= ['Rail freight transport', 'Road freight transport', 'Inland waterways freight transport']¶ Variables to include in a partial sum.
-
item.historical.T003.
VARIABLE_MAP
= {'Inland waterways freight transport': {'mode': 'Shipping', 'vehicle': 'Inland'}, 'Pipelines transport': {'mode': 'Pipeline', 'vehicle': 'Pipeline'}, 'Rail freight transport': {'mode': 'Rail'}, 'Road freight transport': {'mode': 'Road'}, 'Road freight transport for hire and reward': {'mode': 'Road', 'operator': 'HIRE'}, 'Road freight transport on own account': {'mode': 'Road', 'operator': 'OWN'}, 'Total inland freight transport': {'mode': 'Inland'}}¶ Mapping from Variable to mode and vehicle_type dimensions.
-
item.historical.T003.
process
(df)[source]¶ Process data set T003.
Remove null values.
Convert units from Mt km / year to Gt km / year.
Lookup and assign “MODE” and “VEHICLE” dimensions based on “VARIABLE”, using
VARIABLE_MAP
.Compute partial sums that exclude pipelines.
Concatenate the partial sums to the original data.
Sort.
T003:
name: Inland Freight Transport
fetch:
type: SDMX
source: OECD
resource_id: ITF_GOODS_TRANSPORT
key: .T-GOODS-TOT-INLD+T-GOODS-RL-TOT+T-GOODS-RD-TOT+T-GOODS-RD-REW+T-GOODS-RD-OWN+T-GOODS-IW-TOT+T-GOODS-PP-TOT
validate: false
T004¶
Data cleaning code and configuration for T004.
Notes:
The input data is does not express the units, which are single vehicles.
Todo
The input data have labels like “- LPG” in the “Fuel type” column, with the hyphen possibly indicating a hierarchical code list. Find a reference to this code list.
The code currently uses some inconsistent labels, such as:
“Liquid-Bio” (no spaces) vs. “Liquid - Fossil” (spaces).
“Natural Gas Vehicle” vs. “Conventional” (word “Vehicle” is omitted).
Fix these after PR #62 is merged by using code lists for these dimensions.
Add code to fetch this source automatically. It does not have a clearly-defined API.
Capture and preserve the metadata provided by the UNECE data interface.
-
item.historical.T004.
COLUMNS
= {'drop': ['Frequency']}¶ Columns to drop from the raw data.
-
item.historical.T004.
COMMON_DIMS
= {'fleet': 'NEW', 'mode': 'Road', 'source': 'UNECE', 'unit': 'vehicle', 'variable': 'Sales'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T004.
CSV_SEP
= ';'¶ Separator character for
pandas.read_csv()
.
-
item.historical.T004.
DATAFLOW
= 'SALES'¶ iTEM data flow matching the data from this source.
-
item.historical.T004.
MAP
= {'Fuel type': {'- Bi-fuel vehicles': ('IC', 'BIOFUEL'), '- Biodiesel': ('IC', 'BIODIESEL'), '- Bioethanol': ('IC', 'BIOETH'), '- Compressed natural gas (CNG)': ('IC', 'CNG'), '- Diesel (excluding hybrids)': ('NONHYB', 'DIESEL'), '- Electricity': ('BEV', 'ELEC'), '- Hybrid electric-diesel': ('HYBRID', 'DIESEL'), '- Hybrid electric-petrol': ('HYBRID', 'PETROL'), '- Hydrogen and fuel cells': ('FC', 'H2'), '- LPG': ('IC', 'LPG'), '- Liquefied natural gas (LNG)': ('IC', 'LNG'), '- Petrol (excluding hybrids)': ('NONHYB', 'GASOLINE'), '- Plug-in hybrid diesel-electric': ('PHEV-G', 'ELEC'), '- Plug-in hybrid petrol-electric': ('PHEV-D', 'ELEC'), 'Alternative (total)': ('Alternative', 'Alternative'), 'Diesel': ('IC', 'DIESEL'), 'Petrol': ('IC', 'GASOLINE'), 'Total': ('_T', '_T'), '_dims': ('TECHNOLOGY', 'FUEL')}, 'Type of vehicle': {'New light goods vehicles': ('F', 'Light Truck'), 'New lorries (vehicle wt over 3500 kg)': ('F', 'Heavy Truck'), 'New motor coaches, buses and trolley buses': ('F', 'Bus'), 'New passenger cars': ('P', 'LDV'), 'New road tractors': ('F', 'Medium Truck'), '_dims': ('SERVICE', 'VEHICLE')}}¶ Mapping between existing values and values to be assigned.
T004:
url: https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT__40-TRTRANS__03-TRRoadFleet/08_en_TRRoadNewVehF_r.px/?rxid=674effaa-3926-4d2e-9d6d-abfd7dd196b8
name: New Road Vehicle Registrations by Vehicle Category and Fuel Type
T005¶
-
item.historical.T005.
COLUMNS
= {'drop': ['IPCC_description', 'IPCC-Annex', 'Name', 'World Region']}¶ Columns to drop from the raw data.
-
item.historical.T005.
COMMON_DIMS
= {'fuel': '_T', 'lca_scope': 'TTW', 'pollutant': 'CO2', 'service': '_T', 'source': 'JRC', 'technology': '_T', 'variable': 'Emissions', 'vehicle': '_T'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T005.
DATAFLOW
= 'EMISSIONS'¶ iTEM data flow matching the data from this source.
-
item.historical.T005.
MAP_MODE
= {'1.A.3.a': 'Air', '1.A.3.b': 'Road', '1.A.3.c': 'Rail', '1.A.3.d': 'Water', '1.A.3.e': 'Other'}¶ Map from IPCC emissions category codes to iTEM
CL_MODE
values. The actual descriptions appear in theIPCC_description
column, which is discarded.1.A.3.a: Civil Aviation
1.A.3.b: Road Transportation
1.A.3.c: Railways
1.A.3.d: Water-borne Navigation
1.A.3.e: Other Transportation
-
item.historical.T005.
process
(df)[source]¶ Process T005.
Select only measures with IDs beginning “1.A.3”.
Map from the IPCC emissions category (e.g. “1.A.3.a”) to mode (e.g. “Air”); see
map_mode()
.Melt from wide to long format.
Drop NA values.
Use “_X” (not allocated/unspecified) as the region for international shipping and aviation.
Convert from Mt/a to Gt/a.
T005:
name: Passenger Road Vehicle Fleet and rate per thousand inhabitants by Vehicle Category
fetch:
type: OpenKAPSARC
dataset_id: passenger-road-vehicle-fleet-and-rate-per-thousand-inhabitants-by-vehicle-catego
url: https://datasource.kapsarc.org/explore/dataset/passenger-road-vehicle-fleet-and-rate-per-thousand-inhabitants-by-vehicle-catego
T006¶
-
item.historical.T006.
COLUMNS
= {'drop': ['Frequency', 'Measure']}¶ Columns to drop from the raw data.
-
item.historical.T006.
COMMON_DIMS
= {'automation': '_T', 'operator': '_T', 'service': 'F', 'source': 'Eurostat', 'technology': '_T', 'unit': 'percent', 'variable': 'Activity, share of volume'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T006.
CSV_SEP
= ';'¶ Separator character for
pandas.read_csv()
.
-
item.historical.T006.
DATAFLOW
= 'ACTIVITY'¶ iTEM data flow matching the data from this source.
T006:
name: Passenger Transport
fetch:
type: OpenKAPSARC
dataset_id: passenger-transport
url: https://datasource.kapsarc.org/explore/dataset/passenger-transport/
T007¶
-
item.historical.T007.
COLUMNS
= {'drop': ['Frequency', 'Measure']}¶ Columns to drop from the raw data.
-
item.historical.T007.
COMMON_DIMS
= {'automation': '_T', 'operator': '_T', 'service': 'P', 'source': 'Eurostat', 'technology': '_T', 'unit': 'percent', 'variable': 'Activity, share of distance'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T007.
CSV_SEP
= ';'¶ Separator character for
pandas.read_csv()
.
-
item.historical.T007.
DATAFLOW
= 'ACTIVITY'¶ iTEM data flow matching the data from this source.
T007:
name: New Passenger Car Registrations by Fuel type
fetch:
type: OpenKAPSARC
dataset_id: new-passenger-car-registrations-by-fuel-type
url: https://datasource.kapsarc.org/explore/dataset/new-passenger-car-registrations-by-fuel-type/
T008¶
-
item.historical.T008.
COLUMNS
= {'drop': ['Frequency']}¶ Columns to drop from the raw data.
-
item.historical.T008.
COMMON_DIMS
= {'fuel': '_T', 'mode': 'Road', 'service': 'Passenger', 'source': 'UNECE', 'technology': '_T', 'variable': 'Stock'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T008.
CSV_SEP
= ';'¶ Separator character for
pandas.read_csv()
.
-
item.historical.T008.
DATAFLOW
= 'STOCK'¶ iTEM data flow matching the data from this source.
T008:
name: New Road Vehicle Registrations by Vehicle Category and Fuel type
fetch:
type: OpenKAPSARC
dataset_id: new-road-vehicle-registrations-by-vehicle-category-and-fuel-type
url: https://datasource.kapsarc.org/explore/dataset/new-road-vehicle-registrations-by-vehicle-category-and-fuel-type/
T009¶
Data cleaning code and configuration for T009.
-
item.historical.T009.
DATAFLOW
= 'STOCK'¶ iTEM data flow matching the data from this source.
-
item.historical.T009.
FETCH
= True¶ Fetch directly from the source, or cache.
-
item.historical.T009.
map_service
(value)[source]¶ Determine ‘service’ dimension based on a vehicle type.
-
item.historical.T009.
process
(df)[source]¶ Process input data for data set T009.
Assign “SERVICE” based on “VEHICLE” values.
Assign “TECHNOLOGY” by stripping “- ” prefix from “fuel_type_name” values.
T009:
name: Road Vehicle Fleet by Vehicle Category and Fuel Type
fetch:
type: OpenKAPSARC
dataset_id: road-vehicle-fleet-by-vehicle-category-and-fuel-type
url: https://datasource.kapsarc.org/explore/dataset/road-vehicle-fleet-by-vehicle-category-and-fuel-type/
T010¶
Data cleaning code and configuration for T010.
-
item.historical.T010.
COLUMNS
= {'country_name': 'REGIONS/COUNTRIES'}¶ Column name to map to ISO 3166 alpha-3 codes.
-
item.historical.T010.
COMMON_DIMS
= {'mode': 'Road', 'service': 'Freight', 'source': 'International Organization of Motor Vehicle Manufacturers', 'technology': '_T', 'unit': '10^6 vehicle', 'variable': 'Stock', 'vehicle': '_T'}¶ Dimensions and attributes which do not vary across this data set.
NB “_T” the code for “Total”, is used for the ‘TECHNOLOGY’ and ‘VEHICLE’ dimensions, since this data set provides totals.
-
item.historical.T010.
DATAFLOW
= 'STOCK'¶ iTEM data flow matching the data from this source.
-
item.historical.T010.
process
(df)[source]¶ Process data set T010.
Melt from wide to long format.
Remove the ‘,’ thousands separators from the values in the ‘VALUE’ column; convert to
float
.Drop null values.
Convert units from 10³ vehicles to 10⁴ vehicles.
T010:
name: Volume of passenger transport relative to GDP
fetch:
type: OpenKAPSARC
dataset_id: volume-of-passenger-transport-relative-to-gdp
url: https://datasource.kapsarc.org/explore/dataset/volume-of-passenger-transport-relative-to-gdp/
T012¶
Data cleaning code and configuration for T012.
-
item.historical.T012.
COLUMNS
= {'country_name': 'Region, subregion, country or area *', 'drop': ['Index', 'Variant', 'Notes', 'Country code', 'Parent code']}¶ Column names:
drop
: to drop from the raw data.country_name
: to map to ISO 3166 codes.
-
item.historical.T012.
COMMON_DIMS
= {'source': 'United Nations', 'unit': '10^6 people', 'variable': 'Population'}¶ Dimensions and attributes which do not vary across this data set.
-
item.historical.T012.
DATAFLOW
= 'POPULATION'¶ iTEM data flow matching the data from this source.
-
item.historical.T012.
process
(df)[source]¶ Process data set T012.
Select only rows with
Type == "Country/Area"
; then drop this column.Rename “Channel Islands” (ISO 3166 numeric code 830) with 831 (Jersey), the larger (compared to 832/Guernsey) of the two Channel Islands. Code 830 does not exist.
Melt from wide to long format.
Remove spaces from strings in the “Value” column; convert to numeric.
Drop null values.
Convert units from 10³ persons to 10⁶ persons.
T012:
name: Modal split of passenger transport
fetch:
type: OpenKAPSARC
dataset_id: modal-split-of-freight-transport
url: https://datasource.kapsarc.org/explore/dataset/modal-split-of-freight-transport/