Introduction
The SDMX initiative sets standards to facilitate the exchange of statistical data and metadata using modern information technology. SDMX has also been published as an ISO International Standard (ISO 17369).
The operations in this API supports SMDX-2.1 artefacts and implements the 2.1 version of SDMX Guidelines for the use of Web Services.
To make the most of this guide, a basic knowledge of XML and REST webservices is required.
The main elements are refered to as SDMX artefacts. Below are some terms that are used in SDMX and their definitions:
Dataset: a collection of related observations, organized according to a predefined structure
Data Structure Definition (DSD): metadata describing the structure and organization of a dataset, the statistical concepts and attached to them code lists used within the dataset
Dimensions: concepts that determine the dataset’s "physical" structure
Codelist: a code list is a predefined list from which some statistical coded concepts take their values. Each code list has the following properties:
identifier (it provides a unique identification within the set of code lists specified by a structural definitions maintenance agency);
name (also unique);
description (a description of the purpose of the code list); and
code value length (either an exact or a maximum number of characters and a type, i.e. numeric or alphanumeric).
Attributes: give additional information about the concepts used and do not affect the dataset structure itself
Dataflow: a structure which describes, categorizes and constrains the allowable content of a dataset that providers supply for different reference periods
Concept scheme: the descriptive information for an arrangement or division of concepts into groups based on characteristics, which the objects have in common. A concept scheme is a maintained list of concepts that are used in key family and metadata structure definitions (Definitions from EUROSTAT SDMX info space and OECD Glossary of statistical terms)
For in depth details, check as well the learning section of SDMX.org or the formal definition of the SDMX information model.
About versioned artefacts
All main artefacts could be versioned. In the current API, only the following structural artefacts are versioned: Code lists, Concept Schemes, Data structure definitions.
It means that such artefacts versions are identified by a version number and safe to be copied/cached for further reference.
Other artefacts : Dataflow, ContentConstraint will always have the default version '1.0' and need to be requested again for updates.
Retrieving Structural metadata artefacts
Starting from the online data code of a dataset of choice, it is possible to query the API for detailed metadata on this data
Looking up in the metadata of a dataset
in the SDMX Dataflow
Taking the dataset ISOC_CI_ID_H as example, its main information are available in its Dataflow SDMX artefact
This resource is not versioned, so 1.0 and latest can be used interchangeably
The minimal response would always contains the dataset label and the reference to the versioned DSD currently used by the dataset
<s:Dataflow id="ISOC_CI_ID_H" urn="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=ESTAT:ISOC_CI_ID_H(1.0)" agencyID="ESTAT" version="1.0" isFinal="false"> <c:Name xml:lang="de">Haushalte - Verfügbarkeit von Internet-Geräten</c:Name> <c:Name xml:lang="en">Households - devices to access the internet</c:Name> <c:Name xml:lang="fr">Ménages - dispositifs pour accéder à l'internet</c:Name> <s:Structure> <Ref id="ISOC_CI_ID_H" version="28.0" agencyID="ESTAT" package="datastructure" class="DataStructure"/> </s:Structure> </s:Dataflow>
Additionally a set of annotations (omitted in previous example, please expand full XML below) would provide additional information
Annotation type | Description | Value(s) (in AnnotationTitle or multi-lingual AnnotationText) |
---|---|---|
OBS_COUNT | Number of statisticals observations in the dataset | 95814 |
OBS_PERIOD_OVERALL_OLDEST | Oldest TIME position reported in an observation | 2002 |
OBS_PERIOD_OVERALL_LATEST | Latest TIME position reported in an observation | 2014 |
UPDATE_STRUCTURE | Timestamp when the dataset structure last changed
| 2021-02-08T23:00:00+0100 |
UPDATE_DATA | Timestamp when the dataset data last changed | 2023-05-10T11:00:00+0200 |
ESMS_HTML | Link to Reference Metadata page | https://ec.europa.eu/eurostat/cache/metadata/en/isoc_i_esms.htm |
ESMS_SDMX | Link to Reference Metadata archive | https://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing?file=metadata/isoc_i_esms.sdmx.zip |
SOURCE_INSTITUTIONS | Source institution | Eurostat |
in the referenced data structure definition (DSD)
From the reference present in the dataflow, it is possible to query for the corresponding SDMX Data Structure Definition
DSD Link | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/datastructure/ESTAT/ISOC_CI_ID_H/28.0 |
---|
These resources are versioned, so version present in the reference must be used to ensure consistency.
This definition is informing about the list of dimensions used in the definition of the time-series of the dataset.
The order of dimensions will help build key filtering in the Data query later
For each dimension a reference is provided
to the to the concept holding the dimension label
to the code lists holding the code and labels for the dimension positions
These code lists are reference metadata and may contains more code and labels that the one used by a specific dataset.
To known the list of positions present in the dataset, please refer to the Content Constraint artefact (next section).
Additionally the DSD defines
- the mandatory TIME_PERIOD period dimension where the value are expressed using ISO8601
- the primary measure OBS_VALUE holding the statistical value observation the
the optional value attribute OBS_FLAG hodling the statistical status (also refered as flags)
in the SDMX Content Constraint
Content Constraint Link | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/contentconstraint/ESTAT/ISOC_CI_ID_H/1.0 |
---|
In the API, SDMX Content Constraints artefacts are used to define available positions, it lists for each dimension code the list of position code that are used at least once to refer to a statistical observation.
From example below we can see that ISOC_CI_ID_H dataset
- provides annual data ( freq = A)
- provides data for 14 indicators
- provides data in 2 units
- provides a breakdown on hhtyp ("Type of Household")
- provides data for EU aggregates and countries
- provides from 2002 to 2010 and 2014
retrieve several artefacts in a single response
It is not necessary to do these call one by one.
Starting back from the Dataflow it is possible to include the referenced artefacts, at two different level
Scope | Link |
---|---|
Dataflow + DSD | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/ISOC_CI_ID_H/1.0?references=children |
Dataflow + DSD + CS and CL | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/ISOC_CI_ID_H/1.0?references=descendants |
Dataflow + DSD + CS and CL filtered on the constraints | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/ISOC_CI_ID_H/1.0?references=descendants&detail=referencepartial |
Special case of Dataset listing
Instead of specifying a dataset code in the dataflow request the ALL keyword can be used to retrieve a list of all Eurostat datasets in one request
Special case of Metadata harvesting
Similarly to the request on ALL dataflows it is possible to get the latest version for all artefacts for a specified type
Scope | Link |
---|---|
All code lists | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/codelist/ESTAT/all |
All concept schemes | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/conceptscheme/ESTAT/all |
All data structure definitions | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/datastructure/ESTAT/all |
Data query
in SDMX 2.1, the data query directly use the dataflow identifier without agencyID
Above link retrieves the complete dataset in default format : SDMX-ML 2.1 Generic Data.
The data file is compose of time-series identified by a series-keys containing Observation as the one show below
<g:Series> <g:SeriesKey> <g:Value id="geo" value="EA"/> <g:Value id="hhtyp" value="TOTAL"/> <g:Value id="unit" value="PC_HH_IACC"/> <g:Value id="indic_is" value="H_IPC"/> <g:Value id="freq" value="A"/> </g:SeriesKey> <g:Obs> <g:ObsDimension value="2003"/> <g:ObsValue value="96.27"/> </g:Obs> <g:Obs> <g:ObsDimension value="2004"/> <g:ObsValue value="96.02"/> </g:Obs> <g:Obs> <g:ObsDimension value="2005"/> <g:ObsValue value="96.27"/> </g:Obs> <g:Obs> <g:ObsDimension value="2006"/> <g:ObsValue value="96.23"/> </g:Obs> <g:Obs> <g:ObsDimension value="2007"/> <g:ObsValue value="96.59"/> </g:Obs> <g:Obs> <g:ObsDimension value="2008"/> <g:ObsValue value="84.89"/> </g:Obs> <g:Obs> <g:ObsDimension value="2009"/> <g:ObsValue value="96.81"/> </g:Obs> <g:Obs> <g:ObsDimension value="2010"/> <g:ObsValue value="97.16"/> </g:Obs> <g:Obs> <g:ObsDimension value="2014"/> <g:ObsValue value="95.63"/> </g:Obs> </g:Series>
It is possible to further customize the query to retrieve only the needed data or to request a different output format :
Filtering on series-keys
Filtering in SDMX REST web service is done by filtering the on the series-keys following the dimension order as specified in the DSD
In the example of ISOC_CI_ID_H, the series-keys template is as follow
FREQ.INDIC_IS.UNIT.HHTYP.GEO
with the following syntax:
- a blank means no filtering for this dimension
- Several values for a dimension must be separated by a '+' character
Scope | Details on the series-keys filter | Link | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Single time-series fully specified |
| https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H/A.H_IPC.PC_HH_IACC.TOTAL.EA | ||||||||||||
EU27 and EA data | As the ....EU27_2020+EA | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H/....EU27_2020+EA |
Filtering on time period
Filtering the observations to be returned based on their TIME_PERIOD value is controller via a FROM-TO filter with the query parameter startPeriod and endPeriod
Reusing above single time-series example, it can be restricted to 2008 to 2010 as follow
https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H/A.H_IPC.PC_HH_IACC.TOTAL.EU27_2020?startPeriod=2008&endPeriod=2010
Supported format
An additional format parameter allows to request a response in a different semantic format
Format | Description | Link |
---|---|---|
SDMX-ML 2.1 Structured Data | More compact XML format | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H?format=SDMX_2.1_STRUCTURED |
SDMX-CSV 1.0 | SDMX standardized CSV format | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H?format=SDMX-CSV |
TSV | Eurostat specific format | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H?format=TSV |
JSONstat | JSON format usable with JSON-stat toolkit | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H?format=JSON |
Compression
An additional compressed parameter allows to optimize network transfer by retrieving the content compressed as GZIP
Compressed TSV data link | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ISOC_CI_ID_H?format=TSV&compressed=true |
---|
Retrieving Navigation artefacts
It is worth to mention that secondary artefacts exists to represent as SDMX artefacts a classification of dataset in categories (also refered as Navigation Tree in Eurostat)
- Category Scheme : Hierarchy of categories
- Categorisation : one categorisation is referencing one dataset into a category of a Category Scheme
Scope | Link |
---|---|
All category schemes | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/categoryscheme/ESTAT/all |
All categorisations | https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/categorisation/ESTAT/all |