Child pages
  • API - FAQ - TSV data format

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The TSV format available in the SDMX 2.1 and 3.0 APIs is the only a format specific to Eurostat.

This format originates from the tab-delimited data files provided previously by Eurostat via Eurostat Bulk Download Facility.

While usage of standard format issued from SDMX standard is recommended, this format is kept for compatibility with existing clients and ease of use.

...

– Time series lines are sorted in ascending alphabetical order on their seriesKeys identifier, i.e on the  the first column,

(warning) ATTENTION: time-series for which there is "no data available" at all are NOT present in the tsv file

...

First line of the TSV file, for example from the header line of  https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/ei_isin_m?format=TSV

SeriesKeys column (first column)

...

(minus) TODO dimensions label from the concept in TSV?

Observation column(s)

In the header line, other columns contains the observation time period

Observation columns are sorted in ascending 
order

(warning) Trailing space is important to align columns when 

ascending order. The notation follows SDMX and ISO8601 standards  ( (info) characters in bold are fixed) 

(warning) There is a trailing space in each column that correspond to the separator between observation value and abservation statuses, it is important to align columns when considering both tab and space as column separator


PeriodFormatExample
yearYYYY2015
semester YYYY-SN2015-S1
quarter YYYY-QN2015-Q4
monthYYYY-MM2015-02
weekYYYY-WNN2015-W01
dayYYYY-MM-DD2015-12-31

...


Data line(s)

...

SeriesKeys column (first column

...

(minus) seriesKeys

Observation column

(minus) Not application vs Not available

EXAMPLE

)

sequence of dimension codes separated by a comma providing the seriesKeys identifier for the data time series for the current line.

Code Block
M,I2015,CA,B,IS-IP,AT

This information uniquely identify the time-series of the dataset.

Dimension codePosition codePosition label
freqMMonthly
unitI2015Index, 2015=100
s_adjCACalendar adjusted data, not seasonally adjusted data
nace_r2BMining and quarrying
indicIS-IPProduction index
geoATAustria

Observation column(s)

– Other columns of the first line: sequence of codes corresponding to the items of 
the dimension. 
All other columns but the first line : represent the sequence of values.observation value and statuses corresponding to the time period in the header line.

When an observation value is not available a special character colon is displayed instead ":"

Where available, status flags are attached to values. The separator used between values 
values and flags statuses is a blank. If there are no flags, the value is followed by a blank.
– The decimal symbol used in the files is the dot ‘.’.

Hints for Excel users

Should review the section 3. HINTS FOR EXCEL USERS from Bulk merged with the updated guide in the Migrating PDF

Important changesRaw input from Migrating to API TSV

Should be converted to a confluence page with minor adaptation

...

simple space.

(warning) When an observation have not statuses, there remains a trailing space in the column to keep columns aligned when considering both tab and space as column separator


Following excerpt show two time-series for which there are data since the first available time position

Image Added

For Ireland (IE) there are observation values without statuses, so the extra space character.

For Netherlands (NL) the observalue values have a "provisional" status so the p status flag is attached to the value, separated by a space

Special case : datasets with multiple time frequencies

When a dataset contains time series of different frequencies, the data file contains one row for each frequency respecting the general rule of alphabetical order.

The columns will be ordered according to the relevant frequency code alphabetical order, so for a same year and a dataset with Annual, Quarterly and Monthly data, the order will be

Annual, then Monthly and finally Quarterly observations, please see below example come from AVIA_GOR_ME dataset

freq unit tra_meas airp_pr\TIME_PERIOD ...2019 2019-01 2019-02...2019-112019-122019-Q12019-Q22019-Q32019-Q4...
ATFRM_BRD ME_LYPG_AT_LOWW ...70.6::...::::::...
[ ... ] 
MTFRM_BRD ME_LYPG_AT_LOWW ...:5.14.4...6.75.8::::...
[ ... ] 
QTFRM_BRD ME_LYPG_AT_LOWW ...:::...::14.715.219.221.4...

Due to this format, this situation must extend the meaning of the ':' special character from "not available" to "not applicable" as it is logically that there cannot be data for 2019 on the monthly time-serie.