Child pages
  • API - FAQ - TSV data format

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

The TSV format available in the SDMX 2.1 and 3.0 APIs is the only format specific to Eurostat.

This format originates from the tab-delimited data files provided previously via Eurostat Bulk Download Facility .

While usage of standard format issued from SDMX standard is recommended, this format is kept for compatibility with existing clients and ease of use.

Details on the format

TSV API responses

Raw input from legacy documentation

‘Tsv’ files are flat files that include a ‘tab delimited’ sequence of values time-series in each line 
instead of one value per line/record .
– In most files the sequences of values are time series.
– For datasets without the dimension time (e.gas in SDMX-CSV.

– Contains one Header line then one or more Data lines

. ‘area of the regions’), or that cover only 
one period of time, the sequences of values are not time series but another dimension, 
e.g. geographical series.
– The columns (or fields or cells) of the records are ‘tab delimited’.

– Time series lines are sorted in ascending alphabetical order on their seriesKeys identifier, i.e on the  first column,

(warning) ATTENTION: time-series descending order (for explanation see the chapter ‘Hints for 
Excel users’).
– ATTENTION: cells for which there is "no data available" at all are NOT stored in the tsv 
files on the bulk download, doing so would explode the size of the tsv files. When using 
the on-line extraction tools like Tables, Graphs and Maps or Data Explorer, such cells 
contain the symbol ":"
EXAMPLE
Dataset with time series (with made-up values)
present in the tsv file


Info

In below examples,

  • orange dot represents a space
  • orange arrow represents a tabulation

Header line

First line of the TSV file, for example

Image Added

SeriesKeys column (first column)


sequence of dimension codes separated by a comma providing the format of the time-series seriesKeys identifier used in data lines followed by a back slash and the time dimension code that is always TIME_PERIOD in SDMX standard \TIME_PERIOD




Code Block
freq,unit,s_adj,

...

nace_r2,

...

indic,geo\TIME_PERIOD

For each of these dimension code there is a corresponding SDMX codelist with the same code available also in TSV format 

(minus) TODO dimensions label from the concept in TSV?


Observation column(s)

In the header line, other columns contains the observation time period

Observation columns are sorted in ascending 
order

(warning) Trailing space is important to align columns when 

The notation follows SDMX and ISO8601 standards  ( (info) characters in bold are fixed) 

PeriodFormatExample
yearYYYY2015
semester YYYY-SN2015-S1
quarter YYYY-QN2015-Q4
monthYYYY-MM2015-02
weekYYYY-WNN2015-W01
dayYYYY-MM-DD2015-12-31

(minus) TODO note on multi-freq

Data line(s)

First column

(minus) seriesKeys

Observation column

(minus) Not application vs Not available




EXAMPLEtime 2004m05 2004m04 2004m03 2004m02
mio-eur,nsa,ext_eurozone,net,bp-100,eurozone 11148 10660 13398 9437
mio-eur,nsa,ext_eurozone,net,bp-200,eurozone 3386e 539 -185 -432
mio-eur,nsa,ext_eurozone,net,bp-300,eurozone -5626e -6696i 1902 919
mio-eur,nsa,ext_eurozone,net,bp-379,eurozone -5758e -4165 -3970 -4703
mio-eur,nsa,ext_eurozone,net,bp-993,eurozone 3151.5e 338.7i 11146.1 5221.0
mio-eur,nsa,ext_eurozone,net,bp-994,eurozone 2314 669 543 2113
mio-eur,nsa,ext_eurozone,net,bp-010,eurozone 5465.1 1006.5 11689.0 7334.3– First line: header.
– Other lines: records with the sequence of values.
– First column — first line: sequence of codes separated by a comma followed by a 
code separated by a back slash ‘\’
The codes separated by a comma ‘,’ are the ‘names’ of the dimensions used for 
identifying each (time) series.
For each of these codes there is a file (with the same name plus the extension 
‘dic’) in the directory dic.
The code separated by a back slash ‘\’ is the ‘name’ of the dimension of the 
sequence of values, e.g. ‘time’ (if this is a time series) or ‘geo’ (in the case of a 
geographical series).
– First column except the first line: sequence of codes separated by a comma ‘,’ 
that represent the ‘names’ of the items (or instances or positions) of the 
dimensions. The label/title of these codes can be found in the ‘dic’ file that has 
the same name of the corresponding dimension.

– Other columns of the first line: sequence of codes corresponding to the items of 
the dimension. 
– All other columns but the first line: sequence of values.
Where available, flags are attached to values. The separator used between values 
and flags is a blank. If there are no flags, the value is followed by a blank.
– The decimal symbol used in the files is the dot ‘.’.Note for Excel users: these files can be straightforwardly opened in Excel (see chapter 

Hints for Excel users

...

Should copy and refresh review the section 3. HINTS FOR EXCEL USERS

...

from Bulk merged with the updated guide in the Migrating PDF


Important changesRaw input from Migrating to API TSV

Should be converted to a confluence page with minor adaptation

...