Child pages
  • API - Examples - periodic data download

Context

This page provides examples of data download requests via API links in order to support recurrent data retrievals.

When a user would find himself periodically repeating the download of some datasets within the Data Browser "Downloads" interface, he could automate it thanks to a scripting solution of his choice.

Below examples are given to help out towards automation. Feel free to apply the same logic in the scripting language of your choice.

Example 1 : periodically downloading a small set of datasets

This is done in two steps:

  • first the download program needs to know which datasets you are interested in;
  • then you have to launch the appropriate script in Powershell to download them.

Step 1 : retrieve the list of your (selected) datasets

This step will helps you to retrieve a file named Filtered_Items_List_EN.txt that contains tab-separated values. See detailed description of this file structure

Such delimited-separator file is easy to use in scripting languages.

Step-by-step:

  • Go to the Data Browser Downloads page
  • Select the datasets you are interested in. In the example below: AACT_ALI01 and AACT_EAA01
  • In ‘Download operations’, select ‘Download filtered list of items (txt)’ as shown in the screenshot below, 

  • You will get a zip file with a text file named Filtered_Items_List_EN.txt containing tab-separated values.
  • Do not forget to unzip your file and save it for example in your user folder ‘C:\Users\your user name\Filtered_Items_List_EN.txt’. 
  • Example :

    Download sample file that contains the listing as selected in above screenshot.


Remark

The complete list of datasets can be downloaded by using ‘Download full list of items (txt)’ as used in next example.


Step 2 : download the data files

Now that the list of datasets is ready, the script available below has to be run in Windows Powershell. This tool should be available to most PC users in version 5 by default, and available for all platform since version 7.

Running powershell command on Windows

  • To run it, use the search of your Windows, type PowerShell and run the App.
  • Copy / paste the script below into PowerShell window and type ‘Enter’ to execute it
  • As a result, the two files (from the above example) will be copied in the current folder that will be your user directory by default ‘C:\Users\your user name’

For further information please consult Powershell online documentation

2.1 Uncompressed files

By connecting a few Powershell cmdlets (Import-CSVForEach-Object and WebClient.DownloadFile), one can easily download the wanted content from the specified URL (SDMX-CSV in this example) to the current a folder.

Import-CSV .\Filtered_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
}

Please note that this example uses the CODE value and the "Data download url (csv)" value.    
By adapting the script, you will be able to download any format proposed in your ‘Filtered_Items_List_EN.txt’:

  • Data download url (tsv)
  • Data download url (sdmx)

2.2 Compressed files

To save bandwidth and disk space you could retrieve the GZ-compressed files by adding &compressed=true parameter to the download URL

Below extends previous example retrieving SDMX-CSV data, now as compressed GZIP files.

Import-CSV .\Filtered_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile("$($_."Data download url (csv)")&compressed=true", "$pwd\$($_.Code).csv.gz")
}

2.3 Compressed files and then uncompressed locally

In case you need the uncompressed files, you can still optimise network transfer and use a powershell function to uncompress the received files.

# from https://securitytidbits.wordpress.com/2017/04/14/powershell-and-gzip-compression/comment-page-1/
Function Gunzip-File([ValidateScript({Test-Path $_})][string]$File){
 
    $srcFile = Get-Item -Path $File
    $newFileName = Join-Path -Path $srcFile.DirectoryName -ChildPath $srcFile.BaseName
 
    try
    {
        $srcFileStream = New-Object System.IO.FileStream($srcFile.FullName,([IO.FileMode]::Open),([IO.FileAccess]::Read),([IO.FileShare]::Read))
        $dstFileStream = New-Object System.IO.FileStream($newFileName,([IO.FileMode]::Create),([IO.FileAccess]::Write),([IO.FileShare]::None))
        $gzip = New-Object System.IO.Compression.GZipStream($srcFileStream,[System.IO.Compression.CompressionMode]::Decompress)
        $gzip.CopyTo($dstFileStream)
    } 
    catch
    {
        Write-Host "$_.Exception.Message" -ForegroundColor Red
    }
    finally
    {
        $gzip.Dispose()
        $srcFileStream.Dispose()
        $dstFileStream.Dispose()
    }
}

Import-CSV .\Filtered_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile("$($_."Data download url (csv)")&compressed=true", "$pwd\$($_.Code).csv.gz")
  Gunzip-File "$pwd\$($_.Code).csv.gz"
  Remove-Item "$pwd\$($_.Code).csv.gz"
}

Example 2: periodically downloading a bigger set of datasets

This example is similar to the first one but the dataset selection is now part of the script and not done in the Data Browser interface using the ‘Download filtered list of items (txt)’.

The script is simply based on the file ‘Download full list of items (txt)’ that has to be downloaded first.

Pre-condition : having the full list of datasets

Along with your download in Data Browser, you can request to receive the full list of items

You will find in your archive a file named Full_Items_List_EN.txt that contains tab-separated values. This is the same format as the filtered one used in previous example.

Execution : select and download the data files

To define the datasets to download, you can apply a condition on the code. In below example all the TPS* datasets will be downloaded in one go.

Import-CSV .\Full_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  If($_.Code.StartsWith("TPS")) { # download only TPS datasets
    (New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
  }
}

Depending on your need you could further develop your script.

Please be cautious to verify

  1. the content of the input file
  2. a condition is set within the script
  3. the compress option is enabled for huge requests to optimise network traffic

Otherwise you could easily start downloading the complete scope of Eurostat data, that would require a large amount of disk space.

Example 3: keeping up to date for a single dataset in Node.js

To know if a dataset contains data changes one could check the date via its product page (example for TPS00001), the list in the "Downloads" interface (see example1) or by consulting the table of content.

Another way is to get this information from the SDMX Dataflow artefact associated to the dataset. This artefact can be retrieved from API in either format

Using JSON can simplify the parsing/processing for users who are familiar to Javascript / Node.js

The example below example is a node script to conditionally download data files for one dataset since a specified date and time.

(warning) To execute this script :


#!/usr/bin/env node

// library to make HTTP calls
const fetch = require('node-fetch');
// library to write to the filesystem
const fs = require('fs');
 
//util method to write to the filesystem
const downloadFile = (async (url, path) => {
  const res = await fetch(url);
  const fileStream = fs.createWriteStream(path);
  await new Promise((resolve, reject) => {
    res.body.pipe(fileStream);
    res.body.on("error", reject);
    fileStream.on("finish", resolve);
  });
});
 
//last time program was run (you should change this value or make it a parameter of your script)
const lastCheck = '2020-09-19T18:00';
// online data code to lookup (you should change this value or make it a parameter of your script)
const onlineDataCode = "T2020_10";
 
(async () => {
  //retrieve information on dataset in JSON
  const response = await fetch('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/' + onlineDataCode + '?format=json');
  const json = await response.json();
   
  //extract last data udpate date
  var lastUpdateDate = null;
  for (var i = 0 ; i < json.extension.annotation.length; i++) {
    //console.log(json.extension.annotation[i]);
    if (json.extension.annotation[i].type == "UPDATE_DATA")
      lastUpdateDate = json.extension.annotation[i].date;
    }
   
  //compare last data update date with our own
  if (Date.parse(lastUpdateDate) > Date.parse(lastCheck)) {
    console.log(onlineDataCode + " need update " + " last updated at " + lastUpdateDate);
    //download needed format (remove the line you do not need)
    console.log("Downloading data in TSV for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '?format=TSV', onlineDataCode +'.tsv');
    console.log("Downloading data in SDMX-CSV for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '?format=SDMX-CSV', onlineDataCode +'.sdmx.csv');
    console.log("Downloading data in SDMX-ML 2.1 for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode , onlineDataCode +'.sdmx.xml');
    console.log("Downloading structure in SDMX-ML 2.1 for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/' + onlineDataCode +'?references=descendants&details=referencepartial', onlineDataCode +'.sdmx.structure.xml');

    // or extract specific time-series in json-stat to feed visualisations..
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '/A.....FR+LU+DE?format=JSON', onlineDataCode +'_FR_LU_DE.json');
    // or use Eurostat statistics query format, because SDMX REST have may be less readable depending the query (may have too many dots, need to know the order of dims)
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/' + onlineDataCode + '?geo=FR&geo=LU&geo=DE', onlineDataCode +'_FR_LU_DE.stats.json');
  } else {
    console.log(onlineDataCode + " already up to date " + " last updated at " + lastUpdateDate);
  }
 
})();




  • No labels
_