Child pages
  • API - Examples - periodic data download

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 20 Next »

Context

This page provides examples of data download requests via API links in order to support recurrent data retrievals.

When a user would find himself periodically repeating the download of some datasets within the Data Browser "Downloads" interface, he could automate it thanks to a scripting solution of his choice.

Below examples are given to help out towards automation. Feel free to apply the same logic in the scripting language of your choice.

Example 1 : periodically downloading a small set of datasets

This is done in two steps:

  • first the download program needs to know which datasets you are interested in;
  • then you have to launch the appropriate script in PowerShell to download them.

Step 1 : retrieve the list of your (selected) datasets

This step will helps you to retrieve a file named Filtered_Items_List_EN.txt that contains tab-separated values. See detailed description of this file structure

Such delimited-separator file is easy to use in scripting languages.

Step-by-step:

  • Go to the Data Browser Downloads page
  • Select the datasets you are interested in. In the example below: AACT_ALI01 and AACT_EAA01
  • In ‘Bulk actions’, select ‘Download selection list’ as shown in the screenshot below, 

  • You will get a zip file with a text file named Selection_Items_List_EN.txt containing tab-separated values.
  • Do not forget to unzip your file and save it for example in your user folder ‘C:\Users\your user name\Selection_Items_List_EN.txt’. 
  • Example :

    Download sample file that contains the listing as selected in above screenshot.


Remark

The complete list of datasets can be downloaded by using ‘Download inventory’ as used in next example.


Step 2 : download the data files

Now that the list of datasets is ready, the script available below has to be run in Windows Powershell. This tool should be available to most PC users in version 5 by default, and available for all platform since version 7.

Running powershell command on Windows

  • To run it, use the search of your Windows, type PowerShell and run the App.
  • Copy / paste the script below into PowerShell window and type ‘Enter’ to execute it
  • As a result, the two files (from the above example) will be copied in the current folder that will be your user directory by default ‘C:\Users\your user name’

For further information please consult Powershell online documentation

2.1 Uncompressed files

By connecting a few Powershell cmdlets (Import-CSVForEach-Object and WebClient.DownloadFile), you should be able to download the desired content from the specified URL (SDMX-CSV in this example) to the current folder.

Import-CSV .\Selection_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
}

Please note that this example uses the CODE value and the "Data download url (csv)" value.    
By adapting the script, you will be able to download any format proposed in your ‘Filtered_Items_List_EN.txt’:

  • Data download url (tsv)
  • Data download url (sdmx)

2.2 Compressed files

To save bandwidth and disk space you could retrieve the GZ-compressed files by adding &compressed=true parameter to the download URL

Below extends previous example retrieving SDMX-CSV data, now as compressed GZIP files.

Import-CSV .\Selection_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile("$($_."Data download url (csv)")&compressed=true", "$pwd\$($_.Code).csv.gz")
}

2.3 Compressed files and then uncompressed locally

In case you need the uncompressed files, you can still optimise network transfer and use a powershell function to uncompress the received files.

# from https://securitytidbits.wordpress.com/2017/04/14/powershell-and-gzip-compression/comment-page-1/
Function Gunzip-File([ValidateScript({Test-Path $_})][string]$File){
 
    $srcFile = Get-Item -Path $File
    $newFileName = Join-Path -Path $srcFile.DirectoryName -ChildPath $srcFile.BaseName
 
    try
    {
        $srcFileStream = New-Object System.IO.FileStream($srcFile.FullName,([IO.FileMode]::Open),([IO.FileAccess]::Read),([IO.FileShare]::Read))
        $dstFileStream = New-Object System.IO.FileStream($newFileName,([IO.FileMode]::Create),([IO.FileAccess]::Write),([IO.FileShare]::None))
        $gzip = New-Object System.IO.Compression.GZipStream($srcFileStream,[System.IO.Compression.CompressionMode]::Decompress)
        $gzip.CopyTo($dstFileStream)
    } 
    catch
    {
        Write-Host "$_.Exception.Message" -ForegroundColor Red
    }
    finally
    {
        $gzip.Dispose()
        $srcFileStream.Dispose()
        $dstFileStream.Dispose()
    }
}

Import-CSV .\Selection_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile("$($_."Data download url (csv)")&compressed=true", "$pwd\$($_.Code).csv.gz")
  Gunzip-File "$pwd\$($_.Code).csv.gz"
  Remove-Item "$pwd\$($_.Code).csv.gz"
}

Example 2: periodically downloading a bigger set of datasets

This example is similar to the first one but the dataset selection is now part of the script and not done in the Data Browser interface using the ‘Download selection list’.

The script is simply based on the file ‘Download inventory’ that has to be downloaded first.

Pre-condition : having the full list of datasets

From the Downloads section you can request to receive the full list of items, via Download inventory

You will download a file named inventory.txt that contains tab-separated values. This is the same format as the filtered one used in previous example.

Execution : select and download the data files

To define the datasets to download, you can apply a condition on the code. In below example all the TPS* datasets will be downloaded in one go.

Import-CSV .\Full_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  If($_.Code.StartsWith("TPS")) { # download only TPS datasets
    (New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
  }
}

Depending on your need you could further develop your script, for example start with automating teh download of inventory.txt from https://ec.europa.eu/eurostat/api/dissemination/files/inventory?type=data&lang=en

Please be cautious to verify

  1. the content of the input file
  2. a condition is set within the script
  3. the compress option is enabled for huge requests to optimise network traffic

Otherwise you could easily start downloading the complete scope of Eurostat data, that would require a large amount of disk space.

Alternate execution : select and download the data files that are more recent than a specified date

In case you download data once a week / once a month, you could skip data without updates since your last execution time (to be set manually in the script)

Import-CSV .\inventory.txt -Delimiter "`t" | ForEach-Object {
  If($_.Code.StartsWith("TPS")) { # download only TPS datasets
    $compareDate = [datetime]::ParseExact('31-05-2024', 'dd-MM-yyyy', $null)
	$datasetDate = [datetime]::ParseExact($_."Last data change", 'yyyy-MM-d\THH:mm:ssK', $null)
	If($datasetDate -ge $compareDate) { # download only dataset with last update date after
		Write-Host Download $_.Code because last data change $datasetDate is more recent than $compareDate
		(New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
	}
  }
}

Extension: validate and re-download in case of API async response

Even if data files are prepared in advance,  it may seldom happen that instead of data an "API async response" is present in the final file.

<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns0:syncResponse xmlns:ns0="http://estat.ec.europa.eu/disschain/soap/extraction">
            <processingTime>412</processingTime>
            <queued>
                <id>98de05ea-540a-43d3-903b-7c9e14faf808</id>
                <status>SUBMITTED</status>
            </queued>
        </ns0:syncResponse>
    </env:Body>
</env:Envelope>

Please see https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Asynchronous+API for more details.

Following script is proposed to validate your downloaded files 

(warning) This script takes the assumption that you are working in an initially empty folder where all the CSV files present are from a download ran recently.

(info) It is proposed to save this script in a powershell executable file (extension ps1) so you can run and see its output in the windows PowerShell terminal

check.ps1
$folderPath = "." # work in current folder
$fileExtension= "*.csv" # scan files with csv extensions"
$pattern = "^<env:Envelope" #search for file with particular async response pattern in first line"
$inventoryFile = "inventory.txt"

# Get all files of specified extension in the specified directory
$files = Get-ChildItem -Path $folderPath -Filter $fileExtension

# Display count in terminal
Write-Host Searching in $files.Length files for API async responses

# Initialize an empty array to store matching filenames
$matchingDatasetCodes = @()

foreach ($file in $files) {
    # Read the first line of the file
    $firstLine = Get-Content $file.FullName -First 1

    # Check if the first line matches the pattern
    if ($firstLine -match $pattern) {
		# collect dataset code
        $matchingDatasetCodes += [System.IO.Path]::GetFileNameWithoutExtension($file)
    }
}

# Uncomment next line to Write the matching dataset codes to a text file
# $matchingDatasetCodes | Out-File -FilePath "Output.txt"

# Report in terminal about matching dataset codes
Write-Host Found $matchingDatasetCodes.Length with API async responses

# Redo download for matching dataset codes
Import-CSV .\$inventoryFile -Delimiter "`t" | ForEach-Object {
  If($matchingDatasetCodes -contains $_.Code) {
	Write-Host Re-downloading $_.Code
    (New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
  }
}

Example 3: keeping up to date for a single dataset in Node.js

To know if a dataset contains data changes one could check the date via its product page (example for TPS00001), the list in the "Downloads" interface (see example1) or by consulting the table of content.

Another way is to get this information from the SDMX Dataflow artefact associated to the dataset. This artefact can be retrieved from API in either format

Using JSON can simplify the parsing/processing for users who are familiar to Javascript / Node.js

The example below example is a node script to conditionally download data files for one dataset since a specified date and time.

(warning) To execute this script :


#!/usr/bin/env node

// library to make HTTP calls
const fetch = require('node-fetch');
// library to write to the filesystem
const fs = require('fs');
 
//util method to write to the filesystem
const downloadFile = (async (url, path) => {
  const res = await fetch(url);
  const fileStream = fs.createWriteStream(path);
  await new Promise((resolve, reject) => {
    res.body.pipe(fileStream);
    res.body.on("error", reject);
    fileStream.on("finish", resolve);
  });
});
 
//last time program was run (you should change this value or make it a parameter of your script)
const lastCheck = '2020-09-19T18:00';
// online data code to lookup (you should change this value or make it a parameter of your script)
const onlineDataCode = "T2020_10";
 
(async () => {
  //retrieve information on dataset in JSON
  const response = await fetch('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/' + onlineDataCode + '?format=json');
  const json = await response.json();
   
  //extract last data udpate date
  var lastUpdateDate = null;
  for (var i = 0 ; i < json.extension.annotation.length; i++) {
    //console.log(json.extension.annotation[i]);
    if (json.extension.annotation[i].type == "UPDATE_DATA")
      lastUpdateDate = json.extension.annotation[i].date;
    }
   
  //compare last data update date with our own
  if (Date.parse(lastUpdateDate) > Date.parse(lastCheck)) {
    console.log(onlineDataCode + " need update " + " last updated at " + lastUpdateDate);
    //download needed format (remove the line you do not need)
    console.log("Downloading data in TSV for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '?format=TSV', onlineDataCode +'.tsv');
    console.log("Downloading data in SDMX-CSV for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '?format=SDMX-CSV', onlineDataCode +'.sdmx.csv');
    console.log("Downloading data in SDMX-ML 2.1 for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode , onlineDataCode +'.sdmx.xml');
    console.log("Downloading structure in SDMX-ML 2.1 for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/' + onlineDataCode +'?references=descendants&details=referencepartial', onlineDataCode +'.sdmx.structure.xml');

    // or extract specific time-series in json-stat to feed visualisations..
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '/A.....FR+LU+DE?format=JSON', onlineDataCode +'_FR_LU_DE.json');
    // or use Eurostat statistics query format, because SDMX REST have may be less readable depending the query (may have too many dots, need to know the order of dims)
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/' + onlineDataCode + '?geo=FR&geo=LU&geo=DE', onlineDataCode +'_FR_LU_DE.stats.json');
  } else {
    console.log(onlineDataCode + " already up to date " + " last updated at " + lastUpdateDate);
  }
 
})();




  • No labels
This online help is outdated and will be deleted soon. Please consult our Data Browser user guide on the Eurostat website. Don’t forget to update any bookmarks you might have saved!