Child pages
  • API - Examples - periodic data download

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • first the download program needs to know which datasets you are interested in;
  • then you have to launch the appropriate script in Powershell PowerShell to download them.

Step 1 : retrieve the list of your (selected) datasets

This step will helps you to retrieve a file named Filtered_Items_List_EN.txt that contains tab-separated values. See detailed description of this file structure

Such delimited-separator file is easy to use in scripting languages.

Step-by-step:

  • Go to the Data Browser Downloads page
  • Select the datasets you are interested in. In the example below: AACT_ALI01 and AACT_EAA01
  • In ‘Download operations’‘Bulk actions’, select ‘Download filtered list of items (txt)’ selection list’ as shown in the screenshot below, 

Image RemovedImage Added

  • You will get a zip file with a text file named FilteredSelection_Items_List_EN.txt containing tab-separated values. See the detailed description of this file structure.
  • Do not forget to unzip your file and save it for example in your user folder ‘C:\Users\your user name\FilteredSelection_Items_List_EN.txt’. 
  • Example :

You will find in the downloaded archive a file named Filtered_Items_List_EN.txt that contains tab-separated values. See detailed description of this file structure

...

Such delimited-separator file is easy to use in scripting languages.


Infoinfo
titleRemark

The complete list of datasets can be downloaded by using ‘Download full list of items (txt)’ inventory’ as used in next example.

Image RemovedImage Added


Step 2 : download the data files

...

By connecting a few Powershell cmdlets (Import-CSVForEach-Object and WebClient.DownloadFile), one can easily you should be able to download the wanted desired content from the specified URL (SDMX-CSV in this example) to the current a folder.

Code Block
languagepowershell
Import-CSV .\FilteredSelection_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
}

Please note that this example uses the CODE value and the "Data download url (csv)" value.    
By adapting the code (replace the part highlighted in red)script, you will be able to download any format proposed in your ‘Filtered_Items_List_EN.txt’:

...

Code Block
languagepowershell
Import-CSV .\FilteredSelection_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile("$($_."Data download url (csv)")&compressed=true", "$pwd\$($_.Code).csv.gz")
}

...

Code Block
languagepowershell
# from https://securitytidbits.wordpress.com/2017/04/14/powershell-and-gzip-compression/comment-page-1/
Function Gunzip-File([ValidateScript({Test-Path $_})][string]$File){
 
    $srcFile = Get-Item -Path $File
    $newFileName = Join-Path -Path $srcFile.DirectoryName -ChildPath $srcFile.BaseName
 
    try
    {
        $srcFileStream = New-Object System.IO.FileStream($srcFile.FullName,([IO.FileMode]::Open),([IO.FileAccess]::Read),([IO.FileShare]::Read))
        $dstFileStream = New-Object System.IO.FileStream($newFileName,([IO.FileMode]::Create),([IO.FileAccess]::Write),([IO.FileShare]::None))
        $gzip = New-Object System.IO.Compression.GZipStream($srcFileStream,[System.IO.Compression.CompressionMode]::Decompress)
        $gzip.CopyTo($dstFileStream)
    } 
    catch
    {
        Write-Host "$_.Exception.Message" -ForegroundColor Red
    }
    finally
    {
        $gzip.Dispose()
        $srcFileStream.Dispose()
        $dstFileStream.Dispose()
    }
}

Import-CSV .\FilteredSelection_Items_List_EN.txt -Delimiter "`t" | ForEach-Object {
  (New-Object System.Net.WebClient).DownloadFile("$($_."Data download url (csv)")&compressed=true", "$pwd\$($_.Code).csv.gz")
  Gunzip-File "$pwd\$($_.Code).csv.gz"
  Remove-Item "$pwd\$($_.Code).csv.gz"
}

...

This example is similar to the first one but the dataset selection is now part of the script and not done in the Data Browser interface using the ‘Download filtered list of items (txt)’selection list’.

The script is simply based on the file ‘Download full list of items (txt)’ inventory’ that has to be downloaded first.

Pre-condition : having the full list of datasets

Along with your download in Data Browser, From the Downloads section you can request to receive the full list of items,  via Download inventory

Image RemovedImage Added

You will find in your archive download a file named Full_Items_List_ENinventory.txt that contains tab-separated values. This is the same format as the filtered one used in previous example.

...

Execution : select and download the data files

To define the datasets to download, you can apply a condition on the code. In below example all the TPS* datasets will be downloaded in one go.

...

Depending on your need you could further develop your script., for example start with automating teh download of inventory.txt from https://ec.europa.eu/eurostat/api/dissemination/files/inventory?type=data&lang=en

Warning

Please

Warning

Please be cautious to verify

  1. the content of the input file
  2. a condition is set within the script
  3. the compress option is enabled for huge requests to optimise network traffic
Otherwise you could easily start downloading the complete scope of Eurostat data, that would require a large amount of disk space.

Otherwise you could easily start downloading the complete scope of Eurostat data, that would require a large amount of disk space.

Alternate execution : select and download the data files that are more recent than a specified date

In case you download data once a week / once a month, you could skip data without updates since your last execution time (to be set manually in the script)

Code Block
Import-CSV .\inventory.txt -Delimiter "`t" | ForEach-Object {
  If($_.Code.StartsWith("TPS")) { # download only TPS datasets
    $compareDate = [datetime]::ParseExact('31-05-2024', 'dd-MM-yyyy', $null)
	$datasetDate = [datetime]::ParseExact($_."Last data change", 'yyyy-MM-d\THH:mm:ssK', $null)
	If($datasetDate -ge $compareDate) { # download only dataset with last update date after
		Write-Host Download $_.Code because last data change $datasetDate is more recent than $compareDate
		(New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
	}
  }
}

Extension: validate and re-download in case of API async response

Even if data files are prepared in advance,  it may seldom happen that instead of data an "API async response" is present in the final file.

Code Block
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/">
    <env:Header/>
    <env:Body>
        <ns0:syncResponse xmlns:ns0="http://estat.ec.europa.eu/disschain/soap/extraction">
            <processingTime>412</processingTime>
            <queued>
                <id>98de05ea-540a-43d3-903b-7c9e14faf808</id>
                <status>SUBMITTED</status>
            </queued>
        </ns0:syncResponse>
    </env:Body>
</env:Envelope>

Please see API - Detailed guidelines - Asynchronous API for more details.


Following script is proposed to validate your downloaded files 

(warning) This script takes the assumption that you are working in an initially empty folder where all the CSV files present are from a download ran recently.

(info) It is proposed to save this script in a powershell executable file (extension ps1) so you can run and see its output in the windows PowerShell terminal

Code Block
languagepowershell
titlecheck.ps1
$folderPath = "." # work in current folder
$fileExtension= "*.csv" # scan files with csv extensions"
$pattern = "^<env:Envelope" #search for file with particular async response pattern in first line"
$inventoryFile = "inventory.txt"

# Get all files of specified extension in the specified directory
$files = Get-ChildItem -Path $folderPath -Filter $fileExtension

# Display count in terminal
Write-Host Searching in $files.Length files for API async responses

# Initialize an empty array to store matching filenames
$matchingDatasetCodes = @()

foreach ($file in $files) {
    # Read the first line of the file
    $firstLine = Get-Content $file.FullName -First 1

    # Check if the first line matches the pattern
    if ($firstLine -match $pattern) {
		# collect dataset code
        $matchingDatasetCodes += [System.IO.Path]::GetFileNameWithoutExtension($file)
    }
}

# Uncomment next line to Write the matching dataset codes to a text file
# $matchingDatasetCodes | Out-File -FilePath "Output.txt"

# Report in terminal about matching dataset codes
Write-Host Found $matchingDatasetCodes.Length with API async responses

# Redo download for matching dataset codes
Import-CSV .\$inventoryFile -Delimiter "`t" | ForEach-Object {
  If($matchingDatasetCodes -contains $_.Code) {
	Write-Host Re-downloading $_.Code
    (New-Object System.Net.WebClient).DownloadFile($_."Data download url (csv)", "$pwd\$($_.Code).csv")
  }
}

Example 3: keeping up to date for a single dataset in Node.js

...


Code Block
languagejs
linenumberstrue
#!/usr/bin/env node

// library to make HTTP calls
const fetch = require('node-fetch');
// library to write to the filesystem
const fs = require('fs');
 
//util method to write to the filesystem
const downloadFile = (async (url, path) => {
  const res = await fetch(url);
  const fileStream = fs.createWriteStream(path);
  await new Promise((resolve, reject) => {
    res.body.pipe(fileStream);
    res.body.on("error", reject);
    fileStream.on("finish", resolve);
  });
});
 
//last time program was run (you should change this value or make it a parameter of your script)
const lastCheck = '2020-09-19T18:00';
// online data code to lookup (you should change this value or make it a parameter of your script)
const onlineDataCode = "T2020_10";
 
(async () => {
  //retrieve information on dataset in JSON
  const response = await fetch('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/' + onlineDataCode + '?format=json');
  const json = await response.json();
   
  //extract last data udpate date
  var lastUpdateDate = null;
  for (var i = 0 ; i < json.extension.annotation.length; i++) {
    //console.log(json.extension.annotation[i]);
    if (json.extension.annotation[i].type == "UPDATE_DATA")
      lastUpdateDate = json.extension.annotation[i].date;
    }
   
  //compare last data update date with our own
  if (Date.parse(lastUpdateDate) > Date.parse(lastCheck)) {
    console.log(onlineDataCode + " need update " + " last updated at " + lastUpdateDate);
    //download needed format (remove the line you do not need)
    console.log("Downloading data in TSV for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '?format=TSV', onlineDataCode +'.tsv');
    console.log("Downloading data in SDMX-CSV for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '?format=SDMX-CSV', onlineDataCode +'.sdmx.csv');
    console.log("Downloading data in SDMX-ML 2.1 for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode , onlineDataCode +'.sdmx.xml');
    console.log("Downloading structure in SDMX-ML 2.1 for " + onlineDataCode );
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/' + onlineDataCode +'?references=descendants&details=referencepartial', onlineDataCode +'.sdmx.structure.xml');

    // or extract specific time-series in json-stat to feed visualisations..
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/' + onlineDataCode + '/A.....FR+LU+DE?format=JSON', onlineDataCode +'_FR_LU_DE.json');
    // or use Eurostat statistics query format, because SDMX REST have may be less readable depending the query (may have too many dots, need to know the order of dims)
    downloadFile('https://ec.europa.eu/eurostat/api/dissemination/statistics/1.0/data/' + onlineDataCode + '?geo=FR&geo=LU&geo=DE', onlineDataCode +'_FR_LU_DE.stats.json');
  } else {
    console.log(onlineDataCode + " already up to date " + " last updated at " + lastUpdateDate);
  }
 
})();

...