Page tree

The data supplied to Eurostat are validated according to the following procedure:

Figure 59 – Validation process (GSBPM notation)  


Step 4.3 is the first sub process of GSBPM where validation checks are done. Those checks are purely related to one instance of a dataset. Step 5.3 is the part of the process where a level 2 validation takes place. In GSBPM this sub-process is specifically referred to validation, it is in fact named 'review & validate'. This sub-process examines data to try to identify potential problems, errors and discrepancies. It can also be referred to as input data validation. At this stage of the process the new data file is checked against the corresponding time series. The new data are checked using predefined validation rules in a set order. In case problems are found, suspicious or erroneous data are marked for manual inspection. At this stage it is also checked whether all data for the reference year were reported, i.e. a check for completeness. Step 6.2 is named 'Validate outputs'. In this sub-process statisticians validate the quality of the outputs produced in accordance with a general quality framework and with expectations.

Structural validation (STRUVAL)

Once the data arrives in Eurostat's input hall they are verified against the defined SDMX files. The input hall is not visible to NSI's, but a report will be sent in response to a data delivery. The data files are only are accepted and sent for content validation when they are syntactically correct and well formed. This corresponds to a level 0 structural validation. The Structural Validation Service (STRUVAL) performs the structural validation of statistical data files based on a set of pre-defined validation rules, contained in a Data Structure Definition (DSD). Structural validation performed by STRUVAL is the first step within a sequence of automated data validation activities conducted by Eurostat before statistical processing and dissemination of the collected data. The STRUVAL service returns a validation report to the data provider listing failures detected in the dataset for correction before resubmission. The service verifies

  • that the transmitted file is an accepted and processable format (SDMX-ML, SDMX-CSV);
  • that the dataset contains the structures as defined in the DSD, including dataflow definition, code lists, concepts, key families and constraints;
  • that the values contained within the dataset follow basic requirements defined in terms of completeness, data format, data consistency and constraints applied.

Content validation (CONVAL)

Closely linked is a level 1 validation, which is a basic content validation (the EDIT tool is used). There a basic checking of the records within the data file is done. Firstly a semantic check of the records itself is made. Then a set of validation rules for an intra-file check is applied.

Figure 60 – Schematic of the input hall (green highlights are items visible to NSI)

 In practice, this is an iterative process, and only after passing all the structure and content steps can the data start to be processed in order to produce dissemination products.

Completeness

The completeness of the file is verified.

Codes

Codes used for categorical fields are checked against the list of valid codes given in this manual.

Coverage

The thresholds defined a priori for IFS (see 2.2 Coverage) are checked against the values delivered. In-scope holdings are those that meet at least one of the conditions mentioned in Table 16 – Thresholds according to Annex II of Regulation (EU) 20181091.

Table 24 – Thresholds according to Annex II of Regulation (EU) 2018/1091

Item

Threshold

Utilised agricultural area UAA = UAATUAAS

5 ha

Arable land ARA = ARATARAS = (C0000T P0000T R1000T R2000T R9000T I0000T V0000_S0000T N0000T G0000T E0000T ARA99T Q0000T) (V0000_S0000S N0000S ARA09S)

2 ha

Potatoes R1000T

0.5 ha

Fresh vegetables and strawberries V0000_S0000T

0.5 ha

Aromatic, medicinal and culinary plants, flowers and ornamental plants, seeds and seedlings, nurseries I5000T N0000T E0000T L0000T

0.2 ha

Fruit trees, berries, nut trees, citrus fruit trees, other permanent crops excluding nurseries, excluding vineyards and excluding olive trees F0000T T0000T PECR9_H9000T

0.3 ha

Vineyards W1000T

0.1 ha

Olive trees O1000T

0.3 ha

Greenhouses UAAS

100 m2

Cultivated mushrooms U1000

100 m2

Livestock A2010 0.4 A2020 0.7 A2130 1 A2230 0.8 A2300F 1 A2300G 0.8 A4100 0.1 A4200 0.1 A3110 0.027 A3120 0.5 A3130 0.3 A5140 0.007 A5110O 0.014 A5230 0.03 A5210 0.01 A5220 0.02 A5410 0.35 A5240_5300 0.001 A6111 0.02

1.7 livestock units

NSNE variables

The coherence between reported NSNE variables and the microdata file is checked. The validation rules for the cross validation are presented in Annex V.

Geographic compliance

Cross checks for the geographic units are run (REF_AREA vs. REGION vs. GEO_LCT)

Aggregations

Aggregated results are checked for plausibility.

Time series consistency

Aggregated results are compared by Eurostat against FSS and IFS data from previous surveys.

Cross domain validation

Aggregated results are compared to crop statistics data for the same survey year. Important differences will merit further investigation or justification. Data Suppliers are asked to check those differences (for example with more than 10% relative differences as first indication) before transmitting data to Eurostat.