Page tree

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 46 Next »

Regulation (EU) 2018/1091 states that "the Commission is to respect the confidentiality of the data transmitted in line with Regulation (EC) No 2232009 of the European Parliament and of the Council. The necessary protection of confidentiality of data should be ensured, among other means, by limiting the use of the location parameters to spatial analysis of information and by appropriate aggregation when publishing statistics. For that reason a harmonised approach for the protection of confidentiality and quality aspects for data dissemination should be developed, while making efforts to render online access to official statistics easy and user-friendly".

Regulation (EC) No 223/2009 Article 3 'Confidential data' means data which allow a statistical unit (i.e. the person, company or organisation to which the data refers) to be identified, either directly or indirectly, thereby disclosing individual information. To determine whether a statistical unit is identifiable, account shall be taken of all relevant means that might reasonably be used by a third party to identify the statistical unit.

The risk of a statistical unit being identified is the only factor that qualifies data as confidential. It is not important which information is disclosed and if this information is sensitive or not. In this light, one cannot argue that some variables (e.g. crops, livestock) are less sensitive than others (labour force).

GDPR


On the 8th of February 2018 the Directors-General and Presidents of the National Statistical Institutes (NSIs) and of the European Union's statistical authority (Eurostat) met at an informal workshop on the implications of the GDPR in European statistics and the following conclusions were issued:

  1. acknowledged the high relevance of the GDPR implementation for the production of high quality official statistics and for maintaining the confidence of the respondents providing personal data for statistical purposes;
  2. recognised that in almost all Member States procedures have been initiated to enact derogations from the data subjects' rights referred to in some or all of the following Articles of the GDPR: 15 (access), 16 (rectification), 18 (restriction) and 21 (objection);
  3. agreed that the same derogations should apply across all statistical domains and should not be domain-specific;
  4. acknowledged that the NSIs and other statistical authorities (ONAs) are responsible for the protection of all personal data they process, both those collected in the framework of an EU regulation and those collected for purely national interests;
  5. noted that appropriate derogations in national law, when granted, could in the most cases be sufficient to effectively address the potential ramifications of the GDPR and the specific needs of the statistical production in each Member State;
  6. agreed that, in the interest of harmonising the protection of the data subjects' rights in the field of official statistics, additional uniform derogations at EU level, notably in Regulation 2232009, could be useful and should be considered once enough experience in the application of the GDPR has been collected; in this respect discussion at expert level should be organised at a later stage;
  7. agreed to share experience and best practice in addressing the implications of the GDPR for official statistics at the national level; to this end, a collaborative platform will be created by Eurostat to store and share examples of national provisions and justifications for derogations;
  8. emphasised the need to establish constructive dialogue with data protection authorities at national and European level in order to clarify the specificities of statistical production, including a better understanding of statistical methodology and existing safeguards.

Data storage and dissemination

To be developed

Confidentiality measures for microdata

See 9.5.3 - Scientific use files

Confidentiality measures for location

In the presence of geospatial data, disclosure control experts must face a paradox. On the one hand, such data need more protection because they allow more identification, and on the other hand they offer many possibilities for analysis, that users don't want to distort too much by suppressing data. Disclosure risk is higher when considering geospatial data:

  • firstly, because belonging to a geographical area may give information to the intruder about some attributes (e.g. 100 percent of inhabitants of a square are unemployed). This is called categorisation risk, and it increases in the case of spatial data because of Tobler's "first law of geography" which states that "everything interacts with everything, but two close objects are more likely to do so than two distant objects";
  • secondly, because of so-called identification risk. Indeed, among the characteristics shared with someone, a common geographic area leads to a higher probability of identifying the person (one probably knows better our neighbour than someone who one shares any other characteristic with). Moreover, identification of addresses has recently become possible with the development of open access tools like Google Street View. As a result, population density is a fundamental predictor of disclosure risk: the lower the density, the higher the disclosure risk. That is why confidentiality thresholds can differ between countries;
  • finally, disclosure risk can increase with the geographic differencing issue, when data is disseminated at different levels (hierarchical or not).

Technically, the dissemination classification (zoning, administrative boundaries, or regular tessellations such as grid squares) is a categorical variable like any another one (an additional dimension of tabular data). It is therefore possible to deal with disclosure risk with no geographical consideration. Nevertheless, a geographically intelligent management of disclosure issues will preserve the underlying spatial phenomenon. A risk-utility compromise has to be made, using relevant distortion indicators (EFGS & Eurostat, 2017). The risk for identifying holdings by crossing census gridded data with the proposed scientific use files (see 9.5.3 - Scientific use files) is close to zero. This is due to the following facts:

  • Gridded information does not include the number of holdings explaining each characteristic. Only aggregated number of holdings is provided
  • Gridded information is tabular information; it represents more than one holding; it is therefore treated with the standard method for disclosure control as any other tabulation, using the following algorithm:
    • If the value of the cell is explained by 4 or less holdings, or if more than 85% of the value is explained by 1 or 2 holdings, then the information is not disclosed
    • the minimum value that a user will observe is 10 holdings
    • the minimum observed total data in any variable that is not disclosed in the grids due to disclosure control represents 10% of the total of the variable in the EU (a strict disclosure control algorithm is used)
  • The method used for locating the holding has a high uncertainty. It is not guaranteed that the holding is actually located in the grid cell where it is shown. This is due to:
    • Coordinates are rounded to a 10km INSPIRE grid.
    • Holdings are represented as points, while they represent polygons; in the case of big farms, they are present in more than one grid, but only one X,Y coordinate pair represents the holding

Figure 65 – Agricultural holding density (number of farms per square Km of UAA)


In order to protect the confidentiality in case of very large holdings, when it is possible that only one farm exists in one of the cells of the grid, it will be possible to allocate the position of a farm to the nearest neighbouring cell with at least one other holding. If none of the 8 neighbouring cells (chosen in random order) has at least one other holding, the neighbouring locations have to be extended until a grid cell is located. As much as possible the chosen cell should be such that the location is within the same NUTS3 region of the original cell. A cell is considered to belong to a NUTS3 region if the lower-left coordinate is inside the polygon that defines the NUTS3 region at the 1:100.000 scale.

Figure 66 – If only one farm at a location, assign it to a random neighbouring cell within the same NUTS3; if still not possible, enlarge the area. 

Multi-resolution grids

Multi resolution grids are represented by a hierarchical structure through two associations. Each StatisticalGrid instance can be associated with a lower and/or an upper resolution grid through the Hierarchical relation association. A StatisticalGridCell belonging to a given StatisticalGrid is composed of the overlapping cells its grid's lower resolution grid, and composes the cell it overlaps in its grid's higher resolution grid. Lower and upper StatisticalGridCells are associated through the Hierarchical composition. Figure 71 – INSPIRE Grid

Source: https://inspire.ec.europa.eu/id/document/tg/su 

Confidentiality for tabular data

Eurostat disseminates a high number of statistical tables on its website and through specific user requests. All these tabular data are treated for primary confidentiality. Primary confidentiality concerns tabular cell data, whose dissemination would permit attribute disclosure. The two main reasons for declaring data to be primary confidential are: too few units in a cell and dominance of 1 or 2 units in a cell . These two reasons have been considered for FSS and for IFS. The following section presents the procedure used by Eurostat starting with 2020 IFS.

Procedure starting with 2020 IFS

The procedure consists of the following methods:

  1. threshold rule (suppression due to small counts): suppression of extrapolated number of holdings and of extrapolated aggregated values of variables describing those holdings;
  2. dominance rule (suppression due to dominance by one or two units): suppression of extrapolated number of holdings and of extrapolated aggregated values of variables describing those holdings;
  3. rounding: all non-confidential extrapolated number of holdings and extrapolated aggregated values of variables are rounded to the nearest multiple of 10.

The methods are applied in the order indicated above. A method is applied only if the data have not been already suppressed following the previous method. However if the data have been already suppressed following the previous method, applying the subsequent method should not make any difference to the results.

Notation

cell c

Any category or breakdown in which the records fall following application of one or more dimensions; in case of more dimensions, the cell is formed at the intersection of the dimensions (e.g. NUTS2 regions and economic size of holdings)

In Eurobase tables, the cell is formed at the intersection of the classifying dimensions.

wi

The extrapolation factor (the weight) of holding i in the sample of holdings nc falling into a specific cell c

xi

The value of variable x of the holding i in the sample of holdings nc falling into a specific cell c

x is a quantitative (numeric) variable (e.g. number of hectares of cereals)

A total  

An average:

The extrapolated aggregated value of variable x describing the holdings falling in cell c.

The majority of indicators published in tables are totals.

The formulae at left consider the case of a total and the case of an average.

In Eurobase tables, the extrapolated aggregated value of a variable is in the dimension 'unit'. This dimension is used for the computation of the indicators.

For a total:

For an average:

  • If the denominator of the average is holdings with xi>0

  • If the denominator of the average is holdings disregarding whether they have

The extrapolated number of holdings in cell c , whose values are contributing to the extrapolated aggregated value of variable x for that cell.

In the case of a total(Y), a holding value xi=0 does not contribute to it, therefore that holding (and the other holding(s) represented by that holding) are not counted in W.

In the case of an average (Y), there are two possibilities, depending on the 'definition' of the average Y:

  • If the average is computed by using only holdings with xi>0, then W is computed as for a total.
  • If the average is computed by using holdings with xi>0 and xi=0, then W is counting all holdings. This because holdings with xi=0 contribute (influence) the average.

W is rounded to 0 decimals.

XMAX

the highest value xi (the highest non-extrapolated value) of the holdings falling in cell c

XMAX2

the second highest value xi (the second highest non-extrapolated value) of the holdings falling in cell c

XMAX2 can be equal to XMAX

WMAX

the extrapolation factor (the weight) wi of XMAX

WMAX is rounded to 0 decimals.

WMAX2

the extrapolation factor (the weight) wi of XMAX2 

WMAX2 is rounded to 0 decimals.

YMAX

the extrapolated value WMAX×XMAX

YMAX2

the extrapolated value WMAX2×XMAX2

YMAX2 can be equal to YMAX

The Eurobase tables in SAS have the following "intermediary statistics" calculated for each cell:

WGT

equal to

W
(rounded to 0 decimals, as mentioned above)

TOTAL_WGT

equal to

The extrapolated number of holdings in cell c, irrespective of whether their values are contributing or not to the extrapolated aggregated value of variable x for that cell 
(not rounded)

WGT_HOLD1

equal to

WMAX

(rounded to 0 decimals, as mentioned above)

WGT_HOLD2

equal to

WMAX +WMAX2

(rounded to 0 decimals, as mentioned above)

HOLDING1

equal to

YMAX  * 100 / Y  

HOLDING2

equal to

(YMAX+YMAX2) * 100 / Y  

The following sections present the procedure for totals, as most indicators are totals. For averages, the procedure is the same, except that the definition W should be adjusted to include the holdings whose values are affecting the average (as described in details in the above table).

Cell suppression due to small counts

This method is also called "threshold rule" or "frequency rule".

Tables display aggregated values Y for variables x1, x2, …, xn calculated over certain population cells c1, c2, …, cn. The aggregates are calculated using the extrapolation factors.

For each total Y (for each aggregated extrapolated value of variable ), calculated for a cell c:

  • First, the programme computes the extrapolated number of holdings (W) which contribute to Y (i.e. which have a non-zero record value for x ) and rounds W to 0 decimals.
  • Then, if that extrapolated number of holdings is higher than 0 and lower than or equal to 4, the programme suppresses:
    • the extrapolated number of holdings (W) for cell c (if planned to be disseminated)  and
    • the extrapolated aggregated value of variable x(Y) for cell c .


As can be seen from above, what is evaluated is the pairs: extrapolated number of holdings contributing to variable x and aggregated extrapolated value of variable x. The result of the evaluation is that either both components of the pair are published or both components of the pairs are suppressed. However, tables also include the total extrapolated number of holdings (with no specific characteristic, such as contributing to variable x e.g. cultivating cereals where x is cereals). Also this total extrapolated number of holdings is also suppressed if it is > 0 AND ≤ 4.

The Eurobase tables in SAS present the standard FLAG_CODE=A for the cells that are to be suppressed because of the threshold rule. In the code list on "confidentiality status" defined by the SDMX Statistical Working Group ,, the flag A stands for "Primary confidentiality due to small counts" .


Examples from Eurobase tables in SAS:

FSS:

IFS:

Subsequently, SAS assigns the FLAG_USER c to the data to be suppressed.

The tables disseminated on Eurostat website and following user requests use only the standard flag :c.

Example on methodology:

Suppose that in the microdata, there are 3 sampled holdings (records) belonging to certain NUTS 2 region, farm type and farm size:

Holding identifier

wi (extrapolation factor)

xi (cereals)

yi=wi ×xi

1

2

430

860

2

3

0

0

3

2

10

20

Using the microdata, a table displays the aggregated value of several variables for this breakdown (cell). One of the variables is cereals (variable x). The total aggregated value for cereals is:

Y=860+0+20=880. We have to decide whether to disseminate or suppress this value.

Only 2 sampled holdings cultivate cereals.

Let's calculate W=2+2=4. Only 4 population holdings contribute to the total Y=880 of the cell.

As W=4, the value Y=880 is suppressed and flagged confidential in the table.

If the extrapolated number of holdings with cereals (W=4) is planned to be disseminated in the table, this number is also suppressed and flagged confidential.

Remarks:

  • The method suppresses data when the number of holdings in the population is less than or equal to 4 compared to other domains where the suppression is applied when that number is less than or equal to 3. This would overprotect the data in our domain.
  • The method suppresses data depending on the extrapolated number of holdings and not on the number of holdings in the sample.

The reason is that the data are about the population in the cell and the purpose is to protect the population data in the cell. Users do not know how many units are in the sample of the cell and the sample size is not important, because the data concerns the population of the cell. If for example, 3 sampled holdings with flowers represent 20 population holdings with flowers in the cell, then the method does not protect the total number of hectares of flowers which is associated to the 20 holdings (the threshold is 4 and 20 is higher than 4). The method does not and should not unnecessarily protect the data (which can occur in the case of a high number of population holdings corresponding to a low number of sampled holdings).

  • The method suppresses data depending on the number of holdings that contribute to the aggregated value (for totals, having a nonzero value) of a variable in a cell.

In the above numeric example, 7 population holdings fall in a certain breakdown, but only 4 population holdings of that breakdown cultivate cereals. The method protects the total number of hectares of cereals associated to the 4 holdings with cereals. The table may be designed to display 7 population holdings in the cell or 4 population holdings with cereals in the cell, depending for example on whether the table covers only the holdings cultivating cereals or additionally covers other aspects of agriculture. If the table is designed to display 4 as number of holdings with cereals, this number is suppressed. It is noted that even if x is defined as a quantitative variable, the treatment implicitly covers also for variables which are counts (number of holdings with certain variables higher than 0 e.g. cereals).

Cell suppression due to dominance by one or two units

This method is also called "dominance rule". In agricultural statistics, the distribution of a variable is often skewed: big farms are fewer than small farms and within a particular cell, 1 or 2 farms might be dominant. This would make it easy to disclose the information on the dominant farm with a high level of accuracy. That is why cells with dominant farms are confidential.

For each total Y (for each aggregated extrapolated value of variable ), calculated for a cell c:

  • First, the programme computes the extrapolated number of holdings (W) which contribute to Y (i.e. which have a non-zero record value for x ) and rounds W to 0 decimals;
  • If W> 4, then the programme:
    • sorts the records by the values xi, , names the highest value xi by XMAX , its corresponding weight by WMAX, the second highest value by XMAX2 and its corresponding weight by WMAX2
    • rounds WMAX and WMAX2 to 0 decimals
  • and if 

    ( XMAX×WMAX where WMAX≤2 )

    OR

    ( XMAX x WMAX + XMAX2×WMAX2 where WMAX + WMAX2≤2) 

represents more than 85% of the extrapolated aggregated value of that cell (Y), then the programme suppresses:

  • the extrapolated number of holdings (W) for cell c (if planned to be disseminated) and
  • the extrapolated aggregated value of variable x ( Y) for cell c .

The Eurobase FSS tables in SAS present the standard FLAG_CODE= T for the cells that are to be suppressed because of the dominance rule. In the code list on "confidentiality status" defined by the SDMX Statistical Working Group, the flag T stands for “Primary confidentiality due to dominance by two units”. Code "T" is going to be changed to code "G" as indicated below for IFS.

Example from a Eurobase table in SAS (FSS):



The Eurobase IFS tables in SAS present the standard FLAG_CODE= G for the cells that are to be suppressed because of the dominance rule. In the code list on "confidentiality status" defined by the SDMX Statistical Working Group, the flag G stands for “Primary confidentiality due to dominance by one or two units”. 

Example from a Eurobase table in SAS (IFS)


Subsequently, SAS assigns the FLAG_USER c to the data to be suppressed.

The tables disseminated on Eurostat website and following user requests use only the standard flag :c.


Example 1 on methodology:

Suppose that in the microdata, there are 3 sampled holdings (records) belonging to a specific NUTS 2 region, farm type and farm size:

Holding identifier

wi (extrapolation factor)

xi (cereals)

yi=wi ×xi

1

2

430

860

2

3

40

120

3

2

10

20

Using the microdata, the table above displays the aggregated value of several variables for this breakdown (cell). One of the variables is cereals (variable x). The total aggregated value for cereals is:

Y=860+120+20=1000

There are clearly two dominant farms which together account for more than 85% of the total 1 000, and the cell is expected to be confidential. Let's calculate W=2+3+2=7. We continue:

XMAX=430, WMAX=2

XMAX2=40, WMAX2=3

The condition (XMAX×WMAX)/Y>85% and WMAX≤2 is met so the value Y=1000 is suppressed and flagged confidential in the table.

If the extrapolated number of holdings (W=7) is planned to be disseminated in the table, this number is also suppressed and flagged confidential.

Example 2 on methodology: Suppose that in the microdata, there are 3 sampled holdings (records) belonging to a specific NUTS 2 region, farm type and farm size:

Holding identifier

wi (extrapolation factor)

xi (cereals)

yi=wi ×xi

1

0.6

300

180

2

1.4

200

280

3

5.0

30

150

The first sampled holding has the weight lower than 1 because of calibration.

Using the microdata, a table displays the aggregated value of several variables for this breakdown (cell). One of the variables is cereals (variable x). The total aggregated value for cereals is:

Y=180+280+150=610

Let's calculate W=0.6+1.4+5.0=7. We continue:

XMAX=300, WMAX=0.6

XMAX2=200, WMAX2=1.4

None of the conditions (XMAX×WMAX)/Y>85% and WMAX≤2 and (XMAX×WMAX+XMAX2×WMAX2)/Y>85% and (WMAXWMAX2)≤2 is fulfilled. The conditions related to the weights are fulfilled in both cases but the conditions related to the 85% thresholds are not fulfilled in any case. So the data are not suppressed and not flagged as confidential.

Remarks:

  • The method suppresses data depending on the extrapolated number of holdings and not on the number of holdings in the sample.

The reason is that the data are about the population in the cell and the purpose is to protect the population data.

  • From Example 2, it is noted that while XMAX is associated to the biggest contributing holding (the first holding), the corresponding extrapolated value YMAX=WMAX×XMAX is not the highest yi. yi is the highest for the second holding. It depends on the weights. Therefore, YMAX is not necessarily the highest yi.

For the identification of the dominant farms, the method considers the highest value xi (and not the highest value yi) because xi is the value at unit (farm) level (it is xi that serves to identify the most dominant single farms).

  • The dominance rule identifies as confidential those cases where the sum of the extrapolation factors is less than or equal to 2. Let's suppose that in a cell, the extrapolation factor attached to the highest value (let's say 100) is higher than 2, let's say 3. So there are 3 holdings with the highest value in the population: 100, 100, 100. This situation is not identified by the dominance rule to need protection (since 3 is higher than the threshold 2). Indeed, there is no need of a protection, because 2 values (100 and 100) cannot account together for more than 85% of the total of the cell, because there is at least the third same value 100 contributing to the cell. So the first two values can account at most for 66.7%, and never more than 85% of the total of the cell. In addition, if the weights are higher the sampling errors are typically higher and there is additional 'perturbation' of the true value because of sampling errors. Moreover, the users do not know which holdings were included in the sample and which ones provided the data.
  • Weights can be slightly higher than 1. For instance the largest has value 100 and weight 1.1 and the second largest has value 100 and weight 1.2. The estimated value for these two units (230) is over 85% of the total cell value of let's say, 260. In this case protection is needed, as they will probably represent only 2 holdings in the population (and not 3). That is why WMAX and WMAX2 are rounded to 0 decimals, as mentioned in the Notation section. After they are rounded to 0 decimals, the condition (WMAX+WMAX2)≤2 is met.

Rounding

Where not suppressed in the previous steps:

  • the extrapolated number of holdings in cells is rounded to the nearest multiple of 10, and
  • the extrapolated aggregated values of all variables in cells are rounded to the nearest multiple of 10.

Overall assessment of the procedure and possible improvements

The procedure generally ensures a good data treatment related to primary confidentiality. However there are some specific issues which can be improved. The following table presents the problems identified, the possible improvements and the current analysis and proposals.

Table 26 – Problems, possible improvements and proposals for application of primary confidentiality

Problem description

Possible improvement

Analysis Proposal

The procedure suppresses data when the number of holdings in the population is less than or equal to 4 compared to other domains where the suppression is applied when that number is less than or equal to 3. When the extrapolated number of holdings is in the interval (3; 4], the data is overprotected, as the risk of a holding (knowing its contribution and the total of a cell) to derive the other individual contributions is not realistic, except maybe if that holding is part of an enterprise operating more holdings.

No longer suppress data when the extrapolated number of holdings is in the interval (3; 4].

This improvement can be easily applied.

Data rounding causes inconsistencies (sums do not add up to totals). The reason is that rounding is applied to individual cells and totals in independent way. Totals on rows or columns are not calculated as the sum of the cells concerned.

To render the totals consistent with the sums of cells, a possible solution is the implementation of controlled rounding (using Tau-Argus). This involves rounding the tabular data to a pre-specified base while ensuring additivity of totals.

The controlled rounding procedure causes loss of accuracy in individual cells, by trying to maintain accuracy of totals. It might therefore be more appropriate to instead limit to warn users that cells do not add up to totals because of cells' data rounding (and suppression).

Knowing its own contribution and the total of a cell, the second largest contributor can estimate the minimum and maximum value of the first larger contributor. The minimum value of the first largest contributor is the value of the second largest contributor while the maximum value of the first largest contributor is the difference between the total and the second largest contributor.

A solution is applying the p% rule, according to which a cell is safe if the cell total minus the two largest contributors exceeds p% of the largest. This rule gives sufficient uncertainty that the second largest contributor cannot determine the size of the largest contributor.

It has to be assessed whether this rule provides some value added (if any), considering the very small likelihood of the risk of this disclosure in our domain, but also considering the dominance rule already in place (which is a concentration rule as the p% rule).

A suppressed cell can be recalculated with some margin by the difference between the total and the sum of the other cells.

A solution is applying secondary confidentiality. Secondary confidentiality is treating a non-confidential cell as confidential, to prevent disclosure of a confidential cell, by making it impossible for a user to recalculate the values of confidential cells.

Secondary confidentiality would need to be implemented for multiple tables at the same time. It is not clear that this is technically possible and feasible

Limitations to the application of secondary confidentiality

As mentioned in Table 17, secondary confidentiality would need to be implemented for several tables at the same time. This may not be feasible, considering:

  • the numerous tables disseminated on Eurostat website and through ad-hoc requests;
  • that the whole publication programme should be reviewed in an integrated way. When a new table is created, the other tables need updating.
  • the suppression of cells should have the same pattern for different reference periods. A value for a confidential cell from other reference periods is usually a good basis for estimating and therefore disclosing data.

It is to be noted that:

  • The rounding to the nearest multiple of 10 would prevent recalculation of the exact values to some extent.
  • When data are estimated from a sample, estimated values deviate from the true values, which would additionally prevent recalculation of the exact values to some extent.
Eurostat made an analysis of the possibility for recalculations and found in 1993 that:
  • most of the derived values are not reliable; there are negative solutions and solutions outside an interval of ±50% in relation to the true value;
  • a few derivations come very close to the real value.

It was concluded not to suppress data in cells, because "the procedure involves iterations of treatment of derivation while no possibility of derivation can be realistically excluded. It also entails loss of data involving no risk of disclosure". Besides the methods already implemented (suppression, rounding) or discussed in the above table, there are other methods e.g. table redesign (collapsing rows/columns), controlled tabular adjustment (selectively adjust cell values: unsafe cells are replaced by either of their closest safe values; other cell values are adjusted to restore additivity), perturbation (add random noise to cell values). Application of methods have pros and cons. For deciding on the most suitable solution, a balance has to be struck between confidentiality and reliability i.e. to which point the confidentiality treatment is effective and does not jeopardize the accuracy and usability of the results, unnecessarily.

Changes of the procedure over the years

Once a change has been done, it has been applied for the whole time series when the tables got updated. 

Further changes in IFS 2020

For the dominance treatment, the condition that the extrapolated number of holdings (W) which contribute to Y (i.e. which have a non-zero record value for x ) should be less than 10, was removed.

IFS 2020 compared to FSS 2016

Until 2017/FSS 2016, the confidentiality treatment included specific methods related to cell suppression due to small counts and due to dominance by one or two units for the data from United Kingdom, as described below.

General method

Specific method for United Kingdom

Cell suppression due to small counts


The general method evaluates if the extrapolated number of holdings (W) which contribute to Y (i.e. which have a non-zero record value for x ) is less than or equal to 4.

The specific method evaluates if the sampled number of holdings which contribute to Y (i.e. which have a non-zero record value for x) is less than 3.

Cell suppression due to dominance by one or two units


The general method evaluates if the extrapolated number of holdings (W) which contribute to Y (i.e. which have a non-zero record value for x ) is less than 10.

The specific method evaluates if the extrapolated number of holdings (W) which contribute to Y (i.e. which have a non-zero record value for x) is less than 50.

Starting with 2017/IFS 2020, the general method has been applied to all countries, including United Kingdom.

FSS 2016 compared to FSS 2013

Suppression vs "0" values

Until FSS 2013, the data in confidential cells were not suppressed but replaced with 0. Users could not distinguish between a real "0" and a confidential value and related to that, also had difficulties to understand why sums of cells did not match the totals. Starting with FSS 2016, the data in confidential cells have been suppressed and flagged as confidential.

Cell suppression due to small counts

Until FSS 2013, for those cells where the extrapolated number of holdings (W) which contribute to Y was 5, 6, 7, 8 or 9, W was replaced with either 0 or 10, based on a pseudo-probabilistic method. When W was replaced with 0, Y was also replaced with 0. Starting with FSS 2016, this additional confidentiality treatment was dropped because:

  • The data had been overprotected
  • The average value of some variables (calculated by users considering the disseminated W ) was misleading.



  • No labels