Take it or Leaf it 2021

Page 18

Statistical disclosure control when publishing on Thematic Maps Text: Martijn ter Steege Statistical disclose control is essential when publishing data. Public authori-ties, such as Statistics Netherlands (Dutch: CBS), do research about various topics. Some of these topics are more sensible than others: for instance publishing the average age of a group of people is not sensible, but the average income is. Therefore, the privacy of each individual within a group should be guaranteed. In short, statistical disclose control makes sure that everyone’s privacy is ensured. The next question is, how can we make sure that each individuals privacy is ensured? In a previous paper by Douwe Hut [1], this was done by adding noise using the (p,q)-rule. This rule makes sure that it is impossible to estimate some individual data within p%, if all other contributions’ lower bound are known within q%. I won’t go into too much details, so if you want to read about this, I’d recommend you to read [1]. In my Bachelor assignment, I’ve looked into another mechanism that protects group data: the Pufferfish framework. Don’t ask me or my supervisor why this method is called Pufferfish, because we do not know. What we do know is that this method was explained in [2], but that not a lot of further research was done. Therefore, I chose to investigate into this method and t check if this method had potential to be used. The Pufferfish framework is based on the following equations:

where T denotes the data, function

18

M denotes a mechanism to protect the data, si and sj are mutually exclusive statements about the data (for instance: si: ‘individual h is not in the data set’, sj: ‘individual h is in the data set’), θ denotes all information that is known about the data. Both equations are sufficient to have Pufferfish privacy, where ε determines the strength of the privacy. The reader can easily verify that1 these equations can be rewritten into

using some basic probability theory. Don’t look this up in my report, because I didn’t even compute this. It is good to understand the interpretation of this computation. We can interpret

as being the initial probability that si is more likely to be true than sj. This can be seen as pre-knowledge, that is not influenced by the published data.

can be interpreted as the postknowledge, the knowledge you have after observing the data. Due to the boundaries of e±ε, and assuming that ε should be small for large privacy, we get that both boundaries are close to 1. Hence, both probability fractions should be nearly equal, so it can be assumed that no new information can be obtained by evaluating the data. That means that a mechanism M ensures privacy, when this equation holds. 1

myself!

I always wanted to use this

For my research, I proved that the addition of noise according to the Laplace distribution satisfies the aforementioned property. This Laplace noise is dependent on the amount of contributions in the group and the privacy parameter ε: a larger group or larger ε means less noise. I won’t go into much detail here, but if you want to see the proof, then you should read my paper. Another mechanism I managed to create was the relative error mechanism. This mechanism does not add noise, but it has multiplicative noise. The reason why this is useful is that we can protect data relatively now: we can say that our mechanism protects the data within k%. This multiplicative noise also has Laplace distribution, but also involves a logarithm. Now we have two mechanisms that can be applied for group privacy. Looking at my research title, I haven’t talked about the publication of thematic maps. Thematic maps can be visualized in several ways, but in my research I focussed on publishing grid maps. In Figure 1 you can see such a grid map that is used by Statistics Netherlands.


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.