Improving Anonymization Clustering
ISSN der Zeitschrift
Gesellschaft für Informatik e.V.
Microaggregation is a technique to preserve privacy when confidential information about individuals shall be used by third parties. A basic property to be established is called k-anonymity. It requires that identifying information about individuals should not be unique, instead there has to be a group of size at least k that looks identical. This is achieved by clustering individuals into appropriate groups and then averaging the identifying information. The question arises how to select these groups such that the information loss by averaging is minimal. This problem has been shown to be NP-hard. Thus, several heuristics called MDAV, V-MDAV,... have been proposed for finding at least a suboptimal clustering. This paper proposes a more sophisticated, but still efficient strategy called MDAV* to construct a good clustering. The question whether to extend a group locally by individuals close by or to start a new group with such individuals is investigated in more depth. This way, a noticeable lower information loss can be achieved which is shown by applying MDAV* to several established benchmarks of real data and also to specifically designed random data.