Search This Blog

Thursday, March 3, 2011

Privacy-Preservation Techniques in data mining

Privacy Preservation in Data Mining

Data mining has an ultimate goal of prediction. (Acquisti, Gritzalis, Lambrinoudakis, & Vimercati, 2008) Data mining has many uses in today’s organizations, specifically in consumer focused companies such as financial, retail and marketing to name a few.

Data mining gathers and is the process of analyzing data from different perspectives and summarizing it into useful information. The information could be used to increase sales, decrease overhead or even find correlations of information that were not known to exist. If a company wants to properly target their advertising dollars, they are going to data mine. By applying predictive data mining, you will find the proper target audience and further, find out what their likes, dislikes and habits are. (Anissimov, 2011)

There are several types of algorithms that are utilized in data mining. To name a few, clustering algorithms are given a set of data that may or may not have any meaning, the clustering algorithm thereby clusters the data in several methods that may have not been recognized by the naked eye. The K-means clustering is a method of cluster analysis which takes into consideration a number of observations and puts them into K clusters; the expected result is an attempt to find the center of natural clusters in the data as well as a clarification of the differences of the different sets of data. (Wagner, Cardie, Rogers, & Schroedl, 2001)

A primary concern for data mining research is the development of data collection methods that incorporate the privacy of the individual. A productive direction for future data mining research will be the development of techniques that incorporate privacy concerns. Specifically, we address the following question; Since the primary task in data mining is the development of models about aggregated data, can we develop accurate models without access to precise information in individual data records? With data mining, a retailer could use point-of-sale records of past purchases to send targeted promotions on an individual’s purchase history.

In the corporate world, data mining is used most frequently to determine trends and predict the future. It is used to build models and decision support systems that give management information they can use to sell their products more efficiently. Data mining however is used in retail as well as pharmaceutical sales and even by the Department of Defense to predict with greater accuracy the likelihood of an attack. (Palace, 1996)

In regard to preserving the privacy of personal information, it is important to note that the privacy of individuals should never be sacrificed. The text gives an example of insurance companies sharing the data of patient records with the doctor’s office. Some data needs to be kept unique, while others can be shared. Sensitive information about an individual could be shared with law enforcement by an airline without the entire passenger list being violated. There have been many cases whereas different law enforcement agencies didn’t collaborate their information and in fact, with data mining could have shared information without giving away the entire portfolio that they have on record of the individual in question, thereby making the identification and apprehension an easier task. (Acquisti, et al., 2008)


Acquisti, A., Gritzalis, S., Lambrinoudakis, C., & Vimercati, S. D. C. d. (Eds.). (2008). Digital Privacy: Theory, Technologies, and Practices. New York: Auerbach Publications.

Anissimov, M. (2011). What is data mining? , 2011, from

Palace, B. (1996). Data Mining Technology Note prepared for Management 274A : Anderson Graduate School of Management at UCLA UCLA.

Wagner, K., Cardie, C., Rogers, S., & Schroedl, S. (2001). Constrained K-means clustering iwth background knowledge. Proceedings of the Eighteenth International Conference on Machine Learning, 2001.