Results 1 -
4 of
4
Using Anonymized Data for Classification
- Proc. IEEE 25th Int‟l Conf. Data Eng. (ICDE
, 2009
"... Abstract — In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assu ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
(Show Context)
Abstract — In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assump-tions. At the same time, there has been little research addressing how to effectively use the anonymized data for data mining in general and for distributed data mining in particular. In this paper, we propose a new approach for building classifiers using anonymized data by modeling anonymized data as uncertain data. In our method, we do not assume any probability distribu-tion over the data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data. We show that releasing such statistics does not violate anonymity. Experiments spanning various alternatives both in local and distributed data mining settings reveal that our method performs better than heuristic approaches for handling anonymized data. I.
An Efficient Representation Model of Distance Distribution Between Two Uncertain Objects
"... Abstract. In this paper, we consider the problem of efficient computation of distance distribution between two uncertain objects. It is important to many uncertain query evaluation (e.g., range queries, nearest-neighbour queries) and uncertain data mining (e.g., classification, clustering and outlie ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract. In this paper, we consider the problem of efficient computation of distance distribution between two uncertain objects. It is important to many uncertain query evaluation (e.g., range queries, nearest-neighbour queries) and uncertain data mining (e.g., classification, clustering and outlier detection). However, existing approaches involve distance computations between samples of two objects, which is very computationally intensive. On one hand, it is expensive to calculate and store the actual distribution of the possible distance values between two uncertain objects. On the other hand, the expected distance (the weighted average of the pair-wise distances among samples of two uncertain objects) provides very limited information and also restricts the definitions and usefulness of queries and mining tasks. In this paper, we propose several approaches to approximate the actual distance distribution between two given objects. Experiments on real data and synthetic data show that our approaches DAGO and DAGS produce approximations in a very short time with acceptable accuracy (about 90% or above). We suggest that these approaches make it practical for the research communities to define and develop more powerful queries and data mining tasks based on the distance distribution instead of the expected distance. 1
Probabilistic Granule-Based Inside and Nearest Neighbor Queries
"... Abstract. The development of location-based services and advances in the field of mobile computing have motivated an intensive research effort devoted to the efficient processing of location-dependent queries. In this context, it is usually assumed that location data are expressed at a fine geograph ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. The development of location-based services and advances in the field of mobile computing have motivated an intensive research effort devoted to the efficient processing of location-dependent queries. In this context, it is usually assumed that location data are expressed at a fine geographic precision. Adding support for location granules means that the user is able to use his/her own terminology for locations (e.g., GPS, cities, states, provinces, etc.), which may have an impact in the semantics of the query, the way the results are presented, and the performance of the query processing. Along with its advantages, the management of the so-called location granules introduces new challenges for query processing. In this paper, we analyze two popular location-dependent constraints, inside and nearest neighbors, and enhance them with the possibility to specify location granules. In this context, we study the problem that arises when the locations of the objects are subject to some imprecision. 1