CLOSENESS
A NEW PRIVACY MEASURE FOR DATA PUBLISHING
ABSTRACT
The
k-anonymity privacy requirement for publishing microdata requires that each
equivalence class (i.e., a set of records that are indistinguishable from each
other with respect to certain “identifying” attributes) contains at least k
records. Recently, several authors have recognized that k-anonymity cannot
prevent attribute disclosure. The notion of l-diversity has been proposed to
address this; l-diversity requires that each equivalence class has at least l
well-represented values for each sensitive attribute. In this project, we show
that l-diversity has a number of limitations. In particular, it is neither
necessary nor sufficient to prevent attribute disclosure. Motivated by these
limitations, we propose a new notion of privacy called “closeness.” We first
present the base model t-closeness, which requires that the distribution of a
sensitive attribute in any equivalence class is close to the distribution of
the attribute in the overall table. We then propose a more flexible privacy
model called (n,t)-closeness that offers higher utility. We describe our
desiderata for designing a distance measure between two probability
distributions and present two distance measures. We discuss the rationale for
using closeness as a privacy measure and illustrate its advantages through
examples and experiments.
SYSTEM
ANALYSIS
PROBLEM
DEFINITION:
The problem of information disclosure has
been studied extensively in the framework of statistical databases. A number of
information disclosure limitation techniques have been designed for data
publishing. The first category of work aims at devising privacy requirements. A
few subsequent works recognize that the adversary has also knowledge of the
distribution of the sensitive attribute in each group. t-Closeness proposes
that the distribution of the sensitive attribute in the overall table should
also be public information. Privacy-preserving data publishing has been
extensively
studied
in several other aspects. First, background knowledge presents additional
challenges in defining privacy requirements. By this, we use the Mondrian
algorithm which partitions the high-dimensional space into regions and encodes
data points in one region by the region’s representation. On the theoretical
side, optimal k-anonymity has been proved, and approximation algorithms for
finding the anonymization that suppresses the fewest cells have been proposed.
Existing System
In the existing system, several
methods have recognized that k-anonymity cannot prevent attribute disclosure.
So it is not able to maintain the privacy and it’s not much useful to formulate
the census evaluations. Before, the system will not make any possible to secure
the data publishing in the area of data security. It does not provide
sufficient protection against attribute disclosure. Some of the hospitals only
making database of the patient’s history. There is no integrated structure in
the existing systems. Every one can enter to the application to view the
patient history with out any security information in the previous sessions.
These are the major drawbacks of the existing applications. To overcome these
problems, have to make an integrated solution with the privacy security in
future applications.
Problems in Existing System
K-anonymity
cannot prevent attribute disclosure.
It
is not able to maintain the privacy.
There
is no integrated structure in the existing systems.
3.2
Proposed System
Here a novel privacy notion called
“closeness” is proposed. From the idea of global background knowledge, first
the base model t-closeness is formalized, which requires that the distribution
of a sensitive attribute in any equivalence class to be close to the
distribution of the attribute in the overall table. While the released table
gives useful information to researchers, it presents disclosure risk to the
individuals whose data are in the table. Therefore, an objective is to limit
the disclosure risk to an acceptable level while maximizing the benefit. Closeness
achieves a better balance between privacy and utility than the existing privacy
models such as l-diversity and t-closeness. Finally, the effectiveness of the
closeness model is evaluated in both privacy protection and utility
preservation through an integrated hospital website with a real data set.
Hardware
Requirements
Processor : Pentium IV
RAM : 512 MB
Hard
Disk : 40 GB
Monitor : 15” Color Monitor
Keyboard : Multimedia
Mouse : Optical
Software
Requirements
Framework : Visual Studio 2005
Front End : Asp.net 2.0
Code behind : C#.net
Database : Sql Server 2000
No comments:
Post a Comment