Secure Mining of
Association Rules in Horizontally Distributed Databases
ABSTRACT:
We propose a protocol for secure mining of
association rules in horizontally distributed databases. The current leading protocol
is that of Kantarcioglu and Clifton. Our protocol, like theirs, is based on the
Fast Distributed Mining (FDM) algorithm of Cheung et al. which is an unsecured
distributed version of the Apriori algorithm. The main ingredients in our
protocol are two novel secure multi-party algorithms — one that computes the
union of private subsets that each of the interacting players hold, and another
that tests the inclusion of an element held by one player in a subset held by
another. Our protocol offers enhanced privacy with respect to the protocol. In
addition, it is simpler and is significantly more efficient in terms of
communication rounds, communication cost and computational cost.
EXISTING SYSTEM:
Kantarcioglu and Clifton studied that problems and devised
a protocol for its solution. The main part of the protocol is a sub-protocol
for the secure computation of the union of private subsets that are held by the
different players. (The private subset of a given player, as we explain below,
includes the item sets that are s-frequent in his partial database. That is the
most costly part of the protocol and its implementation relies upon
cryptographic primitives such as commutative encryption, oblivious transfer,
and hash functions. This is also the only part in the protocol in which the
players may extract from their view of the protocol information on other
databases, beyond what is implied by the final output and their own input. While
such leakage of information renders the protocol not perfectly secure, the
perimeter of the excess information is explicitly bounded and it is argued
there that such information leakage is innocuous, whence acceptable from a practical
point of view.
DISADVANTAGES
OF EXISTING SYSTEM:
v Insufficient
security, simplicity and efficiency are not well in the databases, not sure in
privacy in an existing system.
v While
our solution is still not perfectly secure, it leaks excess information only to
a small number (three) of possible coalitions, unlike the protocol of that
discloses information also to some single players.
v Our
protocol may leak is less sensitive than the excess information leaked by the
protocol.
PROPOSED SYSTEM:
The protocol that we propose here computes a
parameterized family of functions, which we call threshold functions, in which
the two extreme cases correspond to the problems of computing the union and
intersection of private subsets. Those are in fact general-purpose protocols
that can be used in other contexts as well. Another problem of secure
multiparty computation that we solve here as part of our discussion is the set
inclusion problem; namely, the problem where Alice holds a private subset of
some ground set, and Bob holds an element in the ground set, and they wish to
determine whether Bob’s element is within Alice’s subset, without revealing to either
of them information about the other party’s input beyond the above described
inclusion.
ADVANTAGES
OF PROPOSED SYSTEM:
v We
proposed a protocol for secure mining of association rules in horizontally
distributed databases that improves significantly upon the current leading
protocol in terms of privacy and efficiency.
v The
main ingredient in our proposed protocol is a novel secure multi-party protocol
for computing the union (or intersection) of private subsets that each of the
interacting players holds.
MODULES:
1. Privacy Preserving Data Mining
2. Distributed Computation
3. Frequent Itemsets
4. Association Rules
MODULES
DESCRIPTION:
1. Privacy Preserving Data Mining:
One, in
which the data owner and the data miner are two different entities, and
another, in which the data is distributed among several parties who aim to
jointly perform data mining on the unified corpus of data that they hold. In the first setting, the goal is to protect
the data records from the data miner. Hence, the data owner aims at anonym zing
the data prior to its release. The main approach in this context is to apply
data perturbation. The idea is that. Computation and communication costs versus
the number of transactions N the perturbed data can be used to infer
general trends in the data, without revealing original record information. In
the second setting, the goal is to perform data mining while protecting the
data records of each of the data owners from the other data owners. This is a
problem of secure multiparty computation. The usual approach here is
cryptographic rather than probabilistic.
2. Distributed Computation:
We compared the performance of two
secure implementations of the FDM algorithm Section In the first implementation
(denoted FDM-KC), we executed the unification step using Protocol UNIFI-KC,
where the commutative cipher was 1024-bit RSA in the second implementation
(denoted FDM) we used our Protocol UNIFI, where the keyed-hash function was
HMAC. In both implementations, we implemented Step 5 of the FDM algorithm in
the secure manner that was described in later. We tested the two
implementations with respect to three measures:
1)
Total computation time of the complete protocols (FDMKC and FDM) over all
players. That measure includes the Apriori computation time, and the time to
identify the globally s-frequent item sets, as described in later.
2) Total
computation time of the unification protocols only (UNIFI-KC and UNIFI) over
all players. 3) Total message size. We ran three experiment sets, where each
set tested the dependence of the above measures on a different parameter: • N — the number of transactions in the unified
database,
3. Frequent Itemsets:
We describe here the solution that was
proposed by Kantarcioglu and Clifton. They onsidered two possible settings. If
the required output includes all globally s-frequent item sets, as well as the sizes of their
supports, then the values of Δ(x) can be revealed for all. In such a case, those values
may be computed using a secure summation protocol, where the private addend of Pm is suppm(x) − sNm. The more interesting setting, however, is
the one where the support sizes are not part of the required output. We proceed
to discuss it.
4. Association Rules:
Once the set Fs of all s-frequent itemsets is found, we may proceed to look for
all (s, c)-association rules (rules with support at
least sN and
confidence at least c). In order to derive from Fs all (s, c)-association rules in an efficient manner we rely upon the
straightforward lemma.
SYSTEM
REQUIREMENTS:
HARDWARE REQUIREMENTS:
•
System : Pentium
IV 2.4 GHz.
•
Hard Disk : 40 GB.
•
Floppy Drive : 1.44 Mb.
•
Monitor : 15 VGA
Colour.
•
Mouse : Logitech.
•
Ram : 512 Mb.
SOFTWARE REQUIREMENTS:
•
Operating system : - Windows XP.
•
Coding Language :
ASP.NET, C#.Net.
•
Data Base :
SQL Server 2005
REFERENCE:
Tamir Tassa “Secure Mining of Association Rules in
Horizontally Distributed Databases” IEEE
TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 4, APRIL 2014
No comments:
Post a Comment