Keyword Query Routing
ABSTRACT:
Keyword search
is an intuitive paradigm for searching linked data sources on the web. We
propose to route keywords only to relevant sources to reduce the high cost of
processing keyword search queries over all sources. We propose a novel method
for computing top-k routing plans based on their potentials to contain results
for a given keyword query. We employ a keyword-element relationship summary
that compactly represents relationships between keywords and the data elements
mentioning them. A multilevel scoring mechanism is proposed for computing the
relevance of routing plans based on scores at the level of keywords, data
elements, element sets, and subgraphs that connect these elements. Experiments
carried out using 150 publicly available sources on the web showed that valid
plans (precision@1 of 0.92) that are highly relevant (mean reciprocal rank of
0.89) can be computed in 1 second on average on a single PC. Further, we show
routing greatly helps to improve the performance of keyword search, without
compromising its result quality.
AIM:
Linked data describes a method of publishing
structured data so that it can be interlinked and become more useful. Keyword
search is an intuitive paradigm for searching linked data sources on the web.
We propose to route keywords only to relevant sources to reduce the high cost
of processing keyword search queries over all sources. In this we have
implement TOP K-Routing plan based on their potentials to contain results for a
given keyword query.
SYNOPSIS:
In recent years the Web has evolved from a global information
space of linked documents to one where both documents and data are linked.
Underpinning this evolution is a set of best practices for publishing and
connecting structured data on the Web known as Linked Data. The adoption of the
Linked Data best practices has lead to the extension of the Web with a global
data space connecting data from diverse domains such as people, companies,
books, scientific publications, films, music, television and radio programmes,
genes, proteins, drugs and clinical trials, online communities, statistical and
scientific data, and reviews. This Web of Data enables new types of
applications. There are generic Linked Data browsers which allow users to start
browsing in one data source and then navigate along links into related data
sources. There are Linked Data search engines that crawl the Web of Data by
following links between data sources and provide expressive query capabilities
over aggregated data, similar to how a local database is queried today. The Web
of Data also opens up new possibilities for domain-specific applications.
Unlike Web 2.0 mashups which work against a fixed set of data sources, Linked
Data applications operate on top of an unbound, global data space. This enables
them to deliver more complete answers as new data sources appear on the Web.
We
propose to investigate the problem of keyword query routing for keyword search
over a large number of structured and Linked Data sources. Routing keywords
only to relevant sources can reduce the high cost of searching for structured
results that span multiple sources. To the best of our knowledge, the work
presented in this paper represents the first attempt to address this problem.
We
use a graph-based data model to characterize individual data sources. In that
model, we distinguish between an element-level data graph representing
relationships between individual data elements, and a set-level data graph,
which captures information about group of elements. This set-level graph
essentially captures a part of the Linked Data schema on the web that is
represented in RDFS, i.e., relations between classes. Often, a schema might be
incomplete or simply does not exist for RDF data on the web. In such a case, a
pseudoschema can be obtained by computing a structural summary such as a
dataguide.
EXISTING SYSTEM:
Existing work
can be categorized into two main categories:
Ø schema-based approaches
Ø Schema-agnostic approaches
There are schema-based approaches
implemented on top of off-the-shelf databases. A keyword query is processed by
mapping keywords to elements of the database (called keyword elements). Then,
using the schema, valid join sequences are derived, which are then employed to
join (“connect”) the computed keyword elements to form so called candidate
networks representing possible results to the keyword query.
Schema-agnostic approaches operate
directly on the data. Structured results are computed by exploring the
underlying data graph. The goal is to find structures in the data called
Steiner trees (Steiner graphs in general), which connect keyword elements.
Various kinds of algorithms have been proposed for the efficient exploration of
keyword search results over data graphs, which might be very large. Examples
are bidirectional search and dynamic programming
Existing work on keyword search relies
on an element-level model (i.e., data graphs) to compute keyword query results.
DISADVANTAGES
OF EXISTING SYSTEM:
Ø The number of potential results may increase
exponentially with the number of sources and links between them. Yet, most of
the results may be not necessary especially when they are not relevant to the
user.
Ø The routing problem, we need to compute results capturing
specific elements at the data level.
Ø Routing keywords return all the source which may or
may not be the relevant sources
PROPOSED SYSTEM:
We propose to
route keywords only to relevant sources to reduce the high cost of processing
keyword search queries over all sources. We propose a novel method for
computing top-k routing plans based on their potentials to contain results for
a given keyword query. We employ a keyword-element relationship summary that
compactly represents relationships between keywords and the data elements
mentioning them. A multilevel scoring mechanism is proposed for computing the
relevance of routing plans based on scores at the level of keywords, data
elements, element sets, and subgraphs that connect these elements. We propose
to investigate the problem of keyword query routing for keyword search over a
large number of structured and Linked Data sources.
ADVANTAGES
OF PROPOSED SYSTEM:
·
Routing
keywords only to relevant sources can reduce the high cost of searching for
structured results that span multiple sources.
·
The
routing plans, produced can be used to compute results from multiple sources.
MODULES:
ü Linked
Data Generation
ü Key
level Mapping
ü Multilevel
Inter relationship
ü Routing
Plan
MODULES DESCRIPTION:
Linked
Data Generation
The GeoNames
Services makes it possible to add geospatial semantic information to the Word
Wide Web. All over 6.2 million geonames toponyms now have a unique URL with a
corresponding XML web service. In this we have used Country Info , Time zone
and Finance Info services. This model resembles RDF data where entities stand
for some RDF resources, data values stand for RDF literals, and relations and
attributes correspond to RDF triples. While it is primarily used to model RDF
Linked Data on the web, such a graph model is sufficiently general to capture
XML and relational data.
Key
level Mapping
The set-level
graph essentially captures a part of the Linked Data schema on the web that is
represented in RDFS, i.e., relations between classes. Often, a schema might be
incomplete or simply does not exist for RDF data on the web. In such a case, a
pseudoschema can be obtained by computing a structural summary such as a data
guide. A set-level data graph can be derived from a given schema or a generated
pseudoschema. The web of data is modeled as a web graph where GA is the set of
all data graphs, N is the set of all nodes, E
is the set of all “internal” edges that connect elements within a
particular source.
Multilevel
Inter relationship
The search space
of keyword query routing using a multilevel inter-relationship graph. The
inter-relationships between elements at different levels keyword is mentioned
in some entity descriptions at the element level. Entities at the element level
are associated with a set-level element via type. A set-level element is
contained in a source. There is an edge between two keywords if two elements at
the element level mentioning these keywords are connected via a path. We
propose a ranking scheme that deals with relevance at many levels.
Routing
Plan:
Given the web
graph W =(G,N,E) and a keyword query K, the mapping: K-2G that associates a
query with a set of data graphs is called a keyword routing plan RP. A plan RP
is considered valid w.r.t. K when the union set of its data graphs contains a
result for K. The problem of keyword query
routing is to find the top-k keyword routing plans based on their relevance to
a query. A relevant plan should correspond to the information need as intended
by the user.
SYSTEM
REQUIREMENTS:
HARDWARE REQUIREMENTS:
Ø
System : Pentium IV 2.4 GHz.
Ø
Hard Disk :
40 GB.
Ø
Floppy Drive : 1.44
Mb.
Ø
Monitor : 15
VGA Colour.
Ø
Mouse :
Logitech.
Ø Ram : 512 Mb.
SOFTWARE
REQUIREMENTS:
Ø Operating system : Windows
XP/7.
Ø Coding Language : JAVA/J2EE
Ø IDE : Netbeans 7.4
Ø Database : MYSQL
REFERENCE:
Thanh Tran and
Lei Zhang, “Keyword Query Routing”. IEEE TRANSACTIONS ON KNOWLEDGE AND
DATA ENGINEERING, VOL. 26, NO. 2, FEBRUARY 2014
No comments:
Post a Comment