BinRank: Scaling Dynamic Authority-BasedSearch Using Materialized
Subgraphs
ABSTRACT:
Dynamic
authority-based keyword search algorithms, such as Object Rank and personalized
PageRank, leverage semantic link information to provide high quality, high
recall search in databases, and the Web. Conceptually, these algorithms require
a query time PageRank-style iterative computation over the full graph.
This
computation is too expensive for large graphs and not feasible at query time.
Alternatively, building an index of precompiled results for some or all
keywords involves very expensive preprocessing. We introduce BinRank, a system
that approximates Object Rank results by utilizing a hybrid approach inspired
by materialized views in traditional query processing.
We
materialize a number of relatively small subsets of the data graph in such a
way that any keyword query can be answered by running ObjectRank on only one of
the sub graphs. BinRank generates the subgraphs by partitioning all the terms
in the corpus based on their co-occurrence, executing ObjectRank for each
partition using the terms to generate a set of random walk starting points, and
keeping only those objects that receive non-negligible scores. The
intuition is that a subgraph that contains all objects and links relevant to a
set of related terms should have all the information needed to rank objects
with respect to one of these terms.
We
demonstrate that BinRank can achieve subsecond query execution time on the
English Wikipedia data set, while producing high-quality search results that
closely approximate the results of ObjectRank on the original graph.
The
Wikipedia link graph contains about 108 edges, which is at least two orders of
magnitude larger than what prior state of the art dynamic authority-based
search systems have been able to demonstrate.
Our
experimental evaluation investigates the trade-off between query execution
time, quality of the results, and storage requirements of BinRank.
No comments:
Post a Comment