eCite Digital Repository
Efficient query processing for XML keyword queries based on the IDList index
Citation
Zhou, J and Bao, Z and Wang, W and Zhao, J and Meng, X, Efficient query processing for XML keyword queries based on the IDList index, The VLDB Journal, 23, (1) pp. 25-50. ISSN 0949-877X (2014) [Refereed Article]
Copyright Statement
Copyright Springer-Verlag Berlin Heidelberg 2013
DOI: doi:10.1007/s00778-013-0313-2
Abstract
Keyword search over XML data has attracted a
lot of research efforts in the last decade, where one of the
fundamental research problems is how to efficiently answer
a given keyword query w.r.t. a certain query semantics. We
found that the key factor resulting in the inefficiency for existing
methods is that they all heavily suffer from the commonancestor-
repetition problem. In this paper, we propose a
novel form of inverted list, namely the IDList; the IDList for
keyword k consists of ordered nodes that directly or indirectly
contain k. We then show that finding keyword query results
based on the smallest lowest common ancestor and exclusive
lowest common ancestor semantics can be reduced to ordered
set intersection problem, which has been heavily optimized
due to its application in areas such as information retrieval
and database systems. We propose several algorithms that
exploit set intersection in different directions and with or
without using additional indexes.We further propose several
algorithms that are based on hash search to simplify the operation
of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many
state-of-the-art algorithms and several large-scale datasets.
The results demonstrate that our proposed methods outperform
existing methods by up to two orders of magnitude in
many cases.
Item Details
Item Type: | Refereed Article |
---|---|
Keywords: | xml keyword search, set intersection |
Research Division: | Information and Computing Sciences |
Research Group: | Data management and data science |
Research Field: | Data management and data science not elsewhere classified |
Objective Division: | Information and Communication Services |
Objective Group: | Information systems, technologies and services |
Objective Field: | Information systems, technologies and services not elsewhere classified |
UTAS Author: | Bao, Z (Dr Zhifeng Bao) |
ID Code: | 90346 |
Year Published: | 2014 |
Web of Science® Times Cited: | 15 |
Deposited By: | Information and Communication Technology |
Deposited On: | 2014-03-31 |
Last Modified: | 2014-08-11 |
Downloads: | 0 |
Repository Staff Only: item control page