eCite Digital Repository

Efficient query processing for XML keyword queries based on the IDList index


Zhou, J and Bao, Z and Wang, W and Zhao, J and Meng, X, Efficient query processing for XML keyword queries based on the IDList index, The VLDB Journal, 23, (1) pp. 25-50. ISSN 0949-877X (2014) [Refereed Article]

Copyright Statement

Copyright Springer-Verlag Berlin Heidelberg 2013

DOI: doi:10.1007/s00778-013-0313-2


Keyword search over XML data has attracted a lot of research efforts in the last decade, where one of the fundamental research problems is how to efficiently answer a given keyword query w.r.t. a certain query semantics. We found that the key factor resulting in the inefficiency for existing methods is that they all heavily suffer from the commonancestor- repetition problem. In this paper, we propose a novel form of inverted list, namely the IDList; the IDList for keyword k consists of ordered nodes that directly or indirectly contain k. We then show that finding keyword query results based on the smallest lowest common ancestor and exclusive lowest common ancestor semantics can be reduced to ordered set intersection problem, which has been heavily optimized due to its application in areas such as information retrieval and database systems. We propose several algorithms that exploit set intersection in different directions and with or without using additional indexes.We further propose several algorithms that are based on hash search to simplify the operation of finding common nodes from all involved IDLists. We have conducted an extensive set of experiments using many state-of-the-art algorithms and several large-scale datasets. The results demonstrate that our proposed methods outperform existing methods by up to two orders of magnitude in many cases.

Item Details

Item Type:Refereed Article
Keywords:xml keyword search, set intersection
Research Division:Information and Computing Sciences
Research Group:Data management and data science
Research Field:Data management and data science not elsewhere classified
Objective Division:Information and Communication Services
Objective Group:Information systems, technologies and services
Objective Field:Information systems, technologies and services not elsewhere classified
UTAS Author:Bao, Z (Dr Zhifeng Bao)
ID Code:90346
Year Published:2014
Web of Science® Times Cited:15
Deposited By:Information and Communication Technology
Deposited On:2014-03-31
Last Modified:2014-08-11

Repository Staff Only: item control page