eCite Digital Repository

Efficient provenance storage for relational queries


Bao, Z and Kohler, H and Wang, L and Zhou, X and Sadiq, S, Efficient provenance storage for relational queries, Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 29 October - 2 November 2012, Maui, Hawaii, pp. 1352-1361. ISBN 978-1-4503-1156-4 (2012) [Refereed Conference Paper]

Copyright Statement

Copyright 2012 ACM

DOI: doi:10.1145/2396761.2398439


Provenance information is vital in many application areas as it helps explain data lineage and derivation. However, storing fine-grained provenance information can be expensive. In this paper, we present a framework for storing provenance information relating to data derived via database queries. In particular, we first propose a provenance tree data structure which matches the query structure and thereby presents a possibility to avoid redundant storage of information regarding the derivation process. Then we investigate two approaches for reducing storage costs. The first approach utilizes two ingenious rules to achieve reduction on provenance trees. The second one is a dynamic programming solution, which provides a way of optimizing the selection of query tree nodes where provenance information should be stored. The optimization algorithm runs in polynomial time in the query size and is linear in the size of the provenance information, thus enabling provenance tracking and optimization without incurring large overheads. Experiments show that our approaches guarantee significantly lower storage costs than existing approaches.

Item Details

Item Type:Refereed Conference Paper
Keywords:data quality, provenance data, index, storage
Research Division:Information and Computing Sciences
Research Group:Data management and data science
Research Field:Data management and data science not elsewhere classified
Objective Division:Information and Communication Services
Objective Group:Information systems, technologies and services
Objective Field:Information systems, technologies and services not elsewhere classified
UTAS Author:Bao, Z (Dr Zhifeng Bao)
ID Code:92181
Year Published:2012
Deposited By:Information and Communication Technology
Deposited On:2014-06-09
Last Modified:2014-12-08

Repository Staff Only: item control page