File(s) under permanent embargo
Efficient provenance storage for relational queries
conference contribution
posted on 2023-05-23, 08:58 authored by Bao, Z, Kohler, H, Wang, L, Zhou, X, Sadiq, SProvenance information is vital in many application areas as it helps explain data lineage and derivation. However, storing fine-grained provenance information can be expensive. In this paper, we present a framework for storing provenance information relating to data derived via database queries. In particular, we first propose a provenance tree data structure which matches the query structure and thereby presents a possibility to avoid redundant storage of information regarding the derivation process. Then we investigate two approaches for reducing storage costs. The first approach utilizes two ingenious rules to achieve reduction on provenance trees. The second one is a dynamic programming solution, which provides a way of optimizing the selection of query tree nodes where provenance information should be stored. The optimization algorithm runs in polynomial time in the query size and is linear in the size of the provenance information, thus enabling provenance tracking and optimization without incurring large overheads. Experiments show that our approaches guarantee significantly lower storage costs than existing approaches.
History
Publication title
Proceedings of the 21st ACM International Conference on Information and Knowledge ManagementPagination
1352-1361ISBN
978-1-4503-1156-4Department/School
School of Information and Communication TechnologyPublisher
Association for Computing MachineryPlace of publication
United States of AmericaEvent title
21st ACM International Conference on Information and Knowledge ManagementEvent Venue
Maui, HawaiiDate of Event (Start Date)
2012-10-29Date of Event (End Date)
2012-11-02Rights statement
Copyright 2012 ACMRepository Status
- Restricted