The Importance of Sibling Clustering for Efficient Bulkload of XML Document Trees


Kanne, Carl-Christian ; Moerkotte, Guido


[img]
Preview
PDF
tr_2005_009.pdf - Published

Download (163kB)

URL: http://ub-madoc.bib.uni-mannheim.de/1137
URN: urn:nbn:de:bsz:180-madoc-11374
Document Type: Working paper
Year of publication: 2005
Publication language: English
Institution: School of Business Informatics and Mathematics > Sonstige - Fakultät für Mathematik und Informatik
MADOC publication series: Veröffentlichungen der Fakultät für Mathematik und Informatik > Institut für Informatik > Technical Reports
Subject: 004 Computer science, internet
Subject headings (SWD): Datenbanksystem , XML , Baum <Mathematik>
Individual keywords (German): Partitionierung, Clustering
Keywords (English): Partitioning, Clustering
Abstract: In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Since a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation. Hence, efficient bulkload support is crucial for XDSs. Essentially, XML bulkload is the transformation of an XML parser's output into the XDS's persistent storage structures. This involves two major subtasks: (1) Partitioning the documents' logical tree structure into subtrees smaller than a disk page in a way that is both space-efficient an suitable for later processing. (2) Mapping the subtrees to the XDS's internal page representation. In enterprise-scale environments with very large documents and/or very many parallel bulkloads, task (1) is particularly challenging, as not only disk space consumption, but also CPU and main-memory usage are important factors. In this article, we (1) discuss requirements for an XML bulkload module, (2) examine existing algorithms for tree partitioning with respect to their applicability as XML bulkload algorithms, (3) derive a new tree partitioning algorithm, (4) present the design and implementation of the bulkload module used in our Natix XDS, and (5) evaluate the implementation.
Additional information:

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




+ Citation Example and Export

Kanne, Carl-Christian ; Moerkotte, Guido (2005) The Importance of Sibling Clustering for Efficient Bulkload of XML Document Trees. Open Access [Working paper]
[img]
Preview


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item