alpha to omega: the G(r)eek alphabet of sampling


Moerkotte, Guido ; Hertzschuch, Axel


[img] PDF
p25-moerkotte-cidr20.pdf - Published

Download (505kB)

URL: https://madoc.bib.uni-mannheim.de/58070
Additional URL: http://cidrdb.org/cidr2020/papers/p25-moerkotte-ci...
URN: urn:nbn:de:bsz:180-madoc-580704
Document Type: Conference or workshop publication
Year of publication: 2020
Book title: CIDR 2020 : 10th Conference on Innovative Data Systems Research, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings
Page range: 1-15
Conference title: CIDR 2020
Location of the conference venue: Online
Date of the conference: 12.-15.01.2020
Place of publication: Amsterdam
Publishing house: CIDR
Publication language: English
Institution: School of Business Informatics and Mathematics > Practical Computer Science III (Moerkotte 1996-)
Pre-existing license: Creative Commons Attribution 3.0 Unported (CC BY 3.0)
Subject: 004 Computer science, internet
Abstract: Sampling is the most versatile and easiest to implement car-dinality estimation method. Therefore, it is implemented inalmost every database management system, commercial ornot. Consequently, the main purpose of this paper is to pro-vide the reader with an intuition about sampling precision.In the context of query optimization, the basic procedurecan be described as follows. From a relationRcontainingntuples, a sample ofm < ntuples is drawn. Then, a querypredicatepis evaluated on themsample tuples, and thenumberkof qualifying sample tuples is recorded. Assumethe evaluation of the same predicatepon the relationRresults inlqualifying tuples. The task now is to producean estimateˆlforlwheren,m,kare given. The standardanswer to this task isˆl=knm. However, there are some (yetunanswered) fundamental questions:1. Is the standard estimator the best way to derive anestimate?2. What are the upper and lower bounds forl?3. How can we derive an estimate that minimizes the q-error?4. How large is the q-error we can expect for this esti-mate?5. For a given maximal allowed q-error, which sample sizemshould we choose?Since sampling is a probabilistic process, we will give prob-abilistic answers to these questions. Further, we show howresult cardinality estimates for selections and joins can significantly be improved.
Additional information: Originalt.: α to ω




Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item