alpha to omega: the G(r)eek alphabet of sampling

PDF
p25-moerkotte-cidr20.pdf - Veröffentlichte Version
Download (505kB)

URL:	https://madoc.bib.uni-mannheim.de/58070
Weitere URL:	http://cidrdb.org/cidr2020/papers/p25-moerkotte-ci...
URN:	urn:nbn:de:bsz:180-madoc-580704
Dokumenttyp:	Konferenzveröffentlichung
Erscheinungsjahr:	2020
Buchtitel:	CIDR 2020 : 10th Conference on Innovative Data Systems Research, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings
Seitenbereich:	1-15
Veranstaltungstitel:	CIDR 2020
Veranstaltungsort:	Online
Veranstaltungsdatum:	12.-15.01.2020
Ort der Veröffentlichung:	Amsterdam
Verlag:	CIDR
Sprache der Veröffentlichung:	Englisch
Einrichtung:	Fakultät für Wirtschaftsinformatik und Wirtschaftsmathematik > Practical Computer Science III (Moerkotte 1996-)
Bereits vorhandene Lizenz:	Creative Commons Namensnennung 3.0 Unported (CC BY 3.0)
Fachgebiet:	004 Informatik
Abstract:	Sampling is the most versatile and easiest to implement car-dinality estimation method. Therefore, it is implemented inalmost every database management system, commercial ornot. Consequently, the main purpose of this paper is to pro-vide the reader with an intuition about sampling precision.In the context of query optimization, the basic procedurecan be described as follows. From a relationRcontainingntuples, a sample ofm < ntuples is drawn. Then, a querypredicatepis evaluated on themsample tuples, and thenumberkof qualifying sample tuples is recorded. Assumethe evaluation of the same predicatepon the relationRresults inlqualifying tuples. The task now is to producean estimateˆlforlwheren,m,kare given. The standardanswer to this task isˆl=knm. However, there are some (yetunanswered) fundamental questions:1. Is the standard estimator the best way to derive anestimate?2. What are the upper and lower bounds forl?3. How can we derive an estimate that minimizes the q-error?4. How large is the q-error we can expect for this esti-mate?5. For a given maximal allowed q-error, which sample sizemshould we choose?Since sampling is a probabilistic process, we will give prob-abilistic answers to these questions. Further, we show howresult cardinality estimates for selections and joins can significantly be improved.
Zusätzliche Informationen:	Originalt.: α to ω