Sampling is the most versatile and easiest to implement car-dinality estimation method. Therefore, it is implemented inalmost every database management system, commercial ornot. Consequently, the main purpose of this paper is to pro-vide the reader with an intuition about sampling precision.In the context of query optimization, the basic procedurecan be described as follows. From a relationRcontainingntuples, a sample ofm < ntuples is drawn. Then, a querypredicatepis evaluated on themsample tuples, and thenumberkof qualifying sample tuples is recorded. Assumethe evaluation of the same predicatepon the relationRresults inlqualifying tuples. The task now is to producean estimateˆlforlwheren,m,kare given. The standardanswer to this task isˆl=knm. However, there are some (yetunanswered) fundamental questions:1. Is the standard estimator the best way to derive anestimate?2. What are the upper and lower bounds forl?3. How can we derive an estimate that minimizes the q-error?4. How large is the q-error we can expect for this esti-mate?5. For a given maximal allowed q-error, which sample sizemshould we choose?Since sampling is a probabilistic process, we will give prob-abilistic answers to these questions. Further, we show howresult cardinality estimates for selections and joins can significantly be improved.
Additional information:
Originalt.: α to ω
Dieser Eintrag ist Teil der Universitätsbibliographie.
Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.