The accuracy of cardinality estimators: Unraveling the evaluation result conundrum


Rashedi, Nazanin ; Moerkotte, Guido


[img]
Preview
PDF
3749646.3749651.pdf - Published

Download (2MB)

DOI: https://doi.org/10.14778/3749646.3749651
URL: https://dl.acm.org/doi/10.14778/3749646.3749651
Additional URL: https://www.researchgate.net/publication/395274986...
URN: urn:nbn:de:bsz:180-madoc-716385
Document Type: Article
Year of publication: 2025
The title of a journal, publication series: Proceedings of the VLDB Endowment
Volume: 18
Issue number: 11
Page range: 3744-3756
Place of publication: New York, NY
Publishing house: Association of Computing Machinery
ISSN: 2150-8097
Publication language: English
Institution: School of Business Informatics and Mathematics > Practical Computer Science III (Moerkotte 1996-)
Pre-existing license: Creative Commons Attribution, Non-Commercial, No Derivatives 4.0 International (CC BY-NC-ND 4.0)
Subject: 004 Computer science, internet
Abstract: Existing research on the accuracy of cardinality estimators generally suffers from a lack of diversity and sufficient quantity of their experimental datasets, particularly in relation to the claimed scope of the study and the generality of its conclusions. We argue that a sufficiently large number of varied datasets are essential for comprehensive evaluations. However, the prevailing per-dataset evaluation method (PDE), producing one result table per dataset, has so far hindered this necessary expansion of the experiments. Moreover, as we demonstrate, this evaluation method often leaves the reader with contradictory results, where one estimator excels on certain datasets or queries, while the other performs better elsewhere. To address these and similar limitations, we propose a multidimensional evaluation framework. This framework unravels the conundrum of analyzing the evaluation results across multiple datasets through the use of discretization. It establishes a robust foundation for aggregating the evaluation results and conducting pairwise comparisons between estimators. Furthermore, it facilitates informed decision making in the presence of conflicting results through a customizable ranking mechanism. To empirically highlight the shortcomings of the aforementioned per-dataset evaluation and demonstrate the advantages of our proposed framework, we conduct a benchmarking study of cardinality estimators, incorporating both learned and traditional approaches. We focus on a fundamental challenge: estimating the cardinality of range queries on a single 2-D geographical relation in a static environment. Despite the apparent simplicity of this task, our findings reveal that many estimators struggle to handle this challenge effectively. To further enhance the quality of our study, we provide valuable insights by addressing some critical aspects that were overlooked in previous benchmarking studies.




Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.




Metadata export


Citation


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics



You have found an error? Please let us know about your desired correction here: E-Mail


Actions (login required)

Show item Show item