A unified framework for frequent sequence mining with subsequence constraints

Beedkar, Kaustubh ; Gemulla, Rainer ; Mertens, Wim

A Unified Framework for Frequent Sequence Mining with Subsequence Constraints.pdf - Published

Download (1MB)

DOI: https://doi.org/10.1145/3321486
URL: https://madoc.bib.uni-mannheim.de/48227
Additional URL: https://dl.acm.org/citation.cfm?id=3321486
URN: urn:nbn:de:bsz:180-madoc-482279
Document Type: Article
Year of publication: 2019
The title of a journal, publication series: ACM Transactions on Database Systems : TODS
Volume: 44
Issue number: 3
Page range: 11:1-11:42
Place of publication: New York, NY
Publishing house: ACM Press
ISSN: 0362-5915 , 1557-4644
Publication language: English
Institution: School of Business Informatics and Mathematics > Practical Computer Science I: Data Analytics (Gemulla 2014-)
License: CC BY 4.0 Creative Commons Attribution 4.0 International (CC BY 4.0)
Subject: 004 Computer science, internet
Abstract: Frequent sequence mining methods often make use of constraints to control which subsequences should be mined. A variety of such subsequence constraints has been studied in the literature, including length, gap, span, regular-expression, and hierarchy constraints. In this article, we show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. A unified treatment allows researchers to study jointly many types of subsequence constraints (instead of each one individually) and helps to improve usability of pattern mining systems for practitioners. In more detail, we propose a set of simple and intuitive “pattern expressions” to describe subsequence constraints and explore algorithms for efficiently mining frequent subsequences under such general constraints. Our algorithms translate pattern expressions to succinct finite-state transducers, which we use as computational model, and simulate these transducers in a way suitable for frequent sequence mining. Our experimental study on real-world datasets indicates that our algorithms—although more general—are efficient and, when used for sequence mining with prior constraints studied in literature, competitive to (and in some cases superior to) state-of-the-art specialized methods.

Dieser Eintrag ist Teil der Universitätsbibliographie.

Das Dokument wird vom Publikationsserver der Universitätsbibliothek Mannheim bereitgestellt.

Metadata export


+ Search Authors in

+ Download Statistics

Downloads per month over past year

View more statistics

You have found an error? Please let us know about your desired correction here: E-Mail

Actions (login required)

Show item Show item