Computing Reviews
Today's Issue Hot Topics Search Browse Recommended My Account Log In
Review Help
Search
Mining top-k high-utility itemsets from a data stream under sliding window model
Dawar S., Sharma V., Goyal V. Applied Intelligence47 (4):1240-1255,2017.Type:Article
Date Reviewed: Apr 19 2018

Nowadays most areas of commerce, science, or business generate data in such copious ways that they just cannot be organized in traditional databases anymore; instead, they must literally be mined out of their heterogeneous sources--which is called data mining. Sometimes, however, data mining operations produce too many or too few results. The first solution that comes to mind is to define a utility threshold, or a boundary under which results are considered irrelevant for the ongoing search; this threshold can be defined according to different criteria. This solution in turn poses another problem: defining the threshold. Many algorithms exist that address and solve this problem. Starting from them, the authors of this paper propose their own novel algorithm to mine data in a more efficient way. Theirs is a one-step algorithm, as opposed to existing ones that perform the same task in two steps: generation of candidate solutions first, followed by verification of their position relative to the threshold.

Their algorithm, instead, evaluates all transactions generated within a given period of time; transactions are grouped in batches and batches are arranged in time frames, called windows. When a given number of transactions arrives, a new batch is created with its associated window, and at the same moment the oldest window is discarded. This method is called sliding windows because windows move forward with time. This way, the algorithm continually monitors transactions and updates results according to the threshold definition.

This, of course, is just a short and succinct summary of the whole paper; the paper itself is far richer and well articulated. It contains a description of the algorithm, then its formal proof of correctness in terms of formal logic, and finally experimental application to both real-world and purpose-built datasets to evaluate its performance against existing algorithms. The paper’s core is the definition in plain, yet formal, English, of what the authors call high-utility itemset mining. Then comes a review of existing algorithms in table form, a plain English explanation of some of them, and a comparison among them. This plain-text description serves as an introduction to the core of the algorithm, an inverted list data structure called iList; it is first introduced, together with its related fields, with formal definitions, then built in pseudocode. Next, the algorithm to mine iLists is described, in text, mathematical language, pseudocode, and a formal proof of correctness. Then comes the part on experimental evaluation: a comparison of the authors’ algorithm against the state-of-the-art T-HUDS algorithm. The experimental setup, hardware, software, and datasets are described, followed by a description of test conditions: tests on both sparse and dense datasets, with different batch sizes and different numbers of items in each transaction.

Results are given both in diagram and table form. The authors find that as size increases, so does execution time; therefore, they face a scalability issue, but so does their competing algorithm, which also often runs out of memory. The paper ends with indications on work to be done to solve this inconvenience and above all adapt this somehow still experimental solution to big data technologies like Apache Storm and Apache Spark in a production environment. This of course constitutes the most exciting part of the paper, which alone is a valid reason to read it.

Reviewer:  Andrea Paramithiotti Review #: CR145985 (1807-0392)
Bookmark and Share
  Featured Reviewer  
 
Data Mining (H.2.8 ... )
 
Would you recommend this review?
yes
no
Other reviews under "Data Mining": Date
Feature selection and effective classifiers
Deogun J. (ed), Choubey S., Raghavan V. (ed), Sever H. (ed) Journal of the American Society for Information Science 49(5): 423-434, 1998. Type: Article
May 1 1999
Rule induction with extension matrices
Wu X. (ed) Journal of the American Society for Information Science 49(5): 435-454, 1998. Type: Article
Jul 1 1998
Predictive data mining
Weiss S., Indurkhya N., Morgan Kaufmann Publishers Inc., San Francisco, CA, 1998. Type: Book (9781558604032)
Feb 1 1999
more...

E-Mail This Printer-Friendly
Send Your Comments
Contact Us
Reproduction in whole or in part without permission is prohibited.   Copyright 1999-2024 ThinkLoud®
Terms of Use
| Privacy Policy