The best way to conference proceedings by Francis Academic Press

Web of Proceedings - Francis Academic Press
Web of Proceedings - Francis Academic Press

An Improved Sequential Pattern Mining Algorithm Based on Data Ming

Download as PDF

DOI: 10.25236/icess.2019.236

Author(s)

Lili Wang

Corresponding Author

Lili Wang

Abstract

This paper improves PrefixSpan algorithm and proposes ISPM (Improved Sequential Pattern Mining) Algorithm. This algorithm can greatly reduce the numbers of construction projection database, thus improving the efficiency of sequential pattern mining. In addition, the algorithm proposes the concept of sequential pattern values, and reorders the results of the mining sequence patterns by the values of sequence pattern, so that it can find the most important sequence patterns. Then we make experiments to verify the efficiency of ISPM algorithm, from different supports, different types of datasets and different sizes of datasets. Propose the ISPM of Map-Reduce algorithm. In practical applications, in the face of huge datasets, the efficiency of the ISPM algorithm is facing bottlenecks. Therefore, we propose ISPM of Map-Reduce algorithm. By way of distributed processing, we put large tasks into multiple smaller tasks, then do sequence pattern mining in parallel on each name-node. Then we make experiments to verify the efficiency of the algorithm. The first experiment is to verify the speedup of the algorithm between single platform and Hadoop. The second experiment is to test the efficiency of the algorithm in different sizes of the datasets. From two experiments, we can find that this algorithm could be able to improve the efficiency in the face of large datasets.

Keywords

Sequential pattern mining, Projection database, PrefixSpan, Large datasets, Map-Reduce