Research of the Data Mining Engine Based on Big Data

Mei Wan

Research of the Data Mining Engine Based on Big Data

Download as PDF

DOI: 10.25236/ictmic.2020.013

Author(s)

Mei Wan

Corresponding Author

Mei Wan

Abstract

With the rapid development of Internet technology, people are accumulating more and more data, and the scale of data has risen from the previous GB level to IB or even PB. In order to discover the potential value in the data, it is common practice to flexibly use various data mining algorithms according to the actual situation. Although data mining has been fully utilized and developed on traditional small data sets, proving its value and guiding significance, on large data sets, the implementation of data mining algorithms faces execution efficiency, algorithm parallelism, and easy platform Usability challenges. In order to solve the problems of data mining in big data, the paper researches the data mining engine under big data, using Spark as the core engine, and based on Spark's memory computing operators, a number of traditional data mining The parallel computing of the algorithm enables the traditional data mining algorithms to run in parallel in a cluster environment, and thus is well applied in big data. Then, through the system layering method, the data mining system is designed in layers to realize a complete big data mining platform.

Keywords

Big data, Data mining, Search engine