Nov 11, 2013

Overview of data mining

When people talk about data mining, sometimes they refer to the methods used in this field, such as machine learning.Sometimes they refer to specific data of interest, such as text mining or video mining. Many other times, people use the term "big data" simply referring to the infrastructure of data mining, such as Hadoop or Cassandra. 

Given all of these various references, newcomers of the data mining field feel like lost in a wonderland. 
Here we give an overview picture that connects all these components together. Essentially we can think the data mining field consists of 4 major layers. At the top layer is the basic methodology, such as machine learning or frequent pattern mining. The second layer is the application to various data types. For social networks, this is graph mining. For sensors and mobile data, this is stream data mining. The third layer is the infrastructure, where Hadoop, NoSQL and other environments are invented to support large data movement and retrieval. Finally, at the fourth layer, we are concerned creating a data mining team, understanding the people profiles that support all of the above operations. 

Given this four-layer separation, we can easily see how various discussions on data mining fall in the picture. For example, the hot topic of "deep learning" belongs to the machine learning layer. More specifically, it is part of Unsupervised Learning.  Another topic "Natural Language Processing" (NLP) is part of mining  text data. Not surprisingly,  this field uses machine learning extensively as its core methodology. 


  1. This comment has been removed by a blog administrator.

  2. Thank you for the informative post on the main components of data mining. It is important to understand these components in order to use the data productively. Another important component is to be sure to clean the data before mining to remove duplicates and such.

  3. What is Data Mining? What are some of the commercial DM tools? What do DM professionals do on a day to day basis, i.e project life cycle? How are queries constructed? Is it the same as SQL?
    Can I have the answers of all my questions here?

  4. Eczema es a menudo llamada Obat Eksim Gatal Di Dada eczema, o dermatitis es una inflamación severa obat kolesterol que causa la formación de pequeñas ampollas o burbujas obat kolesterol di apotik (vesículas) en la piel hasta que finalmente reventar y expulsar el líquido.