Knowledge discovery in databases (KDD) (Fayyad et al, 1996), is the process of identifying valid, novel, potentially useful, and ultimately understandable patterns or models in data. Data mining(DM) is a step in the knowledge discovery process consisting of particular data mining algorithms that, under some acceptable computational efficiency limitations, find patterns or models in data. This is most cited among numerous definitions of data mining and knowledge discovery. However, for many, data mining is a synonym for knowledge discovery, and in this tutorial we will basically stick to this view. The overview of crucial steps of the standard DM process, is given on the following image.
Practical data mining requires a lot more than application of sophisticated techniques like neural networks or decision trees to a table of data. This is the reason why we have decided to use a practical, process-like vision of data mining as a surface layer of this tutorial. From this layer user will find links to explanations of individual terms, or techniques. Techniques involved in data mining represent a blend of statistics, pattern recognition and machine learning. We have had no intention to give here a detailed descriptions of individual techniques, but rather explain where and why should they be used. However, we will suggest locations on WEB that has, in our opinion, interesting material on the particular topic.
It is difficult to write about the topic without using domain-specific terminology. To avoid explanations in the text, there is a special, Glossary section where different data mining terms are explained to a greater detail.
Based on practical, real-world experience a CRISP-DM (CRoss Industry Standard Process for Data Mining) has been defined, by the consortium of companies which applied data mining from the days of its infancy. We will stick to that, broad picture of the process which is given below, as a surface layer of the tutorial.