Home Machine Learning Education Modelling of data for Data mining

Modelling of data for Data mining

Modelling

Regarding the novelty and abundance of available techniques and algorithms involved in the modelling phase this is the most interesting part of the data mining process. Therefore, we have devoted a special section of the tutorial to a description of data mining modelling techniques. Important stages in the modelling phase:

Selection of modelling technique

This problem has been initiated earlier in the project, through specifying problem and data mining goals. However, at this stage, when we finally have data prepared for modelling, we can still choose more appropriate technique than specified at the start of the process.
When choosing an appropriate technique among numerous available DM modelling techniques one has to have in mind the main task of the project and its relation to main divisions of DM tools according to the type of the problem. First division of DM modelling tools is according to the type of knowledge discovery task one wants to achieve: i.e. prediction or description. One must emphasize that many DM modelling tools are capable of generating models which at the same time solve prediction task but also provide an informative description of the model behind the data, which is appropriate as a descriptive task solution. Generally, goals of prediction and description tasks are achieved by applying one of the primary data mining methods. In the table below data mining problem types are related to appropriate modelling techniques.

ClassificationRule induction methods, Decision trees, Neural networks, K-nearest neighbors, Case based reasoning
PredictionRegression analysis, Regression trees, Neural networks, K-nearest neighbors,
Dependency analysisCorrelation analysis, Regression analysis, Association rules, Bayesian networks, Inductive logic programming
Data description and summarizationStatistical techniques, OLAP
Segmentation or clusteringClustering techniques, Neural networks, Visualization methods

Generate test design

Before building a model, we need to generate a procedure or mechanism to test the model’s quality and validity. For example, in supervised data mining tasks such as classification, it is common to use error rates as quality measures for data mining models. Therefore, we typically separate the data into train and test set, build the model on the training set and estimate its quality on the separate test set.

Building a model

Once the modelling tool(s) is choosen we can run the tool on the prepared dataset and generate typically more different models. All the modelling tools have a number of parameters that govern the model generation process. The choice of optimal parameters for the problem at hand is an iterative process, and it has to be properly explained and supported through results. Resultant models should be properly interpreted and their performance explained.

Model assessment

Once models are generated they are interpreted according to the existing domain knowledge and data mining success criteria. Domain experts judge the results (models) within domain context, while data miners apply data mining criteria (accuracy on the test set, lift or gain tables, etc.).

Source code


Must Read

Highlighting AI Bias

On Monday, IBM made a monumental announcement: the company is getting out of the facial recognition business, citing racial justice concerns and the need...

Artificial Brains Need Sleep Too

 States that resemble sleep-like cycles in simulated neural networks quell the instability that comes with uninterrupted self-learning in artificial analogs of brains.No one can...

Differenciating Bitcoin and Electronic Money

Bitcoin has the largest market share among virtual currencies, and is already being used on a daily basis overseas. Since it is a virtual...

Answering the Woes of Staking Centralization

What if better behavior on blockchains could be encouraged with fun rather than value?Josh Lee and Tony Yun of Chainapsis built a staking demo at the Cross-Chain...

The future of Machine Learning

Machine learning (ML) is the process which enables a computer to perform something that it has not been explicitly told to do. Hence, ML...

Is Automation the solution for rapid scaling in response to the Pandemic

Thanks to the pandemic, the nature of work for federal agencies changed almost overnight. Agencies are now attempting to meet the challenges of a...

Siemens and SparkCognition unveil AI-driven cybersecurity solutions

Today, Siemens and industrial AI-firm, SparkCognition, announced a new cybersecurity solution for industrial control system (ICS) endpoints.DeepArmor Industrial, fortified by Siemens, leverages artificial intelligence (AI) to...

Amazon and Microsoft follow IBM, no longer in Face Recognition business

At least its bandwagon-detection AI still worksMicrosoft said on Thursday it will not sell facial-recognition software to the police in the US until the...

Developing smart contracts with buffered data model

How specifying world state data model with protocol buffers can help in developing smart contracts

Reasons why your AI Project might fail

Here is a common story of how companies trying to adopt AI fail. They work closely with a promising technology vendor. They invest the...
banner image