Home Machine Learning Education Structure of Data Mining modelling techniques

Structure of Data Mining modelling techniques

DM modelling techniques come from a number of different fields of research like machine learning, signal processing, evolutionary computing, statistics. This fact, together with the great number of different algorithms is confusing for the potential users. A useful approach to diminish confusion is to stress that all DM modelling techniques have similar structure. They can all be described by three main components:

  1. Model (or knowledge) representation

    • This is the functional form of the model(s) that are used by the algorithm. Formally, model can be represented as a function y=f(x,P), where x represents the input (these are attribute-value pairs) and P represents specific parameters describing the particular model. For example in case of Decision tree algorithm y represents a graph of nodes and edges, while for Rule induction algorithm this is a particular set of rules in CNF or DNF form. Important issues related to model representation are: the form of data that the model handles (continuous, discrete-integer, categorical, all), the explanatory power of the representation, the function approximation capabilities (linear, nonlinear), the form of the output of the model.


  2. Estimation criteria

    • When a particular representation f is given, estimation criteria evaluates how well a particular set of parameters P fits the data. This estimation criteria is internal to the specific DM modelling technique and should not be confused with evaluation measures used to asses already built models. Evaluation measures for assesing built models are treated in a separate section of this tutorial. Estimation criterion therefore evaluates construction of different instances of representation f, during the search through the space of all possible instances using the search method which is the third basic part of the DM modelling techniques, and is explained next. Typical characteristics of estimation criteria include: sensitivity and robustness of the estimation criteria for a particular model as a function of sample size and the dimensionality of the problem; the underlying assumptions of the criterion (probabilistic, logical, independent sampling). Estimation criterion differs significantly from technique to technique; it is a consequence of the particular model representation and applied search method.


  3. Search method

    • Given a represetational form, and an estimation criteria, the search method is a specific algorithm that governs search through the space of all describable representations, using the estimation criteria. This basicaly means that given model representation and estimation criteria, DM modelling techniques work like optimization algorithms. Search algorithms have some typical characteristics: basic search methodolgy (greedy, exhaustive, heuristic, hill-climbing); complexity of the search (whether it is a parameter search or it has additional loop over model structures); control of the search (time and memory complexity related stopping criteria).

Descriptions of different DM modelling techniques, which can be found through links given in the DM Modelling techniques section, reveal different properties of these three components, typical for each technique.

Source link


Must Read

Highlighting AI Bias

On Monday, IBM made a monumental announcement: the company is getting out of the facial recognition business, citing racial justice concerns and the need...

Artificial Brains Need Sleep Too

 States that resemble sleep-like cycles in simulated neural networks quell the instability that comes with uninterrupted self-learning in artificial analogs of brains.No one can...

Differenciating Bitcoin and Electronic Money

Bitcoin has the largest market share among virtual currencies, and is already being used on a daily basis overseas. Since it is a virtual...

Answering the Woes of Staking Centralization

What if better behavior on blockchains could be encouraged with fun rather than value?Josh Lee and Tony Yun of Chainapsis built a staking demo at the Cross-Chain...

The future of Machine Learning

Machine learning (ML) is the process which enables a computer to perform something that it has not been explicitly told to do. Hence, ML...

Is Automation the solution for rapid scaling in response to the Pandemic

Thanks to the pandemic, the nature of work for federal agencies changed almost overnight. Agencies are now attempting to meet the challenges of a...

Siemens and SparkCognition unveil AI-driven cybersecurity solutions

Today, Siemens and industrial AI-firm, SparkCognition, announced a new cybersecurity solution for industrial control system (ICS) endpoints.DeepArmor Industrial, fortified by Siemens, leverages artificial intelligence (AI) to...

Amazon and Microsoft follow IBM, no longer in Face Recognition business

At least its bandwagon-detection AI still worksMicrosoft said on Thursday it will not sell facial-recognition software to the police in the US until the...

Developing smart contracts with buffered data model

How specifying world state data model with protocol buffers can help in developing smart contracts

Reasons why your AI Project might fail

Here is a common story of how companies trying to adopt AI fail. They work closely with a promising technology vendor. They invest the...
banner image