Monday, 26 August 2013

Some of the Main Techniques For Data Mining

Data mining is the process of extracting relationships from large data sets. It is an area of Computer Science that has received significant commercial interest. In this article I will detail a few of the most common methods of data mining analysis.

Association rule discovery: Association rule discovery methods are used to extract associations from data sets. Traditionally, the technique was developed on supermarket purchase data. An association rule is a rule of the form X -> Y. An example of this may be "If a customer purchases milk this implies (->) that the customer will also purchase bread". An association rule has associated with it a support and a confidence value. The support is the percentage of all entries (or transactions in this case) that have all the items. For example, the percentage of all transactions in which milk and bread were purchased. The confidence is the percentage of the transactions that satisfy the left hand side of the rule that also satisfy the right hand side of the rule. For example, in this case, the confidence would be the percentage of purchases that purchased milk which also purchased bread. Association discovery methods will extract all possible association rules from a data set for which the user has specified a minimum support and confidence.

Cluster Analysis: Cluster analysis is the process of taking one or more numerical fields and assigning clusters their values. These clusters represent groups of points which are close to each other. For example, if you watch a documentary on space, you will see that galaxies contain a lot of stars and planets. There are many galaxies in space, however the stars and planets all occur in clusters that are the galaxies. That is, the stars and planets are not randomly located in space but are clumped together in groups that are galaxies. A cluster analysis method is used to find these sorts of groups. If a cluster analysis method was applied to the stars in space, it may find that each galaxy is a cluster and assign a unique cluster identification to each star in a given galaxy. This cluster identification then becomes another field in the data set and can be used in further data mining analysis. For example, you might use a cluster id field to form association rules to other fields in the data set.

Decision Trees: Decision trees are used to form a tree of decisions in a data set to help predict a value. For example, if you were looking at a data set that was used to predict weather a potential loan applicant would be a credit risk, a tree of decisions would be formed based on factors in the data set. The tree may contain decisions such as whether the applicant had defaulted on a loan before, the age of the applicant, whether the applicant was employed or not, the applicants income and the total repayments on the loan. You could then follow this tree of decisions to say for example, if an applicant has never defaulted on a loan before, the applicant is employed, their income is in the top 15 percentile for the country and the loan amount relatively low then there is a very low risk of default.

These are some of the more common techniques for data mining analysis amongst a large group of data mining techniques that a commonly applied to analyzing large data sets. These techniques have proved beneficial to gather useful information and relationships from data that may otherwise be too large to interpret well.




Source: http://ezinearticles.com/?Some-of-the-Main-Techniques-For-Data-Mining&id=4210436

No comments:

Post a Comment