Data mining is the process of applying these methods to data with the intention of discovering hidden patterns. It has been used for many years by companies, governments and scientists to sift through volumes of data, including data on air passengers traveling to census data and supermarket scanner data to produce market research reports.
An important reason for using data mining is to assist in analyzing the collections of observations of behavior. This data is vulnerable because of collinearity relationships unknown. A fact of data mining is that all of the data being analyzed may not be representative of the whole field, and therefore can not provide examples of critical behaviors and relationships that exist in other parts of the field.
To overcome this problem, the analysis can be increased by using approaches based on experiments and others, such as the choice models for human generated data. In these situations, inherent correlations either verified or removed during construction of the experimental setup.
Data mining usually involves four categories of tasks:
Layout – Organizes data into predefined groups. For example an email program can try to e-mail as legitimate or spam classification. Common learning algorithms are decision trees, nearest neighbor, naive Bayes classification and neural network.
Clustering – such as format, but the groups are not predefined, so the algorithm will try to group similar items.
Regression – Attempts to find a function that models the data with the least error.
Association rule learning – Searches for relationships between variables. Example, a supermarket might gather data about customers’ buying behavior. Using the learning of association rules, the supermarket determine which products are often purchased together and use this information for marketing purposes. This is called market basket analysis. Now. Look at some examples where it can be used in real world.
In the area of research in human genetics, the goal is important to understand the relationship of correspondence between the inter-individual variations in human DNA sequences and the variability in susceptibility to disease. Simply put, it is how changes in the DNA sequence of an individual affect the risk of developing diseases such as cancer. This is very important for the diagnosis, prevention and treatment of diseases. The data mining technique used for this task is known as multifactor dimensionality reduction.
In the field of electrical engineering, data mining techniques are widely used for monitoring the condition of high voltage electrical equipment. The purpose of condition monitoring is to provide valuable information on the health of the insulation of the equipment. Data, such as the combination of self-organizing map (SOM) was applied to the vibration monitoring and analysis of transformer on-load tap changers (OLTC).
Using vibration monitoring, it may be noted that each tap change operation of a signal about the status of the contacts and generate trimmer disks. Of course, the tap positions generate different signals. However, there was considerable variability between the signals normally makes precisely the same function. SOM was used to detect abnormal conditions and the nature of the deviations to estimate.