Current location - Trademark Inquiry Complete Network - Overdue credit card - What is data mining?
What is data mining?

Data mining is the process of extracting potential and valuable knowledge (models or rules) from large amounts of data.

1. What can data mining do?

1) Data mining can do the following six different things (analysis methods):

· Classification (Classification)

· Valuation (Estimation)

p>

· Prediction

· Affinity grouping or association rules

· Clustering

· Description and Visualization (Description and Visualization)

2) Data mining classification

The above six data mining analysis methods can be divided into two categories: direct data mining; indirect data mining

· Direct data mining

The goal is to use the available data to build a model that can

understand the remaining data for a specific variable into the attributes of a table in the database, that is, columns).

· Indirect data mining

The target does not select a specific variable and describe it with a model; instead, it establishes a certain relationship among all variables

.

· Classification, valuation, and prediction belong to direct data mining; the last three belong to indirect data mining

3) Introduction to various analysis methods

· Classification (Classification)

First select a training set that has been classified from the data, and use data mining classification technology on the training set to establish a classification model.

For those without Classified data is classified.

Example:

a. Credit card applicants, classified as low, medium, and high risk

b. Assign customers to predefined customer shards

p>

Note: The number of classes is certain and predefined

·Estimation

Estimation is similar to classification, the difference is that Classification describes the output of discrete variables, while valuation handles the output of continuous values; the number of classification categories is fixed, but the amount of valuation is uncertain.

Example:

a. Estimate the number of children in a family based on purchasing patterns

b. Estimate the income of a family based on purchasing patterns

c. Estimating the value of real estate

Generally speaking, valuation can be used as a preliminary step in classification. Given some input data, the value of the unknown

continuous variable is obtained through estimation, and then classified according to the preset threshold. For example: For home loan business, banks use valuation to score each customer (Score 0~1). Then, based on the thresholds, the loan classes are classified.

·Prediction

Usually, prediction works through classification or valuation, that is, a model is derived through classification or valuation, and the model is used

p>

For prediction of unknown variables. In this sense, prophecies do not actually need to be classified into a separate category.

The purpose of prophecy is to predict unknown variables in the future. This kind of prediction takes time to verify, that is, a certain amount of time must pass

before the accuracy of the prediction is known. .

· Affinity grouping or association rules

Determine which things will happen together.

Example:

a. When customers in the supermarket buy A, they often buy B, that is, A =gt; B (association rule)

b . After a customer purchases A, he will purchase B after a period of time (sequence analysis)

· Clustering

Clustering is to group records and put similar records in a cluster inside. The difference between aggregation and classification is that aggregation does not rely on predefined classes and does not require a training set.

Examples:

a. The clustering of some specific symptoms may indicate a specific disease

b. The clustering of customers who rent VCDs with dissimilar types may indicate that Members belong to different subculture groups

Aggregation is often used as the first step in data mining. For example, "Which type of promotion responds best to customers?" For this type of question, first aggregate the entire customers, group the customers into their respective clusters, and then classify each different gathering,

Answering questions may be more effective.

·Description and Visualization (Description and Visualization)

It is a way of representing data mining results.

2. Business background of data mining

Data mining first requires the collection of a large amount of data in a business environment, and then requires the knowledge mined to be valuable.

Value for business is nothing more than three situations: reducing expenses; increasing income; increasing stock prices.

1) Data mining as a research tool (Research)

2) Data mining to improve process control (Process Improvement)

3) Data mining as a marketing tool (Marketing)

4) Data mining as a customer relationship management CRM tool (Customer Relationship Management)

3. Technical background of data mining

1) Data mining Technology includes three main parts: algorithms and technology; data; modeling capabilities

2) Data mining and machine learning (Machine Learning)

· Machine learning is computer science and artificial intelligence The product of AI development

· Machine learning is divided into two learning methods: self-organized learning (such as neural networks); generalizing rules from examples (such as decision trees

Decision trees)

· Origin of data mining

Data mining was proposed in the 1980s when AI was put into practical application after the investment in AI research projects failed. It is an emerging AI research oriented towards commercial applications. The choice of the term data mining indicates that there is no technical overlap with statistics, actuarial science, or economists who have long been engaged in predictive modeling.

3) Data mining and statistics

Statistics also begins to support data mining. Statistics include predictive algorithms (regression), sampling, experience-based design, etc.

4) Data mining and decision support systems

· Data warehouse

· OLAP (Online analytical processing), Data Mart (data mart), multi-dimensional database

· Integration of decision support tools

Integrate data warehouse, OLAP, and data mining to form enterprise decisions Analyze the environment.

4. The social background of data mining

Data mining and personal prediction: Data mining claims to be able to predict customer behavior through the analysis of historical data, but in fact

, the customer himself may not know what he wants to do next. Therefore, the results of data mining are not as mysterious as people imagine, and they cannot be completely correct.

Customer behavior is related to the social environment, so data mining itself is also affected by the social background. For example, in the United States, the credit rating model for bank credit card customers has been very successful, but it may not be suitable for China